Eight years ago I wrote an article called “It’s Always The Load Balancer“, and eight years later, that’s still the case. But despite being the undeserved scape-goat of an infrastructure’s (and society’s) ills, there are a few things that actually are the load balancers fault (and those who charged with their administration). And when these gotchas occur, the pitch-fork and fire toting villagers that come for your head are somewhat justified.
These are the most common gotchas I’ve come across, and most of them have been utterly my fault.
#5: A Load Balancer’s Got To Know Its Limitations
It’s a good idea to communicate with the other groups involved in your infrastructure, in terms of performance and functionality, the limitations of your given product. All products have limitations (connection rate, SSL TPS, throughput), and you’ve got to understand them, and have them understand them.
Otherwise, they might think it’s OK to put a 100 MByte video file on a server, and overwhelm a 100 Mbit link. Or get slashdot’d, and end up sending tens of thousands of connections per second on an entry level device.
After all, as the application developers, as the content producers, they are the ones who control how popular a site is, and as a result control how much load will end up on the devices under your charge. So it’s best to make sure they understand the limitations.
#4: Backups, What Backups?
Even in HA setups, it’s possible to have your configuration files completely blown away. Events such as upgrades gone terribly awry, a bad hardware day, or more likely, you just fat-fingered it. So an HA setup is not considered a decent backup. Save the configurations on a regular basis, and on somewhere that isn’t the load balancer(s).
#3: Duplex Mis-match
Unlike Gigabit Ethernet, 100 Mbps Fast Ethernet duplex negotiation doesn’t !#@*ing work. At least not the way you would think it would. If you have duplex on one side set for auto negotiation, and the other side hard set to full/half, the auto-end often won’t negotiate correctly. And sometimes with older drivers, even with both ends set for auto negotiate, they won’t negotiate correctly.
The worst part of a duplex mismatch is that even when it occurs, the link still works. You can pass traffic, and at low traffic levels, there are almost no symptoms. But as traffic ramps up, you’ll start to see slowness, but it’s intermittent, and can be easily blamed on congested Internet connections, or perhaps, that the load balancer is just slow.
So make sure any load balancer connections that are connected via Fast Ethernet are manually set (ideally to full duplex) on both the load balancer side and the Ethernet switch side.
Gigabit Ethernet seems to work just fine, and if you’re connected to anything and negotiation is set to auto, you should be good to go.
#2: Persistence on Stateful Apps
You configure the virtual service, real servers, and all the associated goodies and pull the page up in the browser to find the page comes up as expected and understandably pat your self on the back (or, like me, do an NFL-inspired victory dance). You hand over the configuration to the app guys and wait for the impending adulations, but instead get angry emails. The app is going berzerk. Sure, you can get pages up, but there are crazy re-directs, the shopping cart seems to have developed multiple personality disorder.
The issue is that you forgot to set up some sort of persistence. Sure, they didn’t ask for it, but they probably didn’t know they needed to. So make sure when setting up load balancing for an application, ask the app group if they need persistence.
#1: Where the #$%^ are my logs!?!?!
You’ve set up load balancing, everything looks great. The app people have tested it out, and things look solid. A couple of weeks go by with a happy applications teams, and then they go to munge their server logs. The problem is, you’ve set up the virtual service with non-transparency. This is when (from the server’s perspective) all requests appear to come from the load balancer, and the IP address of the client is lost. Lost, and you can’t ever get them back. Which leads to a conversation that goes something like this:
“Where the #@#! are my logs? No, seriously, the jokes over, where the #@#$!!$ @#$# are my #@%@ing logs!?!?!”
Load balancers generally don’t log HTTP requests. With some load balancers, you can set up some rudimentary logging (such as an iRule with F5), but it’ll quickly fill up a disk and likely impose a pretty serious penalty in terms of capacity. And even if it didn’t, associating those events with events on the server (such as database calls) would require some intensive munging. So no, the load balancers don’t log anything.
The lack of IP address can be mitigated going forward by setting up either transparent or by including the source IP address in an HTTP request header, but this isn’t going to bring back the lost source IPs. Once they are gone, they’re gone.



As for persistence, if you have multiple data centers, and you are Active-Active across data centers, you will need to make sure that you also the same data center.
This gets to be a little tricky. Your GSLB device will also need to be configured for persistence. And since GSLB is an intelligent DNS system, you will need to where the DNS requests are coming from.
-Ron
siteredundancy.com