It’s a new year, and time for resolutions and all that. In that spirit, I’ve put together a list of 4 things you can do to improve your load balancing infrastructure. Some are quick, some are more involved, but they’ll all pay huge dividends.

Number 1: Get Your Sniffing In Order
Take some time to prepare your infrastructure to make sure you can sniff the points where traffic enters and leaves your load balancing infrastructure. If you’re lucky, your load balancer has the tcpdump utility built in. If that’s the case, you’re done. If not, you’ll need a way to sniff traffic on all LANs that your load balancer operates on.
Without TCPDump on your load balancer, you’ll need to setup some type of span/mirror port on your switching infrastructure, or put a hub near the ingress/egress point. This can be somewhat of a pain to setup, which is why it’s best to do it when there isn’t a problem that needs diagnosing.
What works great is some type of Unix or Windows box that you can log into remotely with a spare Gigabit Ethernet port. Have a couple of ports on you Layer 2 infrastructure switches ready to be turned into span/mirror ports, and all you need to do is plug the spare Ethernet port to do the sniffing. You can of course move the cable around to different span ports as needed.
If you’re really in a pinch, you can plug a hub in between the load balancer and the servers or the load balancer and the Internet and sniff traffic that way, although you’ll probably degrade network performance some with the hub (and thus, collisionable) network.
Number 2: Run MRTG/RRDTool
Management guru Peter Drucker said “What gets measured, gets managed”, and that’s certainly true for networks. Installing MRTG/RRDTool on your network (if you haven’t already) and pulling stats from your load balancer will definitely help you manage your infrastructure.
Getting Interface bandwidth stats is trivial with any load balancer. Most support extended objects, which will get you even more detailed metrics on your load balancing infrastructure.
Number 3: Check For Ethernet Errors
It’s always a good idea to check your Ethernet interfaces for errors. If you’ve got a managed switch, such as a Cisco Catalyst, the command is something along the lines of “show port counters [port]“.
Port Align-Err FCS-Err Xmit-Err Rcv-Err UnderSize ----- ---------- ---------- ---------- ---------- --------- 3/2 1511 213 0 0 0
Port Single-Col Multi-Coll Late-Coll Excess-Col Carri-Sen Runts Giants
----- ---------- ---------- ---------- ---------- --------- --------- ---------
3/2 0 0 0 0 0 5511 0
On a Linux, FreeBSD, or other Unix-type system, “ifconfig -a” usually does it.
RX packets:157721935 errors:5869 dropped:6008 overruns:5869 frame:0 TX packets:172114172 errors:0 dropped:0 overruns:15 carrier:0 collisions:0 txqueuelen:1000
In both cases, there are errors on the interface. These types of errors can be caused by a couple of things, (including faulty wiring), but the usual suspect is a duplex mismatch.
Duplex mismatches are insidious because when they occur, you can still pass traffic. With light traffic, you won’t even notice a difference. With moderate to heavy traffic, things will get slow, connections dropped, but you’ll still pass traffic. Other failures completely block traffic, but a duplex mis-match isn’t always obvious, and it’ll make traffic crawl.
100 Mbps duplex autodetect doesn’t really work that well. With Gigabit, the protocol is much more refined, and I’ve yet to see a duplex mismatch even on auto when Gigabit is used. But with 100 Mbps, you can’t really trust auto-detect.
The solution: Always hard-code 100 Mbps links, and check your interfaces occasionally for errors.
Number 4: Check For Software Updates
Even if there’s no immediate need to update your code, it’s a good idea to keep current. It’s a lot easier to schedule and update code in regular intervals than it is to find yourself in a situation where you need to be on the latest code, and you’re several major versions back. So it’s a good idea to schedule code updates on a regular basis.

