You’re stumped. There’s a problem with your infrastructure, and you’re not positive what it is. You checked a few things out, but the symptoms befuddle you. You’re pretty sure it’s not the load balancer, but everyone is pointing at you, and you’ve got no proof.
What do you do?
I’ve been in that situation so many times, I’ve developed a relatively quick checklist process that can quickly be performed. This check list has a couple of benefits:
- It’s methodical and process-based, so it can pick up both the obvious and the oddity
- When working in an environment where there are different groups responsible for different aspects of the infrastructure, this provides clear demarcation for them and helps with interaction
- If the problem lies with the load balancer, this troubleshooting will point to the problem in about 90% of the cases
- If the problem lies elsewhere, this troubleshooting will provide hard evidence to back up that claim
The heart of this check list is the 4-step process basic to all load balancing:
The 4-step process, basic to all load balancing
This process starts at the beginning, from the clients perspective, and moves through the entire connection from end-to-end, testing to make sure everything is hunky dory along the way.
- Make sure the load balancer sees the connection
- Determine how the load balancer handles incoming connections (Layer 4 or Layer 7)
- Check connectivity from the load balancer to the server
Going through the list, if you find a problem, you resolve the problem before you continue on. There may be other problems, but you’ll need to address the first problem you encounter first before moving on, otherwise there will be too many variables.
This check list is particularly useful in situations where you don’t have access to all of the equipment on the network, such as large enterprise situations where separate groups are responsible for areas like firewalls, network routing, switch infrastructure, and servers.
Prepping for the Check List
The two tools you’ll need to run through this checklist are telnet, openssl, and tcpdump (or some other network sniffer). It’s best if you use use tcpdump on the load balancer itself (which is included in most load balancers), but if that’s not possible, setup a network tap of some sort. For this checklist, we’ll assume you’re using a load balancer with tcpdump.
Step #1: Confirm that the connection is reaching the load balancer
Step 1 is to simply ensure that connections are going to where they are supposed to go. While this is obvious if you’re in a situation where the connection times out, this also works when you get a definate reaction (connection accepted/connection refused), this at least proves that the load balancer is the one sending the response, and not some other device.
In this test, we’re only concerned with whether the connection is reaching the load balancer, step 1 in the diagram above. To do this, run TCP dump on the load balancer with the following attributes:
tcpdump -i [interface] -n host [ip address of virtual] and port [port of virtual]
Then telnet to the IP and port of the virtual service on the load balancer. If you’re doing SSL termination at the load balancer, use telnet anyway, as we’re just testing for a valid TCP connection. It’s best that you do this from a subnet that is not the virtual service, so as to eliminate routing issues.
You can try ping, but it really doesn’t tell us anything. For one, ICMP is not the protocol we’re concerned with. Firewall rules also may block ICMP and not TCP, or it may block TCP and not ICMP. Either way, telnet works much better because on a TCP level and mimics a connection from a browser.
Typically one of three things will happen:
- Connection refused
- Nothing connects, and the operation times out
- A connection is made
What we’re looking for is to see if the load balancer sees the attempted incoming connection. If the load balancer doesn’t see the incoming connection, it may be a routing issue (either Layer 3 or even Layer 2) or it may be that a firewall rule is blocking the connection.
In any event, if you’re not seeing the connection, stop at this step, and figure out why. If you’re dealing with different networking groups, you can bring this tcpdump information to them and they’ll have something substantive to go on.
If you do see the incoming connection, move on to the next step.
Step #2: How Is The Load Balancer Handling The Connection?
As mentioned in a previous blog entry, load balancers exhibit different behavior depending on whether or not the virtual service is configured for Layer 4 or Layer 7. A layer 4-configured virtual service will not complete a TCP connection unless a connection all the way through to a real server. In a Layer 7-configured virtual service, as long as you can reach the IP and port of the load balancer, you’ll probably get an established TCP connection (although some load balancers allow you to change this behavior).
Most load balancers don’t tell you explicitly whether or not you’re running in Layer 4 or Layer 7 mode. They switch between one or the other automatically depending on how you configure the virtual service. Only one load balancer that I know of tells you (KEMP Technologies). With others however, you’re left to pretty much deduce this on your own.
So how do you tell? It depends on the vendor, but generally if you’re using any type of cookie persistence, SSL termination, content rules, or programming language on the load balancer, you’re in Layer 7. In F5′s BIG-IP V9, when you set up a virtual server, there are a few options on the type of virtual server to setup.
Standard and Performance HTTP are Layer 7 configurations, while the others are Layer 4-limited.
So what happens when you try to connect to a Layer 7 virtual service that has connectivity problems with real servers on the back end?
First, the connection will be accepted:
system1> telnet 192.168.0.200 80
Trying 192.168.0.200...
Connected to testvip (192.168.0.200).
Escape character is '^]'.
A valid TCP connection has been established. If you’re troubleshooting and you get this, you may assume that the device you’ve connected to is the server. But this is not the case. You never directly connect to the server when the load balancer operates in Layer 7. In this example, there are no real servers that are on line. The BIG-IP shows all available servers as unavailable. Yet I was still able to make a connection.
Now I’ll do a simple “GET /”. What happens with this “GET /” depends on the vendor, and even on the version. Take for the example BIG-IP Version 4 and BIG-IP Version 9.
With version 9, this happens:
GET /
As soon as I hit <Enter>, the connection is closed by the BIG-IP sending a reset packet.
Connection closed by foreign host.
13:37:36.679409 192.168.0.200:80 > 192.168.2.2.33962: R 1:1(0) ack 7 win 4387 (DF)
With BIG-IP version 4, there is a different behavior. It will hang out for a while, before sending the reset. This can make you think that the web server is hanging, but again, what is happening is that the server.
Step #3: Connectivity From The Load Balancer To The Server
First off, perform some sort of test to see if the real servers are even operational. Open up a browser and plug the IP address (and port) and see if you can bring up a site. If that doesn’t work, or if the server is a non-HTTP protocol, use telnet to see if you can get a TCP connection. If you can’t, you may want to figure out why. If the servers aren’t responding, you’re obviously not going to get far with a load balancer.
If you can get to the servers, log onto the load balancer and telnet from the load balancer to the real server on the port configured. Try connecting to at least one of the servers in a multi-server group.
> telnet 10.0.0.100 80
Again, one of three things will likely happen:
- You’ll get a valid TCP connection. If this occurs, try to make an HTTP request. A simple “GET /” and <enter><enter> (hit enter twice) should suffice to get some sort of response. As long as you get some sort of response, that’s good.
- You’ll get a connection refused. If you’re getting connection refused, it’s either because a firewall is blocking you, or the server isn’t answered.
- Your connection will time out. For whatever reason, packets aren’t getting to the server. This can either be a firewalling issue, or some other routing issue. If you’ve got access to run tcpdump or other type of network trace on the server, see if you can see the incoming connection from the load balancer.
After running these tests, you should have a much better picture on what’s going on with your network. If the issue was caused by the load balancer, this would probably have spotted the root cause. If it wasn’t the load balancer, then you’ve got evidence to prove that it’s not.




Hey!…I Googled for block ip address, but found your page about Troubleshoot Any Load Balancing Ailment: End-to-End Connectivity…and have to say thanks. nice read.
Very cool troubleshooting resource…thanks-a-million!
Where is step 4?