When Your Load Balancer Has A Short Attention Span
The ability for a load balancer to peer into (and potentially manipulate) the HTTP headers of incoming connections was once an advanced feature, but now is fairly commonplace. Most often it’s used in cookie -based persistence, but it’s also used in web switching, true-source IP resolution, and other tasks.
But the ability to look at the HTTP headers doesn’t always work the way you might think it would. Often, the load balancer can have a short attention span.
In a traditional HTTP 1.1 connection, multiple HTTP requests are sent through a single TCP connection. Most load balancers by default will only look at the first HTTP request, and ignore the rest. To elaborate on this, let’s take a look at two of the basic concepts of HTTP.
HTTP Basics
The HTTP protocol can be broken up into to parts: HTTP requests and HTTP responses. Both are comprised of two components: HTTP headers and HTTP content.
In both HTTP requests and HTTP responses, there are always HTTP headers. In an HTTP request, there is sometimes content, such as a form POST, or uploading a file. In the HTTP response, there is usually content, but there are cases when there is none (such as with an HTTP HEAD request).
| HTTP Request | HTTP Response |
| HTTP Header (always) | HTTP Header (always) |
| HTTP Content (sometimes) | HTTP Content (usually) |
And there’s one more important bit to keep in mind with regard to HTTP: Every object has a separate request and a separate response. That’s every JPG, GIF, Flash file, HTML file, etc. So a web page with 20 images will invoke 21 different HTTP requests; one for the HTML page itself, and 20 for the objects (such as images) referenced in the HTML file.
With HTTP 1.1, all of those 21 objects in a web page are typically requested in a single TCP stream, rather than 21 individual connections (which would be fairly inefficient). But this presents a problem for load balancers.
Do load balancers look at the data in the first request out of the 21? Or does the load balancer look at each request individually?
Typically, the load balancer will (by default) only pay attention to the first HTTP request in a TCP connection. Any subsequent HTTP request headers are ignored.
This means that once the decision is made on the first request, every subsequent request is sent to the same server, so long as it’s part of the same TCP connection. The load balancer essentially ignores the headers for the following requests. It’s no longer paying attention.
If you’re doing cookie persistence, where the load balancer is using a cookie to figure out where to send requests to, this usually isn’t a problem.
However, if you’re trying to separate out traffic such as JPGs from HTML to send to different servers (web switching/Layer 7 switching), or if you’re trying to insert headers into every connection (such as the true source IP address, or an SSL header), then this is a big problem.
So what’s the fix? Well, there are two possible solutions, depending on your load balancer.
Load Balancer Solution
One solution, if your load balancer supports it, is to configure your load balancer to specifically pay attention to each request. Again, most load balancers by default ignore the subsequent requests. However, with some load balanacers it’s possible to increase the attention span. Most of the higher-end Enterprise-oritented load balancers (such as Cisco and F5) have this ability. With F5 and A10 Networks, this long attention span is enabled by default (in the F5 it can be turned off for performance). In Cisco, enabling a long attention span is an option known rather cryptically as “connection rebalance”. (If you know the option for the load balancer you use, feel free to add it to the comments section.)
This does cause the load balancer to do more work so your overall capacity may go down, but it’s probably your best option if you need the load balancer to pay attention to all headers, not just the first.
Server Solution
Some load balancers simply don’t have the ability to pay attention to all requests in a TCP connection; they can only look at the first connection. This is true for many of the value market load balancers. If this is the case, you’re only solution is not on the load balancer, but the servers themselves.
All web servers have the ability to turn off the “Keep-Alive” function, which is what allows multiple requests in a single TCP connection. Turning Keep-Alive off forces the web clients to make a separate TCP connection for each request. Since there’s a separate connection for each request, the load balancer will then pay attention to every header.
In the Apache configuration file (for several versions, including 2.2), there’s a single line option called KeepAlive. Also, if you Google for Keep-Alive and your favorite web server and version, you should find plenty of HOW-TOs, such as this one for IIS 7.
The drawback is that you make the web server do some more work (opening and allocating resources for a TCP connection for each object), and you increase network utilization by a nominal amount. This can reduce your servers overall performance/capacity, but you may not have any other choice. However, this all depends on the nature of your traffic.









