First Moments of an SSL Connection
I stumbled upon this link somewhere (Digg I think?), and it goes into pretty good detail the first moments of an SSL connection:
I stumbled upon this link somewhere (Digg I think?), and it goes into pretty good detail the first moments of an SSL connection:
Most in IT are familiar with the concept of Moore’s Law, whereby processor capability tends to double about every two years. To a certain extent, this happens with networking equipment, with their capacity increasing at a steady rate, although probably not the same rate at processors.
Benefiting from Moore’s Law to a great extent are load balancers/ADCs, where the lowest end device from just about every vendor can handle traffic loads in the 50-100 Mbps range. Of course, throughput isn’t a terrible way of measuring performance capability of a load balancer (100 Mbps of large file downloads is a heckuva lot easier than 100 Mbps of tiny file connections), but it does relate well to one very important factor in web site serving:
How big is your pipe?
Businesses of all sizes have seen steady increases in their available bandwidth to be sure, but while the growth may be steady, for the most part it tends to be a slower growth rate. Much slower than doubling every 18 months, like in Moore’s Law. The result is that increasingly that the lower end offerings from vendors are more than sufficient to run a larger share of web sites out there.
There are a number of reasons for this. We’re no longer experiencing the exponential accross-the-board growth rates in users that occurred during the dot-com boom. While there are some sites going through a growth explosion, for most websites in this economic environment, growth rates of any kind are fairly extraordinary. So extra bandwidth isn’t in as high demand. Secondly, if you’re hosting your own data center as many large businesses do, getting extra bandwidth is often time consuming. Moving from a single DS3 (45 Mbps) to an OC-3 (155 Mbps) is going to take some time to get that order fulfilled.
Cisco’s ACE 4710 appliance comes with a default license of 1 Gbps of throughput. F5’s entry-level BIG-IP 1600 LTM maxes out at 1 Gbps. In the Enterprise market, 500-Mbps to 1 Gbps is about the rock bottom in terms of performance capability. Yet many of the high-end clients of these vendors don’t push nearly that much traffic.
Companies that aren’t media or mega-content providers (such as Google, Youtube, Yahoo!, Facebook, etc.) that have web applications serving customers or businesses typically don’t go above 100-200 Mbps in traffic, even for some Fortune 500 companies. Of course, there are exceptions, and there are quite a few factors involved in determining the traffic characteristics of a site. Companies that are offering media such as streaming video or audio often use third-party content providers, such as Youtube or Akami, so as to keep that bandwidth off their own pipe.
So we’ve got all this idle CPU time, so why not make use of it? That’s what many vendors are doing, in both the enterprise and value markets. With the steady rise in CPU power while bandwidth consumption lags behind, vendors are throwing more and more capabilities into these devices to take advantage of the unused CPU cycles, such as caching, compression, Layer-7 inspection, etc. Vendors are offering more functionality with the greater power they have available to them.
In the previous post, I talked about the o3 article, and where I think they may have gotten it wrong (but it’s impossible to tell, as he didn’t publish any details on his testing methodologies, which is pretty lame).
But that he may have used the wrong terminology for the performance testing he did (saying it was TPS instead of HTTP requests per second) shows that there’s a lot of confusion on benchmark terminology, so I’m going to go over some of the basics.
In the load balancing world, TPS (Transactions per second) refer to the number of new SSL connections initiated. The new part is important, because each new SSL connection requires a relatively CPU-expensive asymmetric encryption operation. This is why most load balancers that do SSL have a separate chip for SSL processing (SSL ASIC), which offloads the SSL functionality from the main CPU.
Once the first step of an SSL connection is completed, the encryption then shifts to the much more CPU-friendly symmetric encryption, which is often referred to as “bulk encryption”. Pushing bulk encryption throughput is relatively easy for a load balancer, even without an SSL accelerator chip.
However, HTTP/HTTPS typically involve short-lived connections, so there is relatively little throughput, and a lot of connection setup/teardown. Hence the need to know the TPS rating of a given device.
Many vendors will offer tiered licensing for SSL TPS. So keep in mind when they mean TPS, they usually reffer to *new* SSL connections per second.
There are two ways to measure “rates” with load balancers: Connection rate, and request rate. While they sound similar, when you get right down to it, they’re actually quite different.
Connection rate refers to the number of TCP connections per second a device can handle. HTTP request rate refers to the number of HTTP requests the device can handle. How are they different? You can have multiple HTTP requests in a single TCP connection.
When your browser goes to a web site, it firsts initiates a TCP connection to the server (or in our case, a load balancer load balancing traffic for servers). In that TCP connection, your browser will typically make several HTTP requests over that connection.
Making multiple requests over a single TCP connection is a lot easier than making a TCP connection for every single request. In fact, the original HTTP 1.0 specification required on TCP connection per request. The HTTP 1.1 specification fixed that, by allowing the multiple requests per TCP stream.
When a load balancer operates in Layer-4 mode, it’s functioning a lot like a router. In fact, it’s not doing much more than your wireless access point at home. Very little memory is consumed with each new connection, and only the TCP/IP header information is evaluated.
When a load balancer operates in Layer-7 mode, it’s functioning more like a server. The TCP session is terminated at the load balancer, and a new TCP connection is initated to the server. HTTP requests are buffered in the load balancer’s memory in order to be evaluated. This requires a lot more processing power and a lot more memory.
Obviously, a load balancer can handle more Layer-4 workload than Layer-7, so it’s important to know which mode you plan on using when it comes to performance.
A10 Networks recently released their new AX 5200 ADC, a 2U unit capable of pushing 40 Gbps of throughput and 3 million Layer 4 connections per second. Check the link for more details, but dayom, 40 Gbps in 2U? If it can really push that much, that’s a lot of gigs.

I’d read the O3 Magazine article by John Buswell with great interest, as well as Lori MacVittie’s response article. I thought they both made great points as I said in my previous post, and I was content to leave it at that.
Then I read the follow-up response over at the 03 blog. And I got that pang. You know the one.
Someone is wrong on the Internet.
He made a number of errors about F5’s capabilities, and they were cleared up by Lori in her response. But there are a couple of items I wanted to address. Take this quote from his blog post:
She claims that L7 is expensive, sure it takes extra processing, but if you read the article, you’d see that I drop hints that Nginx has a very superior way of handling I/O.
Maybe it’s just me, but “oh, I dropped hints!” seems to be talking down to your audience. At best, it’s condescending to your audience. At worst, it’s a shallow and transparent attempt to show a depth of knowledge in an area you actually know only superficially. Either detail the “very superior way of handling I/O”, or post a link to something detailing as such, or get off the pot.
There’s also his claim of being able to do over 25,000 TPS, which Lori rightfully called into question.
My personal favourite is that she quotes a 3 year old article to try to claim that the Opteron can handle “around 1500″ 1024-bit RSA operations per second, I don’t think she understands what is written in that report, as she has mis-quoted it, and picked a report thats over 3 years old. Lets play far shall we, I know marketing people are used to trying to skew reports, but you try that with me, I’m going to call you on it. Yet we have a running test that shows that machine is handling requests on-par with the F5 solution.
I found a more updated article on a processor very similar to the one used in the original O3 article. For a 4096-bit operation, the value was 343.12 decrypt RSA operations for 8 cores (dual quad-core Opteron 2356, a similar CPU to the one he used). 4096-bit RSA of course being much more expensive than 1024-bit, but by a predictable amount. Mulitply by about 32, and you get the number of 1024-bit operations (used in most SSL certificates). The value comes to about 11,000 for 8-cores, which is inline with what she states, and a lot less than his stated 26,000 connections per second. And as Lori pointed out, 11,000 would be possible if the system were only doing this SSL work. He also didn’t refute that he used 512-bit certs.
It’d be easy enough for him to test. The command to test the system’s SSL capabilities with 8 cores is:
openssl speed rsa -multi 8
So he either really did use 512-bit certs, or he’s actually not measuring TPS correctly. When an SSL vendor measures SSL, they typically use the term TPS, or transactions per second. This typically refers to the rate at which the system can accept new SSL sessions, with each new connection requiring an asymmetric encryption operation. SSL/TLS uses two encryption technologies: Symmetric and Asymmetric. The asymmetric encryption is relatively expensive on a general purpose CPU like an Opteron (about 1,000 times as CPU intensive as symmetric encryption). That’s why devices like the F5 and other vendors include SSL accelerator cards, which are special processors that keep the encryption operations off the main CPU. A device with an SSL accelerator won’t “feel” the impact of an SSL connection any different than a non-SSL connection.
What I’m guessing (and it’s just a guess, since he didn’t state how he did his tests) is that he measured HTTP requests per second, which is a bit different than TPS. If he used HTTP 1.1 connection persistence, he could do 10 or more requests with only one asymmetric operation. While that’s an absolutely fair measure of HTTP/S performance, it is not TPS, at least not in the generally accepted way. If you measured an F5 the same way (multiple HTTP requests through a single SSL connection), the F5 (or any other vendor advertising in TPS) would be able to push far beyond 25,000.
If that is the case, then he could do a comparative TPS test by using HTTP 1.0 mode, where each HTTP request required it’s own TCP connection (and thus asymmetric operation).
And finally, we have this.
1. No offense but covering the beat doesn’t exactly equate to 9 years experience with the technology. Sure you look at the trends, products and evaluations, but this is within a sandbox, not day to day real world experience. I’ve got 2 years experience working with Alteon as a customer back in 1998, working on the bleeding edge at the time of L4 switching. I kept F5 out of the customer site where I was working, simply because Alteon offered a much better and more innovative hardware platform. The web guys liked F5 because it did fancy graphs, Alteon got the job done in terms of performance and scalability. Following that, I spent over 6 years working as a Sustaining Engineer for Nortel / Alteon, responsible for thousands of bug fixes, and beating F5 on many occasions. After that I spent about 18 months working on Open Source App Delivery before returning to Nortel to work on their next-gen platform and help Sustain the Application Switch line. So as you can see, not all experience is equal.
![]()
He should really have done more research on Lori. There aren’t a lot of people in the world with a wider breadth of knowledge in the Layer-7/ADC/load balancing world as Lori (and he ain’t among them). He makes the mistake of dismissing her as “covering the beat”, referring to her time at Network World Computing, as if her job there only involved sitting through power point presentations and hitting the occasional power switch. Just looking at her F5 devcentral posts, she has an impressive knowledge from such aspects of the technology as HTTP security, application acceleration, and finer aspects of the HTTP protocol, to application-specific issues such as SOA, XML security, and APIs. I’ve never met her, but I’ve spoken with her on a number of occasions, and she’s the real deal.
This article represents the first in a series of reviews of a market segment known as link load balancing. Link load balancing are a class of device that allow multiple Internet connections of an unrelated nature to be shared, load balanced, and fail over, all without using a routing protocol. They can handle links from 56K lines all the way to Gigabit downlinks, and mix and match them to boot.
Traditionally, if you wanted an office to have multiple network connections that were both load balanced and redundant, you’d get a few T1 or Frame Relay lines and run a routing protocol such as OSPF with your ISP. If a link went down, the routing protocol would remove the bad link from routing tables, and traffic would proceed normally with hardly a blip. This would typically require using the same service provider, limiting redundancy.
Larger organizations would have full BGP peering with their ISPs as well as portable netblocks, allowing them a great amount of flexibilty in how their various links are balanced and failed over. But of course few organizations today qualify for portable netblocks or have the budget for a staff that can handle that configuration.
Today, the T1 line as well as frame relay connections have fallen out of favor for most offices for Internet connectivity. Typically far less expensive, and higher capacity, are consumer-grade cable modem and DSL lines, offering bandwidth from several hundred kilobits to 50 Mbps and beyond with the new DOCSIS 3.0 cable modems. But most cable modem and DSL service providers won’t allow any kind of routing protocols, peering, or other load balancing/redundancy. And that’s where link load balancers come into play.
Known as traffic mangers, link load balancers, and half a dozen other terms (sort of like the great server load balancer/application delivery controller debate), they allow you to utilize multiple links at the same time (sending some user requests out one link while others go out different links) and fail over to remaining links in case of link failures.
I’ve been wanting to do a review of a product in this market segment, and Ecessa was kind enough to send me an evaluation of their ShieldLink 100 link load balancer.


As pointed out by Shawn Nunley on the lb-l mailing list, o3 Magazine did a piece on rolling your own SSL accelerator and load balancer. Lori MacVittie over at F5 (who shares my affinity for cat pictures with grammatically dubious captions) did a cautionary piece suggesting that rolling your own Layer-7 device has some drawbacks to consider.
My take is that yes, rolling your own can be a great way to same some money, and yes as Lori said, rolling your own can end up not saving you as much money as you’d thought. Rolling your own requires two things that you may not have: Talent and time. If you’ve got those, then awesome. If not, a pre-packaged solution may work for you. The merits and drawbacks are issues that can depend on your situation, and are also things upon which reasonable people can disagree.
And if there’s anything the Internet is known for, it’s that it is full of reasonable people (as evidenced by the comments section in Lori’s post).
But the decision isn’t between an expensive (yet impressive) BIG-IP LTM 6900 and a roll-your-own box, there’s a third option in if you need the Layer 7/SSL acceleration, and that’s the value-market. Vendors like KEMP Technologies, Coyote Point Systems, and Barracuda make Layer 7 devices that are much simpler to configure than a roll-your-own box yet cost about the same. You can spend about $10,000 USD and get around 2,000 SSL connections per second, as well as around 200 Mbps of throughput.
There are certainly situations where I’ve had to/greatly benefited from rolling my own. My home data center/laundry room is a perfect example. However, there are many times when using a pre-packaged solution is way better, even if it costs more.
This morning A10 Networks sent me this press release, announcing the release of their new AX1000, an entry-level offering to the AX load balancer/ADC line.

I just got word that Radware has just completed their acquisition of Nortel’s Layer 4-7/Alteon business. In the press release, they reiterated their commitment to continuing development of the Alteon line, with the purchase not just being a customer list buy.
Final purchase price was around $18,000,000. Not a bad deal for Radware (considering Nortel bought Alteon for $7.8 billion in 2000).
Owen Garret, product manager for Zeus’s ZXTM load balancer/ADC has announced on the load balancing mailing list a Zeus program that allows a free development license for the ZXTM on non-production systems.
Definitely worth checking out.