HAProxy

The Reliable, High Performance TCP/HTTP Load Balancer

  Mirror Sites: Master
  Language: English

Quick links

Quick News
Recent News
Description
Main features
Supported Platforms
Performance
Reliability
Security
Download
Documentation
Live demo
They use it!
Commercial Support
Add-on features
Other Solutions
Contacts
External links
Discussions
Mailing list
10GbE load-balancing (updated)
Contributions
Coding style
Known bugs

HATop: Ncurses Interface
Herald: load feedback agent
RHEL-based Docker images
Alpine-based Docker images
Debian/Ubuntu packages



visitors online
 
Thanks for your support !




Latest versions

BranchDescriptionLast versionReleasedLinksNotes
Development 1.8-dev 1.8-dev1 2017/04/03 git / web / dir may be broken
1.7 1.7-stable 1.7.5 2017/04/03 git / web / dir Stable version
1.6 1.6-stable 1.6.12 2017/04/04 git / web / dir Stable version
1.5 1.5-stable 1.5.19 2016/12/25 git / web / dir Stable version
1.4 1.4-stable 1.4.27 2016/03/14 git / web / dir Critical fixes only
1.3 1.3-stable 1.3.28 2016/03/14 git / web / dir Unmaintained
1.3.15 1.3.15-maint 1.3.28.14 2015/02/01 git / web / dir Unmaintained
1.3.14 1.3.14-maint 1.3.28.14 2009/07/27 git / web / dir Unmaintained
1.2 1.2-stable 1.2.18 2008/05/25 git / web / dir Unmaintained
1.1 1.1-stable 1.1.34 2006/01/29 git / web / dir Unmaintained
1.0 1.0-old 1.0.2 2001/12/30 git / web / dir Unmaintained

Quick News

February 28th, 2017 : 1.7.3

December 25th, 2016 : 1.6.11 and 1.5.19

    There was nothing really critical on the 1.6 front but a number of fixes were pending among which some against painful bugs when hitting out of memory conditions. 1.5 had been lagging for quite longer and received fixes against a few risks of crashes when misusing sc_trackers, another rare crash in zlib not happening with slz, some situations where random connections can be frozen during a redispatch, and an issue with gcc 6 where the listening address could be ignored. For more info, you can check the full 1.6.11 announcement here and the 1.5.19 announcement here.

    Code and changelogs are available here for 1.6 and here for 1.5 as usual.

Recent news...

Description

HAProxy is a free, very fast and reliable solution offering high availability, load balancing, and proxying for TCP and HTTP-based applications. It is particularly suited for very high traffic web sites and powers quite a number of the world's most visited ones. Over the years it has become the de-facto standard opensource load balancer, is now shipped with most mainstream Linux distributions, and is often deployed by default in cloud platforms. Since it does not advertise itself, we only know it's used when the admins report it :-)

Its mode of operation makes its integration into existing architectures very easy and riskless, while still offering the possibility not to expose fragile web servers to the net, such as below :

We always support at least two active versions in parallel and an extra old one in critical fixes mode only. The currently supported versions are :

  • version 1.7 : added server hot reconfiguration, content processing agents, multi-type certs, ...
  • version 1.6 : added DNS resolution support, HTTP connection multiplexing, full stick-table replication, stateless compression, ...
  • version 1.5 : added SSL, IPv6, keep-alive, DDoS protection, ...
  • version 1.4 : the most stable version for people who don't need SSL. Still provides client-side keep-alive
  • version 1.3 : the old stable version for companies who cannot upgrade for internal policy reasons.

Main features

Each version brought its set of features on top of the previous one. Upwards compatibility is a very important aspect of HAProxy, and even version 1.5 is able to run with configurations made for version 1.0 13 years before. Version 1.6 dropped a few long-deprecated keywords and suggests alternatives. The most differenciating features of each version are listed below :

  • version 1.5, released in 2014 This version further expands 1.4 with 4 years of hard work : native SSL support on both sides with SNI/NPN/ALPN and OCSP stapling, IPv6 and UNIX sockets are supported everywhere, full HTTP keep-alive for better support of NTLM and improved efficiency in static farms, HTTP/1.1 compression (deflate, gzip) to save bandwidth, PROXY protocol versions 1 and 2 on both sides, data sampling on everything in request or response, including payload, ACLs can use any matching method with any input sample maps and dynamic ACLs updatable from the CLI stick-tables support counters to track activity on any input sample custom format for logs, unique-id, header rewriting, and redirects, improved health checks (SSL, scripted TCP, check agent, ...), much more scalable configuration supports hundreds of thousands of backends and certificates without sweating
  • version 1.4, released in 2010 This version has brought its share of new features over 1.3, most of which were long awaited : client-side keep-alive to reduce the time to load heavy pages for clients over the net, TCP speedups to help the TCP stack save a few packets per connection, response buffering for an even lower number of concurrent connections on the servers, RDP protocol support with server stickiness and user filtering, source-based stickiness to attach a source address to a server, a much better stats interface reporting tons of useful information, more verbose health checks reporting precise statuses and responses in stats and logs, traffic-based health to fast-fail a server above a certain error threshold, support for HTTP authentication for any request including stats, with support for password encryption, server management from the CLI to enable/disable and change a server's weight without restarting haproxy, ACL-based persistence to maintain or disable persistence based on ACLs, regardless of the server's state, log analyzer to generate fast reports from logs parsed at 1 Gbyte/s,
  • version 1.3, released in 2006 This version has brought a lot of new features and improvements over 1.2, among which content switching to select a server pool based on any request criteria, ACL to write content switching rules, wider choice of load-balancing algorithms for better integration, content inspection allowing to block unexpected protocols, transparent proxy under Linux, which allows to directly connect to the server using the client's IP address, kernel TCP splicing to forward data between the two sides without copy in order to reach multi-gigabit data rates, layered design separating sockets, TCP and HTTP processing for more robust and faster processing and easier evolutions, fast and fair scheduler allowing better QoS by assigning priorities to some tasks, session rate limiting for colocated environments, etc...

Version 1.2 has been in production use since 2006 and provided an improved performance level on top of 1.1. It is not maintained anymore, as most of its users have switched to 1.3 a long time ago. Version 1.1, which has been maintaining critical sites online since 2002, is not maintained anymore either. Users should upgrade to 1.4 or 1.5.

Supported platforms

HAProxy is known to reliably run on the following OS/Platforms :

Highest performance is achieved with modern operating systems supporting scalable polling mechanisms such as epoll on Linux 2.6/3.x or kqueue on FreeBSD and OpenBSD. This requires haproxy version newer than 1.2.5. Fast data transfers are made possible on Linux 3.x using TCP splicing and haproxy 1.4 or 1.5. Forwarding rates of up to 40 Gbps have already been achieved on such platforms after a very careful tuning. While Solaris and AIX are supported, they should not be used if extreme performance is required.

Current typical 1U servers equipped with a dual-core Opteron or Xeon generally achieve between 15000 and 40000 hits/s and have no trouble saturating 2 Gbps under Linux.

Performance

Well, since a user's testimony is better than a long demonstration, please take a look at Chris Knight's experience with haproxy saturating a gigabit fiber in 2007 on a video download site. Since then, the performance has significantly increased and the hardware has become much more capable, as my experiments with Myricom's 10-Gig NICs have shown two years later. Now as of 2014, 10-Gig NICs are too limited and are hardly suited for 1U servers since they do rarely provide enough port density to reach speeds above 40-60 Gbps in a 1U server. 100-Gig NICs are coming and I expect to run new series of tests when they are available.

HAProxy involves several techniques commonly found in Operating Systems architectures to achieve the absolute maximal performance :

  • a single-process, event-driven model considerably reduces the cost of context switch and the memory usage. Processing several hundreds of tasks in a millisecond is possible, and the memory usage is in the order of a few kilobytes per session while memory consumed in preforked or threaded servers is more in the order of megabytes per process.
  • O(1) event checker on systems that allow it (Linux and FreeBSD) allowing instantaneous detection of any event on any connection among tens of thousands.
  • Delayed updates to the event checker using a lazy event cache ensures that we never update an event unless absolutely required. This saves a lot of system calls.
  • Single-buffering without any data copy between reads and writes whenever possible. This saves a lot of CPU cycles and useful memory bandwidth. Often, the bottleneck will be the I/O busses between the CPU and the network interfaces. At 10-100 Gbps, the memory bandwidth can become a bottleneck too.
  • Zero-copy forwarding is possible using the splice() system call under Linux, and results in real zero-copy starting with Linux 3.5. This allows a small sub-3 Watt device such as a Seagate Dockstar to forward HTTP traffic at one gigabit/s.
  • MRU memory allocator using fixed size memory pools for immediate memory allocation favoring hot cache regions over cold cache ones. This dramatically reduces the time needed to create a new session.
  • Work factoring, such as multiple accept() at once, and the ability to limit the number of accept() per iteration when running in multi-process mode, so that the load is evenly distributed among processes.
  • CPU-affinity is supported when running in multi-process mode, or simply to adapt to the hardware and be the closest possible to the CPU core managing the NICs while not conflicting with it.
  • Tree-based storage, making heavy use of the Elastic Binary tree I have been developping for several years. This is used to keep timers ordered, to keep the runqueue ordered, to manage round-robin and least-conn queues, to look up ACLs or keys in tables, with only an O(log(N)) cost.
  • Optimized timer queue : timers are not moved in the tree if they are postponed, because the likeliness that they are met is close to zero since they're mostly used for timeout handling. This further optimizes the ebtree usage.
  • optimized HTTP header analysis : headers are parsed an interpreted on the fly, and the parsing is optimized to avoid an re-reading of any previously read memory area. Checkpointing is used when an end of buffer is reached with an incomplete header, so that the parsing does not start again from the beginning when more data is read. Parsing an average HTTP request typically takes half a microsecond on a fast Xeon E5.
  • careful reduction of the number of expensive system calls. Most of the work is done in user-space by default, such as time reading, buffer aggregation, file-descriptor enabling/disabling.
  • Content analysis is optimized to carry only pointers to original data and never copy unless the data needs to be transformed. This ensures that very small structures are carried over and that contents are never replicated when not absolutely necessary.

All these micro-optimizations result in very low CPU usage even on moderate loads. And even at very high loads, when the CPU is saturated, it is quite common to note figures like 5% user and 95% system, which means that the HAProxy process consumes about 20 times less than its system counterpart. This explains why the tuning of the Operating System is very important. This is the reason why we ended up building our own appliances, in order to save that complex and critical task from the end-user.

In production, HAProxy has been installed several times as an emergency solution when very expensive, high-end hardware load balancers suddenly failed on Layer 7 processing. Some hardware load balancers still do not use proxies and process requests at the packet level and have a great difficulty at supporting requests across multiple packets and high response times because they do no buffering at all. On the other side, software load balancers use TCP buffering and are insensible to long requests and high response times. A nice side effect of HTTP buffering is that it increases the server's connection acceptance by reducing the session duration, which leaves room for new requests.

There are 3 important factors used to measure a load balancer's performance :

  • The session rate
    This factor is very important, because it directly determines when the load balancer will not be able to distribute all the requests it receives. It is