They use it!
10GbE load-balancing (updated)
HATop: Ncurses Interface
Herald: load feedback agent
RHEL-based Docker images
Alpine-based Docker images
Thanks for your support !
February 28th, 2017 : 1.7.3
HAProxy 1.7.3 was released on 2017-02-28. It fixes a few remaining bugs affecting 1.7, mostly related to DNS, Lua, header rewriting, and compression for the more serious ones. A few other minor issues were addressed. For the details, please check the announcement here. Code and changelog are available here as usual.
December 25th, 2016 : 1.6.11 and 1.5.19
HAProxy is a free, very fast and reliable solution offering
load balancing, and
proxying for TCP and HTTP-based applications. It is particularly suited for very
high traffic web sites and powers quite a number of the world's most visited ones.
Over the years it has become the de-facto standard opensource load balancer, is
now shipped with most mainstream Linux distributions, and is often deployed by
default in cloud platforms. Since it does not advertise itself, we only know it's
used when the admins report it :-)
Its mode of operation makes its integration into existing architectures very easy
and riskless, while still offering the possibility not to expose fragile web servers
to the net, such as below :
We always support at least two active versions in parallel and an extra old
one in critical fixes mode only. The currently supported versions are :
- version 1.7 : added server hot reconfiguration, content processing agents, multi-type certs, ...
- version 1.6 : added DNS resolution support, HTTP connection multiplexing, full stick-table replication, stateless compression, ...
- version 1.5 : added SSL, IPv6, keep-alive, DDoS protection, ...
- version 1.4 : the most stable version for people who don't need SSL. Still provides client-side keep-alive
- version 1.3 : the old stable version for companies who cannot upgrade for internal policy reasons.
Each version brought its set of features on top of the previous one.
Upwards compatibility is a very important aspect of HAProxy, and even
version 1.5 is able to run with configurations made for version 1.0 13
years before. Version 1.6 dropped a few long-deprecated keywords and
suggests alternatives. The most differenciating features of each version
are listed below :
- version 1.5, released in 2014
This version further expands 1.4 with 4 years of hard work :
native SSL support on both sides with SNI/NPN/ALPN and OCSP stapling,
IPv6 and UNIX sockets are supported everywhere,
full HTTP keep-alive for better support of NTLM and improved efficiency in static farms,
HTTP/1.1 compression (deflate, gzip) to save bandwidth,
PROXY protocol versions 1 and 2 on both sides,
data sampling on everything in request or response, including payload,
ACLs can use any matching method with any input sample
maps and dynamic ACLs updatable from the CLI
stick-tables support counters to track activity on any input sample
custom format for logs, unique-id, header rewriting, and redirects,
improved health checks (SSL, scripted TCP, check agent, ...),
much more scalable configuration supports hundreds of thousands of backends and certificates without sweating
- version 1.4, released in 2010
This version has brought its share of new features over 1.3, most of which were long awaited :
client-side keep-alive to reduce the time to load heavy pages for clients over the net,
TCP speedups to help the TCP stack save a few packets per connection,
response buffering for an even lower number of concurrent connections on the servers,
RDP protocol support with server stickiness and user filtering,
source-based stickiness to attach a source address to a server,
a much better stats interface reporting tons of useful information,
more verbose health checks reporting precise statuses and responses in stats and logs,
traffic-based health to fast-fail a server above a certain error threshold,
support for HTTP authentication for any request including stats, with support for password encryption,
server management from the CLI to enable/disable and change a server's weight without restarting haproxy,
ACL-based persistence to maintain or disable persistence based on ACLs, regardless of the server's state,
log analyzer to generate fast reports from logs parsed at 1 Gbyte/s,
- version 1.3, released in 2006
This version has brought a lot of new features and improvements over 1.2, among which
content switching to select a server pool based on any request criteria,
ACL to write content switching rules, wider choice of
load-balancing algorithms for better integration,
content inspection allowing to block unexpected protocols,
transparent proxy under Linux, which allows to directly connect to
the server using the client's IP address, kernel TCP splicing to forward
data between the two sides without copy in order to reach multi-gigabit data rates,
layered design separating sockets, TCP and HTTP processing for more
robust and faster processing and easier evolutions, fast and fair scheduler
allowing better QoS by assigning priorities to some tasks, session rate limiting
for colocated environments, etc...
Version 1.2 has been in production use since 2006 and provided an improved performance level
on top of 1.1. It is not maintained anymore, as most of its users have switched to 1.3 a long
time ago. Version 1.1, which has been maintaining critical sites online since 2002, is not
maintained anymore either. Users should upgrade to 1.4 or 1.5.
HAProxy is known to reliably run on the following OS/Platforms :
- Linux 2.4 on x86, x86_64, Alpha, Sparc, MIPS, PARISC
- Linux 2.6 / 3.x on x86, x86_64, ARM, Sparc, PPC64
- Solaris 8/9 on UltraSPARC 2 and 3
- Solaris 10 on Opteron and UltraSPARC
- FreeBSD 4.10 - 10 on x86
- OpenBSD 3.1 to -current on i386, amd64, macppc, alpha, sparc64 and VAX (check the ports)
- AIX 5.1 - 5.3 on Power™ architecture
Highest performance is achieved with modern operating systems supporting scalable polling mechanisms such as
epoll on Linux 2.6/3.x or kqueue
on FreeBSD and OpenBSD. This requires haproxy version newer than 1.2.5. Fast data transfers are made possible
on Linux 3.x using TCP splicing and haproxy 1.4 or 1.5. Forwarding rates of up to 40 Gbps have already been
achieved on such platforms after a very careful tuning. While Solaris and AIX are supported, they should not
be used if extreme performance is required.
Current typical 1U servers equipped with a dual-core Opteron or Xeon generally
achieve between 15000 and 40000 hits/s and have no trouble saturating 2 Gbps
Well, since a user's testimony is better than a long demonstration, please take a look at
Chris Knight's experience
with haproxy saturating a gigabit fiber in 2007 on a video download site. Since then,
the performance has significantly increased and the hardware has become much more capable, as
my experiments with
Myricom's 10-Gig NICs have shown two years later. Now as of
2014, 10-Gig NICs are too limited and are hardly suited for 1U servers since they do rarely
provide enough port density to reach speeds above 40-60 Gbps in a 1U server. 100-Gig NICs
are coming and I expect to run new series of tests when they are available.
HAProxy involves several techniques commonly found in Operating Systems
architectures to achieve the absolute maximal performance :
- a single-process,
event-driven model considerably reduces the cost of
and the memory usage. Processing several hundreds of tasks in a millisecond is
possible, and the memory usage is in the order of a few kilobytes per session
while memory consumed in preforked or threaded servers is more in the order of
megabytes per process.
- O(1) event checker on systems that allow it (Linux and FreeBSD)
allowing instantaneous detection of any event on any connection among tens of
- Delayed updates to the event checker using a lazy event cache ensures
that we never update an event unless absolutely required. This saves a lot of
- Single-buffering without any data copy between reads and writes whenever
possible. This saves a lot of CPU cycles and useful memory bandwidth. Often,
the bottleneck will be the I/O busses between the CPU and the network
interfaces. At 10-100 Gbps, the memory bandwidth can become a bottleneck too.
- Zero-copy forwarding is possible using the splice() system
call under Linux, and results in real zero-copy starting with Linux 3.5. This
allows a small sub-3 Watt device such as a Seagate Dockstar to forward HTTP
traffic at one gigabit/s.
memory allocator using fixed size memory pools for immediate memory
allocation favoring hot cache regions over cold cache ones. This dramatically
reduces the time needed to create a new session.
- Work factoring, such as multiple accept() at once, and
the ability to limit the number of accept() per iteration when
running in multi-process mode, so that the load is evenly distributed among
- CPU-affinity is supported when running in multi-process mode, or simply
to adapt to the hardware and be the closest possible to the CPU core managing the
NICs while not conflicting with it.
- Tree-based storage, making heavy use of the Elastic Binary tree I have
been developping for several years. This is used to keep timers ordered, to keep
the runqueue ordered, to manage round-robin and least-conn queues, to look up ACLs
or keys in tables, with only an O(log(N)) cost.
- Optimized timer queue : timers are not moved in the tree if they are
postponed, because the likeliness that they are met is close to zero since they're
mostly used for timeout handling. This further optimizes the ebtree usage.
- optimized HTTP header analysis : headers are parsed an interpreted on
the fly, and the parsing is optimized to avoid an re-reading of any previously
read memory area. Checkpointing is used when an end of buffer is reached with
an incomplete header, so that the parsing does not start again from the
beginning when more data is read. Parsing an average HTTP request typically
takes half a microsecond on a fast Xeon E5.
- careful reduction of the number of expensive system calls. Most of the
work is done in user-space by default, such as time reading, buffer aggregation,
- Content analysis is optimized to carry only pointers to original data and
never copy unless the data needs to be transformed. This ensures that very
small structures are carried over and that contents are never replicated when
not absolutely necessary.
All these micro-optimizations result in very low CPU usage even on moderate
loads. And even at very high loads, when the CPU is saturated, it is quite common
to note figures like 5% user and 95% system, which means that the
HAProxy process consumes about 20 times less than its system counterpart. This
explains why the tuning of the Operating System is very important. This
is the reason why we ended up building
our own appliances,
in order to save that complex and critical task from the end-user.
In production, HAProxy has been installed several times as an emergency solution
when very expensive, high-end hardware load balancers suddenly failed on Layer 7
processing. Some hardware load balancers still do not use proxies and process requests
at the packet level and have a great difficulty at supporting
requests across multiple packets and high response
times because they do no buffering at all. On the
other side, software load balancers use TCP buffering
and are insensible to long requests and high response times. A
nice side effect of HTTP buffering is that it
increases the server's connection acceptance by reducing the
session duration, which leaves room for new requests.
There are 3 important factors used to measure a load balancer's performance :
- The session rate
This factor is very important, because it directly determines when the load
balancer will not be able to distribute all the requests it receives. It is