They use it!
Mailing list archives
10GbE load-balancing (updated)
Web Based User Interface
HATop: Ncurses Interface
Thanks for your support !
April, 1st, 2015 : April Fool's : Complete rewrite of HAProxy in Lua
As some might have noticed, HAProxy development is progressively slowing
down over time. I have analyzed the situation and came to the following
In parallel, I'm seeing I'm getting old, I turned 40 last year and it's
obvious that I'm not as much capable of optimizing code as I used to be.
I'm of the old school, still counting the CPU cycles it takes a function
to execute, the nanoseconds required to append an X-Forwarded-For header
or to parse a cookie. And all of this is totally wasted when people run
the software in virtual machines which only allocate portions of CPUs
(ie they switch between multiple VMs at high rate), or install it in
front of applications which saturate at 100 requests a second.
Recently with the Lua addition, we found it to be quite fast. Maybe not
as fast as C, but Lua is improving and C skills are diminishing, so I
guess that in a few years the code written in Lua will be much faster
than the code we'll be able to write in C. Thus I found it wise to
declare a complete rewrite of HAProxy in Lua. It comes with many
First, Lua is easy to learn, we'll get many more developers and
contributors. One of the reason is that you don't need to care about
resource allocation anymore. What's the benefit of doing an strdup() to
keep a copy of a string when you can simply do "a = b" without having to
care about the memory used behind. Machines are huge nowadays, much
larger than the old Athlon XP I was using 10 years ago.
Second, Lua doesn't require a compiler, so we'll save 30 minutes a day
per 200 builds, this will definitely speed up development for each
developer. And we won't depend on a given C compiler, won't be subject
to its bugs, and more importantly we'll be able to get rid of the few
lines of assembly that we currently have in some performance-critical
Third, last version of HAProxy saw a lot of new sample fetch functions
and converters. This will not be needed anymore, because the code and
the configuration will be mixed together, just as everyone does with
Shell scripts. This means that any config will just look like an include
directive for the haproxy code, followed by some code to declare the
configuration. It will then be possible to create infinite combinations
of new functions, and the configuration will have access to anything
internal to HAProxy.
In the end, of the current HAProxy will only remain the Lua engine, and
probably by then we'll find even better ones so that haproxy will be
distributed as a Lua library to use anywhere, maybe even on IoT devices
if that makes sense (anyone ever dreamed of having haproxy in their
This step forward will save us from having to continue to do any code
versionning, because everyone will have his own fork and the code will
grow much faster this way. That also means that Git will become useless
for us. In terms of security, it will be much better as it will not be
possible to exploit a vulnerability common to all versions anymore since
each version will be different.
HAProxy Technologies is going to assign a lot of resources to this task.
Obviously all the development team will work on this full time, but we
also realize that since customers will not be interested in the C
version anymore after this public announce, we'll train the sales people
to write Lua as well in order to speed up development.
We'll continue to provide an enterprise version forked from HAPEE that
we'll rename "Luapee". It will still provide all the extras that make
it a professional solution such as VRRP, SNMP etc and over the long term
we expect to rewrite all of these components in Lua as well.
The ALOHA appliances will change a little bit, they'll mostly be a Lua
engine to run all that code, so we'll probably rename them HALUA. And
given that the appliance's goal has always been to take profit of the
hardware and kernel to further improve the capabilities, we'll have free
hands to port other performance-critical parts in Lua, including maybe
the currently aging Linux kernel which also happens to be written in C.
Once everything is ported, I intend to use my old skills in the domain
of microarchitecture to design a native Lua processor that will run in
our appliances so that all the code runs in silicon and ends up being
much faster than what we currently have in C.
I'm quite aware that some parts will be tedious. Rewriting OpenSSL in
Lua will neither be easy nor fun. But it's the price to pay to get fast
and affordable security.
Due to the huge amount of work, we'll postpone the 1.6 release to 1st
April 2016, which leaves us exactly 366 days to complete this task. I
hope everyone understands that we have no other choice.
- the code base is increasing and is becoming slower to build day
after day. Ten years ago, version 1.1.31 was only 6716 lines
everything included. Today, mainline is 108395 lines, or 16 times
- gcc is getting slower over time. Since version 2.7.2 I used to rely
on ten years ago, we've seen important slowdowns with v2.95, several
v3.x then v4.x. I'm currently on 4.7 and afraid to upgrade.
- while the whole code base used to build in less than a second ten
years ago on an Athlon XP-1800, now it takes about 10 seconds on a
core i5 at 3 GHz. Multiply this by about 200 builds a day and you
see that half an hour is wasted every single day dedicated to
development. That's about 1/4 of the available time if you count
the small amount of time available after processing e-mails.
- people don't learn C anymore at school and this makes it harder to
get new contributors. In fact, most of those who are proficient in C
already have a job and little spare time to dedicate to an
February, 1st, 2015 : 1.5.11, 1.4.26, 1.3.27, 184.108.40.206
There was nothing really important for 1.5, mostly small annoyances caused by improper behaviours. One of them was not exactly a bug since it used to work as documented, but as it was documented to work in a stupid and useless way I decided to backport it anyway. It's the "http-request set-header" action which used to remove the target header prior to computing the format string, making it impossible to append a value to an existing header, or to have to pass via a dummy header, adding to the complexity. Now the string is computed before removing the header so that there's no more insane tricks to go through. One important fix targets users running on 1.5.10 : the addition of "log-tag" uncovered a bug by which we can run with a null logger if no logger is declared. Since 1.5.10 (with log-tag), this could cause a crash upon startup, so this was fixed here. On the 1.4 front, things had been stuck for several months due to the problems caused by "http-send-name-header" that managed to keep both Cyril and me busy for a while. No less than 3 bugs in direct relation with this feature were fixed, two of them capable of crashing the process under certain conditions. Another important bug in 1.4 was triggered when issuing "show sess" on the CLI. Other fixes are not really important and were accumulated over 10 months. Having 1.4 ready was a great opportunity to issue another 1.3, so 1.3.27 backports the relevant fixes from 1.4. Considering that the last 1.3 was issued 3.5 years ago, I suspect that 1.3.27 will be the last 1.3 though it's still maintained 8 years the first 1.3 was issued. 220.127.116.11 was released with pending fixes as well and now the 1.3.15 branch is closed and switches to the unmaintained state after 7 years of fixes. Note: when pushing 1.3.27, I was unfortunate to discover that git.haproxy.org and the public master Git repository went out of sync, both forking after 1.3.26, so I had to perform a forced push on git.haproxy.org to resync them. Sorry for the inconvenience.
January, 1st, 2015 : Year of a changing Web
I'm always surprized to see how very few people outside of the IETF HTTP working group are just aware of the fact that HTTP/2 is being worked on. At the time of writing, the draft is in the "Last Call" state which basically means that unless something critical is discovered, it will soon be adopted in its current form. Here "soon" means "around a few weeks".
What will this change ? Probably not much at the beginning, but a lot soon. It will take some time before web site operators figure the performance benefits brought by HTTP/2, but the media will quickly boast its merits and the change can happen quickly, even if just to catch up with competiting early adopters. A number of sites already support SPDY for the same reasons right now but SPDY is constantly evolving and requires more attention from users who have to update often. By being a new standard, HTTP/2 will not require that level of care, and it will be perceived as the direct descendant of HTTP/1.1, which is why it will be more adopted than SPDY.
All major browsers already support HTTP/2, two of them (Firefox and Chrome) will only support it for HTTPS. Internet Explorer will drop SPDY support once HTTP/2 is adopted. That logically means that a number of web sites will decide to enable HTTPS in order to support HTTP/2 for the users of the two aforementionned browsers. HTTPS comes with an extra round trip at the beginning of the connection, but HTTP/2 saves a lot of them during the transfers so in the end if there are at least a few tens of objects to retrieve, it will still be an improvement.
But this will cause a new issue : migrating to HTTPS will mean that the caches that are operated in universities, enterprises, all mobile phone operators and many ISPs will not be used anymore. This will immediately have two impacts : the first one is that the traffic on the internet will increase. Alarmists used to say that the 40 Tbps trans-atlantic total capacity is almost saturated and hard to upgrade, we'll see if that's true. The second effect is that origin servers will get a significant traffic increase, which is good for ADC vendors as well as for CDNs which will get many new customers and increase their revenue. Sadly, in a number of poorly connected countries where client-side caches are critical to the survival of the Internet, CDNs will not be able to help and the situation will get even worse. That's also the case for a number of mobile phone operators who can observe high cache hit ratios today.
What will very likely happen to address these situations is that ISPs and mobile phone operators will start to propose a faster Internet access to their customers in exchange for a root cert that they can happily install in their browser so that the operator can decipher SSL traffic on the fly and cache again. End users are already prepared to accept this because they don't care at all about their privacy when it comes to whatever they do with their smartphone, otherwise they would always close their apps and type a password to access their emails. And the next logical step is that mobile phones sold by these operators will already have the root cert pre-installed in order to save a complex operation from the end user.
And that will lead to an interesting situation. First, SSL offloading solution vendors will happily see their sales increase. But the counter part is that the chain of trust of the SSL/TLS model will be definitely broken in that there will be no way for the end user to know if his data were safe or not. This chain is extremely fragile already and is regularly being abused, but now it could become the norm not to trust SSL anymore when rogue CAs becomes mandatory to access the net.
Fortunately, a few solutions are being worked on. On the HTTP working group they're called "Trusted Proxies" or "GET https://", as a reference to what an HTTPS request through an explicit proxy could look like. They consist in letting the end user choose what can be deciphered and what cannot. That allows proxy operators to let some trusted sites pass through and to decipher/inspect/cache contents for all other ones. That's how we could get a better Internet for everyone, with better caching and better privacy at the same time. Not sure it will happen by 2015 though, but we should do whatever we can for this to happen!
December, 31th, 2014 : 1.5.10 : Last release of the year!
Most of the fixes in this version are related to how we deal with out-of-memory situations. This normally interests nobody except those who run many instances on memory-bound servers. There was a very unlikely but possible case of crash when it was not possible to allocate a small chunk of memory (I managed to reproduce it after a long time during extremely aggressive tests). There are a few fixes on tcp-checks, one for a bug causing some random contents to be analysed, another one where quick acks were disabled when there was no data to send, causing 200ms delays when "option tcp-check" was specified alone. Another bug concerned proxies disabled in the configuration which could under some circumstances cause a segfault upon startup during the process mask propagation between frontends and backends. The rest is mostly harmless, so keep cool, no rush if you're already running 1.5.9. Code and changelog are available here as usual.
HAProxy is a free, very fast and reliable solution offering
load balancing, and
proxying for TCP and HTTP-based applications. It is particularly suited for very
high traffic web sites and powers quite a number of the world's most visited ones.
Over the years it has become the de-facto standard opensource load balancer, is
now shipped with most mainstream Linux distributions, and is often deployed by
default in cloud platforms. Since it does not advertise itself, we only know it's
used when the admins report it :-)
Its mode of operation makes its integration into existing architectures very easy
and riskless, while still offering the possibility not to expose fragile web servers
to the net, such as below :
We always support at least two active versions in parallel and an extra old
one in critical fixes mode only. The currently supported versions are :
- version 1.5 : the most featureful version, supports SSL, IPv6, keep-alive, DDoS protection, etc...
- version 1.4 : the most stable version for people who don't need SSL. Still provides client-side keep-alive
- version 1.3 : the old stable version for companies who cannot upgrade for internal policy reasons.
Each version brought its set of features on top of the previous one.
Upwards compatibility is a very important aspect of HAProxy, and even
the lastest stable version (1.5) is able to run with configurations made
for version 1.0 13 years ago. The most differenciating features of each
version are listed below :
- version 1.5, released in 2014
This version further expands 1.4 with 4 years of hard work :
native SSL support on both sides with SNI/NPN/ALPN and OCSP stapling,
IPv6 and UNIX sockets are supported everywhere,
full HTTP keep-alive for better support of NTLM and improved efficiency in static farms,
HTTP/1.1 compression (deflate, gzip) to save bandwidth,
PROXY protocol versions 1 and 2 on both sides,
data sampling on everything in request or response, including payload,
ACLs can use any matching method with any input sample
maps and dynamic ACLs updatable from the CLI
stick-tables support counters to track activity on any input sample
custom format for logs, unique-id, header rewriting, and redirects,
improved health checks (SSL, scripted TCP, check agent, ...),
much more scalable configuration supports hundreds of thousands of backends and certificates without sweating
- version 1.4, released in 2010
This version has brought its share of new features over 1.3, most of which were long awaited :
client-side keep-alive to reduce the time to load heavy pages for clients over the net,
TCP speedups to help the TCP stack save a few packets per connection,
response buffering for an even lower number of concurrent connections on the servers,
RDP protocol support with server stickiness and user filtering,
source-based stickiness to attach a source address to a server,
a much better stats interface reporting tons of useful information,
more verbose health checks reporting precise statuses and responses in stats and logs,
traffic-based health to fast-fail a server above a certain error threshold,
support for HTTP authentication for any request including stats, with support for password encryption,
server management from the CLI to enable/disable and change a server's weight without restarting haproxy,
ACL-based persistence to maintain or disable persistence based on ACLs, regardless of the server's state,
log analyzer to generate fast reports from logs parsed at 1 Gbyte/s,
- version 1.3, released in 2006
This version has brought a lot of new features and improvements over 1.2, among which
content switching to select a server pool based on any request criteria,
ACL to write content switching rules, wider choice of
load-balancing algorithms for better integration,
content inspection allowing to block unexpected protocols,
transparent proxy under Linux, which allows to directly connect to
the server using the client's IP address, kernel TCP splicing to forward
data between the two sides without copy in order to reach multi-gigabit data rates,
layered design separating sockets, TCP and HTTP processing for more
robust and faster processing and easier evolutions, fast and fair scheduler
allowing better QoS by assigning priorities to some tasks, session rate limiting
for colocated environments, etc...
Version 1.2 has been in production use since 2006 and provided an improved performance level
on top of 1.1. It is not maintained anymore, as most of its users have switched to 1.3 a long
time ago. Version 1.1, which has been maintaining critical sites online since 2002, is not
maintained anymore either. Users should upgrade to 1.4 or 1.5.
HAProxy is known to reliably run on the following OS/Platforms :
- Linux 2.4 on x86, x86_64, Alpha, Sparc, MIPS, PARISC
- Linux 2.6 / 3.x on x86, x86_64, ARM, Sparc, PPC64
- Solaris 8/9 on UltraSPARC 2 and 3
- Solaris 10 on Opteron and UltraSPARC
- FreeBSD 4.10 - 10 on x86
- OpenBSD 3.1 to -current on i386, amd64, macppc, alpha, sparc64 and VAX (check the ports)
- AIX 5.1 - 5.3 on Power™ architecture
Highest performance is achieved with modern operating systems supporting scalable polling mechanisms such as
epoll on Linux 2.6/3.x or kqueue
on FreeBSD and OpenBSD. This requires haproxy version newer than 1.2.5. Fast data transfers are made possible
on Linux 3.x using TCP splicing and haproxy 1.4 or 1.5. Forwarding rates of up to 40 Gbps have already been
achieved on such platforms after a very careful tuning. While Solaris and AIX are supported, they should not
be used if extreme performance is required.
Current typical 1U servers equipped with a dual-core Opteron or Xeon generally
achieve between 15000 and 40000 hits/s and have no trouble saturating 2 Gbps
Well, since a user's testimony is better than a long demonstration, please take a look at
Chris Knight's experience
with haproxy saturating a gigabit fiber in 2007 on a video download site. Since then,
the performance has significantly increased and the hardware has become much more capable, as
my experiments with
Myricom's 10-Gig NICs have shown two years later. Now as of
2014, 10-Gig NICs are too limited and are hardly suited for 1U servers since they do rarely
provide enough port density to reach speeds above 40-60 Gbps in a 1U server. 100-Gig NICs
are coming and I expect to run new series of tests when they are available.
HAProxy involves several techniques commonly found in Operating Systems
architectures to achieve the absolute maximal performance :
- a single-process,
event-driven model considerably reduces the cost of
and the memory usage. Processing several hundreds of tasks in a millisecond is
possible, and the memory usage is in the order of a few kilobytes per session
while memory consumed in preforked or threaded servers is more in the order of
megabytes per process.
- O(1) event checker on systems that allow it (Linux and FreeBSD)
allowing instantaneous detection of any event on any connection among tens of
- Delayed updates to the event checker using a lazy event cache ensures
that we never update an event unless absolutely required. This saves a lot of
- Single-buffering without any data copy between reads and writes whenever
possible. This saves a lot of CPU cycles and useful memory bandwidth. Often,
the bottleneck will be the I/O busses between the CPU and the network
interfaces. At 10-100 Gbps, the memory bandwidth can become a bottleneck too.
- Zero-copy forwarding is possible using the splice() system
call under Linux, and results in real zero-copy starting with Linux 3.5. This
allows a small sub-3 Watt device such as a Seagate Dockstar to forward HTTP
traffic at one gigabit/s.
memory allocator using fixed size memory pools for immediate memory
allocation favoring hot cache regions over cold cache ones. This dramatically
reduces the time needed to create a new session.
- Work factoring, such as multiple accept() at once, and
the ability to limit the number of accept() per iteration when
running in multi-process mode, so that the load is evenly distributed among
- CPU-affinity is supported when running in multi-process mode, or simply
to adapt to the hardware and be the closest possible to the CPU core managing the
NICs while not conflicting with it.
- Tree-based storage, making heavy use of the Elastic Binary tree I have
been developping for several years. This is used to keep timers ordered, to keep
the runqueue ordered, to manage round-robin and least-conn queues, to look up ACLs
or keys in tables, with only an O(log(N)) cost.
- Optimized timer queue : timers are not moved in the tree if they are
postponed, because the likeliness that they are met is close to zero since they're
mostly used for timeout handling. This further optimizes the ebtree usage.
- optimized HTTP header analysis : headers are parsed an interpreted on
the fly, and the parsing is optimized to avoid an re-reading of any previously
read memory area. Checkpointing is used when an end of buffer is reached with
an incomplete header, so that the parsing does not start again from the
beginning when more data is read. Parsing an average HTTP request typically
takes half a microsecond on a fast Xeon E5.
- careful reduction of the number of expensive system calls. Most of the
work is done in user-space by default, such as time reading, buffer aggregation,
- Content analysis is optimized to carry only pointers to original data and
never copy unless the data needs to be transformed. This ensures that very
small structures are carried over and that contents are never replicated when
not absolutely necessary.
All these micro-optimizations result in very low CPU usage even on moderate
loads. And even at very high loads, when the CPU is saturated, it is quite common
to note figures like 5% user and 95% system, which means that the
HAProxy process consumes about 20 times less than its system counterpart. This
explains why the tuning of the Operating System is very important. This
is the reason why we ended up building
our own appliances,
in order to save that complex and critical task from the end-user.
In production, HAProxy has been installed several times as an emergency solution
when very expensive, high-end hardware load balancers suddenly failed on Layer 7
processing. Some hardware load balancers still do not use proxies and process requests
at the packet level and have a great difficulty at supporting
requests across multiple packets and high response
times because they do no buffering at all. On the
other side, software load balancers use TCP buffering
and are insensible to long requests and high response times. A
nice side effect of HTTP buffering is that it
increases the server's connection acceptance by reducing the
session duration, which leaves room for new requests.
There are 3 important factors used to measure a load balancer's performance :
- The session rate
This factor is very important, because it directly determines when the load
balancer will not be able to distribute all the requests it receives. It is
mostly dependant on the CPU.
Sometimes, you will hear about requests/s or hits/s, and they are the same as
sessions/s in HTTP/1.0 or HTTP/1.1 with
keep-alive disabled. Requests/s with keep-alive enabled is generally much
higher (since it significantly reduces system-side work) but is often meaningless
for internet-facing deployments since clients often open a large amount of connections
and do not send many requests per connection on avertage. This factor is
measured with varying object sizes, the fastest results generally coming from
empty objects (eg: HTTP 302, 304 or 404 response codes).
Session rates around 100,000 sessions/s can be achieved on Xeon E5
systems in 2014.
- The session concurrency
This factor is tied to the previous one. Generally, the session rate
will drop when the number of concurrent sessions increases (except with the
epoll or kqueue polling mechanisms). The slower
the servers, the higher the number of concurrent sessions for a same session rate.
If a load balancer receives 10000 sessions per second and the servers respond in
100 ms, then the load balancer will have 1000 concurrent sessions. This number is
limited by the amount of memory and the amount of file-descriptors the system can
handle. With 16 kB buffers, HAProxy will need about 34 kB per session, which
results in around 30000 sessions per GB of RAM. In practise, socket
buffers in the system also need some memory and 20000 sessions per GB of RAM is
more reasonable. Layer 4 load balancers generally announce millions of
simultaneous sessions because they need to deal with the TIME_WAIT sockets
that the system handles for free in a proxy. Also they don't process any data
so they don't need any buffer. Moreover, they are sometimes designed to be used
in Direct Server Return mode, in which the load balancer only sees forward
traffic, and which forces it to keep the sessions for a long time after their end
to avoid cutting sessions before they are closed.
- The data forwarding rate
This factor generally is at the opposite of the session rate. It is measured
in Megabytes/s (MB/s), or sometimes in Gigabits/s (Gbps). Highest data rates
are achieved with large objects to minimise the overhead caused by session
setup and teardown. Large objects generally increase session concurrency, and
high session concurrency with high data rate requires large amounts of memory
to support large windows. High data rates burn a lot of CPU and bus cycles on
software load balancers because the data has to be copied from the input
interface to memory and then back to the output device. Hardware load balancers
tend to directly switch packets from input port to output port for higher data
rate, but cannot process them and sometimes fail to touch a header or a cookie.
Haproxy on a typical Xeon E5 of 2014 can forward data up to about 40 Gbps.
A fanless 1.6 GHz Atom CPU is slightly above 1 Gbps.
A load balancer's performance related to these factors is generally announced for
the best case (eg: empty objects for session rate, large objects for data rate).
This is not because of lack of honnesty from the vendors, but because it is not
possible to tell exactly how it will behave in every combination. So when those 3
limits are known, the customer should be aware that it will generally perform below
all of them. A good rule of thumb on software load balancers is to consider an
average practical performance of half of maximal session and data rates for
average sized objects.
You might be interested in checking the 10-Gigabit/s page.
Being obsessed with reliability, I tried to do my best to ensure a total
continuity of service by design. It's more difficult to design something
reliable from the ground up in the short term, but in the long term it reveals
easier to maintain than broken code which tries to hide its own bugs behind
respawning processes and tricks like this.
In single-process programs, you have no right to fail : the smallest bug
will either crash your program, make it spin like mad or freeze. There has not
been any such bug found in stable versions for the last 13 years, though
it happened a few times with development code running in production.
HAProxy has been installed on Linux 2.4 systems serving millions of pages
and which have only known one reboot in 3 years for a complete OS upgrade.
Obviously, they were not directly exposed to the Internet because they did not receive
any patch at all. The kernel was a heavily patched 2.4 with Robert Love's
jiffies64 patches to support time wrap-around at 497 days (which
happened twice). On such systems, the software cannot fail without being
immediately noticed !
Right now, it's being used in many Fortune 500 companies around the world to
reliably serve billions of pages per day or relay huge amounts of money. Some
people even trust it so much that they use it as the default solution to solve
simple problems (and I often tell them that they do it the dirty way). Such
people sometimes still use versions 1.1 or 1.2 which sees very limited evolutions
and which targets mission-critical usages. HAProxy is really suited for such environments
because the indicators it returns provide a lot of valuable information about the application's
health, behaviour and defects, which are used to make it even more reliable.
Version 1.3 has now received far more testing than 1.1 and 1.2 combined, so
users are strongly encouraged to migrate to a stable 1.3 or 1.4 for mission-critical
As previously explained, most of the work is executed by the Operating System.
For this reason, a large part of the reliability involves the OS itself. Latest
versions of Linux 2.4 have been known for offering the highest level of stability
ever. However, it requires a bunch of patches to achieve a high level of performance,
and this kernel is really outdated now so running it on recent hardware will often
be difficult (though some people still do). Linux 2.6 and 3.x include the features
needed to achieve this level of performance, but old LTS versions only should be
considered for really stable operations without upgrading more than once a year.
Some people prefer to run it on Solaris (or do not have the choice). Solaris 8 and
9 are known to be really stable right now, offering a level of performance comparable
to legacy Linux 2.4 (without the epoll patch). Solaris 10 might show performances
closer to early Linux 2.6. FreeBSD shows good performance but pf (the firewall)
eats half of it and needs to be disabled to come close to Linux. OpenBSD sometimes
shows socket allocation failures due to sockets staying in FIN_WAIT2 state
when client suddenly disappears. Also, I've noticed that hot reconfiguration does
not work under OpenBSD.
The reliability can significantly decrease when the system is pushed to its
limits. This is why finely tuning the sysctls is important. There is no
general rule, every system and every application will be specific. However, it is
important to ensure that the system will never run out of memory and
that it will never swap. A correctly tuned system must be able to run for
years at full load without slowing down nor crashing.
Security is an important concern when deploying a software load balancer. It is
possible to harden the OS, to limit the number of open ports and accessible
services, but the load balancer itself stays exposed. For this reason, I have been
very careful about programming style. Vulnerabilities are very rarely encountered
on haproxy, and its architecture significantly limits their impact and often allows
easy workarounds. Its remotely unpredictable even processing makes it very hard to
reliably exploit any bug, and if the process ever crashes, the bug is discovered.
All of them were discovered by reverse-analysis of an accidental crash BTW.
Anyway, much care is taken when writing code to manipulate headers. Impossible
state combinations are checked and returned, and errors are processed from the
creation to the death of a session. A few people around the world have reviewed
the code and suggested cleanups for better clarity to ease auditing. By the way,
I'm used to refuse patches that introduce suspect processing or in which not
enough care is taken for abnormal conditions.
I generally suggest starting HAProxy as root because it
can then jail itself in a chroot and drop all of its privileges
before starting the instances. This is not possible if it is not started as
root because only root can execute chroot(),
contrary to what some admins believe.
Logs provide a lot of information to help maintain a satisfying security
level. They are commonly sent over UDP because once chrooted, the
/dev/log UNIX socket is unreachable, and it must not be possible to
write to a file. The following information are particularly useful :
- source IP and port of requestor make it possible to find their origin
in firewall logs ;
- session set up date generally matches firewall logs, while tear
down date often matches proxies dates ;
- proper request encoding ensures the requestor cannot hide
non-printable characters, nor fool a terminal.
- arbitrary request and response header and cookie capture help to
detect scan attacks, proxies and infected hosts.
- timers help to differentiate hand-typed requests from browsers's.
HAProxy also provides regex-based header control. Parts of the request, as
well as request and response headers can be denied, allowed, removed, rewritten, or
added. This is commonly used to block dangerous requests or encodings (eg: the
Apache Chunk exploit),
and to prevent accidental information leak from the server to the client.
Other features such as Cache-control checking ensure that no sensible
information gets accidentely cached by an upstream proxy consecutively to a bug in
the application server for example.
The source code is covered by GPL v2. Source code and pre-compiled binaries for
Linux/x86 and Solaris/Sparc can be downloaded right here :
- Development version (1.6) :
- Latest version (1.5) :
- Latest version (1.4) :
- Latest version (1.3) :
- Previous branch (1.2) :
- Various Patches :
- Browsable directory for other files (not only patches)
There are three types of documentation now : the Reference Manual which explains
how to configure HAProxy but which is outdated, the Architecture Guide which will
guide you through various typical setups, and the new Configuration Manual which
replaces the Reference Manual with more a explicit configuration language explanation. The
official documentation is the pure-text one provided with the sources. However, Cyril
Bonté's automated conversion to HTML is much easier to use and constantly up to date,
so it is the preferred one when available.
- Reference Manual for version 1.5 (stable) :
- Reference Manual for version 1.4 (stable) :
- Reference Manual for version 1.3 (stable) :
- Reference Manual for version 1.2 (old stable) :
- Reference Manual for version 1.1 (unmaintained) :
- architecture.txt : Architecture Guide
- Article on Load Balancing (HTML version) : worth reading for people who don't know what type of load balancer they need
In addition to Cyril's HTML converter above, an automated format converter is being developed by Pavel Lang. At the time of writing these lines, it is able to produce a PDF from the documentation, and some heavy work is ongoing to support other output formats. Please consult the
project's page for more information.
Here's an example
of what it is able to do on version 1.5 configuration manual.
If you think you don't have the time and skills to setup and maintain a free load
balancer, or if you're seeking for commercial support to satisfy your customers or
your boss, you have the following options :
I also find it important to credit Loadbalancer.org. I am
not affiliated with them at all but like us, they have contributed a fair amount of time and money to the
project to add new features and they help users on the mailing list, so I have some respect for what they
do. They're a UK-based company and their load balancer also employs HAProxy, though it is somewhat different
from the ALOHA.
- contact HAProxy Technologies
to hire some professional services or subscribe a support contract ;
- install HAProxy Enterprise Edition (HAPEE),
which is a long-term maintained HAProxy package accompanied by a well-polished collection of software, scripts,
configuration files and documentation which significantly simplifies the setup and maintenance of a completely
operational solution ; it is particularly suited to Cloud environments where deployments must be fast.
- try an ALOHA appliance
(hardware or virtual), which will even save you from having to worry about the system, hardware and from managing a Unix-like
Some happy users have contributed code which may or may not be included. Others
spent a long time analysing the code, and there are some who maintain ports up to
date. The most difficult internal changes have been contributed in the form of
paid time by some big customers who can afford to pay a developer for several
months working on an opensource project. Unfortunately some of them do not want
to be listed, which is the case for the largest of them.
Some contributions were developped and not merged, most often by lack of sign of
interest from the users or simply because they overlap with some pending changes
in a way that could make it harder to maintain future compatibility.
- Geolocation support
Quite some time ago now, Cyril Bonté contacted me about a very interesting
feature he has developped, initially for 1.4, and which now supports both 1.4
and 1.5. This feature is Geolocation, which many users have been asking for
for a long time, and this one does not require to split the IP files by country
codes. In fact it's extremely easy and convenient to configure.
The feature was not merged yet because it does for a specific purpose (GeoIP)
what we wanted to have for a more general use (map converters, session variables,
and use of variables in the redirect URLs), which will allow the same features to
be implemented with more flexibility (eg: extract the IP from a header, or pass
the country code and/or AS number to a backend server, etc...). Cyril was very
receptive to these arguments and accepted to maintain his patchset out of tree
waiting for the features to be implemented (Update: 1.5-dev20 with
maps now make this possible). Cyril's code is well maintained and used in
production so there is no risk in using it on 1.4, except the fact that the
configuration statements will change a bit once you upgrade to 1.5.
The code and documentation are available here : https://github.com/cbonte/haproxy-patches/wiki/Geolocation
- sFlow support
Neil Mckee posted a patch to the list in early 2013, and unfortunately this patch
did not receive any sign of interest nor feedback, which is sad considering the
amount of work that was done. I personally am clueless about sFlow and expressed
my skepticism to Neil about the benefits of sampling some HTTP traffic when you
can get much more detailed informations for free with existing logs.
Neil kindly responded with the following elements :
I agree that the logging you already have in haproxy is more flexible and detailed,
and I acknowledge that the benefit of exporting sFlow-HTTP records is not immediately
The value that sFlow brings is that the measurements are standard, and are designed to
integrate seamlessly with sFlow feeds from switches, routers, servers and applications to
provide a comprehensive end to end picture of the performance of large scale multi-tier
systems. So the purpose is not so much to troubleshoot haproxy in isolation, but to
analyze the performance of the whole system that haproxy is part of.
Perhaps the best illustration of this is the 1-in-N sampling feature.
If you configure sampling.http to be, say, 1-in-400 then you might
only see a handful of sFlow records per second from an haproxy
instance, but that is enough to tell you a great deal about what is
going on -- in real time. And the data will not bury you even if you
have a bank of load-balancers, hundreds of web-servers, a huge
memcache-cluster and a fast network interconnect all contributing
their own sFlow feeds to the same analyzer.
Even after that explanation, no discussion emerged on the subject on the list, so
I guess there is little interest among users for now. I suspect that sFlow is
probably more deployed among network equipments than application layer equipments,
which could explain this situation. The code is large (not huge though) and I am not
convinced about the benefits of merging it and maintaining it if nobody shows even
a little bit of interest. Thus for now I prefer to leave it out of tree. Neil has
posted it on GitHub here :
Please, if you do use this patch, report your feedback to the mailing list, and invest
some time helping with the code review and testing.
This table enumerates all known significant contributions that
led to version 1.4, as well as proposed fundings and features yet to be developped but
waiting for spare time. It is not more up to date though.
Some older code contributions which possibly do not appear in the table above are still listed here.
- Application Cookies
Aleksandar Lazic and Klaus Wagner implemented this feature which
was merged in 1.2. It allows the proxy to learn cookies sent by the server
to the client, and to find it back in the URL to direct the client to the right
server. The learned cookies are automatically purged after some inactive time.
- Least Connections load balancing algorithm
This patch for haproxy-1.2.14 was submitted by Oleksandr Krailo. It implements
a basic least connection algorithm. I've not merged this version into 1.3 because
of scalability concerns, but I'm leaving it here for people who are tempted to
include it into version 1.2, and the patch is really clean.
- Soft Server-Stop
Aleksandar Lazic sent me this patch against 1.1.28 which in fact does two things.
The first interesting part allows one to write a file enumerating servers which
will have to be stopped, and then sending a signal to the running proxy to tell
it to re-read the file and stop using these servers. This will not be merged into
mainline because it has indirect implications on security since the running
process will have to access a file on the file-system, while current version can
run in a chrooted, empty, read-only directory. What is really needed is a way to
send commands to the running process. However, I understand that some people
might need this feature, so it is provided here. The second part of the patch has
been merged. It allowed both an active and a backup server to share a same
cookie. This may sound obvious but it was not possible earlier.
Usage: Aleks says that you just have to write the server names that you
want to stop in the file, then kill -USR2 the running process. I have
not tested it though.
- Server Weight
Sébastien Brize sent me this patch against 1.1.27 which adds the
'weight' option to a server to provide smoother balancing between fast and slow
servers. It is available here because there may be other people looking for this
feature in version 1.1.
I did not include this change because it has a side effect that with
high or unequal weights, some servers might receive lots of consecutive
requests. A different concept to provide a smooth and fair
balancing has been implemented in 1.2.12, which also supports
weighted hash load balancing.
Usage: specify "weight X" on a server line.
Note: configurations written with this patch applied will normally still
work with future 1.2 versions.
- IPv6 support for 1.1.27
I implemented IPv6 support on client side for 1.1.27, and merged it into
haproxy-1.2. Anyway, the patch is still provided here for people who want to
experiment with IPv6 on HAProxy-1.1.
- Other patches
Please browse the directory for other useful
If you don't need all of HAProxy's features and are looking for a simpler solution,
you may find what you need here :
Linux Virtual Servers (LVS)
Very fast layer 3/4 load balancing merged in Linux 2.4 and 2.6 kernels. Should
be coupled with Keepalived to monitor
servers. This generally is the solution embedded by default in most
IP-based load balancers.
Nginx ("engine X")
Nginx is an excellent piece of software. Initially it's a very fast and reliable
web server, but it has grown into a full-featured proxy which can also offer
load-balancing capabilities. Nginx's load balancing features are less advanced
than haproxy's but it can do extra things (eg: caching, running FCGI apps), which
explains why they are very commonly found together. I strongly recommend it to
whoever needs a fast, reliable and flexible web server !
Pound is very small and reasonably good. It aims at remaining small and auditable
prior to being fast. It used to support SSL and keep-alive before HAProxy. Its
configuration file is small and simple. It's thread-based, but can be a simpler
alternative to HAProxy for a small site when the flexibility and performance of
HAProxy are not required.
Pen is a very simple load balancer for TCP protocols. It supports source IP-based
persistence for up to 2048 clients. Supports IP-based ACLs. Uses select()
and supports higher loads than Pound but will not scale very well to thousands of
simultaneous connections. It's more versatile however, and could be considered as
the missing link between HAProxy and socat.
Feel free to contact me at for any questions or comments :
Some people regularly ask if it is possible to send donations, so I have set up a Paypal account for this.
Click here if you want to donate.
An IRC channel for haproxy has been opened on FreeNode (but don't seek me there, I'm not) :
Here are some links to possibly useful external contents I gathered on the net.
I have found most of them due to their link to haproxy's site ;-)