Load Balancing With DNS, BGP and LVS

Without knowing where to look, it is hard to find information on performing large scale load balancing without resorting to commercial heavy-metal, heavy-expensive load balancers, or small scale home brew load balancers based upon HAProxy and VRRP.

HAProxy is a layer 3/4 load balancer service which looks into the packets to determine where to send sessions. For an application I'm working on, I don't need that type of heavy duty evaluation. As this service resides mostly in userland, there is some additional overhead userland/kernel calls.

Load balancing base upon source-ip and/or dest-ip and/or port numbers is all I need. Linux Virtual Server (LVS) seems to fit the bill. It is kernel resident with userland tools for management. The web-site doesn't look to have recent updates, so the first impression is that it is unmaintained code. But after digging into the mailing lists, it seems to be an actively used service. Wiki and mailing lists talk about many different active/active active/passive scenarios, but none really discuss how to use a number of load balancers actively simultaneously.

At some point, I did see a passing reference to using BGP as part of a load balancing solution. After looking more, I came across Day 11 - Turning off the Pacemaker: Load Balancing Across Layer 3, starting with the section on "Solving the HAProxy SPoF Problem". Now that is getting closer to a solution which makes sense to me. The article refers to using Bird or Quagga as BGP engines which can manipulate the FIB. The article also refers to using BGP AnyCast as part of the solution. The issue with this basic setup is that with a bunch of load balancers each running BGP, you get into scaling issues with BGP full mesh requirements.

The common solution in the network world full mesh issues is to use route reflectors. BGP or Bird could be used. But I came across ExaBGP. This is a Python based application which knows how to talk BGP. It doesn't know how to manipulate FIBs. It was designed to fit the role of prefix injection. So it looks like the the ideal candidate of being a route-reflector for managing route injection for assigning traffic to the LVS load balancers in a deterministic, resilient manner. There is a link which talks about High availability with ExaBGP. Exactly what I needed. Some additional background on the solution type: Stop Buying Load Balancers and Start Controlling Your Traffic Flow with Software.

2018/01/08: as an update, rather than using the OSPF/BGP/RouteReflector scenario for datacenter dynamic routing, it is possible to use a hierarchy of eBGP peers to handle interior routing: Use of BGP for routing in large-scale data centers. OSPF is not needed. Nor are route reflectors. Something like ExaBGP can still be used to perform route injection for loopback or dummy interface route injection as services come and go. This handles the AnyCast routing scenario with aplomb.

Referring back to LVS for a second, LVS can load balance to services with LVS-DR (direct route), LVS-TUN (one way tunnel to off-subnet services), or LVS-NAT. For me Virtual Server via Direct Routing appears to be the way to go.

Related links:

Job Scheduling Algorithms in Linux Virtual Server
Linux Virtual Server Tutorial
feedbackd: provide dynamic feedback of server health
Super Sparrow: global load balancing solution for linux, but I don't think it is actively supported
keepalived: implements a set of checkers to dynamically and adaptively maintain and manage loadbalanced server pool according their health
Dedicated SSL-Cache Farm
Building a Load Balancer with LVS - Linux Virtual Server
mon: monitor hosts/services/whatever and alert about problems
pacemaker -- Creating Active/Passive and Active/Active Clusters on Fedora
Cluster Management Shell: mostly pacemaker related
pacemaker controls, manages and recovers cluster resources. A resource can be anything from a virtual IP address, to a cluster managed filesystem, to a complex application.
Sébastien Han - Highly Available LVS: detailed examples with pictures

Load balancing works through the interaction of multiple sub-systems. The first level is through round-robin DNS. This provides some flexibility for determining from where services will be serviced. The DNS name may resolve to one or more IP addresses. The next stage makes use of BGP. Each of the IP addresses can be assigned to one or more physical hosts. What this means is that each physical host will advertise that IP address to the edge, and each host provides a unique metric. The host with the best metric receives the traffic for that ip address. With multiple IP addresses in play, with multiple hosts each advertising a subset of those addresses, loads can be balanced across many hosts. If a host goes out of commission, each of its advertised IP addresses is withdrawn manually or automatically from the routing tables, and loads re-adjust automatically to the still active hosts, with hosts with the next best metric automatically picking up that load.

ECMP might also be used for delivering and balancing traffic. This gets into more esoteric traffic management conditions. In any case, LVS resides on each host, and is the final stage of load balancing the traffic across a number services, each of which is a virtualized guest (or physical hosts on huge scale outs).

2016/12/3 update: even though the IPVS page is horribly out-of-date, IPVS appears to be alive and well, as there was a release of ipvsadm v1.29 today, with the userland tools available at IPVSADM git via git.kernel.org.

It has been far too long since the last ipvsadm release. Even-though only two changes to the ipvsadm tool happened since last release, a release must be made as these feature relates to kernel side features.

Support for reading 64-bit stats is avail since kernel v4.1. The new attributes for sync daemon got introduced in kernel v4.3, but got fixed in kernel v4.7.

2018/01/07: The article Day 11 - Turning off the Pacemaker: Load Balancing Across Layer 3 offers up a python program for link testing and using BFD (Bidirectional Forwarding Detection) for advertising and withdrawing routes. The example script uses a separate BFD Daemon. But as I use OVS, I think that I will give BFD in Open vSwitch a try via the event/monitoring interface. [this isn't really part of the load balancing tooling above, but is an interesting side project related to it]

2018/01/08: Someone suggested dnsdist for load balancing. It has an impressive rule set for customising how query responses are generated.

2018/02/01: In an earlier update, I mentioned BFD for updating routes. When only monitoring link state, something like ifplugd could be used. ifplugd is an Ethernet link-state monitoring daemon, that can execute user-specified scripts to configure an Ethernet device when a cable is plugged in, or automatically un-configure it when a cable is removed.

2018/09/01 - Anycast TCP

2018/10/09 - Pound - is a reverse proxy, load balancer and HTTPS front-end for Web server(s). Pound was developed to enable distributing the load among several Web-servers and to allow for a convenient SSL wrapper for those Web servers that do not offer it natively.