Load Balancers: The Traffic Controllers

In October 2018, GitHub went down for 24 hours — their longest outage ever. The root cause: a database master switch during a network partition left the site with split-brain replication. But the outage propagated entirely through their load balancer, which kept routing traffic to unhealthy nodes because its health checks were too shallow to detect the split. Cloudflare processes over 50 million HTTP requests per second across 300+ data centers. Without sophisticated load balancers, any one of those 300 nodes failing would become a user-visible outage. The difference between a 5-second latency spike and a total outage is almost always the intelligence of your load balancing layer.

[!IMPORTANT] In this lesson, you will master:

  1. The Scale Imperative: Why DNS Round Robin is a poor substitute for a dedicated Load Balancer.
  2. Hardware vs. Software: The physical reality of ASICs vs. Kernel-level context switching.
  3. Global Resilience: Using Anycast to route traffic at the speed of light (BGP).

1. The Problem: Success is Dangerous

Imagine you open a pizza shop. It becomes famous. Suddenly, 10,000 people want pizza at the same time. If you have only one chef (One Server), the kitchen catches fire. This is where Vertical Scaling hits its limit. You can only upgrade a single machine so much before it becomes prohibitively expensive or physically impossible.

Why not just use DNS?

You might think: “I’ll just add 10 servers and give their IPs to the DNS server. The DNS server will rotate them.” This is called DNS Round Robin, and it has a fatal flaw: DNS Caching.

  • The Scenario: Server 1 crashes.
  • The Problem: The user’s browser (or ISP) has cached the IP of Server 1 for 60 minutes (TTL).
  • The Result: For the next hour, that user sees a “Connection Refused” error, even though 9 other servers are healthy.

You need a smarter solution. You need a Manager at the counter who knows exactly which chef is cooking and which one is on a smoke break.

[!TIP] In System Design, the Load Balancer (LB) is that Manager. It is the single entry point (VIP) that distributes incoming network traffic across multiple backend servers.


2. Why Do We Need Them?

It’s not just about splitting work. The Load Balancer provides three critical superpowers:

1. Scalability (The “Elastic Waistband”)

You can add 100 more servers behind the scenes, and the user never knows. They just talk to the LB’s IP address. This decoupling allows you to scale up or down based on traffic demand (Auto-Scaling) without changing the client-side configuration.

2. Availability (The “Pulse Check”)

If Server 3 crashes, the LB detects it via Health Checks.

  • Action: The LB removes Server 3 from the rotation.
  • Result: The user never sees a 500 Internal Server Error. They are seamlessly routed to a healthy server.

3. Security (The “Bouncer”)

The LB acts as a shield.

  • DDoS Protection: It can absorb SYN floods.
  • Hidden IPs: The world only sees the LB’s IP. Your backend servers stay in a private subnet, unreachable from the public internet.
  • TLS Termination: It decrypts incoming requests so your web servers don’t have to spend CPU cycles on math.

3. Types of Load Balancers

1. DNS Load Balancing (GeoDNS)

While standard DNS Round Robin is flawed, sophisticated GeoDNS (like Amazon Route53) is powerful as a Global traffic director.

  • Mechanism: It resolves the domain name to an IP address based on the user’s Geographic Location.
  • Use Case: User in London → UK LB IP. User in Tokyo → Japan LB IP.
  • Limit: It is still subject to caching issues, so it’s usually the first layer of defense, not the only one.

2. Hardware Load Balancers (F5 Big-IP, Citrix)

  • Mechanism: Proprietary physical appliances in your data center.
  • Pros: Extreme performance (ASICs), massive throughput.
  • Cons: Expensive ($), hard to automate (API limitations), rigid capacity (“I need to buy another box”).

3. Software Load Balancers (Nginx, HAProxy, Envoy)

  • Cons: Consumes CPU/Memory on the host.

[!NOTE] Hardware-First Intuition: Hardware Load Balancers (like F5) use ASICs to process packets in hardware without ever touching a general-purpose CPU. Software LBs (like Nginx), however, rely on the Linux Kernel. Every packet triggers an Interrupt on the CPU, forcing it to stop what it’s doing and process the network stack. This is why high-traffic software LBs require tuning Receive Side Scaling (RSS) to spread interrupt load across multiple CPU cores.

Interview Insight: The Elephant Flow Problem. In a Load Balanced environment, an “Elephant Flow” is a long-lived, high-bandwidth connection (like a large file backup). Most LBs use simple hashing (ECMP) to assign connections. If your hash function accidentally puts three Elephant Flows on the same backend while other nodes are idle, you have a hot-spot that Round Robin cannot fix. Solution: Use “Least Connections” or “Power of Two Choices” algorithms instead of pure hashing.


4. Deep Dive: Health Checks

How does the LB actually know a server is dead?

1. Shallow Checks (L3/L4)

The LB pings the IP or tries to open a TCP socket.

  • Check: “Is the machine ON?”
  • Flaw: The machine might be ON, but the application process has crashed or is in a deadlock. The LB thinks it’s healthy, but users get errors.

2. Deep Checks (L7)

The LB sends an HTTP request to a specific endpoint: GET /health.

  • Check: “Is the application functional?”
  • Implementation: The /health endpoint checks DB connectivity, Cache status, and Disk space.
  • Logic:
  • If DB is down → Return 503 Service Unavailable.
  • LB sees 503 → Marks node as Unhealthy.
  • Result: The LB stops sending traffic until the app recovers.

[!WARNING] Flapping: A dangerous state where a server toggles rapidly between Healthy and Unhealthy. This usually happens when a server is overloaded; it passes a simple Health Check (idle) but fails actual traffic (load). Hysteresis (requiring X successes to be marked healthy again) solves this.

3. Connection Draining (Graceful Shutdown)

What if you want to take a server offline for maintenance? You can’t just kill it, or you’ll drop active users.

  • Mechanism: You tell the LB to “Drain” Server A.
  • Action: The LB stops sending new requests to Server A but allows existing connections to finish (until a timeout, e.g., 30s).
  • Result: Zero downtime deployments.

5. System Walkthrough: A Request’s Journey

Let’s trace exactly what happens when you type www.example.com into your browser.

  1. DNS Resolution:
    • Browser asks DNS: “Where is example.com?”
    • DNS (GeoDNS) replies: “Go to IP 1.2.3.4 (The Load Balancer).”
  2. Connection Establishment:
    • Browser sends TCP SYN to 1.2.3.4.
    • LB accepts connection (Completes 3-way handshake).
  3. Load Balancing Decision:
    • LB looks at its pool of backend servers (10.0.0.1, 10.0.0.2).
    • LB applies Algorithm (e.g., Round Robin). “It’s 10.0.0.1’s turn.”
  4. Forwarding:
    • LB opens a connection to 10.0.0.1 (or reuses a pooled connection).
    • LB forwards the HTTP request.
  5. Response:
    • Server 10.0.0.1 processes request and sends HTML back to LB.
    • LB forwards HTML back to Browser.

6. High Availability of the LB itself

“But wait… if the LB is the Manager, what if the Manager has a heart attack?” The Load Balancer itself is a SPOF.

To solve this, we use Redundancy:

Active-Passive (High Availability)

  • Setup: Two LBs. One is Active, the other is Passive (standby).
  • Mechanism: They talk using VRRP (Virtual Router Redundancy Protocol) and share a Floating IP (VIP).
  • Failover: If Active stops sending heartbeats, Passive takes over the VIP.

High Level Design: Active-Passive HA

👤 PUBLIC USER FLOATING VIP ⚖️ ACTIVE LB (MASTER) ⚖️ PASSIVE LB (SLAVE) VRRP HEARTBEAT 🏠🏠🏠 BACKEND POOL

7. Deep Dive: Global Server Load Balancing (GSLB)

What if your users are in Tokyo, London, and New York? A single LB in Virginia is not enough. You need GSLB.

1. GeoDNS (The Phonebook Strategy)

  • Mechanism: The DNS server looks at the User’s IP.
  • Logic: “User is from Japan IP range? Return the IP of the Tokyo LB.”
  • Pros: Simple, supported by Route53, Cloudflare.
  • Cons: DNS Caching. If Tokyo goes down, users in Japan might still try to connect to the dead IP for 5 minutes (TTL).

2. Anycast (The Magic IP)

  • The Unicast Problem: Standard networking (Unicast) maps one IP to one physical server. As learned in the OSI model, this forces distant users to suffer maximum light-speed latency.
  • Mechanism: BGP (Border Gateway Protocol) routing.
  • Concept: Anycast breaks the one-to-one rule. You announce the SAME IP Address (e.g., 1.1.1.1) from multiple physical locations (Tokyo, London, NY).
  • Routing: The internet’s routers naturally send the user’s packet to the physically closest data center (fewest hops).
  • Pros: Instant Failover (BGP updates faster than DNS), zero DNS caching issues, and immunity to trans-oceanic latency.
  • Cons: Complex to set up (requires owning an ASN or using a provider like Cloudflare).

Staff Engineer Tip: Anycast + ECMP. How do multiple routers “agree” on where to send traffic for the same IP? They use ECMP (Equal-Cost Multi-Path). Within a data center, a router can see four paths to the same Anycast VIP. It hashes the “5-tuple” (Source IP, Source Port, Dest IP, Dest Port, Protocol) to pick a path. This ensures that all packets from a single TCP connection follow the same physical path, preventing out-of-order delivery that would destroy performance.

Staff Engineer Tip: Direct Server Return (DSR). In standard load balancing, the LB is a bottleneck because it must process both the small incoming request and the massive outgoing response (e.g., a video file). With DSR, the LB only handles the request. The backend server is configured to respond directly to the client using the LB’s IP as the source. This allows a 10Gbps LB to handle 100Gbps of traffic because it never sees the “egress” data.

High Level Design: Global Load Balancing (GSLB)

1. GeoDNS (Stateful Caching)

👤 DNS SERVER L.A. DATA CTR LONDON DC

2. Anycast (BGP / Magic IP)

👤 1.1.1.1 NODE PAR NODE LON

8. Observability: The RED Method

How do you know if your Load Balancer is healthy? We use the RED Method for monitoring microservices and LBs.

  • R - Rate: The number of requests per second (RPS).
  • Metric: http_requests_total
  • Alert: Sudden drop (outage) or spike (DDoS).
  • E - Errors: The number of requests failing.
  • Metric: http_requests_5xx
  • Alert: Error rate > 1%.
  • D - Duration: How long requests take.
  • Metric: http_request_duration_seconds
  • Alert: P99 latency > 500ms.

9. Interactive Demo: The Traffic Controller

[!TIP] Try it yourself: Test the resilience of a Load Balanced system.

  1. Start Traffic: Watch the LB distribute requests (Round Robin).
  2. Kill a Server: Click the “Power” button on a server to crash it. Watch the LB stop sending requests to it.
  3. Burst Mode: Simulate a sudden traffic spike.
  4. Drain: Gracefully remove a server without dropping connections.
LB Status
ONLINE
Throughput
0 RPS
Healthy Nodes
3 / 3
⚖️
TRAFFIC LB
SRV-01
🏠
HEALTHY
SRV-02
🏠
HEALTHY
SRV-03
🏠
HEALTHY
SYSTEM STATUS: NOMINAL (3/3 NODES)

10. Summary

  • Horizontal Scaling > Vertical Scaling.
  • Deep Health Checks ensure the application is logically working, not just “on”.
  • Connection Draining ensures smooth deployments.
  • GSLB uses Anycast and GeoDNS to balance traffic globally.

Mnemonic — “Health Checks Need Depth”: A load balancer that only pings a server is like a doctor who checks heartbeat but ignores blood pressure. Use deep checks (HTTP 200 from /health with DB connectivity verified) or you’ll route to a node whose CPU is 100% but whose TCP port is open.

Staff Engineer Tip: GSLB is Not Just for Disaster Recovery. Too many teams only use Global Server Load Balancing (GSLB) as DR failover. Use GSLB with latency-based routing to reduce p99 latency for global users by 40-60% by routing to the geographically nearest healthy cluster — proactively, before any disaster. You get both performance and resilience for free.