Load Balancing Strategies

[!NOTE] This module explores the core principles of Load Balancing Strategies, deriving solutions from first principles and hardware constraints to build world-class, production-ready expertise.

1. The Traffic Cop of the Internet

Imagine you just launched an exclusive flash sale for highly anticipated concert tickets. Within seconds, a million users hit your domain. If all that traffic routes to a single server, its CPU will peg at 100%, memory will exhaust, and the server will inevitably crash, leaving users with error pages.

A load balancer is the solution. It sits between the clients and your backend servers, acting as a reverse proxy. It accepts incoming traffic and distributes it systematically across a pool of healthy backend servers, ensuring no single server is overwhelmed.

Core Responsibilities

  1. Distribution: Evenly (or intelligently) spread load across available servers.
  2. High Availability: Detect unhealthy servers and temporarily remove them from the routing pool.
  3. Scalability: Allow you to seamlessly add or remove servers without client disruption.

2. Layer 4 vs. Layer 7 Load Balancing

The OSI (Open Systems Interconnection) model defines how networks operate. Load balancers typically operate at either Layer 4 (Transport) or Layer 7 (Application). Understanding the distinction is critical for system design.

Layer 4: Transport Load Balancing (e.g., AWS NLB)

At Layer 4, the load balancer routes traffic based on network and transport layer data: IP addresses and TCP/UDP ports. It does not inspect the contents of the packet (the HTTP payload is invisible to it).

  • How it works: It uses Network Address Translation (NAT). When a packet arrives, the load balancer rewrites the destination IP to one of the backend servers and forwards it.
  • Pros: Extremely fast and consumes very little CPU. It can handle millions of requests per second because it’s just forwarding bytes.
  • Cons: “Dumb” routing. It cannot make decisions based on the requested URL, cookies, or HTTP headers.

Layer 7: Application Load Balancing (e.g., AWS ALB, NGINX)

At Layer 7, the load balancer decrypts the traffic, inspects the HTTP/HTTPS/WebSocket payload, and makes intelligent routing decisions based on the actual content of the request.

  • How it works: It terminates the client’s TCP connection, reads the HTTP headers, looks at the routing rules (e.g., “if URL starts with /api, route to Server Pool A; if /images, route to Pool B”), establishes a new TCP connection to the chosen backend server, and forwards the request.
  • Pros: Highly intelligent routing. Supports microservices architectures, rate limiting per user, and cookie-based session stickiness.
  • Cons: Slower and more CPU-intensive due to TLS decryption, connection termination, and deep packet inspection.
Feature Layer 4 (Transport) Layer 7 (Application)
Visibility IP and Port only. HTTP Headers, Cookies, URL path.
Speed Blazing fast (zero payload inspection). Slower (requires TLS decryption & inspection).
Routing Logic Simple: “Send Port 80 to Server Pool.” Smart: “Send /images to Image Service.”
Connection Setup Client connects directly to backend (via NAT). Client connects to LB, LB connects to backend.

3. Load Balancing Algorithms

How does the load balancer decide which server gets the next request? There are several algorithms, ranging from naive to highly sophisticated.

Static Algorithms

These do not take the current state of the backend servers into account.

  1. Round Robin: Requests are distributed sequentially (Server 1, then 2, then 3, then 1…). Simple, but assumes all requests have equal processing cost and all servers have equal capacity.
  2. Weighted Round Robin: Administrators assign a “weight” to each server based on its hardware specs. A server with a weight of 2 receives twice as many requests as a server with a weight of 1. Useful for heterogenous clusters.
  3. IP Hash (Consistent Hashing): The LB hashes the client’s IP address (or a specific HTTP header) to map the request to a specific server. This guarantees that a specific client always hits the same server, enabling Session Stickiness.

Dynamic Algorithms

These monitor the real-time health and load of the backend servers.

  1. Least Connections: Sends the next request to the server with the fewest active connections. Excellent for environments where request processing times vary wildly (e.g., heavy database queries vs. simple cache lookups).
  2. Least Response Time: Combines Least Connections with the lowest average response time. It aggressively routes traffic to the fastest, most idle server.

4. Interactive: Balancing Algorithms in Action

Experience how different load balancing strategies route incoming traffic dynamically.

Strategy Simulator

System Idle.
👤
Client
⚖️
Load Balancer
Server 1 (W:2)
0 active
Server 2 (W:1)
0 active
Server 3 (W:1)
0 active

5. Health Checks: Active vs. Passive

A load balancer is useless if it blindly routes traffic to a dead server. It uses health checks to maintain an accurate pool of viable backend instances.

  • Active Health Checks: The load balancer proactively pings the backend servers at fixed intervals (e.g., every 10 seconds).
    • TCP Check: “Can I open a TCP connection on port 80?” Fast, but doesn’t guarantee the application layer is actually responding.
    • HTTP/HTTPS Check: “Does a GET request to /health return an HTTP 200 OK?” This is more robust, as the backend server can run a local script to verify its database connectivity before returning 200.
  • Passive Health Checks: The load balancer observes real client traffic. If a server starts returning HTTP 500 Internal Server Error to actual clients, the LB detects the anomalous error rate and dynamically evicts the server without needing a dedicated /health ping.

6. SSL/TLS Termination and Offloading

Establishing an encrypted HTTPS connection (the TLS handshake) requires intense CPU computation, largely due to asymmetric public key cryptography.

If every backend server has to handle its own TLS decryption, you are wasting valuable application CPU cycles. Instead, load balancers perform SSL Termination (Offloading).

  1. Client to LB: The client connects to the Load Balancer over a secure HTTPS connection. The LB holds the SSL certificate and decrypts the traffic.
  2. LB to Backend: The Load Balancer forwards the now-unencrypted payload to the backend servers over internal, private network HTTP.

Why is this safe? Because the traffic between the load balancer and the backend servers travels entirely within a secured Virtual Private Cloud (VPC), isolated from the public internet. Furthermore, managing one SSL certificate on the load balancer is vastly easier than rotating certificates across 500 backend nodes.

(Note: In ultra-high security environments like banking, SSL Passthrough or SSL Bridging is used, where traffic remains encrypted all the way to the application server to prevent internal snooping).

7. Case Study: High Concurrency Flash Sale (PEDALS)

Let’s apply the PEDALS framework to solve a real-world load balancing problem.

  • Problem: We are designing the infrastructure for a flash sale of 100,000 limited edition sneakers. We expect 2 million concurrent users at exactly 12:00 PM.
  • Estimate: 2M concurrent connections. High read-to-write ratio initially, shifting to heavy writes (inventory decrement) as users checkout.
  • Data Model: We need session state to track carts, but storing this in-memory on backend servers is dangerous.
  • Architecture:
    • We deploy an AWS Application Load Balancer (ALB) at Layer 7.
    • Because users will be rapidly adding items to their cart and refreshing, we configure Session Stickiness (IP Hash). However, this creates a risk: if one server goes down, all users pinned to it lose their cart.
    • Architectural Pivot: Instead of stickiness, we use Round Robin or Least Connections, and store all session state externally in a distributed Redis cache. The LB remains stateless.
  • Localized Details: To prevent the thundering herd from overwhelming the database, the Layer 7 ALB is configured with a rate-limiting rule (WAF) to block IP addresses making more than 50 requests per second.
  • Scale: To handle the initial burst, we pre-warm the load balancer (requesting our cloud provider to allocate massive LB capacity ahead of time, as dynamic scaling takes minutes).

8. Review & Mastery

Layer 4 Load Balancing

(Click to reveal)

Routes traffic based entirely on IP address and TCP/UDP ports. It is exceedingly fast but cannot read HTTP headers or URLs.

SSL Offloading

(Click to reveal)

The load balancer handles the CPU-intensive TLS decryption and forwards plain, unencrypted HTTP traffic to backend servers within the private network.

Least Connections Algorithm

(Click to reveal)

Dynamically routes the next request to the server with the fewest active connections. Ideal when requests have varying processing times.