In a system with 1,000 microservices, you cannot use static IP addresses. Servers spin up and die every minute. Service Discovery is the mechanism that allows Service A to find the ever-changing address of Service B.


1. Client-Side vs. Server-Side Discovery

Server-Side Discovery (The Classic way)

The client talks to a fixed “Load Balancer” (like an AWS ALB). The LB queries a Service Registry and forwards the request.

  • Pros: Simple for the client.
  • Cons: The LB is a single point of failure and an extra “hop” (higher latency).

Client-Side Discovery (The Staff way)

The client queries the Service Registry directly, gets a list of healthy IPs, and performs the load balancing itself.

  • Pros: No extra hop (lower latency). Better visibility into backend health.
  • Cons: Client code becomes complex. You need to implement LB logic in every language your company uses (Go, Java, Python).

2. The gRPC Pattern: Lookaside Balancing

Staff engineers often use a hybrid called Lookaside Load Balancing.

  1. The gRPC client asks a “Load Balancing Server” (the Control Plane) for a list of targets.
  2. The Control Plane tells the client exactly where to go.
  3. The client sends the data directly to the target.

This keeps the client code thin while keeping the heavy lifting in a centralized (but out-of-band) service.


3. The Sidecar Pattern & Envoy

To avoid the “every language needs LB code” problem, we use Sidecars.

Your application (Service A) doesn’t talk to the network. It talks to a local proxy (like Envoy) running on localhost. Envoy handles the retries, circuit breaking, and service discovery.

The xDS API: The Real Secret

Envoy doesn’t use a config file. It uses the xDS API (Discovery Service).

  • LDS: Listener Discovery Service.
  • RDS: Route Discovery Service.
  • CDS: Cluster Discovery Service.
  • EDS: Endpoint Discovery Service.

Staff Insight: A “Service Mesh” (like Istio or Linkerd) is just a Control Plane that talks xDS to thousands of Envoy proxies. When you add a new server, the Control Plane pushes an EDS update to all Envoys, and the entire fleet knows about the new IP in milliseconds.


4. Visualizing the Mesh

graph TD subgraph Control_Plane ["Mesh Control Plane (Istio/Custom)"] xDS[xDS API Server] end subgraph Node_1 ["Kubernetes Pod A"] App1[App Code] -- Localhost --> Sidecar1[Envoy Sidecar] end subgraph Node_2 ["Kubernetes Pod B"] Sidecar2[Envoy Sidecar] --> App2[App Code] end xDS -- "Push Config (EDS)" --> Sidecar1 xDS -- "Push Config (EDS)" --> Sidecar2 Sidecar1 -- "Load Balanced mTLS" --> Sidecar2

Staff Takeaway

A Staff engineer views networking as Software Defined (SDN).

  • Service Discovery isn’t just a list of IPs; it’s the foundation of routing.
  • xDS is the industry-standard language for “Programmatic Infrastructure.”
  • Sidecars decouple your application logic from the “Grungy” details of distributed networking (retries, timeouts, mTLS).