YURKOL Ltd - Custom Software and Cloud Architectural Solutions

gRPC Load Balancing at Scale

When building distributed systems with gRPC, load balancing works fine at small to medium scale. However, as your system grows, seemingly appropriate load balancing strategies can break in surprising ways due to the fundamental nature of how gRPC operates.

The Problem: Connection Multiplexing

Unlike traditional HTTP/1.1 where each request typically creates a new connection, gRPC uses HTTP/2 which multiplexes multiple streams over a single long-lived TCP connection. This creates a fundamental disconnect with traditional load balancing approaches.

load balancing problem

// Client establishing a single connection
conn, err := grpc.Dial("service-name:port",
    grpc.WithInsecure(),
    grpc.WithDefaultServiceConfig(`{"loadBalancingPolicy":"round_robin"}`))
if err != nil {
    log.Fatalf("failed to connect: %v", err)
}

// This single connection will handle multiple concurrent requests
client := pb.NewServiceClient(conn)

// These requests all go through the same connection
for i := 0; i < 10000; i++ {
    go func() {
        response, err := client.SomeRPC(ctx, &request)
        // Handle response...
    }()
}
                

Here's what happens at scale:
Why It's Hard to Detect

This problem remains hidden at low to medium scale because:

Often, this manifests as a "long tail" of latency. The p50 latency looks fine, but p95 and p99 latencies are significantly worse, indicating that some percentage of requests are being handled by overloaded instances.

The Technical Culprits

Three specific mechanisms contribute to this problem:

// 1. max_concurrent_streams parameter limits how many
// concurrent streams a connection will handle

// 2. HTTP/2 connection multiplexing creates uneven load
// as streams are not evenly distributed

// 3. Load balancers operate on connections, not streams
// config example (oversimplified):
loadBalancer:
  type: tcp  # operates at connection level
  algorithm: round_robin
  # This has no awareness of HTTP/2 streams
                

When a connection reaches its max_concurrent_streams limit (default 100 in many implementations), new requests are queued at the client. But the client has no way to know which server instances are less loaded, so it continues to queue requests to potentially overloaded servers.

Effective Solutions

After dealing with this issue across multiple projects, we've found several approaches that work effectively:

1. Client-side load balancing

// Using a client-side resolver that's aware of active stream counts
resolver, _ := naming.NewDNSResolverWithFreq(1)
balancer := grpc.RoundRobin(resolver)
conn, err := grpc.Dial("service-name:port",
    grpc.WithBalancer(balancer),
    grpc.WithInsecure(),
    grpc.WithKeepaliveParams(keepalive.ClientParameters{
        Time:                10 * time.Second,
        Timeout:             2 * time.Second,
        PermitWithoutStream: true,
    }))
                

The client-side approach ensures the client is aware of how many RPCs are active on each connection and can distribute them appropriately.

2. Using Envoy Proxy with proper HTTP/2 configuration

clusters:
  - name: grpc_service
    type: STRICT_DNS
    lb_policy: ROUND_ROBIN
    typed_extension_protocol_options:
      envoy.extensions.upstreams.http.v3.HttpProtocolOptions:
        "@type": type.googleapis.com/envoy.extensions.upstreams.http.v3.HttpProtocolOptions
        explicit_http_config:
          http2_protocol_options:
            max_concurrent_streams: 100
    load_assignment:
      cluster_name: grpc_service
      endpoints:
      - lb_endpoints:
        - endpoint:
            address:
              socket_address:
                address: grpc-service
                port_value: 9000
                

Envoy understands HTTP/2 streams and can make load balancing decisions based on actual request load rather than just connection count.

3. Shorter connection lifetimes with proper cycling

// Server-side configuration to force connection cycling
server := grpc.NewServer(
    grpc.KeepaliveEnforcementPolicy(keepalive.EnforcementPolicy{
        MinTime:             30 * time.Second,
        PermitWithoutStream: true,
    }),
    grpc.KeepaliveParams(keepalive.ServerParameters{
        MaxConnectionIdle:     2 * time.Minute,
        MaxConnectionAge:      10 * time.Minute,
        MaxConnectionAgeGrace: 30 * time.Second,
        Time:                  30 * time.Second,
        Timeout:               5 * time.Second,
    }),
)
                

By configuring servers to cycle connections periodically, you can mitigate long-term imbalances without impacting application performance.

When to Implement These Solutions

These approaches should be considered when:

The conclusion: as scale increases, moving load balancing decisions closer to clients often works better than adding more intermediate proxies.


References