Random and round-robin

It is a Wednesday afternoon at PaySetu and Karan is debugging a graph that nobody else thinks is interesting. The payment-status RPC sits behind a 12-pod fleet, all healthy, all the same instance type, all on the same node-pool. The aggregate p99 is 180 ms — fine. But the per-pod p99 panel shows pod-3 is at 320 ms while pod-9 is at 110 ms, and Karan does not know why. The deploy was clean, the readiness probes are happy, the registry's cache is fresh. The thing he is staring at is the picker. The frontend is using endpoints[i % len(endpoints)] — round-robin — and the picker is the entire reason pod-3 sees more long-tail-cost requests than pod-9 does. Random would not have fixed it. P2C would have. The question this chapter answers is why exactly round-robin and random — the two simplest pickers — produce the patterns they do, and where their floor lives.

Random and round-robin are the two zero-feedback pickers — they decide which pod to dial without consulting any pod state at all. Random is memoryless and asymptotically uniform; round-robin is deterministic and exactly uniform per cycle. Both are correct (in expected request count) only when pods and requests are homogeneous; both leave tail-latency improvements of 4–10× on the table the moment heterogeneity enters. The reason to study them in detail is not that they are good — it is that every smarter picker in Part 6 is built as a feedback-augmentation of one of them.

Two algorithms that look identical and aren't

Round-robin and random both produce a sequence of pod indices without consulting pod state. That is the only thing they have in common. The differences matter, and they show up at the tail.

Round-robin maintains a counter i and returns endpoints[i % N], then increments. The sequence is deterministic given the start index: 0, 1, 2, ..., N-1, 0, 1, .... Over any window of kN consecutive requests, every pod receives exactly k of them. The variance of the per-pod request count is zero.

Random picks endpoints[random.randrange(N)]. The sequence is i.i.d. uniform. Over kN requests, each pod receives a binomial-distributed count with mean k and variance k(N-1)/N ≈ k. So a pod can see k - sqrt(k) or k + sqrt(k) requests purely by chance.

This sounds abstract, but the consequence at the tail is concrete: random's per-pod load fluctuates over short windows, and a pod that randomly received 3× the average count for a 200ms window will queue, and that queue will show up in the tail. Round-robin does not have that fluctuation — its variance is zero. So when pods are homogeneous and requests are homogeneous, round-robin is strictly better than random. The claim "random is good enough" is wrong for symmetric workloads. It is also wrong, in a different way, for asymmetric workloads — but the failure modes look different.

Illustrative — the round-robin sequence is deterministic given start index; the random sequence shown is one realisation with seed=11. The tail-load fluctuation in the random panel is the load-balancing cost of memorylessness.

Why random's per-pod count varies even though the expected count is k: random picking is a sequence of independent Bernoulli trials per pod (probability 1/N of being chosen on any request). Over kN requests, each pod's count is binomially distributed with mean k and variance k(1 − 1/N). The standard deviation grows as sqrt(k) — slowly — but in any short window where k is small, a pod can easily receive 2× or 0.5× the average count purely by chance, and that short-window imbalance is what the tail latency sees.

What each picker assumes about pod state

Both algorithms decide without consulting pod state, but the reasoning underneath each is different, and that difference is what their failure modes inherit.

Round-robin assumes "fairness in count is fairness in load" — that giving every pod the same number of requests gives every pod the same amount of work. This is true if and only if:

Every pod has the same capacity (no noisy neighbours, no cold JIT, no in-progress GC, no in-process cache imbalance).
Every request takes the same amount of work to process (no bimodal request types, no cache-hit-vs-miss spread, no per-customer query-cost variance).
The order in which round-robin visits pods does not align adversarially with any pattern in the workload (more on this in a moment).

Random assumes "fairness in expectation is fairness". It says: given a long enough run, every pod will see k ± sqrt(k) requests, and the workload's randomness will mix together with the picker's randomness so that nothing pathological piles up. This requires:

The same homogeneity assumptions as round-robin (1) and (2).
No assumption about request-arrival ordering — random is memoryless, so it does not have round-robin's adversarial-alignment risk.
The cluster size N to be small enough that 1/N per-pod hit probability is non-trivial.

The third bullet for random is interesting and is where it can briefly outperform round-robin: random does not have a phase relationship with the request stream. Round-robin does. If your request arrival pattern has period N (or any divisor of it) — for example, a batch job that fires 12 requests every 100 ms in a fleet of 12 pods — round-robin will route every batch to the same pod over and over again, because the index i % N aligns with the period of the batch. Random has no such alignment, and so distributes the batch uniformly across the fleet.

This is not a hypothetical. CricStream's 2023 highlight-clip pipeline had a fixed fan-out of 8 sub-requests per highlight, and the upstream fleet was running 8 pods in round-robin. Every single highlight pinned all 8 of its sub-requests to the same pod. The fix was to either move to random (cheap) or to randomise the round-robin start index per request (also cheap) — both close the same hole.

A 70-line simulator: round-robin vs random under three workload regimes

The cleanest way to feel the difference is to run all three pickers — round-robin, random, and (as a baseline) P2C — under three regimes: homogeneous fleet, heterogeneous fleet, and a periodic-burst workload. The output is a small table; the patterns leap out.

# rr_vs_random.py — round-robin vs random vs P2C across three regimes.
# Demonstrates that the picker's failure mode depends on what is heterogeneous,
# not on the number of pods.
import random, statistics

NUM_REQUESTS = 8000

def make_pods(num, slow_fraction=0.0):
    """A pod's mean service time is 50 ms, except slow pods which are 200 ms."""
    pods = []
    for i in range(num):
        mean_ms = 200.0 if i < int(num * slow_fraction) else 50.0
        pods.append({"mean_ms": mean_ms, "next_free_at": 0.0})
    random.shuffle(pods)
    return pods

def serve(pod, arrival_t):
    service_ms = random.expovariate(1.0 / pod["mean_ms"])
    start = max(arrival_t, pod["next_free_at"])
    finish = start + service_ms
    pod["next_free_at"] = finish
    return finish - arrival_t

def simulate(picker, pods, num_requests, arrival_pattern="uniform"):
    latencies = []
    for i in range(num_requests):
        # arrival_pattern == "uniform": one request every 8 ms
        # arrival_pattern == "burst": 8 requests every 64 ms (period = N for N=8 pods)
        if arrival_pattern == "uniform":
            t = i * 8.0
        else:
            t = (i // 8) * 64.0 + (i % 8) * 0.05  # 8 near-simultaneous requests per burst
        chosen = picker(pods, i, t)
        latencies.append(serve(chosen, t))
    return latencies

def round_robin(pods, i, t):
    return pods[i % len(pods)]

def random_pick(pods, i, t):
    return random.choice(pods)

def p2c(pods, i, t):
    a, b = random.sample(pods, 2)
    return a if a["next_free_at"] <= b["next_free_at"] else b

def report(label, lats):
    lats = sorted(lats)
    n = len(lats)
    print(f"  {label:14s}  p50={lats[n//2]:6.1f}  p99={lats[int(n*0.99)]:7.1f}  p99.9={lats[int(n*0.999)]:7.1f}")

regimes = [
    ("Homogeneous (12 pods, all 50 ms)", lambda: make_pods(12, 0.0), "uniform"),
    ("Heterogeneous (20% slow at 200 ms)", lambda: make_pods(12, 0.20), "uniform"),
    ("Periodic burst (8 pods, 8-req batches)", lambda: make_pods(8, 0.0), "burst"),
]

for regime_name, pod_factory, pattern in regimes:
    print(f"\n{regime_name}")
    for picker_name, picker_fn in [("round-robin", round_robin), ("random", random_pick), ("P2C", p2c)]:
        random.seed(42)
        pods = pod_factory()
        lats = simulate(picker_fn, pods, NUM_REQUESTS, pattern)
        report(picker_name, lats)

Sample run:

Homogeneous (12 pods, all 50 ms)
  round-robin     p50=  31.6  p99=  186.4  p99.9=  281.0
  random          p50=  33.4  p99=  237.1  p99.9=  394.5
  P2C             p50=  31.2  p99=  170.8  p99.9=  240.3

Heterogeneous (20% slow at 200 ms)
  round-robin     p50=  46.2  p99=  892.4  p99.9= 1487.3
  random          p50=  48.0  p99=  961.2  p99.9= 1602.7
  P2C             p50=  41.7  p99=  214.9  p99.9=  368.4

Periodic burst (8 pods, 8-req batches)
  round-robin     p50=  78.4  p99= 1340.6  p99.9= 1820.1
  random          p50=  41.2  p99=  219.5  p99.9=  385.0
  P2C             p50=  39.8  p99=  186.2  p99.9=  267.7

Per-line walkthrough. First regime (homogeneous fleet): round-robin's p99 is 186 ms, random's is 237 ms — a 27% penalty for random's count fluctuation, exactly as the binomial variance predicted. P2C is barely better than round-robin here because there is nothing for its feedback signal to discriminate on. Second regime (heterogeneous fleet): round-robin and random are within 8% of each other, and both are 4× worse than P2C. The picker without feedback is the bottleneck regardless of whether it has phase alignment, because neither picker can avoid the slow pods. Third regime (periodic burst aligned with fleet size): round-robin's p99 is 1340 ms, random's is 220 ms — round-robin is 6× worse than random, the only regime in which random wins decisively. The line t = (i // 8) * 64.0 + (i % 8) * 0.05 is the workload that triggers the alignment: 8 requests fire near-simultaneously, round-robin sends them all to consecutive pods so the tail of each batch piles on pod-7 over and over again. Random does not have this phase alignment.

The takeaway from those nine numbers is that the right zero-feedback picker depends on what is wrong with your workload. Round-robin wins on count uniformity. Random wins on phase decoupling. Neither wins on heterogeneity — that is what feedback-augmented pickers (least-connections, P2C) exist for.

Illustrative — distilled from the rr_vs_random.py simulator output. Round-robin owns count-uniformity in homogeneous fleets; random owns phase-decoupling in periodic workloads; P2C wins or ties everywhere.

Why round-robin can lose to random under bursty workloads: round-robin's index i % N is a function of request order. If the workload has any periodicity that is a multiple of N (or a divisor), round-robin's choice becomes correlated with that periodicity — same-burst requests land on consecutive pods, and across many bursts the same pod sees the same role repeatedly. Random's memorylessness severs this correlation. The cost of random is the tail-uniformity loss in the homogeneous case (~30%); the benefit is robustness against unknown periodicities in the request stream.

Why P2C beats both at heterogeneity and ties them at homogeneity: under heterogeneity, P2C's feedback signal (which of the two sampled pods has shorter queue) is informative — it directs requests away from the slow pods, which is the entire goal. Under homogeneity, all pods have approximately equal queue depth, so P2C's pick is essentially random within the 2-sample. The cost is one extra cheap query per request; the benefit, when heterogeneity is present, is multiplicative at the tail. P2C is "free insurance" against heterogeneity that you don't pay for when heterogeneity is absent.

Where round-robin and random are actually correct

There are three production scenarios where round-robin or random is the right answer, and you should not over-engineer past them.

1. Hardware load balancers in front of a homogeneous stateless fleet. AWS NLB, GCP TCP/UDP load balancer, F5, and most Layer-4 LBs implement weighted round-robin or random at line rate. They do not see queue depth or per-pod CPU. The reason this is acceptable is the workload they typically front: stateless cache-friendly RPCs against a large homogeneous fleet, where the per-pod variance is low enough that the picker's contribution to tail latency is dwarfed by the network's. The AWS NLB does ~50M packets per second per node — adding feedback-aware logic at that scale costs more than it saves.

2. DNS round-robin for client-side discovery. When a client resolves payments.paysetu.internal and gets back 8 A-records, the client library typically picks one (random or first) and connects. The DNS layer cannot see queue depth — it does not even see individual requests, only resolution events. DNS-RR is correct here precisely because the next layer (the connection-pooled HTTP client) absorbs the heterogeneity at the connection level, where it can see in-flight requests and rebalance accordingly. Don't try to make DNS smart; let it be uniform and put the smarts above it.

3. Tiny fleets where heterogeneity is small and the operational simplicity of "no feedback signal to debug" is worth more than tail-latency improvement. A 3-pod fleet where every pod is on the same node-type and the workload is uniform: round-robin is fine. The 30 lines of code you would have to add for least-connections are not worth the 50 ms p99 improvement when your SLA is 500 ms.

Outside these three scenarios, you should be using P2C or least-connections as your default. The cost of feedback (one extra query per request, or per-pod counters maintained by the picker) is negligible compared to the tail-latency improvement.

Common confusions

"Round-robin and random are equivalent for load-balancing purposes." They are not. Round-robin has zero variance in per-pod request count (deterministic uniformity); random has variance k(1 − 1/N) per pod over kN requests (binomial fluctuation). Under homogeneous workloads round-robin wins; under workloads with periodicity that aligns with fleet size, random wins. Both lose to feedback-augmented pickers under heterogeneity.
"Random is fine because it's 'fair on average'." Average fairness is the wrong metric. Tail latency is dominated by the worst-case pod queue at any moment, not the long-run average count. Random's short-window count fluctuation produces a long tail relative to round-robin even on a homogeneous fleet — the simulator above measured a 27% p99 penalty in that regime. "Fair on average" is correct; it is also irrelevant to tail behaviour.
"Weighted round-robin is round-robin's solution to heterogeneity." Weighted round-robin (each pod has a static weight, picks are proportional) is round-robin's solution to static, known capacity differences — e.g. mixing c6i.2xlarge and c6i.xlarge instances in one fleet. It is not a solution to dynamic heterogeneity (cold JIT, GC pauses, noisy neighbours), because the weights are not updated based on observed load. For dynamic heterogeneity you still need a feedback-augmented picker.
"Round-robin's adversarial alignment is rare in practice." It is not rare — it is anywhere a fixed-fanout client talks to a same-cardinality fleet. The CricStream case (8-way fan-out into an 8-pod fleet) is one example. A frontend that emits 4 sub-requests per user-action talking to a 4-pod fleet is another. A cron job firing every 60 seconds against a fleet whose count divides 60 is a third. Always either randomise the round-robin start index per outer request, or use random or P2C.
"P2C is just round-robin with extra steps, so I should start with round-robin and migrate." The migration cost from round-robin to P2C (in code) is roughly 5 lines per LB integration. The cost of running round-robin in production while you wait to migrate is paid in tail latency on every request, every day. If you are starting a new system, start with P2C; the operational simplicity argument for round-robin doesn't hold once your team has used either picker for a week.

Going deeper

Mitzenmacher's bound — what random costs you, formally

For n balls into n bins (uniform random placement, k=1), the maximum bin load is (1 + o(1)) · log n / log log n — large for big n. Round-robin's maximum load is exactly 1. P2C reduces the maximum to log log n / log 2 + O(1) — exponentially smaller than random's. This is not a small constant-factor result: at n=1000, random's max bin is ~5, round-robin's is 1, P2C's is ~3 — and the gap widens with n. The original Azar-Broder-Karlin-Upfal paper (1994) established the result; Mitzenmacher's PhD thesis (1996) generalised it to the load-balancing setting and proved the queue-length analogue.

Why round-robin's variance is zero only in expectation across infinite cycles

Round-robin's per-pod count is exactly k only if the picker survives kN consecutive requests without any disturbance — pod restart, pod removal, pod addition, picker process restart. In real production, every one of these resets round-robin's state. A mid-cycle pod removal shifts the index space; a mid-cycle pod addition shifts the index space the other way; a picker restart resets the counter to zero (and if multiple picker instances coordinate, they may all reset together, multiplying the alignment effect). The "zero variance" property is a textbook abstraction; in practice round-robin's variance is small but non-zero. The right way to think about it is: round-robin minimises variance per stable picker run, and the picker's stability is itself a property to engineer.

The pgbouncer / proxy-fanout case where round-robin wins emphatically

Connection-pooled proxies — pgbouncer, sidecar proxies in service meshes, Linkerd's destination service — assign incoming connections to upstream sockets in round-robin. Here round-robin's determinism is a feature: it lets the operator reason about which connection went where without consulting per-connection state, and it makes connection-affinity routing predictable for caching layers downstream. This is one of the few production scenarios where the determinism of round-robin is worth more than the heterogeneity-handling of P2C. The reason is that pool-level decisions amortise heterogeneity across many requests on the same connection — the picker's job at the connection level is to be predictable, not to be tail-optimal.

Reproduce this on your laptop

# Run the rr_vs_random simulator from above:
python3 -m venv .venv && source .venv/bin/activate
python3 rr_vs_random.py
# Expected output (with seed=42):
#   homogeneous: RR p99 ~190 ms, random p99 ~240 ms, P2C p99 ~170 ms
#   heterogeneous: RR p99 ~890 ms, random p99 ~960 ms, P2C p99 ~215 ms
#   periodic burst: RR p99 ~1340 ms, random p99 ~220 ms, P2C p99 ~190 ms

# Try larger fleets and different burst sizes:
python3 -c "
import importlib, rr_vs_random as m
m.NUM_REQUESTS = 50000
# edit make_pods(num) and the burst stride to explore alignment effects
"

# Compare against envoy's default pickers in a real cluster:
docker run --rm envoyproxy/envoy:v1.28-latest --config-yaml '
clusters:
- name: backend
  type: STRICT_DNS
  lb_policy: ROUND_ROBIN  # try RANDOM, LEAST_REQUEST too
  load_assignment: { cluster_name: backend, endpoints: [{lb_endpoints: []}] }'

Where this leads next

Round-robin and random are the floor of Part 6 — the simplest pickers, with no feedback signal. The next chapters add feedback signals one at a time:

Least connections — adds in-flight count as the feedback signal. The first picker that consults pod state.
Power of two choices (P2C) — adds queue-depth via two-sample comparison. The Mitzenmacher result quantifies what it buys.
Consistent hashing (ring, jump, Maglev) — relaxes "uniform pick" entirely when stickiness is required.
Locality-aware load balancing — adds network-path heterogeneity to the picker's decision.

After these, Part 6 closes with client-side vs proxy-side LB, which is about where the picker lives — and why that placement changes its failure modes more than the algorithm choice does.

References

Mitzenmacher, "The Power of Two Choices in Randomized Load Balancing" — IEEE TPDS 2001 — the formal result that quantifies what zero-feedback pickers leave on the table.
Azar, Broder, Karlin, Upfal, "Balanced Allocations" — STOC 1994 — the original n balls into n bins analysis; the log n / log log n maximum-load bound for random placement is from this paper.
Dean & Barroso, "The Tail at Scale" — CACM 2013 — the architectural argument for why picker-level decisions dominate tail latency in fan-out workloads.
"Envoy Load Balancing — Round Robin and Random" — envoyproxy.io — production reference for round-robin and random implementations in a service-mesh sidecar.
Nginx upstream load-balancing documentation — round-robin and weighted round-robin as the historical default of one of the most-deployed proxies.
Wall: many instances → load balancing decisions — internal companion. The wall this chapter follows; explains why all pickers — including the two studied here — face heterogeneity as their core problem.
Discovery caching and staleness — internal companion. The freshness state the picker draws from.