Caching patterns and the thundering-herd problem
It is 11:59:59 PM on a Flipkart Big Billion Days sale. A million phones in Bengaluru are refreshing the same Redboard banner page. The cached HTML expires at exactly midnight, and in the next millisecond ten thousand application servers each see a MISS, each fire the same query at the same MySQL primary, and the database falls over before the first user has finished saying "add to cart". This chapter is about why that happens, and the four-line fix that prevents it.
In short
A cache sits between a fast caller and a slow source of truth, and four patterns describe how reads and writes flow through it: cache-aside, read-through, write-through, and write-behind. The dangerous failure mode is the thundering herd (or cache stampede): when one hot key expires, every concurrent request misses, recomputes, and slams the source. The cheap fix is a SETNX recompute lock; the cleaner fix is probabilistic early recompute so the cache renews itself before it expires.
The previous three chapters built up the in-memory store itself — Redis data structures, persistence, replication and Cluster, and the contrast with Memcached's minimalism. This chapter is the one that closes Build 22 by stepping back to the question every team reaches for caches eventually: given that I have an in-memory store, what is the discipline of using it as a cache? The answer is four patterns, two failure modes, and a fix you can write in five lines of Python.
The four patterns: who reads, who writes, who owns the staleness
A cache is defined by three things — what triggers a read, what triggers a write, and who is responsible for keeping the cache and the source of truth in sync. Pick any reasonable answer to those three questions and you land on one of four canonical patterns. The names are old and the boundaries are fuzzy, but the trade-offs are sharp.
Two patterns deserve dwelling on, because the trade-offs decide the architecture.
Cache-aside is the default. Application code looks like v = cache.get(k); if v is None: v = db.read(k); cache.set(k, v, ttl=300); return v. The application owns both halves of the relationship: it reads from the cache, falls back to the database, and writes back to the cache. On the write path, the canonical idiom is write to the database, then delete the cache key — not cache.set(k, new_value), because two concurrent writers can race and leave the cache with the older value (write A reads, write B reads, write B writes DB, write B sets cache, write A writes DB, write A sets cache → cache holds A's value but DB holds B's value). Deleting the key forces the next reader to repopulate from the now-correct DB. Why "delete on write" beats "set on write": the database is the source of truth and serialises writes; the cache only needs to invalidate. A DEL is idempotent and cannot leave the cache in a state that contradicts the DB. A SET competes with concurrent SETs and the last writer wins on the cache — which is the wrong winner if it is not the last writer on the DB.
Write-behind is the dangerous one. It looks attractive because the application's write returns the moment the cache acknowledges, and a background worker batches writes to the DB. The catch is that the cache is now the source of truth between the ack and the flush, and a Redis crash in that window erases real user data. The pattern is only safe when the cache has its own durability story (AOF with appendfsync everysec, replication to a synchronous follower) and when the application has accepted up to N seconds of write loss as a contract. In practice, very few teams should reach for write-behind; if you think you need it, you usually want a queue (Kafka, SQS) in front of the database instead, which gives you ordered, durable, replayable writes without pretending the cache owns them.
The thundering herd: one expiry, ten thousand misses
Now the failure mode this chapter is named for. Pick any cache pattern. Set a TTL on a key. Make the key very popular. The moment the TTL expires, every concurrent request that was reading that key sees a miss at the same instant, and every request fires its own database query to refill the cache. If 10,000 requests/second were hitting the cache and the recompute takes 50 ms, you have just sent 500 simultaneous identical queries to a database that was sized for one. This is the thundering herd — also called cache stampede, dogpile effect, or cache miss storm.
The mechanism is mechanical, not exotic. While the cache held the value, the database saw 0 queries on that key. The instant the TTL fires, the database sees request_rate × recompute_latency queries on that key — possibly within a single millisecond if the cache delete was atomic. Every replica of the application sees the same miss, none of them coordinate, and they all race to refill. The first one that finishes wins (its SET populates the cache); the others finish a few milliseconds later and overwrite with the same value, having paid the database cost for nothing.
The textbook expiry is not the only trigger. The same pattern fires when a Redis node is rebooted (every key disappears), when FLUSHDB is run by accident, when a deploy bumps the cache-key namespace (every key effectively becomes new), when the cache evicts under memory pressure, or when an application bug invalidates too aggressively. The herd is a system-level property, not a property of any particular pattern.
The fix: SETNX recompute lock + early probabilistic recompute
Two fixes together kill the herd. The first is a recompute lock: when a request misses, only the first one is allowed to recompute; the rest either wait briefly and retry, or serve a stale value. The second is probabilistic early recompute: a small fraction of requests refresh the key before it actually expires, so the synchronous-miss event never happens for a hot key. Together they cap the database hit at exactly one per recompute interval, regardless of how popular the key is.
Here is the recompute lock in five lines, using Redis SET key value NX EX ttl (the modern atomic form of SETNX):
import redis, json, time, random
r = redis.Redis(decode_responses=True)
def get_with_lock(key, loader, ttl=300, lock_ttl=10):
"""Cache-aside read with a SETNX recompute lock to prevent stampedes."""
val = r.get(key)
if val is not None:
return json.loads(val)
# Cache miss. Try to acquire the recompute lock.
lock_key = f'lock:{key}'
got_lock = r.set(lock_key, '1', nx=True, ex=lock_ttl)
if got_lock:
try:
fresh = loader() # the slow recompute
r.set(key, json.dumps(fresh), ex=ttl)
return fresh
finally:
r.delete(lock_key)
# Lost the race — wait briefly and retry the cache.
for _ in range(20):
time.sleep(0.05 + random.random() * 0.05) # 50–100 ms jitter
val = r.get(key)
if val is not None:
return json.loads(val)
# Fallback: pay the DB cost ourselves (better than 500-error)
return loader()
def assemble_product(sku):
"""Pretend this hits MySQL + inventory service — 50 ms."""
time.sleep(0.05)
return {'sku': sku, 'price': 19999, 'stock': 42, 'name': 'Redmi Note 13'}
# The herd test: 500 threads ask for the same key the moment after expiry
# Without the lock: 500 calls to assemble_product (DB melts)
# With the lock: 1 call to assemble_product, 499 sleep-and-retry hits
Run that under 500 concurrent requests and you will see exactly one assemble_product call; the other 499 sleep 50–100 ms, find the now-populated cache, and return. The database paid one query instead of 500. Why SET ... NX EX 10 is the right primitive: SET with NX (only if not exists) is atomic on a single Redis instance — exactly one client among 500 simultaneous callers gets the True reply, every other gets None. The EX 10 second TTL is a safety net: if the lock holder crashes mid-recompute, the lock auto-releases in 10 seconds and the next request retries. Without the TTL, a crashed worker would block every reader on that key until manual intervention.
That fixes the spike at expiry, but it leaves a smaller problem: the lock holder still pays the recompute synchronously, and during those 50 ms, the 499 other readers are blocked. For very hot keys you want the recompute to happen before the key expires, so no reader ever sees the gap. This is probabilistic early expiration (XFetch, named after the algorithm in Vattani et al., 2015):
import math, random
def get_with_probabilistic_refresh(key, loader, ttl=300, beta=1.0):
"""Each reader independently rolls a die to refresh early."""
pipe = r.pipeline()
pipe.get(key)
pipe.get(f'{key}:delta') # how long the last recompute took (ms)
pipe.ttl(key)
val, delta_ms, remaining = pipe.execute()
if val is None:
return _recompute(key, loader, ttl)
delta = float(delta_ms or 50) / 1000.0 # default 50 ms if unknown
expiry = time.time() + (remaining or 0)
# XFetch: refresh probability rises as TTL approaches zero.
if time.time() - delta * beta * math.log(random.random()) >= expiry:
return _recompute(key, loader, ttl)
return json.loads(val)
def _recompute(key, loader, ttl):
start = time.time()
fresh = loader()
elapsed_ms = int((time.time() - start) * 1000)
pipe = r.pipeline()
pipe.set(key, json.dumps(fresh), ex=ttl)
pipe.set(f'{key}:delta', elapsed_ms, ex=ttl)
pipe.execute()
return fresh
The trick is the line time.time() - delta * beta * log(random()). As remaining shrinks toward zero, the probability that this expression exceeds expiry grows; one of the in-flight readers will roll a small enough random number to trigger a refresh before the natural expiry. The hotter the key (more readers per second), the more likely some reader refreshes early, while cold keys experience zero overhead. Why −log(random()) is the right shape: random() is uniform on (0, 1), so −log(random()) is exponentially distributed. The probability of triggering scales with the key's request rate — exactly the keys you most want refreshed early. Cold keys get refreshed only when they actually expire (one recompute, no stampede because there is no herd). Hot keys get refreshed by a single early reader, with a smooth probability curve that has no sharp boundary anyone can synchronise against. Combined with the SETNX lock, you get the full fix: hot keys self-refresh before expiry, and the rare honest expiry triggers a single locked recompute instead of a herd.
There are two cheaper alternatives worth knowing. Stale-while-revalidate (the HTTP-cache pattern, also baked into CDNs like Cloudflare) serves the stale value from the cache while a single background worker refreshes it; the stampede becomes invisible because every reader gets a hit. TTL jitter — instead of SETEX 300, use SETEX (300 + random.randint(0, 60)) — desynchronises expiries across keys so a million keys created at deploy time do not all expire in the same second. Jitter alone does not save a single hot key from stampeding, but it saves the system from every key stampeding at the same minute.
Common confusions
-
"Cache-aside and read-through are the same thing." They differ in who does the loader call. In cache-aside the application has both
cache.getanddb.readin its own code; if the cache is down, reads still work (slow but functional). In read-through the cache library has the loader, and a cache outage is a read outage because the application has no fallback path. Read-through is a Java-shop favourite (Caffeine, EhCache); cache-aside is the Python/Go default because there is no library — you write the eight lines yourself. -
"Write-through guarantees no stale reads." It guarantees no stale reads from this cache on this node after the write returns. It does not guarantee anything about other application servers using a different cache layer, about read replicas of the database lagging the primary, or about a CDN cached page upstream. "Write-through" makes one cache layer consistent with one DB layer — full read-after-write across a distributed system requires more (consistent reads from primary, cache invalidation broadcast, or a single global cache).
-
"Write-behind is faster, so it is better." Write-behind makes writes return faster at the cost of treating the cache as the source of truth between ack and flush. If the cache crashes in that window — Redis without persistence, a Memcached node OOM-killed — you lose the writes. The pattern is correct only when the cache itself has durability (AOF + replication) and when losing the unflushed window is an acceptable failure mode. Most teams reach for write-behind, hit a Redis crash, lose user data, and migrate to a queue + write-through.
-
"
SETNX key valueis the same asSET key value NX EX ttl." Functionally, on the happy path, yes. But the no TTL version is a lock that lasts forever if the holder crashes. TheEX ttlvariant auto-expires the lock so a crashed recomputer cannot block every future reader of that key. Always use the combined form (SET ... NX EX ...) for any lock you would not personally babysit; it has been atomic in one round trip since Redis 2.6.12 (2012). -
"Probabilistic early refresh is just TTL jitter with extra steps." They solve different problems. Jitter desynchronises which keys expire when, so a million keys created at the same moment do not all expire together. Probabilistic early refresh tackles the single hot key: even one popular key with no jitter would stampede on every expiry. You usually want both — jitter at
SETtime for the population, and early refresh atGETtime for the hot tail. -
"
MEMCACHED_DELETE_THEN_RECOMPUTEis what cache invalidation means." The stronger pattern is delete the cache key after the DB write commits, not before. Deleting before the commit means a concurrent read repopulates the cache from the old DB state and overwrites your invalidation. The order isBEGIN; UPDATE row; COMMIT; cache.delete(key)— and even then, the read-then-write race still exists in microscopic windows, which is why high-consistency systems use change-data capture or invalidation messages from the DB itself instead of relying on application code.
Going deeper
The math of −log(random())
The XFetch algorithm works because of one elegant property of the exponential distribution. If U is uniform on (0, 1), then X = −log(U) is Exponential(1). Multiplied by the recompute time delta, the random variable delta × β × X has mean delta × β and a long thin tail. The check now − delta × β × log(random()) ≥ expiry triggers a refresh when the rolled random sample is small enough; the probability of a small sample is calibrated by delta so the refresh window scales with the recompute cost. The full analysis is in Vattani, Chierichetti & Lowenstein (2015), "Optimal Probabilistic Cache Stampede Prevention" — a four-page VLDB paper that is unusually readable.
Negative caching: cache the misses too
A subtle related stampede: someone keeps requesting product:99999 which does not exist. Every request misses the cache (correctly — the row is not there) and queries the database (which returns nothing, also correctly), but the database query is paid every time. The fix is to cache the negative result — r.set('product:99999', '__NOT_FOUND__', ex=60) — and check for the sentinel on read. The TTL is short to allow new rows to appear quickly; the value is small and tagged so it does not collide with real data. Without negative caching, a single bot scanning random IDs can DOS your database through a perfectly healthy cache.
Cache penetration, breakdown, avalanche
Three named failure modes from Chinese tech blogs (击穿, 穿透, 雪崩) that map cleanly to English: cache breakdown is the single-hot-key herd this chapter is built around. Cache penetration is the negative-caching problem above — keys that do not exist hammering the DB. Cache avalanche is many keys expiring simultaneously (often after a deploy or a synchronous TTL on bulk-loaded data) — the fix is jitter at insert time. The three terms describe three distinct symptoms with three distinct fixes; muddling them produces the wrong fix and an outage that "looks like" the herd but is actually penetration.
Two-tier caching: L1 in-process + L2 Redis
For the very hottest keys (under a few thousand per app pod), an in-process cache (Caffeine in Java, cachetools.TTLCache in Python, lru-cache in Node) sits in front of Redis with its own short TTL. A read hits L1 (microseconds), falls back to Redis (sub-millisecond), falls back to the DB (tens of milliseconds). Two-tier caching takes pressure off Redis itself the same way Redis takes pressure off the DB; the cost is that L1 is per-pod so invalidation needs a pub/sub broadcast (or short L1 TTLs and acceptance of brief inconsistency). Facebook's TAO and Twitter's home-timeline cache both use multi-tier shapes.
CDN as a cache layer
Everything above also applies to the CDN layer (Cloudflare, Akamai, Fastly). The "Big Billion Days" stampede in the lead is, in production, mostly absorbed by the CDN edge — Cloudflare's cache-control: s-maxage plus stale-while-revalidate and stale-if-error directives implement the stale-while-revalidate pattern at the network edge so end users never see the herd. The application-level cache (Redis) catches what the CDN misses (logged-in pages, personalised data, dynamic API responses). The patterns are identical; only the layer and latency differ.
Where this leads next
This chapter closes Build 22 — In-memory databases. The next build moves up the stack to streaming systems where the "log is the source of truth" idea returns:
- Build 23 — Streaming databases & CDC. Kafka as a distributed log, change-data capture pulling DB writes into a stream, and stream-table joins. Many of the cache-invalidation patterns above are cleaner when invalidation messages flow through Kafka instead of being inferred by the application.
- For the data structures backing the cache, return to Redis data structures as the product — the SORTED SET sliding-window rate limiter and the STREAM consumer-group pattern both map onto cache-stampede-adjacent problems.
- For a related idea on the write side, see write-ahead logging — the durability trick; WAL is to durability what stale-while-revalidate is to availability — both decouple the slow path (commit, recompute) from the fast path (ack, serve).
- For the eviction story underneath every cache, see buffer pool design — LRU, CLOCK, 2Q; the same algorithms that decide which page to evict from a Postgres buffer pool decide which key to evict from a Redis with
maxmemory allkeys-lru.
References
- Vattani, Chierichetti & Lowenstein (2015), "Optimal Probabilistic Cache Stampede Prevention" — the four-page paper that introduced the XFetch early-recompute algorithm with the
−log(random())distribution. - AWS ElastiCache best practices: caching strategies — AWS's reference write-up of cache-aside, write-through, and TTL strategies for managed Redis/Memcached.
- Cloudflare: stale-while-revalidate — how Cloudflare implements the SWR pattern at the CDN layer; the same shape your application cache should have for hot keys.
- Facebook engineering: scaling Memcache at Facebook (NSDI 2013) — the canonical paper on running large-scale memcached fleets, including lease-based stampede mitigation that predates
SETNXlocks. - Redis docs: distributed locks with SET NX EX — the Redis manual's writeup of
SET ... NX EX ...as a single-instance lock primitive (and the Redlock multi-node generalisation). - Twitter engineering: caching at scale — Twitter's two-tier (in-process + memcached) cache architecture and the invalidation discipline that goes with it.
- /wiki/redis-data-structures-as-the-product — the data structures every pattern in this chapter is built on.
- /wiki/memcached-s-minimalist-design-what-redis-isn-t — the contrasting cache-only store, where every pattern here applies but
SETNXisaddand there is no Lua scripting.