In short

Memcached is what Redis would be if you stripped out everything except GET, SET, and DELETE on opaque bytes. No persistence, no replication, no types, no scripting — just a multi-threaded server with a slab allocator and a per-slab LRU, designed to be a stateless side-cache in front of your real database. Where Redis bets on rich server-side semantics, memcached bets on raw throughput and operational simplicity, and twenty years of Facebook, Wikipedia and Pinterest infrastructure have proven both bets right.

Open a terminal. Run docker run -d -p 11211:11211 memcached:1.6 -m 64 -t 4. That eight-token command launches a 64 MB cache on four worker threads, with no config file, no dump.rdb, no appendfsync, no replication — and when you stop it every stored byte is gone, on purpose.

After three chapters of Redis (data structures, persistence, HA topology) the natural question is what does the other in-memory store look like? It looks like Redis with almost everything taken out, and the interesting part is figuring out why each removal makes some real production problem simpler.

The thesis: subtract until what is left is fast

Brad Fitzpatrick wrote the first memcached at LiveJournal in 2003. He had one problem: page renders were hitting the database too hard, and adding more web frontends did not help because every frontend re-fetched the same hot rows. He needed a process that web servers could ask "do you have user 42?" before going to the database, and which would happily forget anything when it ran out of room. He wrote that process in Perl in a weekend, then rewrote it in C, and the design has barely changed since.

The minimalism is the feature, not a limitation he ran out of time to fix. Every property memcached lacks is a property that would slow it down or complicate it operationally if it had it.

Redis versus memcached: where each one says yes and where each one says noA two-column comparison table. Left column is Redis with checkmarks for nine data types, persistence, replication, scripting, transactions, pub/sub, and single-threaded with no implicit eviction. Right column is memcached with one data type (opaque bytes), no persistence, no replication, no scripting, no transactions, no pub/sub, and multi-threaded with implicit LRU eviction always on. Bottom note states that memcached's no answers are deliberate design choices for raw throughput.Redis vs memcached: the no's are the productRedisdata types9 (string, list, hash, ...)persistenceRDB + AOFreplicationasync + Sentinel + ClusterscriptingLua + FunctionstransactionsMULTI / EXEC / WATCHpub/subchannels + STREAMsexpiryexplicit per keyevictionopt-in: maxmemory + policythreadingsingle-thread loop (mostly)memory mgmtjemalloc, fragmentation real"a database that happens to be fast"memcacheddata types1 (opaque bytes, ≤ 1 MB)persistencenonereplicationnone (client-side hash)scriptingnonetransactionsCAS only (compare-and-swap)pub/subnoneexpiryper key, lazy + LRUevictionalways on, per-slab LRUthreadingmulti-thread, per-thread shardmemory mgmtslab allocator, near-zero frag"a stateless cache that happens to be a server"every "none" on the right is a deliberate subtraction — fewer features = more raw GET/SET throughput per core
Redis is a database that happens to live in RAM; memcached is a cache that happens to be a server. Every "none" in the right column makes the hot path shorter and the operational story simpler. Why subtraction is the design: each feature on the left side has an associated cost — persistence pays `fork()` and `fsync` overhead, scripting pays a Lua VM and a per-script lock, replication pays an output buffer and a replication backlog, rich types pay a type-tag dispatch on every command. Memcached drops all of them and keeps only the inner-most loop — `hash(key) → lookup → return value` — and runs it on every core in parallel. The result is roughly 2–3× the raw `GET` throughput per core that Redis achieves, on the use cases that fit memcached's shape (small bytes-in-bytes-out values, no server-side computation, throw-away on restart).

The decision tree between the two stores is short. If your value is a typed structure (a leaderboard, a queue, a session you want to update one field of) you want Redis — the structure is the product, and serialising it through memcached's bytes-only interface throws that product away. If your value is an opaque blob you compute once and re-fetch many times (a rendered HTML fragment, a JSON-serialised database row, an ML-model output for a request fingerprint) memcached is the cleaner fit. The first case is what the previous three chapters covered. This chapter is the second case.

The hot path: hash, slab, LRU

The whole memcached server fits in your head. A single hash table maps keys to items. Items live inside fixed-size memory chunks called slabs. Each slab class holds chunks of one size; items get rounded up to the next slab class on set. A doubly-linked LRU list per slab class tracks recency. When a slab class runs out of free chunks, the LRU's tail item is evicted and its chunk reused. That paragraph is most of slabs.c, items.c, and assoc.c from the upstream source.

Memcached internal architecture: slab allocator, hash table, per-slab LRUAn architecture diagram. At the top, four worker threads are shown each with its own libevent loop accepting client connections on a shared listening socket via accept-mutex. Below them, a single global hash table maps keys to item pointers. To the right of the hash table, a memory layout shows the slab allocator with five slab classes of increasing chunk size from 96 bytes through 192, 384, 768, and 1536 bytes. Each slab class is shown as a column of fixed-size chunks. A doubly-linked LRU list runs through the chunks in each slab class with HEAD and TAIL labels. The TAIL of each LRU is the eviction candidate when that slab class is full. At the bottom, a note explains that on SET the value size is rounded up to the next slab class, that on GET the hash table lookup goes through the global table but the LRU update is per slab class lock, and that eviction is per slab class so a flood of large items cannot evict small items.memcached internals: threads + hash table + slabs + per-slab LRUN worker threads (configurable: -t 4)thread 0: libeventthread 1: libeventthread 2: libeventthread 3: libeventglobal hash table (assoc.c)key bytes → item* pointer"sess:abc" → item @ 0x...A1"prod:101" → item @ 0x...B7"page:home" → item @ 0x...C2"u:42:cart" → item @ 0x...D9power-of-two chainedexpands online whenload factor > 1.5item struct{ exptime, nbytes, refcount, next/prev (LRU), key, value }stored in a slab chunkslab allocator: 5 size classes96 BHEADTAIL→evictLRU list192 BHEADTAIL→evict384 BHEADTAIL768 BHEADTAIL1536 BHEADTAILSET → round up size → grab a free chunk in that slab class → link at LRU HEAD; full slab → evict TAIL of that slab
One global hash table for the key → pointer lookup, but the memory underneath is sliced into **slab classes** of fixed chunk size with a separate LRU per class. A `set` of a 200-byte value rounds up to the 384 B slab class and links the new item at that class's LRU head. When the 384 B class fills up, only items in that class are evicted — a flood of large `page:*` blobs cannot push out small `sess:*` tokens. The threading model is one libevent loop per worker, with the listening socket guarded by an accept-mutex so exactly one thread takes each new connection.

Three design choices in this picture deserve their own sentence each, because they explain almost every operational property of memcached.

The slab allocator gives you near-zero fragmentation. A general-purpose malloc/free allocator suffers external fragmentation when you allocate many small objects of varied sizes and free a random subset — over weeks of running, the heap becomes a swiss cheese where total free bytes is large but no single contiguous run is large enough for the next allocation. Memcached pre-cuts memory into fixed-size chunks per class, so allocation is "pop a chunk off the free list" (O(1), no fragmentation) and freeing is "push it back". The cost is internal fragmentation: a 100-byte value stored in a 96 B chunk doesn't fit (it goes into the 192 B class instead), wasting 92 bytes. The default growth factor of 1.25 between slab classes keeps the worst-case waste at about 20 %. Why this matters more than it first looks: a long-running Redis instance with jemalloc and adversarial value sizes can hit 50 %+ fragmentation, which is why Redis added MEMORY PURGE and activedefrag. Memcached's slab allocator, by construction, cannot reach that state — there is no reachable pathological workload that breaks it. The trade is that the slabs themselves cannot resize once allocated to a class, so a sudden shift in the value-size distribution can leave you with a class that is full while another class has free chunks; this is what the slab_reassign and slab_automove features in modern memcached are for.

The LRU is per slab class, and that is a feature. A single global LRU would let a flood of large items evict all the small items, even though they live in different memory regions. Per-class LRU isolates eviction pressure: if your application suddenly starts caching a hundred 500 KB images, only the 768 KB slab class fills up and only its LRU tail gets evicted. The 96 B class holding session tokens is untouched. This is a property real Redis users have to engineer manually using key prefixes and maxmemory-policy per database; memcached gives it to you for free.

The threading model is true multi-threading, with a per-LRU lock not a global one. Each worker thread has its own libevent loop and accepts a slice of incoming connections; lookups and updates serialise on the per-slab-class lock, not a single global lock. This is what lets memcached scale linearly to the core count on read-heavy workloads — memtier_benchmark against memcached on an 8-core box typically pulls 1.5–2 M ops/sec, where Redis 7 (single-threaded I/O before the I/O-threads feature) tops out around 600–800 K ops/sec on the same hardware. The cost is that command implementations have to be carefully concurrency-safe; that is the entire reason the command set is so small — fewer commands, fewer concurrency invariants to maintain.

A subtler property worth naming: the global hash table is a chained hash table that resizes online. When the load factor crosses 1.5, memcached spawns a maintenance thread that allocates a new (doubled) bucket array and migrates entries in small batches while the worker threads keep serving. During the migration each lookup checks both the old and the new array, paying a few extra cache lines per get. The migration finishes within seconds even on a 100-million-key keyspace. Compare with Redis's incremental rehashing — same idea, slightly different mechanics — and you see the convergence: any in-memory store that wants to stay sub-millisecond cannot afford a stop-the-world resize, and both projects independently arrived at the same online-rehash answer.

The thirty-line client: text protocol, in your face

You can talk to memcached without any library. Open a TCP socket to port 11211 and type. The text protocol was the original interface; the binary protocol came later for slightly lower overhead, and the meta-text protocol is the modern best-of-both. Here is a session against the docker container you started above.

$ telnet localhost 11211
set greeting 0 60 12
hello, dipti!
STORED
get greeting
VALUE greeting 0 12
hello, dipti!
END
incr counter 1
NOT_FOUND
set counter 0 0 1
0
STORED
incr counter 1
1
incr counter 1
2
delete counter
DELETED
quit

That is the entire customer-facing surface. set <key> <flags> <exptime> <bytes> followed by the value bytes; get <key> returns the value or END; incr and decr for atomic 64-bit counters; add (set only if missing); replace (set only if present); cas (compare-and-swap with a version number, the only transactional primitive memcached has); delete; flush_all; stats; quit. Twenty-odd verbs total, against Redis's 240+. Most production deployments use the binary protocol (which adds a 24-byte header and length-prefixed key/value, avoiding the parser), but the shape of the command set is the same.

Now let us write the actual client in 25 lines of Python so you can feel the protocol.

# tinymc.py — a minimal memcached client over the text protocol
import socket

class TinyMC:
    def __init__(self, host='localhost', port=11211):
        self.s = socket.create_connection((host, port))
        self.f = self.s.makefile('rwb')

    def set(self, key, value, exptime=0):
        # set <key> <flags> <exptime> <bytes>\r\n<value>\r\n
        v = value.encode() if isinstance(value, str) else value
        self.f.write(f"set {key} 0 {exptime} {len(v)}\r\n".encode())
        self.f.write(v + b"\r\n")
        self.f.flush()
        resp = self.f.readline().strip()
        return resp == b"STORED"

    def get(self, key):
        self.f.write(f"get {key}\r\n".encode()); self.f.flush()
        header = self.f.readline().strip()
        if header == b"END":
            return None
        # VALUE <key> <flags> <bytes>
        _, _, _, nbytes = header.split()
        data = self.f.read(int(nbytes) + 2)[:-2]  # strip trailing \r\n
        self.f.readline()                          # consume final END
        return data

    def delete(self, key):
        self.f.write(f"delete {key}\r\n".encode()); self.f.flush()
        return self.f.readline().strip() == b"DELETED"

if __name__ == "__main__":
    mc = TinyMC()
    mc.set("greeting", "namaste, riya", exptime=60)
    print(mc.get("greeting"))   # b'namaste, riya'
    print(mc.delete("greeting"))  # True
    print(mc.get("greeting"))   # None

socket.create_connection. We open one TCP connection and reuse it for every command. The real pymemcache and python-memcached clients hold a connection pool, but for one client there is no benefit — the protocol is request-response on a single stream.

f"set {key} 0 {exptime} {len(v)}\r\n". The 0 is the flags field — a 32-bit integer the client can use to encode "this value is a pickled Python object" or "this is gzipped JSON". Memcached itself never inspects flags; the server stores them with the item and returns them on get. exptime is seconds from now (or absolute Unix time if larger than 30 days); 0 means "no expiry".

self.f.read(int(nbytes) + 2)[:-2]. The VALUE header tells us the byte count. We read exactly that many bytes plus the trailing \r\n. This is a length-prefixed binary read inside a text protocol — a deliberate choice that lets values contain arbitrary bytes (even \r\n) without escaping. The memcached parser is one of the cleanest examples of a hybrid text/binary line protocol you will read.

exptime=60. Sixty seconds from now, the server marks this item as expired. The expiry check happens lazily on the next get and proactively when the LRU walks through expired items; there is no background thread sweeping the entire keyspace. Why lazy expiry is correct: a sweep over millions of keys to find expired ones would burn CPU and lock pages for no benefit if those keys are never accessed again. The lazy approach pays the check-cost only when someone asks. The LRU walker handles the case where items are set with a TTL and then never read — those would otherwise sit in memory until evicted, which is fine but slightly wasteful. The walker runs in tiny batches and never blocks the worker threads.

Run the script, run a few set/get/delete cycles, and you have the entire memcached programming model. Compare with Redis's redis-cli: similar in spirit, but Redis has 240+ commands and you can spend a week learning them. Memcached has six you actually use. That is the entire point.

Benchmarking the throughput claim

Talk is talk. The "memcached is faster per core than Redis" line gets repeated everywhere; you should run it yourself before you believe it. Start two containers — docker run -d --name mc -p 11211:11211 memcached:1.6 -m 256 -t 4 and docker run -d --name rd -p 6379:6379 redis:7-alpine — and hit each with memtier_benchmark, the standard tool from Redis Labs (brew install memtier-benchmark on macOS, or build from source). On a 2024 M2 MacBook Air with the default 1:10 SET:GET ratio, 50 clients across 4 threads, 10 000 ops each:

$ memtier_benchmark -s localhost -p 11211 -P memcache_text \
    -t 4 -c 50 -n 10000 --ratio=1:10 --key-pattern=R:R --hide-histogram
... memcached ...
ALL STATS                              Ops/sec   Avg Latency
Sets                                  118,427      0.41 ms
Gets                                1,184,219      0.41 ms
Totals                              1,302,646      0.41 ms

$ memtier_benchmark -s localhost -p 6379 -P redis \
    -t 4 -c 50 -n 10000 --ratio=1:10 --key-pattern=R:R --hide-histogram
... Redis 7.2 ...
ALL STATS                              Ops/sec   Avg Latency
Sets                                   76,103      0.65 ms
Gets                                  761,041      0.65 ms
Totals                                837,144      0.65 ms

Memcached delivers roughly 1.55× the operations per second of single-threaded Redis on this workload, and 1.5× lower median latency, because the four worker threads are genuinely running the protocol in parallel while Redis's I/O thread (default 1) serialises everything through one core. Why the gap shrinks in real deployments: production Redis usually runs with io-threads 4 or higher (Redis 6+ feature) which moves socket reads/writes off the main thread, narrowing the gap to about 1.2×. And almost any non-trivial value size above 4 KB makes the network — not the server — the bottleneck, at which point both stores look identical from the client. The benchmark is most honest as a measurement of what the server's hot path can do when nothing else is in the way; in real apps the gap matters only on small-value, very high-throughput workloads where you are already hitting CPU on the cache box.

The numbers worth remembering: a single memcached instance on a modern 8-core box pulls 1.5–2 M ops/sec on tiny GETs, scales near-linearly with worker thread count up to the core count, and uses about 30–40 % CPU at 1 M ops/sec — there is real headroom before you saturate. Single-threaded Redis on the same box tops out around 800 K–1 M ops/sec on one core, with the other 7 cores idle as far as the main loop is concerned. The Redis Cluster answer to this is "shard horizontally to use more cores"; the memcached answer is "use the cores you already have on this box". Both are valid; both reach 5 M ops/sec total for the price of one extra layer of complexity.

Latency tail matters as much as throughput. Re-run memtier_benchmark with --print-percentiles=50,99,99.9 and you will see memcached's p99.9 latency hold under 2 ms even at 1 M ops/sec, while Redis's p99.9 spikes to 8–15 ms whenever a BGSAVE or AOF rewrite kicks off. The cause is fork() — even with copy-on-write, the parent stalls for tens of milliseconds while the kernel duplicates page tables for a multi-gigabyte process. Memcached has no fork() because it has no persistence; its tail latency is governed only by libevent, the OS scheduler, and the per-slab lock. For a Razorpay-style payments cache where p99.9 directly maps to user-visible failure, this is a bigger deal than the median throughput gap.

The fix on the Redis side is to disable RDB and run AOF with appendfsync no, but that gives up the persistence guarantee that justified picking Redis in the first place — the choice keeps coming back to "what is your value's shape".

A second number that lands often: the slab efficiency under realistic value-size distributions. A common production benchmark fills a 1 GB memcached with values drawn from a Pareto distribution (mean 600 B, p99 8 KB — a typical "JSON-ish" cache shape). Memory utilisation comes in around 88–92 % of -m; the remaining 8–12 % is a mix of slab-class internal fragmentation and the small overhead each item carries for its next/prev LRU pointers and key bytes. Redis on the same workload, with jemalloc and no activedefrag, often holds 80–85 % utilisation after a few days of churn — close, but with a long tail of badly-fragmented runs that need a MEMORY PURGE or a restart to clean up. Memcached has no equivalent fragmentation knob because there is nothing to fragment.

When to pick memcached, when to pick Redis

The decision is rarely "which is faster" — both are fast enough that the network round trip dominates. The decision is shape.

Decision tree for picking memcached versus RedisA decision tree starting from a question whether you need data types beyond opaque bytes. If yes, the path leads to Redis with examples like leaderboard, queue, or session. If no, the next question is whether you need persistence so data survives a restart. If yes, the path leads to Redis with persistence enabled. If no, the next question is whether you need pub-sub or scripting. If yes, the path leads to Redis. If no, the next question is whether your dataset is mostly serialised blobs that you compute once and refetch. If yes, the path leads to memcached with examples like rendered pages, ML model outputs, and DB row caches. The bottom row shows two real-world deployments. Facebook runs a memcached cluster sized in the hundreds of terabytes for fan-out caching and a Redis cluster sized in tens of terabytes for typed structures like counters and sessions. Indian fintech Razorpay runs Redis Cluster for rate limiting and idempotency keys and uses memcached for HTTP response caching at the edge.memcached vs Redis: pick by shape, not by benchmarkneed typed values?(list, hash, zset, ...)yesno→ Redisleaderboard, queue, session,rate limiter, geo radiusneed persistence?(survive restart)yesno→ Redis (RDB+AOF)queue you cannot lose,user-visible sessionneed scripting / pub-sub?(Lua, fan-out)yesno→ Redisnotifications, atomic Lua scripts→ memcachedopaque blobs,rebuildable cache,throughput & cores
The decision is mostly mechanical. Typed values → Redis. Must survive restart → Redis. Server-side computation → Redis. Otherwise — opaque blobs, rebuildable from the source of truth, hot enough that throughput per core matters — memcached wins on simplicity and CPU utilisation.

The shape that fits memcached perfectly: a Razorpay HTTP response cache where each entry is the JSON body of /v1/payments/<id> for 30 seconds, keyed by the request fingerprint, sitting in front of a payment-status read replica. Each value is 1–4 KB of opaque JSON. There is no value in storing it as a HASH; you never HGET one field — you take the whole blob, deserialise on the client, return it. Memcached's slab allocator gives you predictable memory behaviour, the per-thread shard gives you 1.5 M GETs/sec on 8 cores, and when the box reboots the cache rebuilds from the source of truth — a desirable property because it also clears any incident-time bad cache entries.

The shape that fits Redis perfectly: a Zerodha rate limiter that allows 100 orders per second per user, where every order calls INCR rl:user:42:second:N and aborts if the result exceeds 100. The INCR is server-side, atomic, and you also need an EXPIRE to drop the key after the second is over. Doing this in memcached is technically possible — incr exists, and you can add a "marker" key with TTL — but it is awkward, racier than the Redis version, and you lose the natural Redis idiom. The right tool is the one that fits the operation, and ranking, counting, deduplicating, scheduling and queueing all want server-side primitives.

The shape that fits both in different processes: Facebook's classic infrastructure (the famous "Scaling Memcache at Facebook" paper from 2013) ran a multi-petabyte memcached pool for HTML-fragment and database-row caching, alongside a smaller TAO and Redis-style typed store for the social graph and counters. The lesson is that "in-memory store" is not one tool; it is two tools that solve different problems and happily coexist.

A Swiggy menu cache: memcached for the JSON, Redis for the order queue

Imagine you are wiring a restaurant-menu service for Swiggy in Bengaluru. Each restaurant's menu is a 4–20 KB JSON blob assembled from MySQL plus an inventory service plus a pricing service; once assembled, the blob is identical for every customer who opens that restaurant for the next 30 seconds. Behind the menu is an order queue — every "place order" call lands in a queue that workers drain to call the kitchen API and the payments service.

Picking the right store for each side is mechanical. The menu is an opaque blob, identical across customers, expensive to compute, and totally rebuildable from MySQL — so you cache it in memcached:

def get_menu(restaurant_id):
    key = f"menu:{restaurant_id}"
    blob = mc.get(key)
    if blob is None:                           # cache miss
        blob = assemble_menu(restaurant_id)    # 200 ms across 3 services
        mc.set(key, blob, exptime=30)          # 30-second TTL
    return json.loads(blob)

The order queue is a typed structure: items must come out in FIFO order, you need atomic LPOP semantics with blocking, and losing the queue on restart is unacceptable because that is real customer money. So the queue lives in Redis with AOF persistence:

r.lpush("orders:bengaluru", json.dumps({"order_id": 7723, "user": "rahul", ...}))
# worker:
order = r.brpop("orders:bengaluru", timeout=10)  # blocking pop

Two stores, two shapes, one app. Most production Indian-scale services look exactly like this: memcached for the rebuildable cache, Redis for the typed primitives that need to survive a node death.

Common confusions

Going deeper

The slab growth_factor and how it set internal fragmentation

Memcached starts with a smallest-class chunk size (defaults to 96 B) and each successive class is growth_factor times larger (default 1.25). With 1.25× growth the worst-case waste for any value is about 20 % (a value that is 1 byte larger than class N rounds up to class N+1 which is 1.25× as big). Setting -f 1.07 gives 7 % worst-case waste at the cost of roughly 4× as many slab classes (more LRU lists, more lock pressure, more memory in metadata). Operators with a known narrow value-size distribution sometimes tune this; most leave it alone.

Slab calcification, slab_reassign, and slab_automove

If your application starts by storing many 200 B values (filling the 192 B / 384 B classes) and then shifts to storing 10 KB values (which need the 12 KB class), the small classes are full and the large classes are empty — but memcached cannot, by default, take pages from one class and give them to another. This is called slab calcification and was the most-cited operational pain point through the 2010s. The fix landed in 1.4.x: slab_reassign lets you manually move a slab page from class A to class B, and slab_automove (default off, set to 1 or 2 in modern deployments) lets the server detect the imbalance and move pages automatically. Read the doc/slab-reassign.txt upstream document for the gritty details — it is one of the better-documented edge cases in any database.

Extstore: SSDs as a slow tail for large items

Modern memcached (since 2018) has an optional extstore mode that stores small "headers" in RAM for every item but pushes the value to SSD if it is larger than a threshold and infrequently accessed. This sounds like persistence — it is not. Extstore data is lost on restart just like the in-RAM cache; the SSD is purely an extension of RAM, used to fit a larger working set than the RAM budget alone. Pinterest publicly described running extstore-backed memcached fleets where 80 % of the data lived on NVMe SSDs and the in-RAM portion was just the index. The latency goes from ~50 µs to ~150 µs for an SSD hit, still much faster than the source database.

The Facebook lease and "stale reads as a feature"

The 2013 Facebook paper introduced the lease: when a get misses, the server hands the client a 64-bit lease token. Other clients that miss for the same key during a short window get told "wait, someone is fetching it" instead of stampeding the database. The first client returns from the database, calls set with its lease token, and the cache fills. This is the thundering-herd mitigation that the next chapter covers, and it is one of the few server-side semantics memcached added beyond the cache primitive — because the cache primitive without it cost Facebook real money on cold starts.

Why the binary protocol exists if nobody uses it

The binary protocol (introduced in 2008) replaces text parsing with a fixed 24-byte header plus length-prefixed key/value. It saves about 1–2 µs per command and a small amount of CPU; on a 1 M ops/sec server that is real money. Most language clients support both. The reason text dominates in 2024 is that operators value the ability to telnet localhost 11211 and debug a live cache by hand, and 2 µs in a 200 µs network round trip is not worth the lost transparency. The newer meta-text protocol (a 2020s addition) adds new flag-based options for things like get-and-touch, atomic-with-cas, and recache-on-miss while keeping the human-typeable shape.

Client-side consistent hashing and the "gutter pool"

Memcached has no server-side replication and no clustering. Horizontal scale comes from the client maintaining a hash ring of server addresses; on every operation the client computes hash(key) % len(ring) and routes the command to one server. When a server dies the ring rebalances and the keys it held simply disappear — every client immediately misses on those keys and refills from the source of truth. This is dramatically simpler than Redis Cluster's gossip protocol and resharding ceremony, but it has one operational sharp edge: if the source of truth (your database) cannot absorb the miss-storm, the dying memcached node takes the database down with it. The Facebook paper's solution, the gutter pool, is a small backup memcached fleet that the client switches to on per-server failure — still no server-side coordination, just a second hash ring to absorb the spike while the primary recovers.

Why memcached refuses to add replication

Multiple times since 2010, contributors have proposed adding native replication to memcached. The maintainers have rejected each proposal. The reasoning is the same every time: replication implies a consistency model (sync? async? quorum?), a failure-detection layer, an authoritative state machine, and operational complexity that is exactly what users came to memcached to escape. If you want replication, you want Redis or Aerospike or you put memcached behind a write-through layer that handles it. Memcached's contract is "I will be a fast, dumb, in-RAM bag of bytes; if you want me to be more, use a different tool". Twenty years in, that opinionated minimalism has aged remarkably well.

When the cache outlives the database

A pattern you only meet at scale: a memcached fleet that is sized larger than the source database can serve under cold-cache load. If the entire fleet restarts simultaneously (a network event, a coordinated config push gone wrong), the database faces a thundering herd of misses it cannot survive — every web frontend asks for every hot row at the same instant. Facebook's answer in the 2013 paper was regional warmups: cold pools are pre-populated by streaming data from a warm pool in another region before going live. Pinterest in 2020 described a slow-restart mode where memcached extends its TTLs and the application gradually trickles requests in. The lesson is operational: when memcached is doing meaningful work, your database can no longer survive without it, and the "rebuildable from source of truth" mental model needs careful warmup choreography to actually be true. Chapter 173 on cache patterns covers the request-coalescing and stampede-prevention machinery that grew out of these incidents.

Where this leads next

References

  1. Brad Fitzpatrick, Distributed caching with memcached (Linux Journal, 2004) — the original article. linuxjournal.com/article/7451.
  2. Rajesh Nishtala et al., Scaling Memcache at Facebook (NSDI 2013) — the canonical paper on running memcached at multi-petabyte scale, leases, regional replication, and the gutter pool. usenix.org/conference/nsdi13/scaling-memcache-facebook.
  3. memcached upstream documentation — doc/protocol.txt, doc/slab-reassign.txt, and doc/storage.txt. github.com/memcached/memcached/tree/master/doc.
  4. dormando (Alan Kasindorf), Extstore: hybrid memory/SSD memcached (memcached blog, 2018) — design notes on the SSD-tier storage engine. memcached.org/blog/extstore-cloud/.
  5. Pinterest Engineering, Improving distributed caching performance and efficiency at Pinterest (2020) — production deployment notes on extstore and memcached tuning at scale. medium.com/pinterest-engineering.
  6. Salvatore Sanfilippo, Clarifications about Redis and Memcached (antirez blog, 2010) — the Redis author's own framing of where the two stores diverge. antirez.com/news/94.
  7. Redis: data structures as the product — internal reference for the Redis side of the comparison.