In short
A single Redis is a single point of failure — one process, one box, one VM that the cloud can retire without warning. Redis ships three answers to that, in order of increasing weight. Replication (master + N replicas with async command propagation) gives you read scale-out and a warm spare for manual failover; the master is still the only writer, so write throughput does not scale. Sentinel layers automatic failover on top: a fleet of 3+ sentinel processes pings the master, agrees by quorum that it is dead, elects a replica as the new master, reconfigures the others, and tells clients the new address — typical recovery is 10–30 seconds with no human involvement. Redis Cluster shards the keyspace across N master shards using 16,384 hash slots (CRC16(key) mod 16384 picks the slot, slots are assigned to shards), with each shard carrying its own replicas and its own automatic failover; this scales both reads and writes horizontally, at the cost that multi-key commands (MSET, MGET, transactions, Lua scripts that touch multiple keys) only work when all the keys live in the same slot — which is what hash tags {...} exist for (every key matching cart:{user42}:* hashes the same slot, so a user's keys stay together). The decision matrix is short: dev/test → single instance; read scale → replication; production HA without sharding → Sentinel; production HA and horizontal write scaling → Cluster. Most managed offerings (AWS ElastiCache, Google Memorystore, Azure Cache for Redis) hide all three modes behind a knob: "single node" / "replication group with multi-AZ" (Sentinel-shaped under the hood) / "cluster mode enabled" (Cluster). Real systems in the family: Redis (canonical), Valkey (Linux Foundation fork, identical replication/cluster story), AWS ElastiCache (managed Redis or Valkey), Redis Enterprise (commercial with active-active geo replication), KeyDB (a fork that adds an experimental multi-master mode where two masters accept writes and merge — simpler than Cluster, but with last-write-wins conflict resolution that you must accept).
By the time you finish this chapter, you will have stopped thinking about Redis as a single process and started thinking about it as a topology. The previous chapter (data structures as the product) gave you the substrate; the chapter before this (persistence) made sure that substrate survives a restart. This chapter makes sure it survives a node death, a network partition, and the day your traffic outgrows what one machine can write.
The framing that helps: do not pick a topology by reading a feature checklist. Pick it by writing down what fails when one machine dies, and what your write throughput ceiling is. Those two answers map almost mechanically to the three rungs of the ladder.
The thesis: a single Redis is a single point of failure
A standalone Redis instance, even one with both RDB snapshots and AOF persistence configured, fails for a long list of reasons that have nothing to do with Redis itself: the EC2 instance gets retired, the underlying physical host has a memory error, the disk fills up because someone forgot to rotate the AOF, the kernel OOM-killer kills the process, an fsync storm causes the AOF rewrite to stall reads for 90 seconds, a BGSAVE fork doubles the resident memory and the box swaps. Each of those is a real outage you will see at least once if you operate Redis for a year. Persistence solves "I lost my data" — it does not solve "my service is down for the 4 minutes it takes to spin up a new box, restore the AOF, warm the kernel page cache, and re-point my application at it".
The high-availability story is a separate story from the durability story. You need both. Persistence is what you reach for so that data survives a planned restart or a clean crash. Replication and its descendants are what you reach for so that the service survives an unplanned hardware failure with bounded downtime. Why these are independent: a perfectly persistent single Redis still has zero availability while you wait for the box to come back. A perfectly replicated Redis with no persistence loses data the moment the entire replica set crashes simultaneously (a power-grid event, a regional outage). The two failure modes are orthogonal — you defend against them separately, with different mechanisms.
There is also the second motivation, write throughput. A single Redis instance, single-threaded by design, tops out somewhere between 100K and 1M operations per second depending on command mix, payload size, and CPU. That is enormous for most use cases — a cart store for a mid-sized Indian e-commerce can run on one Redis box for years. But there are workloads where it is not enough: a real-time bidding ad exchange doing 5M lookups/second, a session store for a 200M-user app, a chat fan-out service. For those, vertical scale runs out and you need to shard the keyspace across multiple processes that each handle a fraction of the writes.
Replication and Sentinel handle the first problem (HA). Cluster handles both (HA and write scale by sharding). The next three sections walk each rung.
Rung 1: replication — master, replicas, async command stream
The simplest topology is one master and one or more replicas. The master accepts both reads and writes. Each replica connects to the master, performs an initial full sync (the master takes a snapshot, ships it to the replica, the replica loads it), and then receives every subsequent write command as a stream over the same connection. Replicas serve reads — clients can be configured to send read traffic to any replica — and the master is the single source of truth for writes.
The replication protocol has two phases. Initial sync (called a full resync in Redis terminology) happens when a replica connects for the first time or after losing the connection for too long: the master forks a child process to write an RDB snapshot, ships the snapshot to the replica, and the replica loads it. Partial resync is the optimised path for short disconnections: the master keeps a circular replication backlog of recent commands; if the replica reconnects within the backlog window, the master replays only the missed commands instead of doing a full snapshot. Why the backlog matters: a full resync of a 50 GB Redis instance takes minutes, doubles the master's memory during the fork, and saturates the network — you do not want one to fire because of a 30-second network blip. Tuning repl-backlog-size to roughly bandwidth × max-disconnect-window (e.g., 100 MB for a few minutes of catch-up at moderate write rates) is the standard ops move.
Replication is asynchronous by default, which is the central trade-off. The master acknowledges a write to the client before the replica has received the command. If the master dies one millisecond after acknowledging a write but before the command made it to any replica, that write is lost — the new master (whichever replica gets promoted) will not have it. This is acceptable for most cache and session use cases (you would rather lose a session than wait for a quorum write); it is unacceptable for, say, financial state, which is why Redis is rarely the system of record for money. Redis offers WAIT n timeout_ms to block until at least n replicas have acknowledged, which gives you a synchronous-ish mode at the cost of latency, but it is rarely used in production.
The use case for plain replication (no Sentinel, no Cluster) is a small or medium workload where you want read scale and a warm spare but accept that failover is a manual operation. A typical setup: 1 master + 2 replicas, all in the same AZ for low replication lag; reads from a Python pool that round-robins across the replicas; the application keeps the master address in config and an on-call engineer flips it on failover. This is fine for an internal dashboard cache. It is not fine for a customer-facing flash sale.
Rung 2: Sentinel — automatic failover via quorum
Sentinel is a small, separate Redis-shipped daemon (redis-sentinel) whose only job is to monitor a replication topology and perform automatic failover when the master dies. You run an odd number of sentinels — typically three or five — across separate failure domains (different AZs in a cloud, different racks on-prem). Each sentinel pings every Redis node every second, gossips with the other sentinels, and votes when something looks broken.
The reason for an odd number of sentinels with a majority quorum is the same reason every distributed-consensus system uses odd numbers: you want to tolerate split-brain. With three sentinels and a quorum of two, a network partition that isolates one sentinel still leaves a majority on the other side that can make decisions; the lone sentinel cannot, so two competing failovers are impossible. With five sentinels and a quorum of three, you tolerate two simultaneous sentinel failures. Why this matters operationally: never run two sentinels — a 1-1 split has no quorum, so no failover ever happens. Never run sentinels co-located with the Redis nodes only — a host failure takes down both a Redis node and its sentinel, halving your visibility. Spread sentinels across AZs/racks the same way you spread the Redis nodes themselves.
The client side of Sentinel is the part most people get wrong. A Sentinel-aware client does not connect directly to a Redis address from config; it connects to the sentinel pool, asks SENTINEL get-master-addr-by-name mymaster to discover the current master, and then opens a connection to that address. After a failover, the connection breaks; the client retries via the sentinels, gets the new master address, and reconnects. Every mainstream Redis client library (redis-py, lettuce for Java, ioredis for Node, go-redis) ships a SentinelConnectionPool or equivalent. If you skip this and just put the master IP in config, your application will not recover when the master moves, defeating the entire point of Sentinel.
A typical recovery timeline: the master goes down at T+0; sentinels notice within down-after-milliseconds (default 30s, often tuned down to 5s); quorum is reached within another second or two; the leader election takes another second; the failover itself (promote, reconfigure replicas) takes a few seconds. End-to-end: 10–30 seconds of write unavailability with default tuning, which is acceptable for most production caching and session workloads but not for hard real-time systems.
Sentinel does not shard. You still have one master taking all the writes — it just gets replaced automatically when it dies. Sentinel is the right answer when you need automatic HA and one master can comfortably handle your write rate. If your write rate outgrows one master, you need Cluster.
Rung 3: Redis Cluster — sharding via 16,384 hash slots
Redis Cluster is what you reach for when one master cannot keep up with the write load, or when you want HA and sharding without running both Sentinel and a separate sharding proxy. Cluster shards the keyspace into exactly 16,384 hash slots. Each key is mapped to a slot via CRC16(key) mod 16384. Slots are distributed across N master shards (e.g., with three shards, master A owns slots 0–5460, master B owns 5461–10922, master C owns 10923–16383). Each master shard runs its own replica set (1+ replicas) for shard-level HA. The cluster does its own automatic failover inside each shard — a failed master is replaced by one of its replicas without any Sentinel needed.
The 16,384-slot count is a fixed Redis design constant (chosen because it fits in a 2 KB bitmap that the cluster gossips between nodes — small enough that every node can carry the full slot map, large enough to allow fine-grained rebalancing across hundreds of shards). You do not configure it; you configure how slots are assigned to shards. Adding a fourth shard means migrating, say, slots 0–4095 from masters A/B/C to the new master D — the cluster supports this online, key by key, while serving traffic. Why fixed 16,384 not "consistent hashing" with a ring: a fixed slot count makes the routing table tiny (16K entries) and exact (every key maps to exactly one slot) instead of approximate. Clients can cache the full slot-to-shard map in a few KB and route every request directly to the correct shard with no extra hop. Migration is just "move slot N from shard X to shard Y" — no reshuffle, no replication factor maths.
The single biggest gotcha with Cluster is multi-key commands. MSET k1 v1 k2 v2 k3 v3 works only if all three keys hash to the same slot — which, for unrelated keys, almost never happens. The same is true for transactions (MULTI/EXEC), Lua scripts that read/write multiple keys, and the SINTERSTORE / ZUNIONSTORE-family commands. The fix is hash tags: if a key contains a substring inside {...}, only that substring is hashed. So cart:{user42}:items and cart:{user42}:total both hash on user42 and land on the same slot, and you can run MULTI / HGETALL cart:{user42}:items / GET cart:{user42}:total / EXEC atomically. The convention is "wrap the entity ID in braces in every key for that entity", which collocates everything for a given user/order/session on one shard.
Cluster's failover is not Sentinel-shaped but is conceptually similar: each master is monitored by the other masters via gossip; when a quorum of masters thinks a master is down, the failed master's replicas hold an election among themselves (the one with the freshest data wins) and one is promoted. No external sentinels are needed because the cluster nodes themselves play that role.
When you should not use Cluster: you are still small enough that one master is plenty (the operational overhead of Cluster is real — more nodes, slot management, hash-tag discipline in your code), or you have a workload that genuinely needs cross-key transactions across unrelated keys (rare in practice for cache/session work, common in OLTP — which is why you would not use Redis Cluster as a primary OLTP store anyway).
A worked example: cart store on the three rungs
Concrete is better than abstract. Here is the same Indian e-commerce cart store walked through all three rungs as it grows.
kirana.in cart store: from one Redis to a 3-shard Cluster
You operate the cart store for kirana.in. The cart for each user is one Redis HASH (cart:user42 → {SKU101: 2, SKU207: 1}), with a 7-day TTL. Reads on the cart page, writes on add-to-cart and quantity changes. Peak load: 50,000 writes/second across all users during a Republic Day flash sale.
Stage 1 — single Redis instance. One redis-server on a c6i.2xlarge (8 vCPU, 16 GB RAM). Persistence: RDB snapshot every 5 minutes, AOF with appendfsync everysec. Latency: p99 around 0.8 ms inside the VPC. Throughput at peak: 50K writes/s — comfortable; the box is at maybe 30% of one core. Failure mode: if the box dies, the cart service is down until a replacement is launched, the AOF restored, the IP re-pointed — about 4 minutes on a good day, 20 on a bad one. For a non-flagship store this is tolerable; on the day of the sale it costs lakhs of rupees per minute. Why one box still works at 50K/s: each cart op is a single HINCRBY or HGETALL on a small hash; per-op CPU cost is microseconds. The bottleneck is not Redis; it is the network card and the per-connection RTT, and you have plenty of headroom on both at this rate.
Stage 2 — Sentinel + 3 replicas. You add two replicas in the other two AZs and stand up three sentinels (one per AZ). The application is updated to use redis-py's Sentinel class:
from redis.sentinel import Sentinel
sentinel = Sentinel([('sent-a', 26379), ('sent-b', 26379), ('sent-c', 26379)],
socket_timeout=0.1)
master = sentinel.master_for('cart-master', socket_timeout=0.1)
master.hincrby(f'cart:{user_id}', sku, qty)
Now if the master EC2 instance is retired, sentinels detect it within 5 seconds (with down-after-milliseconds 5000), agree on the failure within a second or two, elect a replica, and the application reconnects — total cart-write outage about 10–15 seconds. Read traffic (the cart page loads) can also be sent to the replicas via sentinel.slave_for('cart-master'), freeing up master CPU. The peak write rate is still 50K/s — the master is still the only writer; replicas help reads, not writes. You bought availability, not write scale.
Stage 3 — Redis Cluster, 3 shards. The store grows. Diwali peak is now 150K writes/second, and one master cannot do that without queueing under bursts. You move to Redis Cluster with 3 master shards (each on a c6i.2xlarge), each shard with one replica in a different AZ. The slots split roughly: A owns 0–5460, B owns 5461–10922, C owns 10923–16383. Cart key changes from cart:user42 to cart:{user42} so every key for a given user lives on the same shard:
from redis.cluster import RedisCluster, ClusterNode
cluster = RedisCluster(startup_nodes=[ClusterNode('shard-a', 7000),
ClusterNode('shard-b', 7001),
ClusterNode('shard-c', 7002)])
cluster.hincrby(f'cart:{{{user_id}}}', sku, qty) # double braces escape Python f-string
Each user's cart and any auxiliary keys you introduce later (cart:{user42}:meta, cart:{user42}:applied_coupons) all hash on user42, all land on the same shard, all participate in the same MULTI/EXEC if you need atomicity. Peak write capacity: roughly 3× the single-shard rate, so about 150K/s comfortably with headroom. Each shard handles its own replica failover internally; no Sentinel runs.
The three rungs in numbers:
| Stage | Topology | Write capacity | Failover time | Operational complexity |
|---|---|---|---|---|
| 1 | single instance | 50K/s | manual, 4–20 min | low |
| 2 | Sentinel + 3 replicas | 50K/s (master-bound) | automatic, 10–30 s | medium |
| 3 | 3-shard Cluster | 150K/s (3× shards) | automatic, shard-local | high |
The progression is the canonical one. You do not start at Stage 3; you grow into it when the data tells you to. Stage 1 lasts as long as you can tolerate manual failover. Stage 2 lasts as long as one master can handle peak writes. Stage 3 is where you live once you outgrow that. Most products never need Stage 3, and that is a good thing: Cluster has real cost in code (hash tags), in ops (more nodes, slot map drift), and in observability (per-shard dashboards). Use it when you need it, not before.
The decision matrix
The whole chapter collapses into four bullets you can paste into a design doc:
- Single instance — use for dev, test, internal tools, anything where 5–20 minutes of downtime is acceptable. Persistence (RDB + AOF) protects data; nothing protects availability.
- Replication (master + N replicas, no Sentinel) — use when you need read scale-out and a warm spare but accept a manual flip on master death. Common for analytics-style read-heavy caches where writes are infrequent.
- Sentinel — use for production HA when one master is enough to handle peak write throughput. The default for the vast majority of customer-facing Redis workloads in 2026: cart stores, session stores, leaderboards, rate limiters serving a few hundred thousand to a few million users.
- Cluster — use when one master is not enough, when you need to scale writes horizontally and you can accept the multi-key constraint (or design around it with hash tags). The default for Redis at the scale of large social, ad-tech, or chat platforms.
A short note on managed services because most teams use them rather than rolling their own. AWS ElastiCache offers all three modes under different names: "cluster mode disabled with replication group" is Sentinel-shaped under the hood (multi-AZ, automatic failover, primary endpoint that updates on failover), and "cluster mode enabled" is Redis Cluster with sharding. Google Memorystore and Azure Cache for Redis offer the same shape. Redis Enterprise (the commercial product from Redis Inc.) adds active-active geo-replication using CRDT semantics — useful if you need multi-region writes, which neither Sentinel nor open-source Cluster provide. Valkey has the same replication and Cluster semantics as Redis (since Valkey forked from Redis 7.2.4); managed Valkey on ElastiCache is, for replication purposes, indistinguishable from managed Redis.
KeyDB deserves a mention because it solves a slightly different problem. KeyDB has an experimental active-replica mode where two nodes both accept writes and continuously replicate to each other; conflicts are resolved last-write-wins. The benefit is simpler ops (no master/replica role, both nodes can take writes) and effectively zero-downtime failover because there is nothing to fail over to. The cost is the conflict-resolution semantics — if two clients update the same cart on the two nodes within the replication window (milliseconds), one update is silently dropped. That is fine for some workloads (idempotent counters, last-write-wins caches) and dangerous for others (anything where you need to read-modify-write a structured value). Most production deployments stick with the standard master-replica model and pay the failover cost rather than accept the lost-write risk.
What's next
You have now seen all the pieces. Persistence (chapter 170) keeps a single Redis from losing data; replication, Sentinel, and Cluster (this chapter) keep the service available across hardware and zone failures and let you scale writes horizontally. The next chapter, Memcached's minimalist design, looks at the road not taken — what Redis gives up by being feature-rich, and where the simpler ancestor is still the right tool. After that, caching patterns closes Build 22 with the patterns themselves: cache-aside, write-through, write-behind, and the thundering-herd problem that bites every team eventually.
By the end of Build 22, you will be able to look at any caching requirement and say, in two minutes, which topology to pick, which persistence mode to enable, which eviction policy fits, and which cache pattern your application should use. That is the level of fluency this build is aiming for.
References
- Redis replication — the canonical reference for master-replica topology, partial resync, and the replication backlog.
- Redis Sentinel documentation — quorum, failover protocol, client discovery (
SENTINEL get-master-addr-by-name), and operational tuning ofdown-after-milliseconds. - Redis Cluster specification — the 16,384-slot design, CRC16 routing, hash tags
{...}, slot migration, and shard-level failover. - Valkey project home — the Linux Foundation fork; replication and cluster semantics are identical to Redis 7.2.
- AWS ElastiCache replication and failover — how the managed offering exposes single-instance, replication-group (Sentinel-shaped), and cluster-mode topologies.
- KeyDB active-replica mode — the alternative multi-master design, with last-write-wins conflict resolution.