Note: Company names, engineers, incidents, numbers, and scaling scenarios in this article are hypothetical — even when they resemble real ones. See the full disclaimer.
In short
Redis ladders availability and scale in three rungs. Replication gives read scale-out and a manual-failover warm spare. Sentinel adds quorum-based automatic failover with 10–30 s recovery, but writes still go to one master. Cluster shards the keyspace across 16,384 hash slots so reads and writes scale horizontally — at the cost that multi-key commands only work within one slot, which is what hash tags {...} exist for.
By the time you finish this chapter, you will have stopped thinking about Redis as a single process and started thinking about it as a topology. The previous chapter (data structures as the product) gave you the substrate; the chapter before this (persistence) made sure that substrate survives a restart. This chapter makes sure it survives a node death, a network partition, and the day your traffic outgrows what one machine can write.
The framing that helps: do not pick a topology by reading a feature checklist. Pick it by writing down what fails when one machine dies, and what your write throughput ceiling is. Those two answers map almost mechanically to the three rungs of the ladder.
The thesis: a single Redis is a single point of failure
A standalone Redis instance, even one with both RDB snapshots and AOF persistence configured, fails for a long list of reasons that have nothing to do with Redis itself: the EC2 instance gets retired, the underlying physical host has a memory error, the disk fills up because someone forgot to rotate the AOF, the kernel OOM-killer kills the process, an fsync storm causes the AOF rewrite to stall reads for 90 seconds, a BGSAVE fork doubles the resident memory and the box swaps. Each of those is a real outage you will see at least once if you operate Redis for a year. Persistence solves "I lost my data" — it does not solve "my service is down for the 4 minutes it takes to spin up a new box, restore the AOF, warm the kernel page cache, and re-point my application at it".
The high-availability story is a separate story from the durability story. You need both. Persistence is what you reach for so that data survives a planned restart or a clean crash. Replication and its descendants are what you reach for so that the service survives an unplanned hardware failure with bounded downtime. Why these are independent: a perfectly persistent single Redis still has zero availability while you wait for the box to come back. A perfectly replicated Redis with no persistence loses data the moment the entire replica set crashes simultaneously (a power-grid event, a regional outage). The two failure modes are orthogonal — you defend against them separately, with different mechanisms.
There is also the second motivation, write throughput. A single Redis instance, single-threaded by design, tops out somewhere between 100K and 1M operations per second depending on command mix, payload size, and CPU. That is enormous for most use cases — a cart store for a mid-sized Indian e-commerce can run on one Redis box for years. But there are workloads where it is not enough: a real-time bidding ad exchange doing 5M lookups/second, a session store for a 200M-user app, a chat fan-out service. For those, vertical scale runs out and you need to shard the keyspace across multiple processes that each handle a fraction of the writes.
Replication and Sentinel handle the first problem (HA). Cluster handles both (HA and write scale by sharding). The next three sections walk each rung.
Rung 1: replication — master, replicas, async command stream
The simplest topology is one master and one or more replicas. The master accepts both reads and writes. Each replica connects to the master, performs an initial full sync (the master takes a snapshot, ships it to the replica, the replica loads it), and then receives every subsequent write command as a stream over the same connection. Replicas serve reads — clients can be configured to send read traffic to any replica — and the master is the single source of truth for writes.
The replication protocol has two phases. Initial sync (called a full resync in Redis terminology) happens when a replica connects for the first time or after losing the connection for too long: the master forks a child process to write an RDB snapshot, ships the snapshot to the replica, and the replica loads it. Partial resync is the optimised path for short disconnections: the master keeps a circular replication backlog of recent commands; if the replica reconnects within the backlog window, the master replays only the missed commands instead of doing a full snapshot. Why the backlog matters: a full resync of a 50 GB Redis instance takes minutes, doubles the master's memory during the fork, and saturates the network — you do not want one to fire because of a 30-second network blip. Tuning repl-backlog-size to roughly bandwidth × max-disconnect-window (e.g., 100 MB for a few minutes of catch-up at moderate write rates) is the standard ops move.
Replication is asynchronous by default, which is the central trade-off. The master acknowledges a write to the client before the replica has received the command. If the master dies one millisecond after acknowledging a write but before the command made it to any replica, that write is lost — the new master (whichever replica gets promoted) will not have it. This is acceptable for most cache and session use cases (you would rather lose a session than wait for a quorum write); it is unacceptable for, say, financial state, which is why Redis is rarely the system of record for money. Redis offers WAIT n timeout_ms to block until at least n replicas have acknowledged, which gives you a synchronous-ish mode at the cost of latency, but it is rarely used in production.
The use case for plain replication (no Sentinel, no Cluster) is a small or medium workload where you want read scale and a warm spare but accept that failover is a manual operation. A typical setup: 1 master + 2 replicas, all in the same AZ for low replication lag; reads from a Python pool that round-robins across the replicas; the application keeps the master address in config and an on-call engineer flips it on failover. This is fine for an internal dashboard cache. It is not fine for a customer-facing flash sale.
Rung 2: Sentinel — automatic failover via quorum
Sentinel is a small, separate Redis-shipped daemon (redis-sentinel) whose only job is to monitor a replication topology and perform automatic failover when the master dies. You run an odd number of sentinels — typically three or five — across separate failure domains (different AZs in a cloud, different racks on-prem). Each sentinel pings every Redis node every second, gossips with the other sentinels, and votes when something looks broken.
The reason for an odd number of sentinels with a majority quorum is the same reason every distributed-consensus system uses odd numbers: you want to tolerate split-brain. With three sentinels and a quorum of two, a network partition that isolates one sentinel still leaves a majority on the other side that can make decisions; the lone sentinel cannot, so two competing failovers are impossible. With five sentinels and a quorum of three, you tolerate two simultaneous sentinel failures. Why this matters operationally: never run two sentinels — a 1-1 split has no quorum, so no failover ever happens. Never run sentinels co-located with the Redis nodes only — a host failure takes down both a Redis node and its sentinel, halving your visibility. Spread sentinels across AZs/racks the same way you spread the Redis nodes themselves.
The client side of Sentinel is the part most people get wrong. A Sentinel-aware client does not connect directly to a Redis address from config; it connects to the sentinel pool, asks SENTINEL get-master-addr-by-name mymaster to discover the current master, and then opens a connection to that address. After a failover, the connection breaks; the client retries via the sentinels, gets the new master address, and reconnects. Every mainstream Redis client library (redis-py, lettuce for Java, ioredis for Node, go-redis) ships a SentinelConnectionPool or equivalent. If you skip this and just put the master IP in config, your application will not recover when the master moves, defeating the entire point of Sentinel.
A typical recovery timeline: the master goes down at T+0; sentinels notice within down-after-milliseconds (default 30s, often tuned down to 5s); quorum is reached within another second or two; the leader election takes another second; the failover itself (promote, reconfigure replicas) takes a few seconds. End-to-end: 10–30 seconds of write unavailability with default tuning, which is acceptable for most production caching and session workloads but not for hard real-time systems.
Sentinel does not shard. You still have one master taking all the writes — it just gets replaced automatically when it dies. Sentinel is the right answer when you need automatic HA and one master can comfortably handle your write rate. If your write rate outgrows one master, you need Cluster.
Rung 3: Redis Cluster — sharding via 16,384 hash slots
Redis Cluster is what you reach for when one master cannot keep up with the write load, or when you want HA and sharding without running both Sentinel and a separate sharding proxy. Cluster shards the keyspace into exactly 16,384 hash slots. Each key is mapped to a slot via CRC16(key) mod 16384. Slots are distributed across N master shards (e.g., with three shards, master A owns slots 0–5460, master B owns 5461–10922, master C owns 10923–16383). Each master shard runs its own replica set (1+ replicas) for shard-level HA. The cluster does its own automatic failover inside each shard — a failed master is replaced by one of its replicas without any Sentinel needed.
The 16,384-slot count is a fixed Redis design constant (chosen because it fits in a 2 KB bitmap that the cluster gossips between nodes — small enough that every node can carry the full slot map, large enough to allow fine-grained rebalancing across hundreds of shards). You do not configure it; you configure how slots are assigned to shards. Adding a fourth shard means migrating, say, slots 0–4095 from masters A/B/C to the new master D — the cluster supports this online, key by key, while serving traffic. Why fixed 16,384 not "consistent hashing" with a ring: a fixed slot count makes the routing table tiny (16K entries) and exact (every key maps to exactly one slot) instead of approximate. Clients can cache the full slot-to-shard map in a few KB and route every request directly to the correct shard with no extra hop. Migration is just "move slot N from shard X to shard Y" — no reshuffle, no replication factor maths.
The single biggest gotcha with Cluster is multi-key commands. MSET k1 v1 k2 v2 k3 v3 works only if all three keys hash to the same slot — which, for unrelated keys, almost never happens. The same is true for transactions (MULTI/EXEC), Lua scripts that read/write multiple keys, and the SINTERSTORE / ZUNIONSTORE-family commands. The fix is hash tags: if a key contains a substring inside {...}, only that substring is hashed. So cart:{user42}:items and cart:{user42}:total both hash on user42 and land on the same slot, and you can run MULTI / HGETALL cart:{user42}:items / GET cart:{user42}:total / EXEC atomically. The convention is "wrap the entity ID in braces in every key for that entity", which collocates everything for a given user/order/session on one shard.
Cluster's failover is not Sentinel-shaped but is conceptually similar: each master is monitored by the other masters via gossip; when a quorum of masters thinks a master is down, the failed master's replicas hold an election among themselves (the one with the freshest data wins) and one is promoted. No external sentinels are needed because the cluster nodes themselves play that role.
When you should not use Cluster: you are still small enough that one master is plenty (the operational overhead of Cluster is real — more nodes, slot management, hash-tag discipline in your code), or you have a workload that genuinely needs cross-key transactions across unrelated keys (rare in practice for cache/session work, common in OLTP — which is why you would not use Redis Cluster as a primary OLTP store anyway).
A worked example: cart store on the three rungs
Concrete is better than abstract. Here is the same Indian e-commerce cart store walked through all three rungs as it grows.
kirana.in cart store: from one Redis to a 3-shard Cluster
You operate the cart store for kirana.in. The cart for each user is one Redis HASH (cart:user42 → {SKU101: 2, SKU207: 1}), with a 7-day TTL. Reads on the cart page, writes on add-to-cart and quantity changes. Peak load: 50,000 writes/second across all users during a Republic Day flash sale.
Stage 1 — single Redis instance. One redis-server on a c6i.2xlarge (8 vCPU, 16 GB RAM). Persistence: RDB snapshot every 5 minutes, AOF with appendfsync everysec. Latency: p99 around 0.8 ms inside the VPC. Throughput at peak: 50K writes/s — comfortable; the box is at maybe 30% of one core. Failure mode: if the box dies, the cart service is down until a replacement is launched, the AOF restored, the IP re-pointed — about 4 minutes on a good day, 20 on a bad one. For a non-flagship store this is tolerable; on the day of the sale it costs lakhs of rupees per minute. Why one box still works at 50K/s: each cart op is a single HINCRBY or HGETALL on a small hash; per-op CPU cost is microseconds. The bottleneck is not Redis; it is the network card and the per-connection RTT, and you have plenty of headroom on both at this rate.
Stage 2 — Sentinel + 3 replicas. You add two replicas in the other two AZs and stand up three sentinels (one per AZ). The application is updated to use redis-py's Sentinel class:
from redis.sentinel import Sentinel
sentinel = Sentinel([('sent-a', 26379), ('sent-b', 26379), ('sent-c', 26379)],
socket_timeout=0.1)
master = sentinel.master_for('cart-master', socket_timeout=0.1)
master.hincrby(f'cart:{user_id}', sku, qty)
Now if the master EC2 instance is retired, sentinels detect it within 5 seconds (with down-after-milliseconds 5000), agree on the failure within a second or two, elect a replica, and the application reconnects — total cart-write outage about 10–15 seconds. Read traffic (the cart page loads) can also be sent to the replicas via sentinel.slave_for('cart-master'), freeing up master CPU. The peak write rate is still 50K/s — the master is still the only writer; replicas help reads, not writes. You bought availability, not write scale.
Stage 3 — Redis Cluster, 3 shards. The store grows. Diwali peak is now 150K writes/second, and one master cannot do that without queueing under bursts. You move to Redis Cluster with 3 master shards (each on a c6i.2xlarge), each shard with one replica in a different AZ. The slots split roughly: A owns 0–5460, B owns 5461–10922, C owns 10923–16383. Cart key changes from cart:user42 to cart:{user42} so every key for a given user lives on the same shard:
from redis.cluster import RedisCluster, ClusterNode
cluster = RedisCluster(startup_nodes=[ClusterNode('shard-a', 7000),
ClusterNode('shard-b', 7001),
ClusterNode('shard-c', 7002)])
cluster.hincrby(f'cart:{{{user_id}}}', sku, qty) # double braces escape Python f-string
Each user's cart and any auxiliary keys you introduce later (cart:{user42}:meta, cart:{user42}:applied_coupons) all hash on user42, all land on the same shard, all participate in the same MULTI/EXEC if you need atomicity. Peak write capacity: roughly 3× the single-shard rate, so about 150K/s comfortably with headroom. Each shard handles its own replica failover internally; no Sentinel runs.
The three rungs in numbers:
| Stage | Topology | Write capacity | Failover time | Operational complexity |
|---|---|---|---|---|
| 1 | single instance | 50K/s | manual, 4–20 min | low |
| 2 | Sentinel + 3 replicas | 50K/s (master-bound) | automatic, 10–30 s | medium |
| 3 | 3-shard Cluster | 150K/s (3× shards) | automatic, shard-local | high |
The progression is the canonical one. You do not start at Stage 3; you grow into it when the data tells you to. Stage 1 lasts as long as you can tolerate manual failover. Stage 2 lasts as long as one master can handle peak writes. Stage 3 is where you live once you outgrow that. Most products never need Stage 3, and that is a good thing: Cluster has real cost in code (hash tags), in ops (more nodes, slot map drift), and in observability (per-shard dashboards). Use it when you need it, not before.
The decision matrix
The whole chapter collapses into four bullets you can paste into a design doc:
- Single instance — use for dev, test, internal tools, anything where 5–20 minutes of downtime is acceptable. Persistence (RDB + AOF) protects data; nothing protects availability.
- Replication (master + N replicas, no Sentinel) — use when you need read scale-out and a warm spare but accept a manual flip on master death. Common for analytics-style read-heavy caches where writes are infrequent.
- Sentinel — use for production HA when one master is enough to handle peak write throughput. The default for the vast majority of customer-facing Redis workloads in 2026: cart stores, session stores, leaderboards, rate limiters serving a few hundred thousand to a few million users.
- Cluster — use when one master is not enough, when you need to scale writes horizontally and you can accept the multi-key constraint (or design around it with hash tags). The default for Redis at the scale of large social, ad-tech, or chat platforms.
A short note on managed services because most teams use them rather than rolling their own. AWS ElastiCache offers all three modes under different names: "cluster mode disabled with replication group" is Sentinel-shaped under the hood (multi-AZ, automatic failover, primary endpoint that updates on failover), and "cluster mode enabled" is Redis Cluster with sharding. Querion Memorystore and Azure Cache for Redis offer the same shape. Redis Enterprise (the commercial product from Redis Inc.) adds active-active geo-replication using CRDT semantics — useful if you need multi-region writes, which neither Sentinel nor open-source Cluster provide. Valkey has the same replication and Cluster semantics as Redis (since Valkey forked from Redis 7.2.4); managed Valkey on ElastiCache is, for replication purposes, indistinguishable from managed Redis.
KeyDB deserves a mention because it solves a slightly different problem. KeyDB has an experimental active-replica mode where two nodes both accept writes and continuously replicate to each other; conflicts are resolved last-write-wins. The benefit is simpler ops (no master/replica role, both nodes can take writes) and effectively zero-downtime failover because there is nothing to fail over to. The cost is the conflict-resolution semantics — if two clients update the same cart on the two nodes within the replication window (milliseconds), one update is silently dropped. That is fine for some workloads (idempotent counters, last-write-wins caches) and dangerous for others (anything where you need to read-modify-write a structured value). Most production deployments stick with the standard master-replica model and pay the failover cost rather than accept the lost-write risk.
What's next
You have now seen all the pieces. Persistence (chapter 170) keeps a single Redis from losing data; replication, Sentinel, and Cluster (this chapter) keep the service available across hardware and zone failures and let you scale writes horizontally. The next chapter, Memcached's minimalist design, looks at the road not taken — what Redis gives up by being feature-rich, and where the simpler ancestor is still the right tool. After that, caching patterns closes Build 22 with the patterns themselves: cache-aside, write-through, write-behind, and the thundering-herd problem that bites every team eventually.
By the end of Build 22, you will be able to look at any caching requirement and say, in two minutes, which topology to pick, which persistence mode to enable, which eviction policy fits, and which cache pattern your application should use. That is the level of fluency this build is aiming for.
Common confusions
- "Replication makes Redis durable, so I can skip persistence." Replication and persistence solve different problems. A correlated failure — power-grid event, regional outage, an
FLUSHALLtyped on the master — wipes every replica too. Persistence (RDB + AOF) is your line of defence against that; replication is your line of defence against single-node hardware loss. Run both. - "Sentinel shards the data." Sentinel does not shard. The whole keyspace still lives on one master; Sentinel only automates the choice of which node is the master. If your write rate exceeds one box, you need Cluster, not more sentinels.
- "Two sentinels are better than one." Two sentinels are worse than one. A 1-1 split has no majority, so no failover ever happens. Always run an odd number (3 or 5) across distinct failure domains so a quorum survives any single-AZ outage.
- "Redis Cluster is just consistent hashing with a different name." It is not a hash ring. Cluster uses a fixed table of 16,384 slots that every client and node caches in full; routing is exact, not probabilistic. Migration is "move slot N from shard X to Y", not a reshuffle of a virtual-node ring.
- "
MULTI/EXECworks in Cluster like it does on a single node." Only if every key in the transaction hashes to the same slot. Two unrelated keys almost always live on different shards, and the transaction is rejected withCROSSSLOT. Wrap the entity ID in{...}to force collocation. - "Async replication means I never lose acknowledged writes." You can lose them. The master acks the client before shipping the command to replicas; if the master dies in that window, the new master never sees the write. Use
WAIT n timeoutto block on replica acks if you cannot tolerate that, but understand the latency cost.
Going deeper
The replication backlog and partial resync
The replication backlog is a circular in-memory buffer on the master that holds the most recent stream of commands sent to replicas. When a replica disconnects briefly — a 10-second network blip, a process restart — it reconnects with its last replication offset; if that offset is still inside the backlog, the master replays only the missed commands (a partial resync). If the offset has fallen off the back of the ring, the master falls back to a full resync: fork, RDB snapshot, ship the whole dataset over the wire. Why sizing matters: a full resync of a 50 GB instance forks the master (RSS doubles temporarily), saturates the NIC for minutes, and stalls under heavy write load. The fix is repl-backlog-size = sustained_write_bandwidth × max_expected_disconnect_window. For a master writing 20 MB/s and replicas that may be gone for two minutes, that is 2.4 GB of backlog — cheap memory, expensive to skip.
Failover safety: min-replicas-to-write
By default a Redis master keeps accepting writes even when all its replicas are gone, which means a partition that isolates the master from its replicas can collect writes that will be lost when the partition heals and the master is demoted. The setting min-replicas-to-write 1 plus min-replicas-max-lag 10 flips this: the master refuses writes unless at least one replica is connected and within 10 seconds of lag. You trade availability for safety — during a network partition, the isolated side becomes read-only. Most production Sentinel setups enable this; it is the closest thing Redis has to a CP-mode toggle.
Hash tags and the cross-shard transaction problem
{...} is more than a convenience — it is a contract you embed in your key schema. The rule: pick one entity ID per logical group, wrap it in braces, and reuse the same wrapped tag across all keys for that entity. cart:{u42}:items, cart:{u42}:total, cart:{u42}:coupons all hash on u42 and land on the same shard, so a single MULTI/EXEC over them works. The anti-pattern is cart:{user42}:items and wishlist:{user42}:items — these also collocate, which sounds fine until shard balance goes lopsided because user 42 is also a power user with 200 wishlist entries. Hot-key skew is the cluster operator's recurring pain: one tenant's keys all land on shard B and shard B is now at 90% CPU while A and C are idle. The fix is per-feature hash tags (cart:{u42} for cart, wishlist:{u42:wl} for wishlist) so the load spreads.
Slot migration and online rebalancing
Adding a fourth shard to a 3-shard cluster is a slot migration, not a reshuffle. The cluster admin runs CLUSTER SETSLOT <slot> MIGRATING <target> on the source and CLUSTER SETSLOT <slot> IMPORTING <source> on the target; keys in that slot are then moved one by one with MIGRATE. During migration, requests for a not-yet-moved key get an ASK redirect (the next request only) and clients learn the new owner with a MOVED redirect once the slot is fully migrated. The whole dance happens online — clients see microseconds of extra latency on migrating slots and nothing on the rest of the keyspace. Production rebalances at PaisaBridge-scale Redis Cluster deployments routinely move thousands of slots per hour without an outage, which is the operational payoff for the fixed-slot design.
Active-active and multi-region writes
Open-source Redis Cluster is single-region: every write goes to the master of the relevant shard, and there is no built-in cross-region write merge. Redis Enterprise adds active-active using CRDTs (conflict-free replicated data types) where every region accepts writes locally and merges asynchronously; counters, sets, and last-write-wins strings have well-defined merge rules. KeyDB's active-replica mode is a simpler take — two nodes, both accept writes, last-write-wins on conflict — which is fine for idempotent counters and dangerous for read-modify-write patterns on structured values. If you genuinely need multi-region writes, evaluate the conflict semantics carefully; for most Indian e-commerce, single-region with cross-region async replication for DR is enough.
References
- Redis replication — the canonical reference for master-replica topology, partial resync, and the replication backlog.
- Redis Sentinel documentation — quorum, failover protocol, client discovery (
SENTINEL get-master-addr-by-name), and operational tuning ofdown-after-milliseconds. - Redis Cluster specification — the 16,384-slot design, CRC16 routing, hash tags
{...}, slot migration, and shard-level failover. - Valkey project home — the Linux Foundation fork; replication and cluster semantics are identical to Redis 7.2.
- AWS ElastiCache replication and failover — how the managed offering exposes single-instance, replication-group (Sentinel-shaped), and cluster-mode topologies.
- KeyDB active-replica mode — the alternative multi-master design, with last-write-wins conflict resolution.