In short
CAP is misquoted everywhere. The bumper-sticker version — "a distributed system can have at most two of consistency, availability, and partition tolerance" — is not what Brewer conjectured and not what Gilbert and Lynch proved. The honest statement is narrower: in the presence of a network partition, a distributed system can guarantee at most one of consistency or availability. When the network is healthy, you do not pick two of three; you get all three. CAP is a statement about what happens during a failure, not a menu you order from at design time.
The reason "pick two of three" is incoherent: P is not optional. If the network between your nodes never partitions, you do not have a distributed system — you have one node, which trivially gives you C and A. The instant you span more than one machine across more than one rack, partitions are a fact of life: a switch reboots, a fibre is cut, a kernel hangs for 30 seconds and the rest of the cluster declares it dead. So you do not choose P; the network chooses for you. The only choice you actually make is what your code does while a partition is happening.
PACELC (Daniel Abadi, 2012) names the other axis everyone secretly cares about. Even in the Else case (no partition), every system trades Latency against Consistency. Strong consistency requires a quorum write or a Paxos round, which costs a network round-trip. Weak consistency lets a single replica respond instantly. So a real classification has two letters: what you do under partition (PA or PC), and what you do otherwise (EL or EC). Spanner is PC/EC — it gives up availability under partition and pays the latency for strong reads always. Cassandra (default) is PA/EL — it stays available always and serves reads from the nearest replica. Most modern OLTP systems are PC/EL — strict during partition, fast otherwise.
The map of real systems is messier than a clean dichotomy. Strong/CP: Spanner, CockroachDB, FoundationDB, etcd, ZooKeeper, FaunaDB, MongoDB v4+ defaults. Eventual/AP: Cassandra, DynamoDB classic, Riak. Tunable per query or per table: DynamoDB global tables, Cosmos DB (5 levels), Cassandra (consistency level per statement), CockroachDB (AS OF SYSTEM TIME for stale reads). And underneath every "strong" system are pragmatic relaxations — bounded staleness for follower reads, read-your-writes per session, causal consistency, hinted handoff — that make strict CP/AP labels mostly fiction.
The decision framework, after all the theory, is one question: what does your application actually need during a five-minute network blip? Most consumer apps tolerate staleness fine — Twitter showing slightly old likes, Swiggy showing a price that updates a second late, WhatsApp delivering messages out of order for a moment. A bank's ledger does not. So you mix: a Spanner-class store for the ledger, a Cassandra-class store for the feed. CAP and PACELC give you the vocabulary to make those choices honestly. This chapter teaches you the vocabulary.
What CAP actually says
The CAP theorem was a conjecture by Eric Brewer in 2000, proved in a 2002 paper by Seth Gilbert and Nancy Lynch. The three letters in the proof have precise meanings, and they are not the meanings most engineers use.
Consistency (C) in CAP means linearizability — every read sees the result of the most recently completed write, as if the database were a single register that processes one operation at a time. This is the strictest meaning of "consistent" in the distributed systems literature; it is not the C in ACID, which is about preserving application-level invariants.
Availability (A) means every non-failing node responds to every request with a non-error response in finite time. Not "highly available" in the operational sense (99.99% uptime); a stronger, mathematical sense — no node refuses any request, ever.
Partition tolerance (P) means the system continues to operate when the network drops messages between nodes. Not "the system handles partitions gracefully"; the literal property that some messages between honest nodes can be lost without the system crashing.
The theorem is then: no system can guarantee all three of C, A, and P at the same time. Equivalently — and this is the form that matters in practice — when a partition occurs, a system must give up either C or A.
Why the bottom edge is not a real option: a "CA system" would be one that has both consistency and availability, and gives up partition tolerance. But "giving up partition tolerance" means assuming the network never partitions — which means you have one node, or you crash the entire cluster the instant any link flickers. Single-node Postgres is CA in this sense; so is your laptop's SQLite. Neither is what people mean by a distributed database. The instant you have two nodes that need to agree, the network can partition them, and CAP fires.
Why "pick two of three" is wrong
The phrase you have heard a hundred times — "you can have two of CAP, but not all three" — frames CAP as a buffet. It implies that some systems pick C and A, others pick C and P, others pick A and P. This is wrong on its own terms.
The theorem says no system can have all three. It does not say every system must have exactly two. A buggy system can have zero. More importantly, the framing pretends P is a design choice, when it is not. Partitions happen whether or not you plan for them. The only design choice is the response to a partition.
Brewer himself wrote a twelve-years-later reflection in 2012 clarifying that the original "2 of 3" was a simplification that "has been seen as misleading", and that in real systems "the choice between C and A can occur many times within the same system at very fine granularity — not only can subsystems make different choices, but the choice can change according to the operation or even the specific data or user involved." The right mental model is per-operation, not per-system.
Two concrete misuses to retire from your vocabulary:
"NoSQL is AP, SQL is CP." False on every axis. MongoDB since v4 defaults to primary-only writes with majority read concern, which is CP. Cassandra can be configured per-query with consistency levels from ONE (AP-ish) to ALL (CP-ish). MySQL with async replication is closer to AP (you can read stale data from a replica). PostgreSQL with synchronous replication and a single writeable primary is CP. The classification is per-system, per-config, per-operation; "NoSQL vs SQL" is a category error.
"MongoDB chose A in CAP." That was a marketing line from circa 2010, and it has been overtaken by reality. MongoDB v4+ uses Raft for replica-set consensus; writes with writeConcern: majority and reads with readConcern: majority give you linearizable behaviour at the cost of write availability during a partition that loses the majority. It is, by any honest reading, CP for its default workload. You can dial it down per-query if you want, but the default is no longer "available at all costs".
"CAP is dead." Not dead, just often misapplied. The theorem is a worst-case statement about a single failure mode (network partition). Most production systems most of the time are not partitioned, and the interesting trade-off they face minute-to-minute is the one PACELC names: latency vs consistency. CAP is necessary, not sufficient.
PACELC: the axis you actually feel every day
Daniel Abadi noticed in a 2010 blog post and 2012 IEEE Computer paper that CAP misses the day-to-day trade-off. If there is a Partition, you choose A or C. Else (no partition), you choose Latency or Consistency. Hence PACELC: PA/EL, PA/EC, PC/EL, PC/EC.
The reason latency and consistency trade off even without partitions is structural. To guarantee a read sees the latest write, the read must either go to the same node that took the write (a single primary, which then becomes a bottleneck) or to a quorum of nodes that collectively agree on the latest value (a Paxos / Raft round, which costs at least one network RTT, often more). Either way, strong consistency costs network round-trips. Weak consistency lets you serve a read from any local replica, possibly stale, in a few hundred microseconds.
Why PA/EC is rare: it would mean "stay available during partition (so accept conflicting writes) but somehow give strong consistency in the normal case." That requires you to merge divergent writes when the partition heals AND give linearizable reads when it does not — the merge logic forces you to weaken consistency in the normal case too, because a read might see a value that has not yet been merged with a concurrent partitioned write. Most systems that try this end up at PA/EL in practice. Riak's strong-consistency mode is the closest production example, and it is rarely used.
The map of real systems
The cleanest way to organise actual production databases is two axes: default consistency (eventual ↔ strong) and availability behaviour during partition (CP — sacrifices A vs AP — sacrifices C). Plot the systems and you see clusters.
A few of the entries deserve commentary because their classification is contested:
MongoDB. Pre-v3.2 it was AP-leaning (the old "eventually consistent secondaries, no Raft"). v3.2 introduced Raft-based replica sets; v3.6 added causal-consistency sessions; v4.0 added multi-document ACID transactions; v4.2 extended them across shards. In v4+, the default writeConcern: majority plus readConcern: majority on a replica set is CP — a partition that loses majority will refuse writes. You can still configure it down for weaker guarantees, but the default is CP for primary writes.
DynamoDB. Classic single-region DynamoDB is AP — eventual reads by default, strongly-consistent reads available as an option (which costs 2× the read capacity unit). Global tables (multi-region) are AP because of the cross-region merge. Recent transactional APIs (TransactWriteItems) give CP within a region for the items in the transaction. So DynamoDB is plural, not singular.
Cassandra. Defaults are AP/EL with consistency level = ONE for reads and writes. Set CL = QUORUM on both and it behaves like a CP system on top of a Dynamo-style ring; set CL = ALL and it is even stricter (and even less available). The classification is per-query, not per-cluster.
MySQL / PostgreSQL with replication. A single primary with synchronous replication to one replica is approximately CP — writes block if the replica is unreachable. Asynchronous replication is approximately AP — writes succeed even if the replica is gone, and a failover can lose the unreplicated tail. Group replication / multi-primary modes complicate this further. The honest answer is "it depends on what you set in my.cnf or postgresql.conf."
The pragmatic relaxations no theorem captures
Pure CP / pure AP is mostly fiction in production. Real systems layer in pragmatic guarantees that fall between strict linearizability and pure eventual consistency. Knowing these by name is half of senior distributed-systems engineering.
Bounded staleness. A read sees data that is at most T seconds old. CockroachDB's AS OF SYSTEM TIME '-5s' and DynamoDB's eventual reads are flavours of this. For most user-facing reads — a Twitter timeline, a product page, a transaction history — five seconds of staleness is invisible to humans and saves you a round-trip to the leaseholder.
Read-your-writes. A session always sees its own previous writes, even if other sessions might see slightly older state. Implemented by routing the same session's reads to the replica that just took the write, or by stamping reads with the session's last write timestamp. Most apps need this for correctness ("after I post a tweet, my own page must show it") but do not need full linearizability.
Causal consistency. If write A causally precedes write B (on any session, anywhere), all readers that see B also see A. Strictly weaker than linearizability, much cheaper, sufficient for most messaging systems and collaborative editors. MongoDB sessions provide this.
Hinted handoff and read-repair. AP systems heal: if a node was down during a write, a hint is left for it on a peer; when it comes back, the hint is replayed. Reads that detect divergence between replicas trigger background repair. The system is eventually consistent in a real, observable sense — usually within seconds.
Pat Helland's Memories, Guesses, and Apologies frames the whole thing brilliantly: most distributed systems give you guesses (your local replica's current best knowledge) and an apology mechanism for when the guess turns out wrong. CAP and PACELC tell you what kind of guess and what kind of apology you have signed up for.
The decision framework
After all the theory, the actionable question reduces to:
During a five-minute network blip between my regions, do my users prefer (a) the app refuses writes, or (b) the app accepts writes and might show inconsistent state?
For most consumer apps, the honest answer is (b). A WhatsApp message sent during a partition between two cell towers should still go through, even if the recipient sees it slightly out of order. A Swiggy order placed while the menu replica was momentarily stale should still succeed; you can refund or re-quote rather than refuse. Hard CP for these would mean a worse user experience for an event your users do not even perceive as a failure.
For ledgers and identity, the answer is (a). A bank cannot debit an account if it cannot also credit the counterparty atomically; an authorisation system cannot grant access if it cannot verify revocation. CP, with the resulting refusals during partition, is the only honest choice.
Most architectures are both, layered. The next worked example shows what that looks like.
**Worked example: a UPI payments app**
You are designing the backend for an Indian UPI payments app — think PhonePe, Google Pay, Paytm. The app does many things: holds the ledger, shows transaction history, runs search, displays balance, sends notifications, runs fraud detection. Each surface has a different CAP / PACELC profile, and a single-store design will either be too strict (slow everywhere) or too loose (incorrect on the ledger).
The ledger — the actual debit and credit. This is the irreducible CP/EC core. When you tap Pay ₹500 to Ravi, two things must happen atomically: ₹500 leaves your account, ₹500 arrives in Ravi's account. If a partition splits the cluster mid-transaction, the system must refuse the write rather than risk a debit-without-credit (or, almost worse, a credit-without-debit that you would discover only on monthly reconciliation). Every regulator, NPCI included, audits for this. Pick a Spanner-class store: Spanner if you are willing to pay GCP, CockroachDB or YugabyteDB if you want to self-host, FoundationDB if you want the bare KV with transactions and roll your own SQL on top. Default to SERIALIZABLE isolation; do not be tempted by READ COMMITTED here.
Transaction history view. Users open the app to see "what did I pay last week". This is read-heavy, append-only from the ledger's perspective, and a few hundred milliseconds of staleness is fine — if you just paid Ravi five seconds ago, the row appearing in your history a second later is invisible. Pick a Cassandra-class store, populated by an asynchronous CDC stream from the ledger. PA/EL, eventual consistency, served from the nearest region's replica for sub-50 ms p99 reads. If a partition cuts off Mumbai from Bengaluru, both regions keep serving history reads from their local copy.
Balance display. This is the trickiest. The user wants to see their current balance, but "current" can mean different things. Strict CP would route every balance query to the ledger, paying a quorum read every time the user opens the app — expensive, and unnecessary. The pragmatic answer is bounded staleness with read-your-writes: serve from a cache or follower replica that is at most a few seconds behind the ledger, and if this session has written a transaction in the last few seconds, route the read to a replica that has applied it. CockroachDB's follower reads with stickiness, or a Redis cache populated by ledger CDC with a "last write wins for this session" override. This gives the user their actual balance after their own payments, and a slightly stale (but converging) balance otherwise.
Search across counterparties. Type "Ravi" to find Ravi's UPI ID. This is approximate, eventual, and tolerates significant staleness. Pick Elasticsearch / OpenSearch, fed asynchronously from the user table. Pure AP/EL.
Fraud detection. Real-time scoring on each transaction. Needs some consistency (you do not want to skip a check because the model's feature store was stale and missed a flagged account), but not strict linearizability. Bounded staleness on the feature store (a few seconds), with a fallback rule that when the feature store is unreachable, the transaction is held for manual review. PC for the blocking path (the actual hold), EL for the feature read.
Notifications. "Ravi received ₹500 from you." Pure AP/EL — the message can arrive a second late, and if it never arrives, the user can pull-to-refresh the history. A queue (Kafka, SQS) with at-least-once delivery is the right tool; you handle duplicates with idempotency keys.
The picture, then. Your single "UPI app" is, behind the curtain, five or six different stores, each chosen for the consistency / availability / latency profile of one operation. Spanner-class for the ledger, Cassandra-class for history, a Redis or follower-read layer for balance, OpenSearch for search, a feature store for fraud, a queue for notifications. The CAP / PACELC trade-off is made per-operation, not per-app.
The pattern generalises: find the smallest core of operations that genuinely require strict consistency, isolate them in a CP store, and let everything else live in cheaper AP-or-tunable stores fed from that core. The ledger is small (a few writes per second per user). The fan-out — history, search, notifications, analytics — is huge. Letting the fan-out run on a CP store would be paying linearizability tax on every query when the user does not need it; letting the ledger run on an AP store would be lying to your users about their money.
Stop arguing about CAP
After ten years of "CAP is dead" posts and another ten of "actually it is misunderstood" rebuttals, the productive position is: CAP is a precise theorem about a precise corner case (network partition). It does not tell you what to build. It tells you what trade-off appears at one specific failure mode. PACELC tells you what trade-off appears the rest of the time. Together they give you vocabulary for the actual conversation, which is per-operation: what does this query, this write, this surface need?
The next time someone tells you "X is a CP system" or "Y chose A over C", the right reply is: "Per which operation? At which consistency level? With which read concern? In which configuration?" The answer is almost never one letter. The answer for the system you build will not be one letter either. That is fine — that is the discipline you have learned.
References
- Brewer, E. (2000). Towards Robust Distributed Systems. PODC keynote. PDF — the original conjecture.
- Gilbert, S. & Lynch, N. (2002). Brewer's Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services. SIGACT News. PDF — the formal proof.
- Brewer, E. (2012). CAP Twelve Years Later: How the "Rules" Have Changed. IEEE Computer / InfoQ. Article — Brewer's own retraction of the "2 of 3" framing.
- Abadi, D. (2012). Consistency Tradeoffs in Modern Distributed Database System Design. IEEE Computer. PDF — the PACELC paper.
- Helland, P. (2015). Memories, Guesses, and Apologies. Blog summary — the "guess and apologise" mental model that frames CAP pragmatically.