Note: Company names, engineers, incidents, numbers, and scaling scenarios in this article are hypothetical — even when they resemble real ones. See the full disclaimer.

PACELC — what CAP forgot to ask

It is 14:02 on a Tuesday at PaySetu. No partition. No alerts. The network between the Mumbai and Chennai availability zones is healthy, RTT is 28 ms, packet loss is zero. The on-call dashboard is green. Yet the latency SLO for the wallet-balance read API just blew past 200 ms p99 for the third time this week, and the database team is being asked why a "consistent" read of a single row needs a cross-region round trip when the row is replicated locally. The answer is not in the CAP triangle. CAP says nothing about life on a healthy network — it only fires when the partition arrives. PACELC names the dial that turns 99% of the time the partition is not happening: latency versus consistency.

PACELC says: if Partition (P), pick Availability (A) or Consistency (C); else (E), pick Latency (L) or Consistency (C). The PC/EL arm is what most production database tuning is actually about — the cost of a strongly-consistent read on a replicated store is a quorum round-trip, and you spend that cost or you don't on every single query, not just during partitions. PACELC strictly subsumes CAP and is the more honest framework for picking a database in 2026.

What PACELC actually says

Daniel Abadi proposed PACELC in a 2010 blog post and formalised it in IEEE Computer in 2012, after a decade of watching engineering teams misuse CAP. Read the formulation slowly:

If there is a Partition, the system must choose between Availability and Consistency. Else (no partition), the system must still choose between Latency and Consistency.

The first arm is just CAP, repackaged. The second arm — ELC — is the contribution. Outside a partition, every distributed read or write that wants to be linearizable must pay a coordination cost: contact a quorum of replicas, wait for the slowest reply, then return. That cost is bounded below by the network round-trip time between the replicas. If your replicas are 28 ms apart, a strongly-consistent read costs at least 28 ms. The alternative is to read locally — fast, but possibly stale by however long replication is behind. No partition is happening. The trade-off is happening anyway.

Most production databases live in the ELC arm 99% of the time. Partitions are rare — measured in seconds-per-month for a well-engineered intra-region cluster, minutes-per-quarter for cross-region. But every read, every write, every transaction, all day, every day, pays a latency-vs-consistency tax. PACELC names this tax. CAP does not.

Why CAP missed this: Brewer's 2000 keynote framed the impossibility around the partition as the dramatic moment — and it is dramatic, because the system is forced to error or diverge. But the daily-life cost of consistency on a healthy network is the bigger production line-item. A bank's wallet-balance API serves 50,000 reads per second, 86,400 seconds a day, on a healthy network. If each read pays 28 ms of cross-region quorum latency, that is the entire latency budget. CAP makes this invisible because CAP only describes partition-time behaviour; PACELC makes it the explicit second axis.

The four PACELC categories that most real systems land in:

  • PA/EL — when partitioned, give up consistency for availability; when healthy, give up consistency for latency. Examples: Cassandra (default), DynamoDB (default), Riak. The whole system is tuned for low latency and accepts staleness.
  • PC/EC — when partitioned, give up availability for consistency; when healthy, give up latency for consistency. Examples: Spanner, etcd, ZooKeeper, CockroachDB at default isolation. Always linearizable, pays the round-trip cost on every read.
  • PA/EC — when partitioned, give up consistency for availability; when healthy, pay for consistency. Rare. The classic example is MongoDB with writeConcern: majority — strong on writes, can read stale during partitions.
  • PC/EL — when partitioned, give up availability for consistency; when healthy, give up consistency for latency. Examples: PNUTS (Yahoo's geo-distributed store), CockroachDB with follower reads enabled, DynamoDB with strongly-consistent reads opted-into per-call.
PACELC 2x2 grid showing the four categories with example systemsA 2x2 grid with the partition arm (PA vs PC) on one axis and the else arm (EL vs EC) on the other. Four cells contain example systems. PA/EL: Cassandra, DynamoDB, Riak. PC/EL: PNUTS, CockroachDB-follower-reads. PA/EC: MongoDB-majority. PC/EC: Spanner, etcd, ZooKeeper. The diagram is illustrative. The PACELC grid — four real production stances During a partition (P) PA — keep available, lose consistency PC — keep consistent, lose availability Healthy (E) EL — low latency, stale OK EC — pay quorum cost, strong everywhere PA/EL Cassandra (default) DynamoDB (default) Riak "fast and loose, always" PC/EL PNUTS CockroachDB w/ follower reads DynamoDB (default reads) "fast normally, refuse on partition" PA/EC MongoDB w/ majority writes (rare combination) "strong normally, diverge on partition" PC/EC Spanner etcd, ZooKeeper CockroachDB (default) "strong always, pay every time" Illustrative. Most systems can be tuned across cells per query — the cell is the default, not a permanent label.
Spanner pays cross-region quorum latency on every committed write so that even on a healthy network it remains linearizable. Cassandra at CONSISTENCY_LEVEL=ONE reads from one local replica and accepts whatever staleness that costs. Most production databases sit somewhere in this grid by default and let you opt out per-call.

The latency cost of consistency, measured

The ELC trade-off is concrete: a strongly-consistent read costs at least one round-trip to a quorum of replicas. If your replicas are spread across availability zones, that is intra-region RTT (~1 ms). Across regions, that is WAN RTT (~28 ms within India, ~180 ms cross-continent). Local stale reads cost zero network. The simulation below makes the cost visible.

# pacelc_sim.py — measuring the EL vs EC tail under no partition
import random, statistics, time

# Three replicas. RTTs are realistic for an ap-south-1 multi-AZ deployment.
RTT_MS = {("A","B"): 1.2, ("A","C"): 1.4, ("B","C"): 1.1,
          ("B","A"): 1.2, ("C","A"): 1.4, ("C","B"): 1.1}

def quorum_read_latency(coord, replicas):
    # Strongly-consistent read: contact quorum, wait for slowest of the fastest 2.
    rtts = sorted(RTT_MS[(coord, p)] + random.gauss(0, 0.3) for p in replicas if p != coord)
    return max(0.05, rtts[0])  # majority = self + 1 other; wait for that one ack

def local_read_latency(coord):
    # Eventually-consistent read: just read local memory, no network.
    return 0.05 + abs(random.gauss(0, 0.02))  # ~50 us, tiny jitter

def staleness_ms_for_local_read(replication_lag_ms):
    # Replication lag determines the window in which a local read can be stale.
    return replication_lag_ms

if __name__ == "__main__":
    random.seed(42)
    ec_lats = [quorum_read_latency("A", ["A","B","C"]) for _ in range(10000)]
    el_lats = [local_read_latency("A") for _ in range(10000)]
    rep_lag = 4.5  # measured replication lag p99 in ms

    print(f"EC (quorum) read p50={statistics.median(ec_lats):.2f} ms  p99={sorted(ec_lats)[9900]:.2f} ms")
    print(f"EL (local)  read p50={statistics.median(el_lats):.3f} ms  p99={sorted(el_lats)[9900]:.3f} ms")
    print(f"EL staleness window: up to {staleness_ms_for_local_read(rep_lag):.1f} ms behind leader")
    print(f"Cost of choosing EC over EL at p99: {sorted(ec_lats)[9900] - sorted(el_lats)[9900]:.2f} ms per read")
    print(f"At 50,000 reads/sec, EC adds {(sorted(ec_lats)[9900] - sorted(el_lats)[9900]) * 50000 / 1000:.0f} ms-of-latency-budget per second")

Sample output:

EC (quorum) read p50=1.24 ms  p99=2.07 ms
EL (local)  read p50=0.060 ms  p99=0.122 ms
EL staleness window: up to 4.5 ms behind leader
Cost of choosing EC over EL at p99: 1.95 ms per read
At 50,000 reads/sec, EC adds 97 ms-of-latency-budget per second

The numbers are intra-AZ. Move the replicas across regions and the EC p99 jumps to 28-180 ms while the EL p99 stays at 0.122 ms. The staleness window — how far behind the leader an EL read can be — is bounded by the replication lag, which on a healthy network is small (single-digit milliseconds for synchronous replication, tens to hundreds for async). The PACELC question is: do you pay 28 ms per read for guaranteed freshness, or accept up to 5 ms of staleness for free? For a wallet-balance display, EL is right. For a fraud-rule check that decides whether to approve a ₹4 lakh transaction, EC is right.

Why the answer is per-query, not per-database: the same Cassandra cluster serves both PaySetu's wallet-balance display (EL — staleness costs the user nothing visible) and PaySetu's fraud-rule check (EC — staleness lets a fraudulent transaction slip through). Cassandra's consistency_level is set per-query precisely so that one cluster can serve both. Treating the database as monolithically EL or EC forces the wrong trade-off on one workload. PACELC's value is naming the dial; it is your job to set the dial appropriately for each call.

Real systems on the PACELC grid

The grid is not abstract — every distributed database has a defensible cell. Naming the cell forces you to read the docs honestly.

Spanner — PC/EC. Google's globally-distributed database is the canonical PC/EC example. Every committed write goes through Paxos across replicas in different regions, paying the cross-region RTT (Google publishes 5-7 ms commit latency intra-continent, ~80 ms trans-continental). Reads at a TrueTime-bounded timestamp also pay a commit-wait of one TrueTime uncertainty interval (~7 ms). Spanner trades latency for global linearizability on every operation, partition or not. The PACELC cell is honest because the docs are honest: Spanner does not pretend to be low-latency.

DynamoDB — PA/EL by default, PC/EL with ConsistentRead=true. The default DynamoDB read returns the value from one of the three replicas the partition key hashes to — fast (single-digit ms in-region) but eventually consistent (replication lag p99 ~300 ms across replicas during heavy load). Pass ConsistentRead=true and DynamoDB upgrades the read to a quorum read of two replicas, paying ~2x the latency for linearizable freshness. The cell is per-call: developers pick PA/EL or PC/EL on each GetItem.

Cassandra — PA/EL with tunable knobs. consistency_level=ONE is PA/EL — a local read, possibly stale. consistency_level=QUORUM (R+W>N) is PC/EC during partitions and PC/EL when healthy (assuming the local replica is in the quorum, which is a fair simplification). consistency_level=LOCAL_QUORUM is region-scoped — fast intra-region but does not guarantee cross-region freshness. Cassandra does not have a single PACELC cell; it has a 2x2 of consistency levels, and the cell you land in depends on the level you pass per query.

Why this per-query framing changed how teams pick databases: between 2010 and 2018, "we use Cassandra so we are eventually consistent" was a sentence engineering teams said in design reviews. After PACELC entered the vocabulary, the same sentence became a smell — Cassandra's consistency level is per-query and the team's data model probably needed at least two cells (a PA/EL cell for analytics-style reads and a PC/EC cell for ledger-style writes). PACELC turned the database choice into a per-call configuration discipline, which is harder than picking one database and forgetting, but also closer to the truth of what production traffic needs.

EC read latency vs EL read latency across replica topologiesA horizontal bar chart showing read latency for EC (quorum) vs EL (local) across three topologies: same-AZ, cross-AZ in one region, cross-region. EC bars are 1ms, 2ms, 28ms. EL bars are 0.1ms in all three. The diagram is illustrative. The latency cost of EC, by replica topology Same-AZ replicas (3 nodes in one rack-row) EC: ~1 ms EL: ~0.1 ms Cross-AZ replicas (3 AZs in ap-south-1) EC: ~2 ms EL: ~0.1 ms Cross-region replicas (Mumbai → Singapore → Frankfurt) EC: ~28 ms (Mumbai-Singapore quorum) EL: ~0.1 ms (local read) Illustrative. EL latency is bounded below by the local in-memory read; EC scales with the slowest quorum reply.
The same query — read one row by primary key — costs 200x more on a cross-region cluster if you ask for EC than if you ask for EL. The replication lag bound, not the latency, is the cost of the EL choice.
Wall-clock fraction in P arm vs E arm for a typical production clusterA horizontal stacked bar showing the fraction of wall-clock time a typical production cluster spends in healthy operation versus in a partition. The healthy slice (E arm of PACELC) is roughly 99.97% of the bar. The partition slice (P arm) is roughly 0.03%, magnified inset for visibility. Annotations point out that ELC tuning operates on the 99.97% slice and PAC tuning operates on the 0.03% slice. The diagram is illustrative. Where the PACELC time is spent — wall-clock proportions A typical intra-region cluster, one quarter: E arm — healthy operation, ~99.97% of wall-clock time → partitions ~0.03% ELC tuning here — every read, every write, every day, on every healthy second PAC Illustrative. Real partition rates vary — intra-AZ is typically minutes per quarter; cross-region is tens of minutes per quarter.
The CAP-style PAC arm of PACELC fires for a few minutes per quarter on a well-engineered cluster. The ELC arm fires every nanosecond the system is up. PACELC's contribution is naming the dial that turns 99.97% of the time.

How PACELC reframes a real production decision

KapitalKite, a fictional discount stockbroker, ran into PACELC on the order-book service in 2024. The order book had to be the same across three regions (Mumbai, Singapore, Frankfurt) so that an order placed by a Singapore client could not execute against a Mumbai order at a stale price. The team's first instinct: deploy CockroachDB in default config (PC/EC) so every read is linearizable. They did. p99 read latency went to 92 ms. The order-book API SLO was 20 ms p99. The system shipped to staging and immediately failed the latency test. The reflex move was to "tune Cockroach" — they tried index hints, query rewrites, connection pooling — none of it helped, because the latency floor was the cross-region Paxos round-trip, which no amount of tuning can eliminate.

The PACELC reframe: the read path did not need EC. It needed PC/EL. CockroachDB's follower reads, configured for ≤5 second staleness, served the order-book reads from the local region in 2 ms p99. Writes still went through Paxos at 28 ms because writes must be PC/EC for an order book. The system shipped, the SLO was met, and the team learned to ask "EL or EC?" on every read path, not just on writes. The post-mortem labelled the original design "CAP-monolithic" — they had picked one cell for the whole database when the database wanted four cells, one per call.

Common confusions

  • "PACELC is just CAP with two more letters." PACELC adds an arm CAP does not have. CAP describes only partition behaviour; PACELC also describes healthy-network behaviour. The ELC arm is what most production tuning is about because partitions are rare and healthy operation is constant. PACELC strictly subsumes CAP.
  • "My PA/EL system is faster than your PC/EC system." Not always. PC/EC systems with co-located replicas (3 nodes in the same AZ) pay sub-millisecond quorum costs and feel as fast as PA/EL systems. The latency tax is bounded below by replica RTT, which is small if replicas are close. Spanner is PC/EC and runs at single-digit ms p99 within a region. The cost only becomes large when you stretch replicas across regions.
  • "If I pick PC/EC, I have a strongly consistent system." You have a strongly consistent server. Your client may still cache stale reads, your application may read-modify-write without transactions, and your monitoring dashboards may aggregate over windows that are not linearizable. Linearizability at one layer does not propagate up the stack — see linearizability and read-your-writes.
  • "PACELC tells me which database to pick." It tells you which trade-off each database makes. Picking the database also depends on operational maturity, ecosystem, query language, geo-replication topology, and team skill. PACELC is a single, narrow axis. Use it to disqualify databases whose default cell is wrong for your dominant workload, not to pick winners.
  • "DynamoDB is PA/EL, full stop." DynamoDB is PA/EL on its default reads, PC/EL on ConsistentRead=true, and the write path is PC under sufficient ack levels. DynamoDB is a per-call PACELC menu, not a single cell. The same is true of Cassandra, MongoDB, and CockroachDB.
  • "Eventual consistency is the same as EL." Not exactly. EL describes what the system does on a healthy network — usually a fast local read with some bounded staleness window. Eventual consistency is a guarantee about what happens if writes stop and the system is left to converge. They are correlated (EL systems are usually eventually consistent at minimum) but not identical. See eventual consistency.

Going deeper

Abadi's 2010 motivation — the PNUTS observation

PACELC was not invented in a vacuum. Abadi was working with Yahoo's PNUTS (the predecessor to many of today's geo-distributed key-value stores) and watching the team make daily decisions that CAP could not name. PNUTS replicated data across geo-regions and gave applications a per-record choice: "timeline consistent" (always serve from the master record region — high latency, strong consistency) or "eventual consistent" (serve from any region — low latency, possibly stale). This was the ELC dial in production, every day, on healthy networks, and CAP had no language for it. Abadi's contribution was to name the second arm, observe that PA/EL systems and PC/EC systems were the most common, and point out that the PA/EC and PC/EL corners are both legitimate and chosen by real systems. The 2012 IEEE Computer paper formalised this and connected it to specific systems — PNUTS, Cassandra, Dynamo, BigTable, MongoDB, VoltDB.

The boundary between EL and EC is a tunable, not a permanent setting

Most modern databases give you a per-query knob to slide between EL and EC. CockroachDB's AS OF SYSTEM TIME enables follower reads with bounded staleness. DynamoDB's ConsistentRead=true upgrades a read from EL to EC. Cassandra's consistency_level parameter takes values from ONE (EL) through LOCAL_QUORUM (regional EC) to ALL (full EC across replicas). Spanner's staleness clause on read-only transactions chooses how stale a read can be. The PACELC cell is the default for the database; the per-query knob lets you opt into the other cell. PaySetu's wallet-balance display reads at CONSISTENCY_LEVEL=ONE; PaySetu's transaction commit reads at CONSISTENCY_LEVEL=QUORUM. Same cluster, two PACELC cells.

TrueTime collapses the EC cost — partially

Spanner's TrueTime API gives every machine a globally-bounded uncertainty interval on the wall-clock time. With this primitive, Spanner can perform externally-consistent reads without contacting other replicas — by waiting out the uncertainty interval (commit-wait), the system guarantees that any read at timestamp t happens after every write that committed at timestamps before t. The cost is no longer a quorum RTT; it is the TrueTime uncertainty interval (~7 ms). This is a clever way to convert the EC tax from a network round-trip into a clock-uncertainty wait — and it works because Google's GPS-and-atomic-clock infrastructure makes the uncertainty interval small. Without that infrastructure (most clouds), TrueTime-style externally-consistent reads are not feasible, and you are back to paying the quorum RTT for EC. See truetime, spanner, and physical-logical hybrids.

When PACELC itself is the wrong frame

For some systems PACELC is also too narrow. CRDTs (G-counters, OR-sets — see Part 13) deliberately give up linearizability and instead promise strong eventual consistency: any two replicas that have received the same set of updates are in the same state, regardless of order. CRDTs are PA/EL by the PACELC label, but they provide a richer correctness guarantee that PACELC does not name. Stream processing systems care about exactly-once delivery and watermark-based windowing, not linearizability vs latency. Bounded-staleness reads (read at most K seconds behind) are neither EL nor EC; they are a hybrid that PACELC accommodates only with a footnote. PACELC is a useful frame for read-write KV stores; it is not a universal lattice of distributed-systems trade-offs.

CricStream's PACELC mismatch

CricStream — a fictional cricket-streaming service — stores per-user view-counters in DynamoDB. The team enabled ConsistentRead=true on every read because the engineering lead "wanted strong consistency for analytics". This pushed every read into PC/EL on a per-call basis — fine on a healthy network, but the per-read cost roughly doubled and the team breached the latency SLO during the IPL final at 25 million concurrent viewers. The fix was to read at the default ConsistentRead=false — eventually consistent, ~300 ms staleness, but 2x faster — and accept that the view counter on a user's screen may be 300 ms behind the leader. No user has ever complained that their view-count was 300 ms stale. The PACELC reframe: analytics needs eventually correct aggregates, not linearizable per-event counts. The team had been paying for EC where EL was both correct and faster.

Reproduce this on your laptop

python3 -m venv .venv && source .venv/bin/activate
pip install simpy
python3 pacelc_sim.py
# Tweak RTT_MS to simulate cross-region replicas (28 for Mumbai-Singapore, 180 for Mumbai-Frankfurt) and re-run.

Where this leads next

PACELC sits at the head of Part 12's consistency-modelling chapters. The cells of the grid resolve into specific consistency models:

Part 13's CRDT chapters tackle the post-divergence reconciliation that the PA arm of PACELC leaves open. Part 14's distributed-transaction chapters revisit PC/EC under the harder constraint of multi-key atomicity. Part 17's geo-distribution chapters revisit ELC across continents, where the L cost is no longer milliseconds but tens of milliseconds, and the EC tax becomes the dominant line-item in the latency budget.

References

  • Abadi, D. — "Consistency Tradeoffs in Modern Distributed Database System Design: CAP is Only Part of the Story" (IEEE Computer, 2012). The PACELC paper.
  • Abadi, D. — "Problems with CAP, and Yahoo's little known NoSQL system" (DBMS Musings blog, 2010). Where PACELC was first sketched, with the PNUTS motivating example.
  • Brewer, E. — "CAP Twelve Years Later: How the Rules Have Changed" (IEEE Computer, 2012). Brewer himself acknowledging the gaps PACELC fills.
  • Corbett, J. et al. — "Spanner: Google's Globally-Distributed Database" (OSDI 2012). The canonical PC/EC system, with TrueTime as the latency-cost reduction trick.
  • DeCandia, G. et al. — "Dynamo: Amazon's Highly Available Key-Value Store" (SOSP 2007). The canonical PA/EL system.
  • Cooper, B. et al. — "PNUTS: Yahoo!'s Hosted Data Serving Platform" (VLDB 2008). The PC/EL system that motivated PACELC.
  • Bailis, P. & Ghodsi, A. — "Eventual Consistency Today: Limitations, Extensions, and Beyond" (CACM 2013). Empirical look at the EL arm.
  • Kleppmann, M. — Designing Data-Intensive Applications, Chapter 9. The clearest book-length treatment of consistency trade-offs.
  • The CAP theorem and its misuse — the framework PACELC refines.
  • Eventual consistency — what EL settles to in the absence of writes.