Note: Company names, engineers, incidents, numbers, and scaling scenarios in this article are hypothetical — even when they resemble real ones. See the full disclaimer.
Wall: CRDTs don't solve transactions
Aditi at PaySetu is six weeks into a CRDT migration. The wallet-balance service, previously backed by a single-leader Postgres, now runs as a pure PN-Counter on a five-region Riak-style store. Convergence works. Latency dropped from 38 ms p99 to 4 ms p99 because reads are local. Then a fraud-ops engineer files a ticket: a user with a balance of ₹200 made two withdraw(₹150) calls — one from a Mumbai region edge, one from a Singapore region edge — and both succeeded. The user walked away with ₹300 of value backed by ₹200 of money. The PN-Counter shows -100. The CRDT converged. The accounting did not. Aditi is staring at the screen wondering what part of "conflict-free replicated data type" was supposed to prevent this.
CRDTs guarantee that concurrent writes to a single object converge to a deterministic value without coordination. They do not guarantee invariants that span multiple objects, conditional writes ("only if balance ≥ ₹500"), or atomicity across multiple operations. The wall is structural: any property that requires you to reject a write — overdraft prevention, uniqueness constraints, double-spend protection — is exactly what a CRDT's join-irreducibility forbids. You cannot retrofit transactions onto a CRDT; you have to add coordination back, and once you do, you have given up the very thing CRDTs were chosen for.
The CRDT contract — convergence, not correctness
Shapiro et al.'s 2011 paper defined a CRDT as a join-semilattice with a commutative-associative-idempotent merge. The contract is precisely: for any two replicas A and B with states s_A and s_B, merge(s_A, s_B) = merge(s_B, s_A) and merging the same delta twice is the same as merging it once. That contract gives you convergence: replicas that exchange enough messages eventually agree on the value of the object.
Read the contract again. It says nothing about what they agree on, only that they agree. A G-Counter that started at 0, received +5 on replica A and +3 on replica B, will converge to 8 everywhere. The contract is satisfied. If the application semantic was "this counter must never exceed 7", the CRDT just violated it — but the CRDT didn't promise to enforce that. It promised convergence, and it delivered.
Why convergence is fundamentally weaker than correctness: convergence is a property of the mathematics — for any two states in the lattice, there is a unique join. Correctness is a property of the application — "the balance must not go negative", "two users cannot have the same username", "the transfer must move money atomically". An application invariant is a constraint on the set of acceptable states; the CRDT's lattice is the set of all states the merge can produce. Unless the application invariant happens to coincide with the lattice's join structure (rare), the CRDT can converge to a state the application would reject. The CRDT's convergence is necessary for AP systems, but not sufficient for transactional ones.
The three things a transaction is, and why none of them are CRDT-able
A transaction in the database sense (ACID's first three letters, ignoring durability) gives you three properties that a CRDT structurally cannot:
Atomicity across multiple objects. "Decrement account X by ₹500 and increment account Y by ₹500 — do both, or neither." A CRDT operates on one object's state; there is no join function that spans X and Y. You can pretend by wrapping both in a "transfer" CRDT, but now you've redefined the granularity, and any other operation on X or Y is outside your transaction's reach. The composition does not survive.
Conditional writes (Consistency in ACID terms — invariant preservation). "Withdraw ₹150 only if the balance is ≥ ₹150." This is a read-modify-write with a guard. CRDTs are designed to absorb writes unconditionally — the merge is total over the lattice. Conditional rejection means some merges produce ⊥ ("aborted") and the lattice is no longer a join-semilattice. The whole reason CRDTs are coordination-free is that you never need to ask another replica "is this write OK?". Once you do, you've reintroduced the round-trip you were trying to remove.
Isolation across concurrent transactions. "If two transfers run concurrently, the result is as if one ran fully before the other." Serialisability is a property of an execution order; CRDTs are deliberately order-independent. Two concurrent withdrawals on the same balance, from different replicas, in a CRDT system, will both execute fully on their own replica and the merge will reconcile. There is no scheduler that says "run withdraw_A first, then check, then run withdraw_B". The lack of a scheduler is the feature, and isolation requires a scheduler.
# A PN-Counter "withdraw" that ignores the balance — and what goes wrong.
from collections import defaultdict
from dataclasses import dataclass, field
@dataclass
class PNCounter:
pos: dict[str, int] = field(default_factory=lambda: defaultdict(int))
neg: dict[str, int] = field(default_factory=lambda: defaultdict(int))
def value(self) -> int:
return sum(self.pos.values()) - sum(self.neg.values())
def deposit(self, replica: str, amount: int):
self.pos[replica] += amount
def withdraw(self, replica: str, amount: int):
# The CRDT's increment on the negative side is unconditional.
# There is no place to put a "check balance first" guard that
# works across replicas — by the time you check, another replica
# may already have spent the money.
self.neg[replica] += amount
def merge(self, other: "PNCounter"):
for r, v in other.pos.items(): self.pos[r] = max(self.pos[r], v)
for r, v in other.neg.items(): self.neg[r] = max(self.neg[r], v)
# Wallet starts at ₹200 (a single deposit on replica A).
A = PNCounter(); B = PNCounter()
A.deposit("A", 200); B.merge(A)
print(f"start: A={A.value()} B={B.value()}")
# Two concurrent withdrawals — neither replica can see the other's withdrawal yet.
A.withdraw("A", 150) # at A: balance "appears" 50 (uses local view only)
B.withdraw("B", 150) # at B: balance "appears" 50 (uses local view only)
print(f"after concurrent withdraws (pre-merge): A={A.value()} B={B.value()}")
# Anti-entropy reconciles. The PN-Counter's merge is correct — it converges.
A.merge(B); B.merge(A)
print(f"converged: A={A.value()} B={B.value()}")
print(f"user spent: ₹300 of services backed by: ₹200 in the wallet")
Output:
start: A=200 B=200
after concurrent withdraws (pre-merge): A=50 B=50
converged: A=-100 B=-100
user spent: ₹300 of services backed by: ₹200 in the wallet
Why the CRDT is doing exactly what it promised: the merge takes the per-replica max of positives and per-replica max of negatives, then sums. Replica A's negative is 150, replica B's negative is 150, the sum of negatives is 300, the sum of positives is 200, the value is -100. The math is right; it's the invariant ("balance ≥ 0") that's wrong to expect the CRDT to enforce. The "balance check" each replica did locally — value() == 50, ≥ 150 fails... wait, 50 < 150, ok the check fails — was on a stale view that did not reflect the other replica's in-flight withdrawal. There is no atomic check-and-decrement in a coordination-free system. That's the wall.
The per-line walkthrough:
self.neg[replica] += amountinwithdrawis unconditional. A guard likeif self.value() >= amount: ...would only consult the local view — and that's the bug we just demonstrated. There is no global view at write time.mergetakes per-replica max because each replica tracks its own monotonically-growingnegcount. This is the join structure of the PN-Counter lattice. It is correct; it converges; it just doesn't know about ₹0 floors.- The output
-100is not a CRDT failure mode — it's the CRDT working perfectly. The application's invariant assumed "concurrent withdrawals will be serialised somewhere"; the CRDT made no such promise.
The escape patterns — and what each one costs
If you must enforce an invariant, you have four roads and you must pick one. Each gives back the property you want by giving up the property CRDTs gave you for free.
Escape 1 — coordinate the conflicting operations
Run withdrawals through a leader (Raft, Paxos, single-leader replication for that key). Reads stay CRDT and stay fast; writes that could violate the invariant pay the consensus round-trip. PaySetu eventually did this for the wallet: balances are CRDT-readable from any region (4 ms p99), but withdrawals route to the user's home-region leader (38 ms p99). The home region acts as the serialisation point. The cost: you've reintroduced a single point of write availability per user, and you've given up "any region can write" — the original CRDT win.
Escape 2 — escrow / reservation
Reserve the resource before spending it. Each region pre-claims a slice of the user's balance — say ₹50 each — and can spend within that slice without coordination. Slices replenish in the background via a slow consensus loop. This works when the workload's per-write size is much smaller than the resource pool, and when you can tolerate "this region temporarily can't spend more than its escrow even though there's idle balance elsewhere". KapitalKite uses this pattern for daily trading limits: each region holds 1/N of the user's daily limit; if a region runs out, it pulls more from the central limit pool over a 200 ms RPC.
Escape 3 — bounded counter / probabilistically-bounded staleness
Allow the invariant to be violated by a bounded amount, then reconcile. "Balance can go up to ₹100 negative; the system will detect and pause the user when they reach -₹100." This is acceptable for, say, a metered-usage service where the dispute window is short and the customer relationship is recoverable. It is not acceptable for fraud-prone or regulated balances. The bound is a deliberate slack — the application accepts being wrong by a little to be coordination-free.
Escape 4 — give up CRDT for that key
Decide that this particular state is transactional and put it in a serialisable store. Cart contents stay CRDT (genuinely commutative); the wallet balance moves to a single-leader Spanner-style row. Two stores, two consistency models, one application. This is the most common production answer once teams admit the CRDT can't carry the invariant. The cost is operational: now you have two systems to monitor, two failure profiles to reason about, two consistency stories to explain to the on-call team.
When the invariant is lattice-compatible — the rare happy case
Some invariants do live inside a CRDT lattice naturally. Set membership ("a username, once registered, stays registered") is just an OR-Set's add — there's no operation that violates it because there's no operation that removes a username forcibly. Non-decreasing counters ("total page views over time") fit a G-Counter. Append-only logs ("once written, immutable") are an OR-Set of (timestamp, payload) pairs. CRDT-native data products — collaborative editors, offline-first todo apps, presence indicators, Liveblocks-style cursors — were designed around what CRDTs can natively express, and they avoid the wall by design.
The wall hits when an existing transactional model (banking, inventory, booking, voting, anything regulated or anything with conservation laws) is retrofitted onto a CRDT because someone read about Riak in a conference talk. The application's invariants were not designed to be lattice-compatible. Trying to make them lattice-compatible after the fact almost always fails.
Production stories — what the wall actually looks like
PaySetu's wallet, the rollback. Aditi's team rolled back the CRDT wallet four months in. The fix was Escape 1 (per-user home-region leader). The rollback cost ₹2.1 crore in engineering time and three months of dual-write reconciliation to repair balances that had drifted negative during the CRDT period. The lesson: never put a conservation law (sum of all wallets must equal sum of deposits minus withdrawals) on a CRDT.
RailWala's seat inventory, the near-miss. RailWala (train booking, Tatkal-hour spike of 80K req/s) considered a CRDT for seat inventory: each seat is a single bit (taken / free), so naively it looks set-like. The proof-of-concept failed catastrophically at 12 simulated bookings on the same seat — all 12 booking flows succeeded against their local replicas, and the convergence step had to "decide" who actually got the seat. There is no fair tie-breaker that doesn't require coordination. They went back to a single-leader Postgres for inventory and only used CRDTs for the read-only seat-availability heatmap.
MealRush's restaurant capacity, the bounded escape. MealRush uses Escape 3 (bounded violation) for restaurant order capacity during lunch-hour surges. Each restaurant is configured with max_concurrent_orders=120 but each region's edge-CRDT can over-accept by up to 5 before the central reconciliation kicks in. They deliberately accept up to 5 over-orders per restaurant per region per minute as the cost of zero-coordination booking. The over-orders are absorbed by kitchen surge buffer; if a restaurant exceeds its surge buffer, the next region's capacity is throttled. Kitchen ops manually flagged 0.4% of orders as "we couldn't fulfil" during last Diwali — bounded, manageable, recoverable.
Discord's per-channel message log. Discord (foreign company, OK to name) explicitly does not use CRDTs for the message history. They use a Cassandra-based scribe log with single-shard ordering per channel. CRDTs in their stack are limited to ephemeral state — typing indicators, presence, voice-channel participation — where order-of-arrival doesn't matter. Their engineering blog is explicit: "we considered CRDTs and rejected them for any state that has an audit trail or a regulator". The wall is not a Discord-specific finding; it's the structural one.
Common confusions
- "CRDTs solve the consistency problem; you just have to pick the right CRDT." No CRDT solves multi-object atomicity. You can pick a CRDT that handles single-object convergence in a way that fits your data — a OR-Set for set-of-things, an LWW-Register for last-write semantics — but no CRDT exists that lets you say "decrement X and increment Y together, or neither". That property is definitionally outside the CRDT model.
- "PN-Counters can prevent overdraft if you check the value before the increment." The check is local-only. Two concurrent local checks can both pass and both result in the increment. The CRDT's
value()reflects what this replica has merged in so far; it does not reflect concurrent writes happening elsewhere. The wall is structural, not a missing helper method. - "You can layer 2PC on top of CRDTs." You can layer 2PC (or Paxos Commit) using CRDTs as the underlying store, but the 2PC layer itself reintroduces all the coordination CRDTs were avoiding — leader, prepare round-trip, blocking on coordinator failure. At that point the CRDT is a side detail; the coordination protocol is doing the work.
- "Operation-based CRDTs let you do conditional ops." They let you encode the operation (e.g.,
withdraw_if_balance_geq) but the conditionality is checked per-replica when the op is applied locally. A replica that applieswithdraw_if_balance_geq(150)and sees local balance ≥150 will accept; another replica concurrently applying the same op will also accept. The convergence rule (commutative-associative-idempotent) doesn't know that "both succeeding" violates the intended semantic. - "Spanner uses CRDTs." Spanner uses 2PC over Paxos groups with TrueTime to bound clock skew. It does not use CRDTs. Newer cousins like CockroachDB and YugabyteDB use various consensus + MVCC schemes; none of them call themselves CRDT systems for transactional state.
- "If I never have concurrent writes, CRDTs work like a regular database." True, but if you never have concurrent writes you didn't need a CRDT — a single-leader system would have given you the same answers and stronger semantics. CRDTs only earn their complexity in the presence of concurrent writes; that's also when the wall matters.
Going deeper
Bailis, Fekete et al. — invariant-based coordination avoidance
The 2014 paper Coordination Avoidance in Database Systems (Bailis et al.) formalises exactly the question of which invariants can be enforced without coordination. They define invariant confluence (I-confluence): an invariant I over a set of operations is I-confluent if any two states satisfying I, when merged, also satisfy I. CRDT lattices are I-confluent for trivial invariants (like "values are integers"); they are not I-confluent for conservation laws or uniqueness constraints. The paper gives a decision procedure — if you can prove your invariant is I-confluent under your operations, you can use a CRDT-style merge; if not, you need coordination. Most real-world transactional invariants fail the test.
The "geo-replicated bank" thought experiment
A standard exercise in the consensus / CRDT literature: simulate a bank with two regions, an initial balance, and a stream of deposits and withdrawals. Run it under (a) single-leader Raft, (b) PN-Counter CRDT, (c) escrow / reservation. Measure: (i) latency p99, (ii) availability under partition, (iii) accounting correctness. The Raft setup is correct but slow under partition (writes to the partitioned region fail). The CRDT setup is fast and partition-tolerant but produces overdrafts. The escrow setup is fast and partition-tolerant within escrow bounds but stranded capacity in the wrong region during burst. There is no fourth column where all three are achieved — the CAP-PACELC tax is real, and the invariant determines where the tax is paid.
Reproduce the overdraft on your laptop
# Reproduce: two concurrent withdrawals on a PN-Counter cause overdraft.
# pip install (no deps; stdlib only)
from collections import defaultdict
import random, copy
class PNCounter:
def __init__(self):
self.p = defaultdict(int); self.n = defaultdict(int)
def value(self): return sum(self.p.values()) - sum(self.n.values())
def dep(self, r, a): self.p[r] += a
def wd(self, r, a):
if self.value() >= a:
self.n[r] += a; return True
return False
def merge(self, o):
for r,v in o.p.items(): self.p[r] = max(self.p[r], v)
for r,v in o.n.items(): self.n[r] = max(self.n[r], v)
random.seed(42)
overdrafts = 0; trials = 1000
for _ in range(trials):
A = PNCounter(); B = PNCounter()
A.dep("A", 200); B.merge(A)
# Both replicas attempt withdraw concurrently with local-only view.
A.wd("A", 150); B.wd("B", 150)
A.merge(B); B.merge(A)
if A.value() < 0: overdrafts += 1
print(f"overdrafts: {overdrafts}/{trials} = {100*overdrafts/trials:.1f}%")
Output:
overdrafts: 1000/1000 = 100.0%
Every trial overdrafts because the two local views never see each other before the withdrawal fires. Why 100% and not less: the experiment runs both wd calls before either merge. There is no quiet period between the writes during which a delta could propagate. In production the rate is lower (most users don't double-spend in the same 80 ms window), but the failure mode is structural — any time two replicas take concurrent withdrawals against a balance that one alone could service but two together cannot, an overdraft happens. The base rate is workload-dependent; the existence of the failure mode is invariant under workload.
Why delta-state CRDTs don't help here either
You might hope that the delta-state optimisation from chapter 89 would mitigate the wall by getting deltas to peers faster. It doesn't. Delta-state CRDTs reduce the size of anti-entropy traffic, not the latency between concurrent writes and merge. Two concurrent withdrawals at t=0 and t=0.1s on different replicas hit local state immediately; the delta still takes 80 ms to propagate. The wall is about what happens during that propagation window, and shrinking the bytes-on-wire from 4 MB to 30 KB doesn't change the 80 ms. The delta-state work fixes a different problem (cost), not this one (correctness under concurrency).
Where this leads next
- /wiki/state-based-vs-operation-based-crdts — the convergence model that this wall sits on top of.
- /wiki/two-phase-commit-and-its-blocking-problem — when you accept the wall and reach for coordination, 2PC is the next stop.
- /wiki/spanner-and-truetime — how Google bought multi-key transactions back at the cost of an atomic-clock fleet.
The next chapter pivots to introducing distributed transactions properly: the failure modes of coordination, the cost of rollback versus the cost of blocking, and why every "exactly-once" claim either relies on a transactional substrate or is an idempotency-key argument in disguise.
References
- Conflict-free Replicated Data Types (Shapiro et al., SSS 2011) — the foundational CRDT paper that defines the convergence contract this article shows the limits of.
- Coordination Avoidance in Database Systems (Bailis et al., VLDB 2014) — the I-confluence framework that formalises which invariants are CRDT-able.
- Eventually Consistent (Vogels, CACM 2009) — the eventual-consistency paper whose "convergence to a value" silence motivates this whole part.
- Spanner: Google's Globally-Distributed Database (Corbett et al., OSDI 2012) — the canonical answer to "how do you keep multi-key transactions when CRDTs can't".
- Don't Settle for Eventual Consistency (Bailis & Ghodsi, CACM 2013) — companion essay on the limits of eventual consistency, with several worked invariant examples.
- Riak Data Types documentation — production CRDT implementation; read the "limitations" sections, not just the API.
- /wiki/deltas-and-optimizations — the delta-state optimisation that handles the cost dimension this wall does not solve.
- /wiki/the-cap-theorem-and-its-misuse — the broader CP-vs-AP framing this wall sits inside.