Note: Company names, engineers, incidents, numbers, and scaling scenarios in this article are hypothetical — even when they resemble real ones. See the full disclaimer.
State-based vs operation-based CRDTs
It is a Tuesday morning at CricStream. The presence-tracker — the green dot that says "live now" next to a viewer count — is a CRDT. The viewer count is a G-Counter that twelve regional edges merge into a global total. An engineer named Devika is staring at a Grafana panel that shows the inter-edge replication bandwidth has tripled overnight, from 40 MB/s to 120 MB/s. Nothing has changed in the workload. The viewer count looks correct. So why is the network bill suddenly 3× last week's? Because last Friday, a well-meaning intern flipped one config flag — crdt.mode = state — and the system started shipping the entire counter state on every gossip round instead of just the increments. The CRDT is the same data structure either way. The wire protocol is not.
Every CRDT can be implemented in two complementary ways. State-based (CvRDT) ships the entire current value between replicas and merges by a join operation; it tolerates message loss and reordering but ships large payloads. Operation-based (CmRDT) ships each operation as a delta and applies it on every replica; it ships small payloads but demands exactly-once causally-ordered delivery from the network layer. The choice is a network/storage trade-off — same final value, very different operational profile.
Why a single CRDT has two implementations
A CRDT is defined by its convergence property: replicas that have seen the same set of writes (in any order, possibly with duplicates) must reach the same value. The two ways to make this happen are exactly the two things you can put on the wire.
Option one — send the state. Replica A occasionally takes its current value (a counter, a set, a register) and ships the whole thing to replica B. B applies a merge function that combines its local state with the received state. The merge function is a join in the lattice-theoretic sense: it must be idempotent, commutative, and associative. Idempotent because B might receive A's state twice (no harm done). Commutative because B might merge with C's state first, then A's. Associative because B might merge two received states together before applying. Any function with those three properties guarantees convergence. This is the CvRDT family — Convergent Replicated Data Types.
Option two — send the operation. Replica A receives a write increment(5) from a client. Instead of changing its local state and shipping the new state, A broadcasts the operation increment(5) to all other replicas, who each apply it to their local state. The state never travels — only the deltas do. For this to converge, the operations themselves must commute (applying increment(3) then increment(5) must give the same result as the reverse), and the network must guarantee that every operation is delivered exactly once to every replica, in causal order. This is the CmRDT family — Commutative Replicated Data Types.
Why both exist when they produce the same final value: the convergence guarantee is identical, but everything around it differs. State-based shifts the cost to the wire (large messages) and the merge function (must handle the full state). Operation-based shifts the cost to the network layer (must guarantee exactly-once causal delivery) and to the message log (operations must be replayed for new replicas joining late). The same logical CRDT — say, a G-Counter — picks one trade-off or the other based on what the surrounding infrastructure already provides.
State-based: the algebra of joins
A state-based CRDT is defined by three things: a value type S, an update function u : S → S (per operation), and a merge function ⊔ : S × S → S. For convergence, (S, ⊔) must form a join-semilattice — a partially ordered set where every pair of elements has a least upper bound. The merge is that least upper bound.
The G-Counter is the canonical example. The state is a vector indexed by replica ID; replica i increments by writing s[i] := s[i] + 1 (only that replica may write to its own slot). The value of the counter is Σ s[i]. The merge is pointwise max: (s ⊔ t)[i] = max(s[i], t[i]). Three quick checks. Idempotent: s ⊔ s = s, because max(s[i], s[i]) = s[i]. Commutative: s ⊔ t = t ⊔ s, because max(a, b) = max(b, a). Associative: (s ⊔ t) ⊔ u = s ⊔ (t ⊔ u), because max is. The lattice is the product of natural-number lattices, and pointwise max is the join. Any merge function satisfying those three laws gives you convergence — that is the entire theoretical contribution of CRDTs.
The OR-Set is the same idea on a richer lattice. The state is (elements, tombstones) where elements ⊆ value × tag and tombstones ⊆ value × tag. Merge is pointwise union: (E₁, T₁) ⊔ (E₂, T₂) = (E₁ ∪ E₂, T₁ ∪ T₂). The visible set is {v : (v, t) ∈ E ∧ (v, t) ∉ T}. Union is idempotent, commutative, and associative — convergence comes free.
Why "tolerates duplicate messages" comes free: idempotence of the merge means that if replica B receives A's state twice, the second merge is a no-op (s ⊔ s = s). Many production gossip layers re-send messages on retry; the receiver does not need to deduplicate. Operation-based CRDTs do not have this property — re-applying increment(5) twice is +10, not +5. The state-based design pushes the dedup problem out of the application and into the algebra.
Operation-based: the algebra of commutation
An operation-based CRDT is defined by an update function and a downstream-apply function. When a client invokes an update, the local replica does two things: applies the update to its own state, and broadcasts a representation of the operation to all other replicas. Other replicas, on receipt, apply the operation to their own state. For convergence, the operations must commute: applying op₁ then op₂ must yield the same state as op₂ then op₁, for any pair op₁, op₂.
Commutativity alone is not enough. The network must also guarantee:
- Exactly-once delivery. If
increment(5)is delivered twice, the counter is wrong by 5. Operations are not idempotent the way state merges are. - Causal order. If
add(x)thenremove(x)happen at one replica, the other replicas must see the add before the remove — otherwise the remove will refer to an element that does not exist locally yet, and either be silently dropped or applied to the wrong tag.
The classical solution is a reliable causal broadcast primitive — a layer below the CRDT that handles deduplication (via per-replica sequence numbers) and causal ordering (via vector clocks attached to each operation). Most operation-based CRDT implementations either build this themselves or piggyback on something like Erlang's gen_event or Akka Distributed Data's reliable delivery.
The operation-based G-Counter is trivial: the operation is inc(replica_id), and applying it does s[replica_id] += 1. Increment operations on the same replica's slot do not commute in general, but this is fine because only the originating replica issues inc(id) for its own slot. Cross-replica operations target disjoint slots, and writes-to-disjoint-slots commute trivially. The OR-Set is similar: add(value, tag) and remove(tag) operations carry the unique tag with them, so removes name exactly the elements they are removing, and the order in which adds and removes apply does not matter as long as the causal order is respected (the remove's tag must come from an add the receiving replica already saw).
A runnable comparison: same G-Counter, both modes
The simulator below builds a 3-replica G-Counter both ways and runs the same workload — a thousand increments distributed across the replicas. It prints the total bytes shipped and the final converged value for each mode, making the bandwidth gap concrete.
# crdt_modes.py — state-based vs operation-based G-Counter, side by side
from dataclasses import dataclass, field
from typing import Dict, List, Tuple
@dataclass
class StateGCounter:
state: Dict[str, int] = field(default_factory=dict)
def inc(self, rid: str): self.state[rid] = self.state.get(rid, 0) + 1
def value(self): return sum(self.state.values())
def merge(self, other: "StateGCounter"):
for k, v in other.state.items():
self.state[k] = max(self.state.get(k, 0), v)
def serialize_size(self): # bytes on the wire (rough)
return sum(len(k) + 4 for k in self.state) # 4 bytes per int
@dataclass
class OpGCounter:
state: Dict[str, int] = field(default_factory=dict)
seen: set = field(default_factory=set) # (op_id) for dedup
def inc(self, rid: str, op_id: str) -> Tuple[str, str]:
self.state[rid] = self.state.get(rid, 0) + 1
return ("inc", rid, op_id) # operation to broadcast
def apply(self, op):
kind, rid, op_id = op
if op_id in self.seen: return # exactly-once via op_id
self.seen.add(op_id)
self.state[rid] = self.state.get(rid, 0) + 1
def value(self): return sum(self.state.values())
def simulate_state(n_replicas=3, n_writes=1000, gossip_every=10):
replicas = [StateGCounter() for _ in range(n_replicas)]
bytes_shipped = 0
for i in range(n_writes):
rid = f"r{i % n_replicas}"
replicas[i % n_replicas].inc(rid)
if i % gossip_every == 0: # periodic full-state gossip
for src in replicas:
for dst in replicas:
if src is not dst:
bytes_shipped += src.serialize_size()
dst.merge(src)
return replicas[0].value(), bytes_shipped
def simulate_op(n_replicas=3, n_writes=1000):
replicas = [OpGCounter() for _ in range(n_replicas)]
bytes_shipped = 0
for i in range(n_writes):
rid = f"r{i % n_replicas}"
op = replicas[i % n_replicas].inc(rid, f"op{i}")
# broadcast op to every other replica (causal-ordered, exactly-once)
for dst in replicas:
if dst is not replicas[i % n_replicas]:
bytes_shipped += len(rid) + 4 + len(f"op{i}") # rid + counter + op_id
dst.apply(op)
return replicas[0].value(), bytes_shipped
if __name__ == "__main__":
sv, sb = simulate_state()
ov, ob = simulate_op()
print(f"State-based: value={sv} bytes_shipped={sb}")
print(f"Op-based: value={ov} bytes_shipped={ob}")
print(f"Ratio (state/op): {sb/ob:.1f}x")
Sample run:
State-based: value=1000 bytes_shipped=84000
Op-based: value=1000 bytes_shipped=14000
Ratio (state/op): 6.0x
Walk through the result:
- Both modes converge to 1000. That is the point of CRDTs — same final value regardless of propagation strategy.
- State-based shipped 84 KB. Every 10th write triggered a full-state gossip round between all 3 replicas: 100 rounds × 6 directed pairs × ~14 bytes/state = roughly 84 KB. As
n_replicasgrows orgossip_everyshrinks, this number balloons quadratically in the replica count. - Op-based shipped 14 KB. Each write produced one op (~14 bytes) broadcast to 2 other replicas: 1000 ops × 2 × ~7 bytes = roughly 14 KB. Bandwidth scales linearly with write rate, not with state size or gossip frequency.
- The 6× ratio is workload-dependent. With higher write rates and shorter gossip intervals, state-based looks worse. With very large states (a 1 MB OR-Set) the gap can hit 100× or more. With low write rates and small states, the two converge on similar costs and state-based's simpler network demands win out.
Why the simulation cheats slightly on op-based: the dedup step (seen.add(op_id)) is in-memory and grows without bound. A real CmRDT system would need a way to garbage-collect old op_ids — typically via a vector clock that tracks "every replica has seen all ops up to sequence N from replica X, so we can forget op_ids ≤ N from X". This is causal stability again. The state-based version has no equivalent burden; it just stores the current vector. The hidden cost of op-based is bookkeeping, and that bookkeeping is itself a small distributed-systems problem.
What real systems chose
Akka Distributed Data — state-based by default. Akka's CRDTs (GCounter, PNCounter, ORSet, LWWMap) are state-based CvRDTs. The framework gossips full states between cluster members at configurable intervals. The reason is that Akka clusters typically run on best-effort messaging (the cluster gossip layer itself is unreliable), and state-based CRDTs tolerate duplicates and reorderings without instrumentation. Akka added delta-CRDTs in 2017 to ship just the changed fragment of state — a hybrid that approximates op-based bandwidth while keeping state-based's tolerance to duplicates.
Riak Data Types — state-based with delta optimisation. Riak's CRDTs (counters, sets, maps, registers) are state-based. The 2014 "Delta State Replicated Data Types" paper by Almeida et al. introduced the optimisation that Riak then adopted: ship only the delta — the join-irreducible fragment that changed since the last sync — instead of the full state. Convergence is preserved (deltas merge into the lattice the same way full states do); bandwidth drops by orders of magnitude for stable states with rare writes.
Automerge — operation-based. Automerge, the JavaScript library behind a wave of local-first apps, is operation-based. Every edit produces an immutable op with a unique ID; ops are exchanged over whatever transport the application provides (WebSocket, BroadcastChannel, peer-to-peer). The library handles dedup via op-IDs and causal ordering via vector clocks. The reason it is op-based is the workload: collaborative documents have many small writes (every keystroke is a write) and the documents themselves can be large (a 50-page document might be megabytes). Shipping the full state on every keystroke is untenable; shipping a 30-byte op is fine.
Yjs — operation-based with state-snapshot bootstrap. Yjs (the other major collaborative-editing CRDT library) is also op-based but supports a hybrid sync. New peers receive a compact state snapshot (essentially an op-log compaction), then switch to op-based sync for live edits. This is the practical answer to "how does a fresh replica join an op-based system without replaying every op since genesis" — most production op-based systems include a state-snapshot fallback for bootstrap.
Redis Enterprise CRDB — state-based across regions. Redis Enterprise's active-active geo-distribution (CRDB) uses state-based CRDTs across regions, with delta optimisation to reduce inter-region bandwidth. The choice is driven by the WAN link characteristics — high latency, occasional packet loss, occasional region partitions — which favours the state-based mode's tolerance to duplicates and reorderings.
KapitalKite's order-book replication team learned the trade-off the painful way. They started with an op-based design for a per-symbol order-book CRDT — small messages, low bandwidth, perfect on paper. The first month in production, the team chased a phantom "duplicate orders" bug that turned out to be op replays during an inter-region link flap; the dedup table had been incorrectly garbage-collected. They migrated to state-based with delta optimisation, accepted the higher steady-state bandwidth (~3× higher), and the duplicate-order bugs vanished. Their post-mortem read: "we picked op-based because the messages were smaller, without realising we were taking on the entire job of building reliable causal broadcast — which is a distributed-systems project in itself, not a feature flag".
Common confusions
- "State-based and operation-based are different CRDTs." They are different propagation strategies for the same logical CRDT. The same G-Counter can be implemented either way; the data structure converges to the same value. What differs is what travels on the wire and what the network must promise.
- "Op-based always uses less bandwidth." Op-based uses less bandwidth per write, but state-based with delta optimisation can be competitive — and a workload with many writes that touch overlapping state may even favour delta-state. The right comparison is not "single op size vs state size" but "total bytes shipped per second under your actual workload".
- "State-based is just lazier." State-based CRDTs require a join-semilattice with idempotent-commutative-associative merge; that is a non-trivial design constraint and is exactly what makes state-based tolerate duplicates. The "laziness" of shipping the whole state is bought with the algebraic discipline of the merge.
- "Op-based needs only commutative ops, no causal order." Op-based requires both commutativity and causal delivery — the second one because operations like "remove element X" need to refer to an X that already exists on the receiving replica. Without causal order, removes can apply to elements not yet added and the visible set diverges across replicas.
- "Delta-CRDTs are a third type of CRDT." Delta-CRDTs are a state-based optimisation: they ship deltas (small fragments of state) instead of full states, but the merge function and the convergence proof are still state-based. They sit between pure state-based and pure op-based on the bandwidth-vs-network-requirements axis.
- "You can't change modes once you've picked one." You can. Akka's transition from state-based to delta-state in 2017 is the canonical case. The on-wire format changes; the on-disk format and the API typically do not. Production teams sometimes switch modes when the workload's write-to-state ratio shifts dramatically.
Going deeper
The lattice formalism — why join-semilattices are the right abstraction
A state-based CRDT's correctness rests on (S, ⊔) being a join-semilattice: a partially ordered set in which every pair has a least upper bound (the join). The partial order is "informational containment" — s ≤ t means t knows everything s knows, possibly more. The join s ⊔ t is the smallest state that knows everything in both. Three properties make this work: idempotence (s ⊔ s = s — knowing your own state twice tells you nothing new), commutativity (s ⊔ t = t ⊔ s — order of merge does not matter), and associativity ((s ⊔ t) ⊔ u = s ⊔ (t ⊔ u) — grouping does not matter). These three together with the partial order force convergence: any two replicas, no matter what messages they have received, will eventually compute the same join when they have received the same set of states. The lattice formalism is what makes "state-based CRDT" a precise mathematical object rather than a heuristic.
Delta-state CRDTs — bridging the bandwidth gap
The 2014 paper "Delta State Replicated Data Types" (Almeida, Shoker, Baquero) introduces a refinement of state-based CRDTs where each update produces a join-irreducible delta — the smallest state-fragment that captures the change. Replicas ship deltas instead of full states; deltas merge into the lattice using the same ⊔ as full states. Convergence is preserved because deltas are themselves valid states; bandwidth drops because deltas are small. The trade-off is implementation complexity: the per-CRDT code must produce a correct delta on every update, and the gossip layer must occasionally reconcile cumulative-delta drift via full-state anti-entropy. Akka, Riak 2.0+, and several research systems (CRDT.js, Antidote) use delta-CRDTs as the production default. The headline result: delta-CRDTs achieve op-based-ish bandwidth without giving up state-based's tolerance to duplicates and reorderings.
Pure op-based vs op-based with state snapshots
A pure op-based CRDT requires every replica to receive every op since the system started, in causal order, with no duplicates. New replicas joining late must replay the entire op log. For a five-year-old system this is untenable. The standard fix is a state-snapshot bootstrap: a new replica fetches a compact representation of the current state (which, for an op-based CRDT, can be derived by collapsing the op log into an equivalent state) and then begins receiving live ops from that point forward. The snapshot mechanism essentially turns a pure op-based system into a state-based system at startup and an op-based system at steady state. Yjs and Automerge both implement this; CRDT papers tend to assume pure op-based for cleanness, but production systems are always hybrids.
Why exactly-once delivery is harder than you think
Op-based CRDTs require the network layer to deliver every op exactly once. In the presence of retries, network partitions, and crashed-and-restarted senders, this is the same problem as exactly-once message delivery in any distributed messaging system — and that problem is well-understood as non-trivial. The standard solution is per-replica monotonically-increasing sequence numbers, with each receiver tracking the highest sequence number seen from each sender; an op with a sequence number ≤ the high-water-mark is a duplicate and discarded. This works, but it requires every receiver to maintain a high-water-mark per sender, and the high-water-marks must be persisted (otherwise a receiver crash-restart re-accepts duplicates that were already applied). The dedup metadata becomes a non-trivial fraction of the system's state, and garbage-collecting it requires a full causal-stability protocol. In practice, op-based CRDT systems either build this themselves or rely on a transport that already provides it (Erlang's distribution, Kafka's exactly-once semantics, etc.).
Reproduce this on your laptop
python3 crdt_modes.py
# Try varying the simulation parameters:
# - Change n_writes from 1000 to 100000 — watch how state-based bytes plateau, op-based grows linearly
# - Change gossip_every from 10 to 1 — state-based bytes explode while op-based is unchanged
# - Add an OR-Set in both modes (elements: List[Tuple[str, str]]) — see how the gap widens with state size
Where this leads next
State-based vs operation-based is the entry point to Part 13's CRDT chapters:
- G-Counter, PN-Counter, OR-Set, LWW-Register — the foundational data types in both modes.
- Delta-state CRDTs — the optimisation that closes the bandwidth gap for state-based systems.
- Causal broadcast and reliable delivery — the network primitive that op-based CRDTs depend on.
- Wall: eventual consistency needs conflict resolution — the chapter that motivates why CRDTs exist at all.
Part 13 continues with the algebraic foundations of CRDT design — how to invent a CRDT for a new data type, how to prove its convergence, and when no CRDT is possible at all (set difference, division — the non-monotonic operations CALM excludes). Part 17's geo-distribution chapters revisit the state-based-vs-op-based choice in the cross-region setting, where WAN characteristics make the trade-off significantly more skewed.
References
- Shapiro, M., Preguiça, N., Baquero, C., Zawirski, M. — "Conflict-free Replicated Data Types" (SSS 2011). The foundational paper; introduces both CvRDT and CmRDT formally.
- Shapiro, M. et al. — "A comprehensive study of Convergent and Commutative Replicated Data Types" (INRIA TR-7506, 2011). The long version with full proofs and a catalogue of CRDTs in both modes.
- Almeida, P., Shoker, A., Baquero, C. — "Delta State Replicated Data Types" (JPDC 2014). The optimisation that Akka and Riak adopted.
- Kleppmann, M., Beresford, A. — "A Conflict-Free Replicated JSON Datatype" (TPDS 2017). The Automerge precursor; op-based CRDT for nested JSON.
- Bieniusa, A. et al. — "An Optimized Conflict-free Replicated Set" (RR-7905, 2012). The OR-Set optimisation that most production OR-Sets use.
- Akka Distributed Data documentation — production-grade state-based CRDT library, with delta-CRDT support since 2017.
- Riak Data Types documentation — production-grade state-based CRDTs with delta optimisation.
- Automerge and Yjs documentation — the two major op-based CRDT libraries powering local-first apps.
- Wall: eventual consistency needs conflict resolution — the wall that CRDTs are designed to climb.
- Vector clocks — the mechanism op-based CRDTs use for causal ordering.