Note: Company names, engineers, incidents, numbers, and scaling scenarios in this article are hypothetical — even when they resemble real ones. See the full disclaimer.

G-Counter and PN-Counter

CricStream's "viewers watching now" badge during a cricket final is a single number — 27,418,302 — but no single machine ever computes it. Twelve regional edges each track their own viewer counts, gossip them every 250 ms, and the badge shows the sum of the latest known values from each edge. There is no leader. There is no Raft group. If the Mumbai edge briefly partitions from the rest of the world, its number keeps climbing inside Mumbai, and the moment connectivity returns, every other edge folds Mumbai's number back in without dropping a single increment. The data structure that makes this work — a counter that survives partitions, reorderings, and duplicate messages without ever counting the same viewer twice — is the G-Counter, and its bidirectional cousin the PN-Counter. They are the simplest CRDTs you will ever meet, and the entire field of conflict-free replicated data types builds on the trick they pull.

A G-Counter (grow-only counter) replicates an integer that only goes up. The state is a vector indexed by replica ID; replica i only writes its own slot; the value is the sum of the vector; merge is pointwise max. Convergence falls out for free because pointwise-max is idempotent, commutative, and associative. A PN-Counter (positive-negative counter) supports decrements by pairing two G-Counters — one for +, one for — and reading the difference. Together they show the entire CRDT trick: partition a problem into a join-semilattice and convergence becomes algebra, not coordination.

Why a global increment is harder than it looks

Start with the question CricStream's engineers actually faced: how do you count viewers across twelve regional edges with no central database? The naive design — a Redis INCR against one cluster — works until the Singapore→Mumbai link flaps, at which point Singapore's edge cannot increment the counter, the badge freezes for users routed through Singapore, and the on-call gets paged. Add a second Redis cluster for failover and you now have two counters that disagree by however many increments happened during the partition; reconciling them by MAX(a, b) loses increments and reconciling by a + b double-counts them.

The deep problem is that integer increment is not commutative across replicas with naive merging. If replica A sees value 100 and increments to 101, and replica B sees value 100 and increments to 101, then merges of "101" and "101" cannot tell whether the two increments are the same event seen twice (true value: 101) or two distinct events on different replicas (true value: 102). The integer has lost the information needed to merge correctly. You can fix this by sending operations instead of state — but then the network must guarantee exactly-once causal delivery, which is itself a distributed-systems problem (covered in state-based vs operation-based CRDTs).

The G-Counter sidesteps this by giving each replica its own slot in a vector. Replica A only writes vec[A]; replica B only writes vec[B]. Increments on A and B target disjoint memory, so they cannot collide. To read the global counter, sum the vector. To merge two vectors, take pointwise max — because each replica's slot is monotonically increasing on its owner, the larger value is always the more recent one for that slot.

Why pointwise-max works where plain-max on a single integer fails: the per-replica partitioning carries identity information that the integer threw away. When replica B receives vec_A = [5, 0, 0] and its local state is vec_B = [0, 3, 0], the merged [5, 3, 0] records "A has done 5 increments, B has done 3" — that is more information than the scalar 8 would carry, and it survives any number of duplicate or out-of-order messages.

Three-replica G-Counter convergence after a partitionThree replicas labelled A, B, C, each holding a vector of three slots. Initial state shows all replicas at vector [3,2,1] with sum 6. A partition isolates B from A and C for a window. During the partition: A increments three times to [6,2,1] (sum 9). B increments twice to [3,4,1] (sum 7). C increments once to [3,2,2] (sum 7). After heal, every pair merges via pointwise max producing [6,4,2] (sum 12), which is exactly the total of all increments. The diagram is illustrative. Three-replica G-Counter through partition + heal Before partition (t=0) A=[3,2,1] sum=6 B=[3,2,1] sum=6 C=[3,2,1] sum=6 During partition (t=1..5) A: +3 → [6,2,1] B: +2 → [3,4,1] C: +1 → [3,2,2] After heal (t=6) A=B=C=[6,4,2] sum = 12 = 6 (start) + 6 (during) Pointwise-max merge A's slot: max(6,3,3) = 6 B's slot: max(2,4,2) = 4 C's slot: max(1,1,2) = 2 Each slot only grows. Larger value is more recent (per replica). The three lattice laws Idempotent: s ⊔ s = s Commutative: s ⊔ t = t ⊔ s Associative: (s⊔t)⊔u = s⊔(t⊔u) Tolerates duplicate messages, reorderings, and partitions — all by construction. No coordination needed No locks No leader No consensus No quorum Just arithmetic and a vector. Illustrative — not measured.
The G-Counter survives partition + heal because per-replica slots are monotonically increasing and disjoint. Pointwise max is the join in the product-of-naturals lattice, and the three lattice laws (idempotence, commutativity, associativity) imply convergence regardless of message ordering or duplication.

The G-Counter as a join-semilattice

Formally, the G-Counter's state space is S = ℕ^n, the set of n-tuples of natural numbers indexed by replica ID. The partial order is componentwise ≤: s ≤ t iff s[i] ≤ t[i] for every i. The join is pointwise max: (s ⊔ t)[i] = max(s[i], t[i]). This is the canonical product-of-naturals lattice.

Three laws fall out of max:

  1. Idempotent: max(a, a) = a, so s ⊔ s = s. A replica that receives the same gossip message twice computes the same merge — no duplicate-detection needed.
  2. Commutative: max(a, b) = max(b, a), so s ⊔ t = t ⊔ s. The order in which a replica receives gossip from different peers does not matter.
  3. Associative: max(max(a, b), c) = max(a, max(b, c)), so (s ⊔ t) ⊔ u = s ⊔ (t ⊔ u). Grouping does not matter — a replica can merge two received states together before merging with its own.

These three together imply convergence: if every replica eventually receives every other replica's state at least once, every replica computes the same join. Why this is the entire theoretical contribution of state-based CRDTs: the convergence proof reduces to "max is idempotent, commutative, associative" — three one-line algebraic facts. There is no protocol, no quorum, no leader, no timeout, no log replication. The ordering problem that plagues distributed integer increment is dissolved by partitioning the integer into per-replica slots and using max as the merge.

The increment operation is the simplest possible: replica i does s[i] := s[i] + 1. Only replica i ever writes s[i], so this is a local operation requiring no coordination. The read operation is value(s) = Σ s[i] — sum across all slots. Both operations are O(n) where n is the number of replicas, which for a cluster of 12 edges is trivial.

PN-Counter: encoding decrements with two G-Counters

The G-Counter only goes up. But many real workloads — vote tallies, account balances, currency-of-rewards — need decrements too. The naive idea of allowing replica i to write s[i] := s[i] - 1 breaks the lattice: if A's slot reads 5 and you receive a gossip that says A's slot is 3, you cannot tell whether A decremented (so the new value is correct) or you received an out-of-order older message (so the old value is correct). Pointwise-max would silently keep the older value, losing the decrement.

The PN-Counter — Positive-Negative Counter — solves this by pairing two G-Counters, one tracking only positive increments (P) and one tracking only negative increments (N). Both counters are grow-only, so both are valid G-Counters with all the lattice properties intact. The visible value of the PN-Counter is value = sum(P) - sum(N). To decrement, replica i increments its slot in N. To merge two PN-Counters, merge P-vectors pointwise-max and N-vectors pointwise-max independently.

Why this trick generalises: any operation that can be decomposed into "monotonic actions on two grow-only side records" can be turned into a CRDT by pairing two G-Counter-like structures. The OR-Set (observed-remove set) uses the same idea — one grow-only set of (element, tag) pairs for additions, one grow-only set of tags for tombstones, with the visible set being the difference. The PN-Counter is the simplest example of this "pair two monotonic structures" pattern, which appears in the LWW-Register, the OR-Set, the OR-Map, and most production CRDTs.

The cost is double the storage and double the wire bytes — every PN-Counter ships two vectors instead of one. For the typical 5–20 replica cluster this is negligible; for a 1000-node mesh CRDT, the doubling adds up.

PN-Counter as two paired G-CountersThree replicas A, B, C, each holding a PN-Counter with two vectors P and N. Replica A: P=[5,0,0] N=[1,0,0] value=4. Replica B: P=[0,3,0] N=[0,1,0] value=2. Replica C: P=[0,0,2] N=[0,0,0] value=2. After merging via pointwise max on each of P and N independently: P=[5,3,2] N=[1,1,0] value = (5+3+2) - (1+1+0) = 10 - 2 = 8 = 4+2+2. The diagram is illustrative. PN-Counter = (G-Counter for +) − (G-Counter for −) Replica A P = [5, 0, 0] N = [1, 0, 0] value = 5 − 1 = 4 (5 inc, 1 dec on A) Replica B P = [0, 3, 0] N = [0, 1, 0] value = 3 − 1 = 2 (3 inc, 1 dec on B) Replica C P = [0, 0, 2] N = [0, 0, 0] value = 2 − 0 = 2 (2 inc, 0 dec on C) Merged state (after gossip) P = pointwise-max = [5, 3, 2] N = pointwise-max = [1, 1, 0] value = (5+3+2) − (1+1+0) = 10 − 2 = 8 = 4 (A) + 2 (B) + 2 (C). Decrements never get lost.
The PN-Counter pairs two grow-only G-Counters, one for increments and one for decrements. Each P and N vector merges by pointwise max independently; the value is the difference of sums. Decrements are encoded as growth in N, preserving the lattice property.

A runnable G-Counter and PN-Counter

The simulator below implements both counters in pure Python and runs a three-replica scenario with a partition. It demonstrates: (a) increments on isolated replicas during partition, (b) gossip-based pointwise-max merge after heal, (c) the PN-Counter handling decrements correctly even when delivered out of order.

# crdt_counters.py — G-Counter and PN-Counter with partition + heal
from dataclasses import dataclass, field
from typing import Dict
import copy, random

@dataclass
class GCounter:
    rid: str                 # this replica's ID
    state: Dict[str, int] = field(default_factory=dict)
    def inc(self, n: int = 1):
        self.state[self.rid] = self.state.get(self.rid, 0) + n
    def value(self) -> int:
        return sum(self.state.values())
    def merge(self, other: "GCounter") -> None:
        for k, v in other.state.items():
            self.state[k] = max(self.state.get(k, 0), v)

@dataclass
class PNCounter:
    rid: str
    P: GCounter = None  # filled in __post_init__
    N: GCounter = None
    def __post_init__(self):
        if self.P is None: self.P = GCounter(self.rid)
        if self.N is None: self.N = GCounter(self.rid)
    def inc(self, n: int = 1): self.P.inc(n)
    def dec(self, n: int = 1): self.N.inc(n)
    def value(self) -> int: return self.P.value() - self.N.value()
    def merge(self, other: "PNCounter") -> None:
        self.P.merge(other.P)
        self.N.merge(other.N)

def gossip(replicas, edges):
    # Each pair of connected replicas exchanges full state (state-based gossip).
    for a, b in edges:
        snap_a = copy.deepcopy(replicas[a])
        snap_b = copy.deepcopy(replicas[b])
        replicas[a].merge(snap_b)
        replicas[b].merge(snap_a)

if __name__ == "__main__":
    # PN-Counter scenario: viewer count where users can also leave (decrements).
    A, B, C = PNCounter("A"), PNCounter("B"), PNCounter("C")
    replicas = {"A": A, "B": B, "C": C}
    fully_connected = [("A","B"), ("A","C"), ("B","C")]
    partition_AB = [("A","B")]   # C is isolated

    # t=0: everyone in sync
    A.inc(3); B.inc(2); C.inc(1)
    gossip(replicas, fully_connected)
    print(f"t=0  A={A.value()} B={B.value()} C={C.value()} (start)")

    # t=1..3: PARTITION — A,B stay connected, C is alone
    A.inc(5); B.inc(2); B.dec(1); C.inc(4); C.dec(2)
    gossip(replicas, partition_AB)
    print(f"t=3  A={A.value()} B={B.value()} C={C.value()} (during partition)")

    # t=4: heal — full gossip resumes
    gossip(replicas, fully_connected)
    print(f"t=4  A={A.value()} B={B.value()} C={C.value()} (after heal)")

    # Sanity: every replica should converge to (3+2+1)+(5+2-1+4-2) = 6+8 = 14
    assert A.value() == B.value() == C.value() == 14
    print("converged: OK")

Sample run:

t=0  A=6 B=6 C=6 (start)
t=3  A=10 B=8 C=8 (during partition)
t=4  A=14 B=14 C=14 (after heal)
converged: OK

Walk through the four key moments:

  • t=0 baseline. All three replicas gossip and converge on value = 6 (3+2+1 increments). Both P and N vectors are [3,2,1] and [0,0,0] respectively on every replica.
  • During the partition (t=1..3). A and B can still gossip with each other; C is alone. A increments by 5, B does +2 and −1 (net +1), C does +4 and −2 (net +2). After the A↔B-only gossip, A and B both see each other's writes (A.value() == B.value() == 10), but neither sees C's. C cannot see A or B either, so it sits at 8 (its starting 6 + its local net +2).
  • After heal (t=4). A full-gossip round merges all three replicas. The P vector becomes [8, 4, 5] (3+5 on A, 2+2 on B, 1+4 on C); the N vector becomes [0, 1, 2] (0 decs on A, 1 dec on B, 2 decs on C); value = (8+4+5) − (0+1+2) = 17 − 3 = 14. Why the merge gets the right answer despite gossip arriving in arbitrary order: every increment is recorded in exactly one slot (the originating replica's slot) and only ever grows that slot. Pointwise-max picks the largest value seen for each slot — which, by the disjoint-writes invariant, is the latest value on the originating replica. The post-heal sum is therefore identical to the local-only sum each replica would have computed if it had observed every write directly.
  • Idempotence in action. Run the final gossip(replicas, fully_connected) ten more times — the values do not change. s ⊔ s = s is enforced by the max in every merge.

The same code with PNCounter swapped for GCounter (and dec() calls removed) gives the grow-only variant. The minimal divergence between the two implementations — the second G-Counter for the negative side — is the entire conceptual leap from "increments only" to "increments and decrements".

Production stories: who actually uses these

Akka Distributed Data — GCounter and PNCounter are first-class. Akka's CRDT library, used by Java/Scala teams running Akka clusters, ships both. The default merge interval is 100 ms; the cluster uses delta-CRDT optimisation (introduced in 2017) so each gossip round only ships the changed slot, not the full vector. A 100-node Akka cluster running a PNCounter for cross-region rate-limiting metrics typically uses 200–400 bytes per gossip round per replica pair, dominated by the ActorRef keys, not the counter state itself.

Riak's riak_dt_pncounter — Basho's CRDT library, productionised by Riak 2.0 in 2014. Riak made one notable refinement: the per-replica slot key is not the literal node name (which can change if a node is replaced) but a stable actor ID persisted to disk at first registration. This sidesteps the "node renamed → its slot is now a fresh zero" pitfall, where a replica restart could create a fresh empty slot and lose its history.

Redis Enterprise CRDB uses PN-Counters under the hood for INCR/DECRBY operations across active-active geo-distributed replicas. The user-facing API looks like a regular Redis counter; underneath, every INCR writes to the local region's slot, every DECRBY writes to the local region's negative slot, and inter-region replication merges via pointwise max.

KapitalKite's order-counter saga. KapitalKite — fictional discount stockbroker — built an internal "open-orders" counter to surface a per-customer dashboard tile ("you have 3 open orders"). The first version used a single Postgres counter with row-level locking. During the 9:15 AM market open, the counter became the hottest row in the database; p99 latency on the dashboard hit 2.4 seconds. The team rewrote the counter as a regional PN-Counter — three regions, one slot per region, gossip every 500 ms — and watched the p99 drop to 35 ms. The trade-off: the dashboard might briefly show 3 when the truth is 4 if a write just happened in another region. Their PM accepted that staleness window because "no customer can see another customer's open-orders count anyway, so the cross-region staleness for a customer's own count is bounded by their own region's gossip cadence — sub-second worst case".

The viewer-counter on CricStream during the 2025 final. Twelve edges, each running a G-Counter (no decrements — viewers leaving was modelled by separate "active connection" tracking), gossiping every 250 ms. Peak measurement: 27.4M concurrent viewers, with edge-to-edge convergence under 800 ms p99 even when the Mumbai↔Singapore link suffered three brief outages totalling 14 seconds. Zero double-counted viewers. The badge update lag was bounded entirely by the gossip cadence, not by any consensus protocol.

Common confusions

  • "A G-Counter is just a Redis INCR." It is not. A Redis INCR is a serialised operation against a single replica; if that replica is down or partitioned, the increment cannot happen. A G-Counter is a vector replicated across N nodes; each node accepts increments locally with no coordination, and the value is the sum after gossip-based merge. The G-Counter survives partitions; INCR against a single Redis node does not.
  • "Pointwise-max loses information." It does, but only information that the lattice considers irrelevant. If s ≤ t (s is dominated by t), then s ⊔ t = t discards s — but s carries no new information beyond what t already has. The lattice formalism is precisely the framework that lets you prove the discarded information is redundant.
  • "PN-Counters can go negative, so they're a generalisation of integers." PN-Counters can represent any integer, but their state is still grow-only — both P and N only increase. The visible difference can be negative, but you cannot "undo" a previous dec by issuing an inc in a way that cancels memory; both increments are retained in their respective vectors.
  • "You can shrink a G-Counter when an old replica leaves the cluster." Naively, no. Removing replica R's slot from every replica's state is not a lattice operation; it cannot be expressed as a join. Real systems handle this via a separate "tombstoned replicas" mechanism (Riak's riak_dt_pncounter does this), or accept the unbounded growth.
  • "G-Counters are commutative because addition is commutative." Subtle but wrong. The G-Counter's increment operation is commutative because each replica writes only its own slot — disjoint writes commute trivially. The merge operation is commutative because max is. The two commutativities are different — and the fact that both hold is what makes G-Counters work.
  • "G-Counter and PN-Counter need vector clocks." They do not. Each slot is a plain integer that only its owning replica increments, so the per-slot ordering is local-monotonic without any logical-clock infrastructure. Vector clocks become necessary in op-based CRDTs (where ops must be causally ordered) and in CRDTs that need to detect concurrent updates to the same logical element (like LWW-Registers and OR-Sets).

Going deeper

The product-of-naturals lattice and why max is the join

A G-Counter's state space ℕ^n is the n-fold product of the natural-numbers lattice. The natural-numbers lattice has as its partial order and max as its join — max(a, b) is the least upper bound of a and b in (ℕ, ≤). Products of lattices are themselves lattices, with the join applied componentwise. So pointwise-max is the join in ℕ^n. The fact that the join is computable in O(n) (just walk the vector) is what makes the G-Counter cheap. More elaborate CRDTs (the OR-Set, the OR-Map) are joins on richer lattices — (P(V × T), ⊆) for the OR-Set's elements set — but the principle is identical: pick a lattice, define the merge as the join, and convergence falls out.

Delta-state G-Counters — paying only for the changes

The pure state-based G-Counter ships the entire vector on every gossip round, costing O(n) bytes per round. For a 1000-node cluster gossiping every 100ms, that is 1000 bytes × 1000 nodes × 10 rounds/sec × n peers — easily MB/sec of overhead. The 2014 "Delta-state CRDTs" paper by Almeida, Shoker, and Baquero introduced an optimisation: each update produces a delta — the join-irreducible fragment that captures the change — and only the delta is shipped. For a G-Counter, the delta is {rid: new_value} (a single-key dict) rather than the full vector. Convergence is preserved because deltas merge into the lattice using the same join. Akka adopted delta-CRDTs in 2017 as the default; a delta-G-Counter ships single-slot updates instead of full vectors, dropping bandwidth by 1–2 orders of magnitude in steady state.

The slot-explosion problem in dynamic clusters

Real clusters add and remove replicas. Every new replica gets a new slot in the vector. Removed replicas' slots stick around (their values cannot be safely reduced — removing them would not be a join operation). Over months, a cluster that has cycled through 10,000 short-lived nodes accumulates a 10,000-slot vector per counter, even though only 100 nodes are active at any time. Riak handles this via actor compaction: periodically, an authoritative coordinator collapses tombstoned-actor slots into a single "compacted" slot, after a delay that ensures every live replica has merged in those values. The compaction is a non-CRDT operation (it does require coordination), but it runs rarely. Without compaction, G-Counter and PN-Counter state grows unbounded in churning clusters.

Sticky inflation under poor randomness

Some implementations let replicas "claim" a slot by hashing their own ID. If two replicas hash to the same slot (collision), both write to the same slot, breaking the disjoint-writes property — and the counter starts double-counting. Production CRDT libraries (Akka, Riak) use UUIDs as actor IDs precisely to make collisions astronomically unlikely. A homegrown CRDT that uses hostname.split('.')[0] as the slot key will collide the moment two different nodes happen to share a hostname prefix.

Reproduce this on your laptop

python3 -m venv .venv && source .venv/bin/activate
pip install simpy   # not strictly needed for this script, but useful to extend
python3 crdt_counters.py
# Try the variations:
# - Add a fourth replica D mid-run, increment on it, gossip — watch the slot appear
# - Run gossip 100 times in a row at the end — observe value never changes (idempotence)
# - Swap PNCounter for GCounter and remove dec() calls — same convergence pattern

Where this leads next

G-Counter and PN-Counter are the two simplest CRDTs and the gateway to Part 13:

Part 13 continues with richer CRDTs (sets, maps, sequences, JSON), the bounds of what CRDTs can model (no general set-difference, no division), and how production systems like Riak, Akka, Automerge, and Yjs ship them. Part 17's geo-distribution chapters revisit G-Counter and PN-Counter in the cross-region setting, where WAN gossip cadence and inter-region partition windows make the convergence-time analysis much more interesting.

References

  • Shapiro, M., Preguiça, N., Baquero, C., Zawirski, M. — "Conflict-free Replicated Data Types" (SSS 2011). The foundational paper; G-Counter and PN-Counter are introduced in §3.
  • Shapiro, M. et al. — "A comprehensive study of Convergent and Commutative Replicated Data Types" (INRIA TR-7506, 2011). Long version with full convergence proofs and a CRDT catalogue.
  • Almeida, P., Shoker, A., Baquero, C. — "Delta State Replicated Data Types" (JPDC 2014). The optimisation Akka and Riak adopted to ship deltas instead of full state.
  • Akka Distributed Data documentation — production-grade GCounter and PNCounter with delta-CRDT support.
  • Riak Data Types documentation — riak_dt_gcounter, riak_dt_pncounter; the actor-ID stability and compaction story.
  • Roh, H.-G. et al. — "Replicated Abstract Data Types: Building Blocks for Collaborative Applications" (JPDC 2011). Counters and registers in collaborative-editing context.
  • Bieniusa, A. et al. — "Brief Announcement: Semantics of Eventually Consistent Replicated Sets" (DISC 2012). Set CRDTs and how the PN-Counter's "pair two monotonic structures" pattern generalises.
  • State-based vs operation-based CRDTs — the two propagation modes for these counters.
  • Eventual consistency — the consistency model G-Counter and PN-Counter satisfy.