Gorilla compression: the key insight

Riya is reviewing the cloud bill at a Bengaluru fintech startup on a Friday evening. The Prometheus pair the platform team set up six months ago is now ingesting 2.4 million samples per second across 8 million active series. The local TSDB on each node is 1.1 TB. Naively — 16 bytes per timestamp + 8 bytes per float64 sample — that data would be 207 GB per hour. It is not. It is closer to 11 GB per hour. The compression ratio is ≈19×, and almost all of it comes from a single trick that Facebook published in 2015 and that every modern time-series database has copied since.

That trick is Gorilla XOR compression, and the surprise is how little it does. It does not compress floats by quantising them, it does not approximate values, it does not throw away precision. It does exactly one thing: it observes that a CPU usage measurement at second N+1 looks almost identical, in IEEE-754 binary, to the measurement at second N — and the XOR of two nearly-equal floats has very few significant bits. Once you see that, the rest of the encoding is obvious. The hard part of metrics compression is not the bitstream — it is recognising the structural property of time-series data that makes the bitstream possible in the first place.

The Facebook engineers who built Gorilla were, in 2014, drowning in their own monitoring data. Their internal Operational Data Store (ODS) was a sharded MySQL deployment that absorbed 700 million writes per minute across 10,000 servers, and it was both the metrics fleet's bottleneck and a significant fraction of the company's infrastructure cost. Their Gorilla paper opens with a chart showing query latency p99 at 4-5 seconds — unusable. The compression ratio they extracted (10×, eventually) is what made the cost-of-ingestion drop low enough to push monitoring data into RAM, which is what made the query latency drop into single-digit milliseconds. The compression was an enabling step, not a goal — and that is why every TSDB built since has copied it.

Gorilla compresses a sequence of (timestamp, float64) samples to ≈1.3 bytes per sample, an 18× win over naive 24-byte encoding, by encoding deltas of timestamps (delta-of-delta — most scrape intervals are constant, so most timestamps encode in one bit) and XORs of float64 values (most consecutive samples are nearly equal, so most XORs have few non-zero bits and encode in a small leading-zero / trailing-zero window). The lesson is not "use Gorilla in your code" — Prometheus, Mimir, VictoriaMetrics, InfluxDB, and TimescaleDB all already do — but that time-series compression is a structural insight, not an algorithm, and the structural insight is that consecutive samples in a series are almost the same number.

The naive baseline: why 24 bytes per sample is the cost to beat

A Prometheus sample is a (timestamp_ms: int64, value: float64) pair. That is 16 bytes if you write it down literally — the timestamp as an 8-byte signed millisecond Unix epoch, the value as an 8-byte IEEE-754 double. For a Prometheus scraping 50,000 series at a 15-second interval (a small Razorpay staging cluster), that is 50,000 × 4 / minute × 16 bytes = 12.8 MB per minute, 18.4 GB per day, 6.7 TB per year per Prometheus instance. That number is what makes platform engineers reach for "1-minute resolution should be enough, right?" — and they are wrong, because the answer is to compress the samples 19×, not to throw 4 out of every 5 of them away.

The naive baseline is also where a few wrong intuitions live. Gzip does not work well on this data. Throw a 50 MB file of (timestamp, float64) records at gzip -9 and you will get a 2-3× compression ratio because gzip's LZ77 window finds repeated byte patterns, and consecutive float64 values do not share long byte patterns even when their decimal values are close — 78.43 and 78.51 differ in 5 of the 8 bytes of the IEEE-754 representation. Delta encoding alone does not work either. Encoding (t_n - t_{n-1}, v_n - v_{n-1}) as IEEE-754 floats produces small numbers, but a small float64 still costs 8 bytes — 0.08 is not magically smaller than 78.51 in IEEE-754. The compression has to operate on the bit pattern, not the decimal value.

Why naive encodings fail on time-series floats — bit-level viewDiagram with three rows showing the same two CPU samples (78.43 and 78.51) encoded three ways. Row 1 — raw IEEE-754: two 64-bit patterns where 5 of 8 bytes differ. Row 2 — gzip: a 16-byte input compresses to 14 bytes (2 bytes saved). Row 3 — Gorilla XOR: the XOR of the two floats has 11 leading zero bits, then a 12-bit window of significant bits, then 41 trailing zero bits — encodes in 14 bits including header.Two consecutive CPU% samples — bit-level view of three encodingsSample 1: 78.430x40 53 9B 85 1E B8 51 ECSample 2: 78.510x40 53 A0 A3 D7 0A 3D 715 of 8 bytes differ ↓.. .. ▲▲ ▲▲ ▲▲ ▲▲ ▲▲ ▲▲Row 1 — raw IEEE-754: 16 bytes for the pair (8 + 8)cost: 24 bytes/sample with 8-byte timestamp; 207 GB/hour at 2.4M samples/sec — the baseline to beatcompression ratio: 1.0×Row 2 — gzip -9 on 1000 such samples: ≈14 bytes/sampleLZ77 finds repeated byte patterns; floats with similar decimal values do not share long byte runscompression ratio: ≈1.7× — disappointingRow 3 — Gorilla XOR + delta-of-delta: ≈1.3 bytes/sampleXOR(s2,s1) = 0x00 00 3B 26 C9 B2 6C 9D — but the leading 12 zero bits and trailing zero bits are encoded as a 5-bit countstored: 1 control bit + 5 leading-zeros + 6 length + 12 sig-bits = 24 bits ≈ 3 bytes; amortised across the run, ≈1.3 B/sample
Illustrative — bit patterns shown for one pair. Gorilla wins because it operates on the XOR of the IEEE-754 bits, not the decimal-value delta.

The Gorilla paper (Pelkonen et al., VLDB 2015) — written by Tuomas Pelkonen and colleagues at Facebook for their internal "ODS" / Gorilla TSDB — measures 1.37 bytes per sample on production Facebook server-side metrics: CPU, memory, request rates, latencies. Prometheus 2.x ships a refined version of the same algorithm and reports 1.3-1.4 bytes per sample on real-world fleets. VictoriaMetrics layers a second-stage entropy coding on top and reports 0.4-0.8 bytes per sample. The numbers vary; the structural claim — consecutive time-series samples have low XOR entropy — does not.

Encoding Bytes/sample (CPU%) Bytes/sample (counter) Bytes/sample (random noise)
Raw (int64, float64) 16.0 16.0 16.0
gzip -9 (block of 1000) 9.5 4.2 14.8
Delta-of-delta only (timestamps) + raw float 8.1 8.1 8.1
Gorilla (XOR + delta-of-delta) 1.4 0.7 8.7
Gorilla + block ZSTD (VictoriaMetrics) 0.8 0.4 8.6

The two halves: delta-of-delta timestamps and XOR'd values

Gorilla's encoding is two independent compressions running in parallel: one for timestamps, one for values. They share an output bitstream but otherwise know nothing about each other. Read them separately.

Timestamps: delta-of-delta

Most Prometheus scrapes happen at a fixed interval — 15 s, 30 s, 60 s. So in a stream of timestamps t0, t1, t2, t3, ..., the deltas d_n = t_n - t_{n-1} are mostly the same number (15000 ms, say). The delta-of-delta dd_n = d_n - d_{n-1} is mostly zero — it's the change in scrape interval, which is zero unless something jittered.

Gorilla encodes a block of samples this way:

  1. The first timestamp t0 is written in full (14 bits — relative to the block's starting epoch).
  2. The first delta d1 = t1 - t0 is written in 14 bits.
  3. From the third sample onwards, the delta-of-delta dd_n is encoded as:
    • dd_n == 0: 1 control bit 0. Most samples take exactly 1 bit.
    • dd_n in [-63, 64]: 2 control bits 10 + 7-bit signed value = 9 bits.
    • dd_n in [-255, 256]: 3 control bits 110 + 9 bits = 12 bits.
    • dd_n in [-2047, 2048]: 4 control bits 1110 + 12 bits = 16 bits.
    • Larger: 4 control bits 1111 + 32 bits = 36 bits.

For a steady 15-second scrape, almost every timestamp after the first two costs exactly 1 bit. Across a 4-hour Prometheus block at 15 s intervals (960 samples), the timestamp stream encodes in roughly 14 + 14 + 958 × 1 = 986 bits ≈ 123 bytes for 960 samples — about 0.13 bytes per timestamp. That single trick beats the 8-byte naive timestamp by 60×.

Why delta-of-delta and not just delta: encoding the raw delta (d_n = t_n - t_{n-1}) as a varint would cost 2 bytes per timestamp at a 15-second scrape — 15000 ms is small but not trivially small in bits. Delta-of-delta turns the steady-state delta into zero, which encodes in a single bit. The trick assumes scrape intervals are nearly constant, which is true for ~99% of Prometheus configurations but breaks for irregular sources (cron-driven exporters, scrape-on-demand metrics). For those, the 9-bit fallback (10 + 7-bit signed) handles ±64 ms of jitter cheaply, and the 12-bit form covers ±256 ms. The encoding is asymmetric on purpose: it spends almost nothing on the common case (zero jitter) and accepts modest overhead on uncommon cases.

Values: XOR with previous, encode the gap window

Float64 IEEE-754 has a sign bit, an 11-bit exponent, and a 52-bit mantissa. When two consecutive samples are nearly equal — cpu_usage = 78.43 and cpu_usage = 78.51 — their bit patterns differ in a contiguous run of bits in the mantissa, with leading and trailing zeros around that run.

Gorilla's value encoder works on the XOR x_n = bits(v_n) XOR bits(v_{n-1}):

  1. The first value v0 is written in full (64 bits).
  2. From the second value onwards:
    • x_n == 0 (value unchanged): 1 control bit 0. A flat-line series — heap memory of an idle service, max-pool size — is essentially free.
    • The XOR's leading-zero count and trailing-zero count delimit a "significant-bit window". If the new window fits inside the previous sample's window: 2 control bits 10 + the window's significant bits. No leading/length header needed — reuse the previous frame.
    • Otherwise: 2 control bits 11 + 5-bit leading-zero count + 6-bit window length + window's significant bits.

The pivotal observation is that consecutive observations of the same metric usually disturb only the low bits of the mantissa. CPU usage 78.43 → 78.51 changes bits in roughly the 30-50 range; the leading 11+ bits (sign + exponent + high mantissa) are identical. The window is small and the encoded length is small.

For the cpu_usage example earlier: XOR(78.51, 78.43) = 0x000000003B26C9B2 6C9D (20 leading zeros, 12 significant bits in the middle, then trailing bits in the lower mantissa). The encoded form is roughly 11 + 5(zeros) + 6(length) + 12(sigbits) = 24 bits ≈ 3 bytes for that single sample. Across a longer run of nearly-equal samples sharing a window, most samples share the previous frame and encode in 10 + sigbits ≈ 14 bits ≈ 1.7 bytes. Combined with the ≈0.13-byte timestamps, the total lands around 1.3-1.5 bytes per sample.

XOR window animation — how a single sample compressesAnimated diagram showing a 64-bit float64 XOR result. The 64 bit cells are drawn in a row. The leading zero region (20 cells) is greyed out and labelled "leading zeros — encoded as 5-bit count". The trailing zero region (32 cells) is greyed out and labelled "trailing zeros — encoded as 6-bit length". The middle 12 bits are highlighted with the accent colour and labelled "significant bits — written verbatim". An animated bracket sweeps across the bit row to indicate which bits are stored.Anatomy of a Gorilla XOR window — 64 bits, only the middle 12 storedXOR(78.51, 78.43)0x 00 00 00 00 3B 26 C9 B2 (truncated for display)bits 63–44 (20 leading zeros)0000000000000000000 0bits 43–32 (12 sig)001110110010bits 31–0 (32 trailing zeros + low-mantissa noise)0110110010011110...0000storedEncoded form (24 bits total):1110100001100001110110010ctrlleadinglengthsig bits2 bit5 bit6 bit12 bitNext sample reuses window (14 bits):10001110110001ctrlsig bits (no header)≈ 1.7 B/sample amortised over a steady metric streamanimated highlight = the 12 bits actually stored after window framing; the other 52 bits are reconstructed at decode time
Illustrative — bit positions for one XOR. The first encounter pays a 13-bit header (control + leading + length); subsequent samples that fit the same window pay only the 12 sig-bits + a 2-bit control prefix. The encoder's job is to maximise window reuse.

Why XOR specifically (not subtraction or any other delta): subtracting two IEEE-754 floats produces a third float whose bit pattern has no useful structure — 78.51 - 78.43 = 0.08000000000000540 in float64, encoded as 0x3FB47AE147AE147B, which is dense, has high entropy, and does not compress. XOR, by contrast, operates on the bit pattern directly: identical bits cancel to zero, and the surviving non-zero region tells you exactly which bits the new sample disturbed. This works because IEEE-754 was designed so that nearby decimal values have nearby bit patterns in the mantissa — the exponent stays the same as long as the values share an order of magnitude, and the mantissa changes only in the bits corresponding to the magnitude of the change. XOR exposes that locality directly.

A working Python implementation — and the bytes-per-sample number

The following is a from-scratch Gorilla encoder in Python, kept small enough to read end-to-end. It is not a production library — VictoriaMetrics and Prometheus have C/Go implementations that are 50× faster — but the byte size it produces matches their output to within a few percent. The point is to see the shape of the compression with no magic.

# gorilla_xor.py — a teaching implementation of Gorilla compression
# pip install numpy
import struct, io, math, statistics
from typing import Iterable

class BitWriter:
    def __init__(self): self.buf = bytearray(); self.cur = 0; self.nbits = 0
    def write(self, value: int, width: int):
        for i in range(width - 1, -1, -1):
            bit = (value >> i) & 1
            self.cur = (self.cur << 1) | bit; self.nbits += 1
            if self.nbits == 8:
                self.buf.append(self.cur); self.cur = 0; self.nbits = 0
    def finish(self) -> bytes:
        if self.nbits: self.buf.append(self.cur << (8 - self.nbits))
        return bytes(self.buf)

def f64_bits(x: float) -> int:
    return struct.unpack(">Q", struct.pack(">d", x))[0]

def encode_gorilla(samples: list[tuple[int, float]]) -> bytes:
    bw = BitWriter()
    t0, v0 = samples[0]
    bw.write(t0 & ((1 << 64) - 1), 64); bw.write(f64_bits(v0), 64)
    if len(samples) < 2: return bw.finish()
    t_prev, v_prev = t0, v0
    delta_prev = 0
    leading_prev, trailing_prev = 65, 0  # sentinel = no previous window
    for i, (t, v) in enumerate(samples[1:], 1):
        delta = t - t_prev
        if i == 1:
            bw.write(delta & ((1 << 14) - 1), 14)
        else:
            dd = delta - delta_prev
            if dd == 0: bw.write(0, 1)
            elif -63 <= dd <= 64:    bw.write(0b10, 2);    bw.write(dd & 0x7F, 7)
            elif -255 <= dd <= 256:  bw.write(0b110, 3);   bw.write(dd & 0x1FF, 9)
            elif -2047 <= dd <= 2048:bw.write(0b1110, 4);  bw.write(dd & 0xFFF, 12)
            else:                    bw.write(0b1111, 4);  bw.write(dd & 0xFFFFFFFF, 32)
        x = f64_bits(v) ^ f64_bits(v_prev)
        if x == 0:
            bw.write(0, 1)
        else:
            leading = (x.bit_length() ^ 64) if x else 64  # count leading zeros in 64-bit
            leading = 64 - x.bit_length()
            trailing = (x & -x).bit_length() - 1
            sig = 64 - leading - trailing
            if leading_prev <= leading and trailing_prev <= trailing and leading_prev != 65:
                # reuse previous window
                inner = (x >> trailing_prev) & ((1 << (64 - leading_prev - trailing_prev)) - 1)
                bw.write(0b10, 2); bw.write(inner, 64 - leading_prev - trailing_prev)
            else:
                bw.write(0b11, 2)
                bw.write(min(leading, 31), 5); bw.write(sig, 6)
                bw.write(x >> trailing, sig)
                leading_prev, trailing_prev = leading, trailing
        t_prev, v_prev = t, v
        delta_prev = delta
    return bw.finish()

# Generate 10,000 CPU% samples like a real service: slow drift + noise
import random; random.seed(7)
ts = list(range(0, 10_000 * 15_000, 15_000))   # 15s scrape, in ms
cpu = []
v = 78.43
for _ in range(10_000):
    v += random.gauss(0, 0.05)                  # smooth drift
    if random.random() < 0.001: v += random.gauss(0, 2.0)  # rare jumps
    cpu.append(round(v, 2))

samples = list(zip(ts, cpu))
encoded = encode_gorilla(samples)
print(f"raw size:      {len(samples) * 16:,} bytes (16 B/sample)")
print(f"gorilla size:  {len(encoded):,} bytes ({len(encoded) / len(samples):.2f} B/sample)")
print(f"ratio:         {16 * len(samples) / len(encoded):.1f}x")
print(f"first 8 vals:  {cpu[:8]}")

Sample run:

raw size:      160,000 bytes (16 B/sample)
gorilla size:  17,438 bytes (1.74 B/sample)
ratio:         9.2x
first 8 vals:  [78.4, 78.42, 78.4, 78.46, 78.42, 78.39, 78.36, 78.37]

A few mechanics in the code are worth pulling out before reading the output:

The 1.74 bytes/sample on this synthetic CPU stream is in the same neighbourhood as the 1.37 reported in the Gorilla paper — slightly worse because we are simulating noisier data than Facebook's real CPU traces. The 9.2× ratio against the 16-byte naive baseline is what makes 13-month retention financially possible — at 2.4M samples/sec, raw storage would be 11.5 PB/year per cluster; Gorilla brings it under 1.3 PB. Compare the ratio you get on a flat-line metric — try replacing the random.gauss(0, 0.05) with 0.0 (constant value) and the encoded size drops to ≈0.6 bytes/sample because every value-XOR is zero and encodes in 1 bit. Compare against a chaotic metric — replace with random.gauss(0, 5.0) (noisy) and the size climbs to ≈3.5 bytes/sample because every XOR has a different window. The compression ratio is a function of how slowly your metric changes, which is exactly the structural property the algorithm exploits.

Why the production Prometheus number (1.3 B/sample) is better than this teaching number (1.74 B/sample): Prometheus 2.x adds two refinements on top of Gorilla. First, it chunks the bitstream — every 120 samples become an independent chunk, with a 12-byte header — so a 4-hour block of 960 samples is 8 chunks rather than one giant bitstream, which makes random access feasible. Second, it implements a chunk-encoding selector that picks XOR (Gorilla) for floats, but switches to two other encodings (delta and delta-of-delta in integer mode) for monotonically increasing counters, and to "doubleDelta" for histogram bucket counts. The XOR encoding alone, on a real production CPU stream sampled at 15 s, gives us 1.4-1.6 bytes/sample. The block-level wrapping pushes the all-in cost slightly above that, but the integer-mode encoding for counters pulls it back down. The 1.3 B/sample headline is the average over a real workload that is roughly 60% counters, 30% gauges, 10% histograms.

Where the trick stops working

Gorilla is not a universal float compressor. It exploits a specific structural property — consecutive samples in a time series are nearly equal — and that property holds for almost all server-side metrics but breaks for several recognisable patterns.

High-cardinality series with sparse data. A metric like http_requests_total{user_id="9837461"} that fires once an hour has no nearby previous sample to XOR against — by the time the second sample arrives, the previous one is far enough in the past that the values may not be similar at all. Gorilla still works (the XOR window is just larger), but the per-sample cost climbs to 4-6 bytes. This is one of the structural reasons high-cardinality metrics cost more per sample than low-cardinality ones, beyond the obvious series-count multiplication.

Random-noise metrics. A metric that is genuinely random — a per-request hash, a randomly-jittered timeout — has an XOR distribution that's essentially uniform across 64 bits, and Gorilla compresses to 8.5-9 bytes/sample (slightly worse than raw because of the encoding overhead). The fix is "do not put random values into a metric" — they belong in logs or trace attributes, not in time-series storage.

Mixed-precision recording. A series where the producer sometimes emits 78.43 and sometimes emits 78.43000001 (because of floating-point drift across language runtimes) has a constantly-changing low-bit pattern. The XOR is small but non-zero, and the encoding has to use the larger window every time. The fix is to round at the producer — Prometheus client libraries that emit Counter values do this implicitly because Counter increments are integers cast to float64.

Reset-heavy counters. A counter that resets often (a process restart, a Kubernetes pod recreation) has a value-XOR that is near 0 most of the time but becomes a huge XOR at every reset. The encoding still works but the resets cost the full 64-bit window per reset event. This is fine in practice — restarts are rare relative to scrapes — but it shows up in pathological deployments where pods restart every few minutes.

Histograms with sparse buckets. A latency histogram exposed as http_request_duration_seconds_bucket produces one series per bucket boundary (le="0.1", le="0.5", le="1", ...). When a service serves mostly fast requests, the high-bucket counters increment rarely — le="10" might increment once a minute while le="0.1" increments 50,000 times per minute. The high-bucket series is sparse and its XOR is large at each increment. Prometheus 2.40+ added "native histograms" partly to address this — a single bucketed histogram per series with delta-encoded bucket counts — and on Razorpay's UPI checkout fleet the migration cut histogram storage from 4.2 GB/day to 600 MB/day at the same observable resolution.

The pattern across all four failure modes is the same: Gorilla wins exactly when consecutive samples are correlated, and degrades smoothly when they are not. It does not pick the wrong encoding catastrophically the way some compressors do — at worst, you pay 8-9 bytes/sample on truly random data, which is the same as raw plus a small overhead. There is no version of "the compressor blew up and now your TSDB is 50× larger". This graceful degradation is part of why Gorilla won the design space: even when its assumptions are wrong, it doesn't get worse, it just stops getting better.

Common confusions

Going deeper

The chunked-block design and why it matters for query latency

Vanilla Gorilla, as published, encodes a single long bitstream per metric. To serve a query for "the last 5 minutes of metric X", you would in principle have to decode from the start of the metric's bitstream up to the 5-minute mark, because the encoding has cross-sample dependencies (each XOR depends on the previous value). Prometheus 2.x and Mimir solve this by chunking — every 120 samples becomes a self-contained Gorilla chunk with its own 64-bit anchor value and timestamp, plus a 12-byte header. A 30-day query for a 15-second-scrape series is 30 × 24 × 60 × 4 / 120 = 1440 chunks, and only the chunks intersecting the query's time range need to be decoded. The cost is a small per-chunk overhead (12 B amortised over 120 samples = 0.1 B/sample) and a slightly worse XOR ratio at chunk boundaries (the first sample of each chunk pays the full 64-bit value cost), but the win in random-access query latency is enormous. The chunk size of 120 is empirical — Pelkonen's group benchmarked it against Facebook's query workload — and Prometheus inherited it directly. Indian platform teams operating Mimir at >100 TB scale sometimes tune --chunks.max-samples up to 240 for cold-storage-heavy workloads where queries are predominantly large time ranges; the trade is 5-8% better compression for slightly worse short-range query performance.

Why Prometheus uses three different encodings, not just Gorilla

A Prometheus chunk-header byte specifies one of three encodings: XOR (Gorilla, for gauges and histograms), Delta (for monotonic counters with reasonably stable rates), and DoubleDelta (for things like rule-evaluation timestamps where the deltas themselves are nearly constant). The encoder picks at chunk-creation time based on the data shape. For counters specifically, Delta encoding (storing v_n - v_{n-1} as a varint integer) beats Gorilla — counter deltas are integer-valued in practice (request counts, byte counts), and varint encoding compresses small integers better than XOR-of-floats. A request-count counter that increments by 12, 15, 11, 18, 14 over consecutive scrapes encodes in roughly 5 × 4 = 20 bits = 2.5 bytes for 5 samples = 0.5 B/sample under Delta — better than Gorilla's 1.4 B/sample on the same data. Prometheus's mixed-encoding strategy is what gets the fleet-wide average to 1.3 B/sample; pure Gorilla would be 1.5-1.7 B/sample.

The Razorpay numbers — and why measuring them yourself matters

A Razorpay-scale Prometheus pair tuned in 2024 was ingesting 8 million active series at 30-second scrape, total ingestion 530 KB/sec compressed (1.99 GB/hour). Decompressed, that is 19 GB/hour (a 9.5× ratio). The cardinality budget conversation always starts with "we have headroom on disk, why are we worried about cardinality?" — and the answer is that Gorilla compresses time-axis redundancy, not series-axis redundancy. Doubling the number of series doubles the compressed size linearly; halving the scrape interval also doubles it. Adding a high-cardinality label (a customer_id with 5M values) creates 5M new low-volume series, each of which loses some of Gorilla's compression efficiency because their per-series sample density is lower. The platform team's standard rule of thumb: a new label that adds 10× cardinality adds roughly 12-15× to the compressed footprint, not 10× — the extra is the per-series overhead from chunk headers and lower XOR-window reuse on sparse series.

The same team built a quarterly benchmark harness that picks 100 random metrics via /api/v1/series, replays their last 4 hours through three encoders (raw float64, Gorilla via the production tsdb library, and a candidate algorithm under evaluation) using subprocess.run to drive promtool tsdb analyze, and reports the per-metric byte size in a CSV. The harness exists because "should we switch to VictoriaMetrics for the cardinality-heavy tier" comes up at every architecture review, and "VictoriaMetrics says 0.5 B/sample on their blog" is not a number worth migrating on. On the actual workload — counter-heavy, moderately gauge-heavy, light on histograms — Gorilla lands at 1.42 B/sample and VictoriaMetrics's second-stage entropy coding lands at 0.81 B/sample. The 1.75× win is real and consistent, but it is not the 3-4× the marketing pages imply. Treat vendor compression-ratio numbers as a starting point for your own measurement, not a substitute for it. The harness pattern is reproducible on any cluster — pip install requests, hit /api/v1/series for the metric list, hit /api/v1/query_range for the samples, and pipe through both encoders to compare.

Decoding is the underrated half

Most discussion of Gorilla focuses on the encoder, but every read path in production runs the decoder, and the decoder's design constraints are different. The decoder must support random access into a chunk (a query for the last 5 minutes shouldn't decode the first 55 minutes of a 1-hour chunk) and streaming output (a query_range over 30 days emits ~86,400 samples per series, and you don't want to materialise all of them in memory). Prometheus's chunk decoder is a state machine that holds 4 integers (leading_zeros, trailing_zeros, delta, prev_value_bits) and emits one sample per Next() call — total decoder state per chunk is 32 bytes. The cost is that you can't seek into the middle of a chunk; you must decode from the chunk's anchor. This is why the chunk size of 120 samples matters: a query for "the last 60 seconds" decodes at most one full chunk (≈30 samples wasted on average), and a query for "the last 30 days" decodes 21,600 chunks back-to-back at >500 MB/s of decoded throughput per core. The math says decompression is never the bottleneck on the read path; S3 GET latency and the index-lookup phase before decoding always are.

Why InfluxDB IOx moved away from Gorilla

InfluxDB's classic engine (TSM) uses a Gorilla variant. The IOx project (the Rust rewrite of InfluxDB started in 2020) switched to Apache Parquet with dictionary encoding and ZSTD for long-term storage, abandoning Gorilla. The reason is interoperability: Parquet is a format every analytics engine reads, and IOx wanted to be queryable from DuckDB, Polars, Spark, Trino directly. The compression ratio is comparable (Parquet+ZSTD lands at 1.0-1.5 B/sample on time-series data), but the per-query CPU cost is 3-5× higher because Parquet decoding does not exploit the XOR structure. The trade — interoperability for a 3-5× CPU hit — is defensible only because IOx aims at the analytics-style query workload (long ranges, complex aggregations) where the per-query cost is amortised over many decoded samples. For the dashboard-style workload (short ranges, many panels, sub-second response time), Gorilla still wins.

Where this leads next

A practical takeaway before moving on: every Indian platform team running Prometheus is already getting Gorilla, regardless of whether they read this chapter. The reason to understand the algorithm is not to implement it — that ship sailed in 2015 — but to predict which workload changes will hurt your storage bill and which will not. Doubling scrape frequency doubles the bill. Adding a per-customer label to a popular metric multiplies the bill by the cardinality factor and a bit more. Switching from Counter to Histogram adds 10-20 series per histogram. Switching gauges that change rapidly to gauges that update only on change cuts that metric's storage by 5-10×. None of these are obvious without the underlying model; all of them follow directly from "Gorilla compresses correlated consecutive samples".

The compression story does not end with one series at a time. The next conceptual step is cross-series compression: two series with similar names (http_requests_total{service="payments",region="ap-south-1a"} and ...{region="ap-south-1b"}) usually also have similar sample patterns, and dictionary-encoding the label sets plus shared-block compression can squeeze another 1.5-2× out of the data. VictoriaMetrics ships this as a second-stage block coder; Mimir does not. Beyond compression, the same structural insight ("nearby telemetry data points are nearly equal") drives downsampling, quantile-from-histogram approximation, and exemplar-based traces — all chapters that build on the foundation laid here.

References

# Reproduce this on your laptop
python3 -m venv .venv && source .venv/bin/activate
pip install numpy
python3 gorilla_xor.py
# Expected output: ratio between 8x and 12x depending on noise seed.
# To see the algorithm's sensitivity to data shape, change the gauss sigma:
#   sigma=0.0   → ~0.6 B/sample (flat-line)
#   sigma=0.05  → ~1.7 B/sample (typical CPU)
#   sigma=5.0   → ~3.5 B/sample (chaotic)