Note: Company names, engineers, incidents, numbers, and scaling scenarios in this article are hypothetical — even when they resemble real ones. See the full disclaimer.
In short
A time-series workload is millions of append-only writes per second with monotone timestamps, no updates, and queries that are always range-bucketed aggregates on the last hour or last day. Postgres and MongoDB lose because their indexes outgrow the data, while Snowflake and BigQuery lose because they were built for infrequent huge scans, not fifteen-second dashboard refreshes. Dedicated TSDBs exploit the workload pattern with time-partitioned columnar chunks, Gorilla compression, continuous aggregates, and tiered retention — typically shrinking storage 10–30× and speeding range aggregates 50–100× over a general-purpose database.
You have spent the previous twenty builds learning the shape of two database families. The OLTP family — Postgres, MySQL, Aurora — is optimised for point queries and small transactions: read this row by primary key, update this account balance, insert this order. The OLAP family — Snowflake, BigQuery, ClickHouse — is optimised for scans and aggregates over wide tables: total revenue by region by quarter, joining a fact table to four dimensions.
Time-series is a third shape. It looks superficially like OLTP because the writes arrive constantly in small pieces, and superficially like OLAP because the queries are always aggregates. But it is neither. The writes are append-only and never updated, which OLTP storage engines do not exploit. The queries always filter on a single column (the timestamp) in a tight range, which OLAP planners do not specifically optimise for. The data shape — a few tag columns repeating for billions of rows, plus one or two narrow value columns — is wasteful in row format and only makes sense in column format with heavy compression.
The thesis of this chapter, and of Build 21 as a whole, is that the workload pattern is so peculiar and so common that it deserves its own engine. You will see that engine come together in the next four chapters. First you have to see why the obvious choice — "just use Postgres, we already know it" — falls apart at scale.
The shape of a time-series workload
Three properties define a time-series workload. Together they are exhaustive: if all three hold, you have a time-series workload, and a TSDB will beat a general-purpose database. If any one is missing, you probably do not have a time-series workload, and a TSDB might be the wrong tool.
Property 1 — write-heavy and append-only. The writes vastly outnumber the reads. A single Kubernetes cluster with five hundred pods, each exporting two hundred metrics every fifteen seconds, generates 500 × 200 / 15 = 6,666 writes per second from one cluster alone. Multiply by the dozens of clusters a serious engineering org runs and the number is in the hundreds of thousands per second. The writes are pure appends — the timestamp on each new row is monotonically the largest the system has ever seen. There is no contention on hot rows because there are no hot rows; every write goes to the end. Why this matters: an MVCC engine like Postgres pays for row-level concurrency control on every insert — checking that no other transaction has the row, taking a tuple lock, writing a xmin for visibility. If your workload guarantees no two writers ever touch the same row, you are paying for protection you do not need. A TSDB drops the protection and gets 10–30× more inserts per CPU second.
Property 2 — queries are range-scans on the timestamp. Every interesting query has a WHERE timestamp BETWEEN x AND y predicate, and the range is usually short (the last hour, the last day) relative to the table (which holds years). Crucially, the query almost never asks for individual rows — it asks for an aggregate over a bucketed window: avg(cpu_pct) GROUP BY time_bucket('1 minute', ts). Why this matters: the planner can prune to the chunks that overlap the time range without consulting any other index. A B-tree on (host, timestamp) works, but the TSDB approach — partition the table by time, then keep only a small per-chunk min/max index — is dramatically cheaper because the index itself shrinks from gigabytes to kilobytes.
Property 3 — high cardinality on tags, low cardinality on schema. Each row is (timestamp, host, region, metric_name, value) or similar. The tag columns (host, region, metric_name) repeat for billions of rows but take only a few hundred or a few thousand distinct values across the dataset. The value columns are usually one or two doubles. The schema barely changes — you add a new metric every few weeks at most. Why this matters: dictionary encoding turns the tag columns from 32-byte strings into 1- or 2-byte integers, and run-length encoding on the time-sorted data compresses runs of the same tag value to single entries. Together they shrink the on-disk footprint by 10–50× compared to a row-oriented Postgres table with the same rows.
The three panels are the entire mental model of a TSDB. Writes pour in monotonically and are appended to today's chunk. Storage keeps recent chunks uncompressed and writeable, compresses older chunks aggressively, ships truly old chunks to object storage. Reads almost always touch one or two chunks, scan a small window, return a small aggregate.
The data shape: timestamp + value, with tags
Compare a row from a time-series table with a row from an OLTP table. The OLTP row — say, an order in an e-commerce system — has fifteen columns of varied types: customer ID, items, totals, addresses, statuses, timestamps, references to other rows. Each column is meaningful, each row is unique, each row may be updated as the order moves through fulfilment. Storing it row-oriented in Postgres is correct because every read or write touches the whole logical entity.
A time-series row is the opposite. It has, conceptually, two parts: the measurement (a timestamp and one or two numeric values) and the series identity (a small set of tag columns identifying which sensor or host or device produced the measurement). The measurement changes every row. The series identity repeats unchanged across millions of consecutive rows.
In Postgres you would store this as one wide row per measurement: (ts, host_str, region_str, env_str, metric_str, value). The string columns alone consume 30–60 bytes each, so a 16-byte logical measurement balloons to 100–200 bytes on disk, repeated for every sample. Across a billion samples that is 100–200 GB of repeated tag strings. Why this matters: storage cost scales linearly with how repetitive the columns are. Time-series data is deeply repetitive in tags and minimally repetitive in values. Row format wastes space proportional to the gap. Column format with dictionary + RLE on tags and Gorilla on values closes the gap to the information-theoretic minimum — typically 1–2 bytes per measurement on disk.
A TSDB exploits this by separating the two halves at the storage layer. InfluxDB and Prometheus store a series as a unit: one entry per (host, region, env, metric) combination, identified by a small integer ID, plus an array of (timestamp, value) measurements. TimescaleDB compresses chunks into arrays per column so the tag column is effectively a single dictionary entry per chunk plus a tiny RLE bitmap. The two-half structure is the data model that the storage layout follows; you will see the layout in detail in chapter 165.
Why general-purpose databases lose
Three families of database engineers reach for when they first hit time-series traffic, three failure modes.
Postgres. A single Postgres node on modern hardware sustains roughly 80–120K row inserts per second on a wide table with two indexes, before WAL fsyncs and B-tree page splits start dominating CPU. You can push to 200K with synchronous_commit = off, larger WAL buffers, and unlogged tables, at the cost of durability. Past that you must shard, which means an application-layer router, schema discipline across shards, and pain when a query needs cross-shard aggregation. The deeper problem is that the indexes grow faster than the table: a 24-byte row plus a 32-byte B-tree entry per index means two indexes weigh 2.6× as much as the data. By month three the database is mostly indexes, and the index pages dominate the buffer cache, evicting the data pages that queries actually want. Why this matters: Postgres' page-based storage and B-tree indexes are designed for read-mostly OLTP workloads where a few percent of rows are hot. Time-series has no hot rows — every row is read at most once, soon after being written. The indexes are pure overhead because a time-partitioned chunk plus a min/max index gives the same pruning benefit at one-thousandth the space.
MongoDB. Document stores look attractive because "my measurement is just a small JSON object". But a BSON document of {"ts": ..., "host": "web-mumbai-37", "region": "ap-south-1", "metric": "cpu_pct", "value": 41.2} is about 110 bytes on the wire and 130 bytes on disk, plus the _id index (24 bytes) and any secondary indexes. The repeated key names alone ("host", "region", "metric", "value") consume 30 bytes per document. Multiply by a billion documents and you have 30 GB of JSON keys, repeated. MongoDB later added a time-series collection feature in 5.0 that internally bucketed measurements and shrank this overhead, which is implicit acknowledgment that the document model loses on this workload.
Snowflake / BigQuery. These are excellent for analytical batch queries on terabytes of data. They are not built for sustained millions-of-rows-per-second ingest — Snowflake recommends micro-batches via Snowpipe with seconds-to-minutes of latency, BigQuery has a streaming insert quota measured in MB/sec per table. They also charge per query, which makes the "every dashboard refreshes every fifteen seconds" pattern of a TSDB workload economically painful. Why this matters: a TSDB serves dashboards and alerts; the queries are tiny (one minute of data) but extremely frequent. An OLAP warehouse serves analysts; the queries are huge (a quarter of data) but infrequent. Pricing models follow the workload — and the wrong tool for one becomes ruinously expensive at scale.
The TSDB optimisations, in one preview
Every dedicated TSDB — Prometheus, InfluxDB, TimescaleDB, ClickHouse, QuestDB, VictoriaMetrics — exploits the three workload properties via the same five mechanisms. The next four chapters of Build 21 dig into each in turn; here is the map.
Time partitioning into chunks. Split the logical table into physical pieces, one per day or week. The planner prunes the irrelevant chunks immediately on the WHERE ts BETWEEN ... predicate, without consulting any index — just a chunk-level min/max metadata entry. Inserts always go to the most recent chunk, which fits in memory and accepts writes at memory speed. Old chunks are read-only and can be aggressively compressed. Chapter 165 builds this.
Columnar layout inside each chunk. Once a chunk is closed for writing, store each column as a contiguous array. A query that touches cpu_pct reads only the cpu_pct column, not the seven other metric columns in the row. Combine with dictionary encoding on tag columns (the four hosts that appear in this chunk become integers 0..3) and Gorilla XOR encoding on the value column (most consecutive doubles XOR to a few zero bits) and the on-disk size shrinks 10–30× from the row format. Chapter 165 again, second half.
Continuous aggregates. Most dashboard queries ask for minute averages, not raw second-by-second points. Pre-compute the per-minute, per-five-minute, per-hour aggregates as the data arrives, store them alongside the raw chunks, route queries to whichever resolution matches their time range. A 24-hour query reads 1,440 minute-aggregates instead of 86,400 raw points — a 60× speedup for free. Chapter 166.
Tiered retention. Define keep raw for 7 days, keep minute-aggregates for 90 days, keep hour-aggregates for 5 years, drop everything older. The TSDB enforces this automatically, dropping old chunks at chunk-granularity (which is much cheaper than DELETE WHERE ts < ...). Optionally tier truly old chunks to S3 — they are still queryable, just slower. Chapter 167.
Alerting integration. A TSDB sits next to an alerting engine — Prometheus has Alertmanager built in, Grafana has Grafana Alerting, InfluxDB has Kapacitor. The alerting engine runs the same time-bucketed aggregate queries every fifteen seconds, compares against thresholds, fires notifications. The TSDB and alerting are co-designed because the query patterns are identical. Chapter 168.
An Indian fintech monitors transaction latency
You are running the payments backend for a Bengaluru-based fintech that processes 100,000 UPI-style transactions per second at peak (Diwali week — every shop is doing card-on-delivery refunds, every gig worker is settling end-of-day, every parent is sending money to children studying abroad). For each transaction you want to record the latency: how many milliseconds between the API gateway accepting the request and the database commit returning. That is 100,000 (timestamp, latency_ms) rows per second, plus a few tag columns: endpoint, merchant_category, region, payment_method.
You start on Postgres. The schema is obvious:
CREATE TABLE tx_latency (
ts TIMESTAMPTZ NOT NULL,
endpoint TEXT,
merchant_category TEXT,
region TEXT,
payment_method TEXT,
latency_ms DOUBLE PRECISION
);
CREATE INDEX ON tx_latency (ts);
CREATE INDEX ON tx_latency (region, ts);
You load-test at 100K inserts/sec. Two things happen:
- CPU sits at 50%, all of it on insert overhead.
pg_stat_statementsshows 30% on theINSERTitself (tuple construction, MVCC bookkeeping), 12% on B-tree maintenance for the two indexes, 8% on WAL fsync. Aggregate scans on the latency dashboard (the SRE team's "p95 latency by region, last 1 hour" query) take 18 seconds because Postgres reads 360 million rows from the table heap to compute the percentile. - Disk usage grows 240 GB per day. Of that, roughly 90 GB is the table heap (24 bytes per row average, after the wide tag strings) and 150 GB is the two indexes. Two weeks in, you have 3.4 TB of database and you have not even configured retention.
You switch to TimescaleDB — same Postgres, just the extension installed and the table converted to a hypertable:
SELECT create_hypertable('tx_latency', 'ts', chunk_time_interval => INTERVAL '1 day');
ALTER TABLE tx_latency SET (
timescaledb.compress,
timescaledb.compress_segmentby = 'region, payment_method'
);
SELECT add_compression_policy('tx_latency', INTERVAL '2 days');
Three things change:
- Insert throughput climbs to 500K rows/sec on the same hardware. The hypertable routes inserts to today's chunk, which is small enough to fit in memory; B-tree depth on a per-chunk index is 2 instead of 5; WAL volume drops because columnar staging reduces redundant writes.
- The 1-hour p95 query drops from 18 seconds to 0.18 seconds. The query planner prunes to the single chunk that overlaps the last hour, which holds 360M rows compressed to ~12 GB columnar. Reading the
latency_mscolumn alone (one of six) and theregionsegmentby column gives the percentile from a 2 GB scan instead of an 80 GB scan. - Disk usage drops to 30 GB per day after compression kicks in. Chunks older than 48 hours compress to roughly 8× smaller. After three months with a retention policy of 90 days, total storage is 2.7 TB instead of 21 TB.
The Postgres schema, the SQL queries, the connection driver, the Grafana dashboard — all unchanged. The optimisation is purely below the storage layer, and it is purely a matter of recognising that this workload is time-series-shaped and treating it accordingly.
When you outgrow a single TimescaleDB node, the next steps are: VictoriaMetrics or Mimir for horizontal-scale Prometheus-style metrics, ClickHouse for SQL queries on tens of TB, or InfluxDB Cloud's clustered tier. The optimisations are the same; the scale-out story differs.
Real systems and where to use them
The TSDB ecosystem in 2026 is mature but fragmented. Brief field guide:
- Prometheus — the de-facto standard for metrics in cloud-native stacks. Pull-based scraping, single-binary deploy, PromQL query language, retention typically 15–30 days on local disk. Scales vertically to a few million active series per node, not designed to scale horizontally on its own.
- VictoriaMetrics, Mimir, Thanos — horizontal-scale Prometheus-compatible engines. Use when you outgrow a single Prometheus node or need multi-tenancy.
- InfluxDB — older, commercial-friendly, custom Flux query language (or InfluxQL/SQL in newer versions). Good for IoT and DevOps overlap.
- TimescaleDB — Postgres extension. Ideal when the team already runs Postgres and wants to keep relational habits, JOINs, full SQL, and pgAdmin. The cleanest migration path from a Postgres-based monolith.
- ClickHouse — general-purpose OLAP that is exceptionally good at time-series at huge scale. Used by Cloudflare, Glydex, and many ad-tech firms for trillion-row time-series workloads. Lower-level tuning required, no built-in retention DSL.
- QuestDB — Java-based, SQL surface, single-binary, optimised for high-cardinality ingest. Strong on financial tick data.
The right choice depends on volume, ecosystem, and team familiarity. For most startups the answer is "Prometheus until it hurts, then TimescaleDB or VictoriaMetrics". For trillion-row scale the answer is "ClickHouse". For an existing Postgres shop the answer is "TimescaleDB from day one".
Common confusions
-
"A time-series workload is just OLTP with a timestamp column." No. The defining property is not the column, it is the access pattern. OLTP queries are point reads (
WHERE order_id = 47291), and OLTP storage engines pay heavily for MVCC, row-level locking, and B-tree maintenance to support concurrent point updates. Time-series writes are pure appends with no contention, and time-series reads are always range-bucketed aggregates. A timestamp on anorderstable does not turn it into a time-series table — what matters is whether your queries doGROUP BY time_bucket(...)over recent ranges. If they do, you have time-series. If they do not, a regular B-tree on(customer_id, ts)is the right answer. -
"
COUNT(*)andavg()already work in Postgres, so I do not need a TSDB." They work, just not at the throughput a dashboard demands. A 24-hour aggregate over 86 million rows on Postgres takes 15–30 seconds because the planner reads the whole heap, computes the bucket per row, hashes into a group table, and aggregates. The same query on TimescaleDB with continuous aggregates takes 50 ms because the rollup is already computed; the query touches 1,440 pre-aggregated rows, not 86 million raw ones. The SQL surface is identical — the speedup is in the storage layer. -
"High cardinality means many rows." No. In TSDB jargon, cardinality is the number of distinct series, where a series is a unique combination of tag values (
host=web-mumbai-37, region=ap-south-1, env=production, metric=cpu_pct). Adding a million rows to one series does not increase cardinality. Adding arequest_idtag with millions of distinct values does. High cardinality is a real problem for TSDBs because the per-series metadata explodes — Prometheus famously slows down past about 10 million active series per node. Beware of putting unbounded values (user IDs, request IDs, full URLs) into tag columns; they belong in a separate columnar store, not in your metrics tags. -
"InfluxDB / Prometheus / TimescaleDB / ClickHouse are interchangeable." They share the five optimisations, but their surfaces and operational models are very different. Prometheus is pull-based and single-node — you point it at HTTP endpoints, it scrapes them. InfluxDB and TimescaleDB are push-based with SQL-ish surfaces — you
INSERTinto them. ClickHouse is push-based, full SQL, and tuned for trillions of rows but has no built-in retention DSL or alerting. Picking by "which one is fastest" misses the point; pick by "what does my ingest pipeline already speak, and what does my query layer (Grafana, custom dashboards) expect." -
"Compression and downsampling are the same thing." Compression preserves every original sample exactly — a 1 GB chunk shrinks to 100 MB on disk but, when decompressed, gives back the same 86,400 raw points. Downsampling is lossy — it replaces 86,400 raw points with 1,440 minute-aggregates and discards the originals. Most production deployments use both: keep raw points compressed for 7 days, keep minute-aggregates for 90 days, keep hour-aggregates for 5 years. Confusing the two leads to people running
SELECT raw_valueon data older than the raw retention and getting nothing back. -
"I will just store metrics in S3 as Parquet — that is basically a TSDB." Parquet on S3 is excellent for batch analytics on cold time-series data, and several TSDBs use exactly that for their cold tier (TimescaleDB tiered storage, Mimir's long-term blocks, VictoriaMetrics' downsampling). But the hot path — the last hour of data ingested at millions of rows/sec, queried by dashboards every fifteen seconds — does not fit on S3. S3 has minimum object sizes, listing latencies measured in tens of milliseconds, and per-PUT costs that make 100K writes/sec ruinously expensive. The TSDB architecture is hot tier on local SSD, cold tier on S3, with the engine knowing how to query both. Skipping the hot tier is what fails.
Going deeper
If you wanted only the workload model and the menu of optimisations, you have them. The rest of this section walks the edge cases that the working TSDB engineer encounters in production: what kills cardinality, why exemplars matter, how clock skew breaks the monotone-timestamp assumption, and what the published numbers from real deployments look like.
The cardinality wall
Every TSDB has a soft ceiling on active series count, where active means "has received a sample in the last few hours." Prometheus' practical limit on a 32 GB node is around 5–10 million active series. VictoriaMetrics pushes this to 50 million. ClickHouse can hold a billion if you tune it. The reason this ceiling exists at all is that each series carries a metadata entry — a label set, an inverted index entry, a write buffer — and the per-series overhead, even when minimised, is hundreds of bytes. Multiply by tens of millions of series and the metadata alone consumes the node's RAM.
The classic way to blow past the ceiling is unintentional: someone adds request_id or user_id as a label, thinking "more dimensions are better". Each unique value spawns a new series. A few hours later the TSDB has 100 million active series, the index does not fit in memory, ingest stalls, queries time out, and the on-call engineer wakes up. Why this matters: TSDBs assume tag values are categorical — drawn from a small fixed alphabet. Continuous or unbounded values violate the assumption, and the engine's data structures are not designed for them. The fix is to put unbounded values into a separate columnar store (ClickHouse, BigQuery) and keep the TSDB for low-cardinality dimensions.
Pull-based engines like Prometheus partly mitigate this by simply refusing to scrape an endpoint that emits too many series — the prometheus_target_metadata_cache_entries exceeds a configured limit and the target is dropped. Push-based engines like InfluxDB cannot refuse a write the same way and need explicit cardinality limits at the schema level.
Exemplars — why the average hides what you need
A dashboard that shows avg(latency_ms) per minute is misleading by construction. The average smooths over outliers, and outliers are exactly what you want to see during an incident. The fix is to record exemplars — for each minute bucket, store one or two raw samples (with their full tag set) alongside the aggregate, so the dashboard can show "the average was 80 ms, but here is one sample at 4,200 ms from host=web-mumbai-9 at 10:23:14".
OpenTelemetry standardised exemplars in 2022, and Prometheus, Grafana, and TimescaleDB now all support them. The cost is small — one exemplar per bucket adds 50–100 bytes — but the diagnostic value is large. When the average suddenly spikes, the exemplar tells you which request caused the spike, not just that the average moved. Production TSDBs that omit exemplars force operators to fall back to log search, which is orders of magnitude more expensive.
Clock skew and out-of-order writes
The whole storage architecture assumes timestamps are monotone — every new write has a larger timestamp than every previous write. In practice, clocks drift, NTP slews, and a write from a slightly-behind clock arrives after a write from a slightly-ahead one. The TSDB has to handle this without rejecting valid data and without breaking the columnar layout that assumes sorted timestamps.
Three strategies exist. Reject far-out-of-order writes. Prometheus' default is to drop any sample older than 10 minutes; InfluxDB has a similar max-values-per-tag. Buffer and reorder before flushing. Most engines hold the most-recent N minutes of data in a row-format buffer, sort it on flush, then write the columnar chunk. This handles small reorderings (seconds) cheaply. Write to a separate "out-of-order" chunk. TimescaleDB's compression scheme handles late-arriving data by keeping a small uncompressed buffer alongside each compressed chunk and merging on read. The trade-off is compression efficiency vs. tolerance for late data.
A note on UPI and BharatRail scale: India's UPI processed roughly 13 billion transactions in March 2025 alone, peaking at over 5,000 transactions per second. The clocks across NPCI's data-centres in Hyderabad, Mumbai, and Chennai drift by tens of milliseconds even with PTP, and a transaction recorded at the Mumbai gateway can arrive at the central TSDB after one recorded later at the Chennai gateway. The metrics infrastructure has to absorb this without losing samples or fragmenting chunks. Out-of-order tolerance is not a luxury; it is the cost of doing business across multiple sites.
Published numbers from real deployments
To anchor the magnitude:
- Cloudflare's ClickHouse cluster ingests around 10 million rows/sec of HTTP request metrics across 200+ data-centres globally, holding hundreds of trillions of rows total. Their published blog post on the architecture describes the schema and partitioning.
- Glydex's M3 + Prometheus stack stores around 10 billion metrics per second across the company, with M3 acting as the long-term storage tier behind Prometheus. The M3DB paper is the reference.
- Grafana Mimir, used internally at Grafana Labs and by many Mimir-Cloud customers, has demonstrated a single tenant holding 1 billion active series on a horizontally-scaled cluster — an order of magnitude beyond a single Prometheus node.
- InfluxDB Cloud Enterprise customers regularly run 5+ million writes/sec on a clustered tier.
These numbers tell you what the upper end looks like. Most engineering organisations operate at 0.1–1% of these scales — tens of thousands to a few hundred thousand writes per second — which a single TimescaleDB node or a small Prometheus + Mimir cluster handles comfortably. The engineering work is not in chasing Cloudflare's scale; it is in matching your actual workload to the simplest TSDB that handles it.
The Gorilla paper, in one paragraph
Sociogram's Gorilla paper (VLDB 2015) is the foundational result that made modern TSDB compression possible. The two ideas are delta-of-delta encoding for timestamps (most metrics are scraped on a regular interval, so the difference between consecutive intervals is almost always zero — encode it in 1 bit) and XOR encoding for double-precision values (consecutive measurements of CPU usage or network bytes are almost identical at the bit level, so their XOR has long runs of zeros — encode the run lengths). Together these two tricks compress (timestamp, double) pairs to 1.37 bytes on average, a 12× shrinkage from the naive 16-byte representation. Every modern TSDB uses some variant of Gorilla — Prometheus' TSDB, InfluxDB's TSI engine, TimescaleDB's columnar compression, M3DB. Chapter 165 walks through a Python implementation of the encoder and the decoder.
What Build 21 builds next
You now have the workload model: write-heavy, append-only, range-scan-with-aggregate, two-halves data shape. You also have the high-level menu of optimisations: time partitioning, columnar inside chunks, compression, continuous aggregates, retention. The remaining four chapters of Build 21 turn the menu into an engine.
- Chapter 165 — Time-partitioned columnar layout. How the chunk boundary is chosen, how rows are written to today's chunk in row format and re-encoded into columnar arrays when the chunk closes, the dictionary + RLE + Gorilla compression pipeline that gives 10–30× shrinkage. You will end the chapter with a working Python prototype that ingests 1M rows/sec into row-format chunks and compresses closed chunks to columnar.
- Chapter 166 — Continuous aggregates and downsampling. How to maintain per-minute, per-five-minute, per-hour rollups incrementally as data arrives, how the query planner picks the right resolution for a given time range, how late-arriving data is handled. The trick is that the rollups are themselves hypertables stored in the same engine.
- Chapter 167 — Retention and tiered storage. Dropping old chunks atomically, downsampling-in-place, shipping cold chunks to S3 while keeping them queryable. The economics: hot SSD storage at ₹8/GB/month vs. S3 at ₹2/GB/month makes tiering pay for itself within weeks.
- Chapter 168 — Alerting integration. How Prometheus' Alertmanager and Grafana Alerting evaluate rules over the same time-bucketed aggregates, the deduplication and grouping logic that keeps you from being woken up 200 times by one outage, the multi-window multi-burn-rate SLO alerts that Querion's SRE handbook recommends.
By the end of Build 21 you will have built, conceptually, the storage and query path of TimescaleDB plus the alerting path of Prometheus — the full observability stack that every modern engineering org runs.
References
- TimescaleDB blog — Time-series data: Why and how to use a relational database instead of NoSQL
- InfluxDB documentation — Key concepts and terminology
- Akhanda, Mohammad Sazid. Time-Series Databases: Concepts, Design and Implementation. 2023.
- Prometheus documentation — Storage
- OutSystems blog — What is a Time-Series Database?
- Pelkonen, Franklin, Teller, et al. Gorilla: A Fast, Scalable, In-Memory Time Series Database. VLDB 2015