Promscale and TimescaleDB hypertables

It is 02:40 IST on a Sunday at Razorpay and Aditi, a platform engineer, is staring at a Prometheus instance that has gone read-only for the third time this quarter. The 2-hour block compaction is running, query latency just spiked from 80ms to 4.2 seconds, and the on-call dashboard the analytics team shares with finance — the one that joins payment-success-rate against the GST-ledger CSV — has timed out because Prometheus does not speak SQL. The previous chapter (Prometheus chunks) explained why the compaction is the way it is. This chapter is about the road not taken: what if your time-series store were a normal SQL database? Promscale, archived by Timescale in 2023, was that experiment. It sent every Prometheus sample into a TimescaleDB hypertable — a PostgreSQL table partitioned automatically on time and on a hashed series id — and let you join cpu_usage against payments_settled against your customer table from pg_cron, all in one EXPLAIN ANALYZE. The product is gone. The mechanism is not.

Knowing what a hypertable is and how Promscale used it is the difference between treating Prometheus's TSDB as the only correct answer and seeing it as one design point in a real space. After this chapter you will know how a hypertable's chunk-per-time-range partitioning compares to Prometheus's 2-hour block, why TimescaleDB's compression hits the same 1.3 bytes/sample as Gorilla on metric data despite using completely different machinery, and why Promscale was discontinued — not because the architecture failed but because the operational complexity of running PostgreSQL as a metrics store at Razorpay scale was higher than running Mimir or VictoriaMetrics.

A TimescaleDB hypertable is a PostgreSQL table that is automatically partitioned into time-based chunks (typically 1 day or 1 week of data each) and optionally hash-partitioned on a second key like series_id. Promscale sat in front of Prometheus's remote-write endpoint, normalised every metric into rows of (time, value, series_id) plus a series lookup table mapping series_id to its label set, and let you query both via PromQL (translated to SQL) and direct SQL with joins to your transactional data. TimescaleDB's columnar compression — applied to chunks older than the configured boundary — reaches 1.3–2 bytes per sample on metric data, comparable to Gorilla. Promscale is sunset (2023); the hypertable design lives on as the dominant metrics-in-PostgreSQL pattern, used by Grafana Mimir's adjacent ecosystem and by every team that wanted SQL joins between metrics and transactional data.

What a hypertable actually is

A hypertable is not a new data structure. It is a PostgreSQL table plus a pg_partman-style routing layer that creates child tables (called chunks in TimescaleDB terminology — distinct from Prometheus chunks) covering disjoint time ranges. From the application's perspective there is one table; from the storage layer's perspective there are dozens or hundreds of regular Postgres tables, each indexed independently, each compressible independently, each droppable independently for retention.

When you insert a row with time = '2026-04-25 12:00:00', TimescaleDB looks up the chunk that owns that range (creating it if needed), routes the insert there, and returns. Queries are planned across the chunks whose time ranges overlap the predicate — the same idea as Prometheus's "block overlap check" but expressed in standard PostgreSQL constraint exclusion.

A TimescaleDB hypertable partitions one logical table into many time-bucketed child chunksDiagram showing a hypertable named metrics. The user sees one table with columns time, value, series_id. Underneath, TimescaleDB has automatically created child chunks: _hyper_1_1_chunk covering 2026-04-23, _hyper_1_2_chunk covering 2026-04-24, _hyper_1_3_chunk covering 2026-04-25, and so on. Each chunk is a normal PostgreSQL table with its own indices and (after the compress_after threshold) its own compressed columnar storage. A second arrow shows query planning: SELECT WHERE time BETWEEN ... AND ... only opens the chunks that overlap, like Prometheus's block overlap check but in standard SQL constraint exclusion.Hypertable — one logical table, many physical chunksmetrics (hypertable)time TIMESTAMPTZ, value DOUBLE, series_id BIGINT_hyper_1_1_chunk04-23 00:00 → 04-24 00:00compressed (cold)_hyper_1_2_chunk04-24 00:00 → 04-25 00:00compressed (cold)_hyper_1_3_chunk04-25 00:00 → 04-26 00:00row-store (hot, today)_hyper_1_4_chunk04-26 00:00 → 04-27 00:00created on first insertQuery: SELECT avg(value) FROM metrics WHERE time >= '2026-04-25' AND series_id = 8821excluded by plannerrange outside predicateexcluded by plannerrange outside predicatescannedoverlaps the predicateexcludedfuture rangeConstraint exclusion is standard PostgreSQL; TimescaleDB's contribution is auto-creating chunks on insertand re-routing the planner so the user sees a single hypertable instead of N child tables.Source: TimescaleDB extension source — _timescaledb_internal schema, chunk_dispatch, chunk_constraint.
Illustrative — derived from the TimescaleDB extension architecture, not a snapshot of a running instance. The hypertable is the user's mental model; the chunks are PostgreSQL's mental model. Constraint exclusion lets the planner skip chunks whose time range cannot match the query, mirroring Prometheus's block overlap check but using a 25-year-old PostgreSQL feature instead of a custom Go index lookup.

Why "chunk" means two different things in this chapter: a Prometheus chunk is 120 samples of one series; a TimescaleDB chunk is one time-bucket of an entire table — usually 1 day's worth of all series. The two are not analogous storage units. The closest TimescaleDB concept to a Prometheus chunk is a single row group inside a compressed chunk, which after compression holds ~1000 samples of one series in a columnar block. Promscale collapsed this naming clash by calling Prometheus chunks "samples" in the SQL schema and TimescaleDB chunks "partitions" in its docs; the codebase used both. We will use "hypertable chunk" and "Prometheus chunk" explicitly throughout to avoid confusion.

How Promscale wrote a sample to a hypertable

Promscale ran as a sidecar process. Prometheus's remote_write configuration pointed at it, and every batch of samples that left Prometheus's WAL via the remote-write protocol arrived at Promscale as a protobuf payload. Promscale unmarshalled it, normalised the labels, looked up or created the corresponding series_id, and bulk-inserted into the prom_data.metric hypertable.

The schema was deliberately compact. One row per sample, three columns. The label set was hashed and stored once in a prom_data_series.<metric> lookup table; thereafter every sample of that series referred to the lookup by integer. This is the same trick Prometheus's head index uses (postings list of seriesRef) but spelled in SQL.

# promscale_ingest.py — model what Promscale did with a remote-write batch
# pip install psycopg2-binary
import psycopg2, hashlib, json, time
from typing import Tuple

conn = psycopg2.connect("host=localhost port=5432 dbname=promscale user=postgres password=postgres")
conn.autocommit = False

def get_or_create_series_id(metric_name: str, labels: dict) -> int:
    # Stable hash over (metric, sorted labels) — series_id is dense small int
    canonical = json.dumps([metric_name, sorted(labels.items())], separators=(",", ":"))
    h = int(hashlib.sha256(canonical.encode()).hexdigest()[:15], 16)
    with conn.cursor() as cur:
        cur.execute("""
            INSERT INTO prom_data_series.checkout_latency_p99_ms (labels_hash, labels_json)
            VALUES (%s, %s::jsonb) ON CONFLICT (labels_hash) DO NOTHING
            RETURNING series_id
        """, (h, json.dumps(labels)))
        row = cur.fetchone()
        if row: return row[0]
        cur.execute("SELECT series_id FROM prom_data_series.checkout_latency_p99_ms WHERE labels_hash=%s", (h,))
        return cur.fetchone()[0]

def ingest(batch: list[Tuple[int, dict, float]]):
    # batch: list of (timestamp_ms, labels, value)
    rows = []
    for ts_ms, labels, value in batch:
        sid = get_or_create_series_id("checkout_latency_p99_ms", labels)
        rows.append((ts_ms / 1000.0, value, sid))
    with conn.cursor() as cur:
        cur.executemany(
            "INSERT INTO prom_data.checkout_latency_p99_ms (time, value, series_id) VALUES (to_timestamp(%s), %s, %s)",
            rows,
        )
    conn.commit()

# Synthetic Razorpay payments-API scrape: 14 pods, 6 endpoints, 1 sample per pod-endpoint per 15 s.
import random
batch = []
ts = int(time.time() * 1000)
for pod in range(14):
    for endpoint in ["/v1/charge", "/v1/refund", "/v1/order", "/v1/customer", "/v1/auth", "/v1/settle"]:
        labels = {"pod": f"payments-api-{pod:02d}", "endpoint": endpoint, "region": "ap-south-1"}
        value = 80.0 + random.gauss(0, 12) + (40 if endpoint == "/v1/settle" else 0)
        batch.append((ts, labels, value))

t0 = time.perf_counter()
ingest(batch)
elapsed_ms = (time.perf_counter() - t0) * 1000
print(f"ingested {len(batch)} samples in {elapsed_ms:.1f}ms → {len(batch)/elapsed_ms*1000:.0f} samples/s")
print(f"unique series this batch: {len(set((b[1]['pod'], b[1]['endpoint']) for b in batch))}")
ingested 84 samples in 38.2ms → 2199 samples/s
unique series this batch: 84

Per-line walkthrough. The line canonical = json.dumps([metric_name, sorted(labels.items())], separators=(",", ":")) is where label-set canonicalisation happens. Why this matters for cardinality accounting: two scrapes that produce the same labels in different order must hash to the same series_id, otherwise a single Prometheus restart would double the series count overnight. Sorting before hashing guarantees the canonical form. Promscale used a faster xxhash variant in production; sha256 is shown here for clarity. The hash is the content address of a series, the same idea Loki uses for log-stream identity.

The line cur.executemany("INSERT INTO prom_data.checkout_latency_p99_ms (time, value, series_id) ...") is where the hypertable insert happens. The application sees one table; TimescaleDB's chunk-dispatch trigger on this table inspects the time value, finds the right child chunk, and routes the row there. Why this single layer of indirection costs less than 5 µs per row: chunk routing is a hash lookup in shared memory (pg_dist_chunk cache) keyed by (hypertable_id, time_bucket(time, chunk_interval)). Once warm, the lookup is one CPU instruction in practice. The cost everyone worries about — "Postgres is slow for high-throughput inserts" — is dominated by WAL fsync and index updates, not by chunk routing. A bulk COPY into a hypertable hits 600k rows/sec on a single node, comparable to Prometheus's WAL append.

The line cur.execute("INSERT INTO prom_data_series.checkout_latency_p99_ms ... ON CONFLICT (labels_hash) DO NOTHING") is the series-id upsert. Why ON CONFLICT DO NOTHING is the right idempotency guarantee: in a high-cardinality remote-write stream, two concurrent ingest workers may try to create the same series simultaneously. The conflict-do-nothing pattern lets one win and the other fall through to the follow-up SELECT series_id. Without this, the ingest path would either deadlock (with naive locking) or produce duplicate series ids (without locking). PostgreSQL's ON CONFLICT is the same primitive Promscale used; the alternative — a global series-id allocator — was tried in early prototypes and abandoned because it became the bottleneck at 200k samples/sec.

The output shows ~2200 samples/sec for an unbatched, single-connection insert. Promscale in production used connection pooling and COPY FROM STDIN to reach 50–80k samples/sec per node — competitive with single-node Prometheus, far below sharded Mimir.

How TimescaleDB compresses a chunk

The genius (and the root of the trade-off) of TimescaleDB's compression is that it is a row-to-columnar transform applied to a sealed chunk. A hot chunk — today's data — is stored row-oriented for fast inserts. Once the compress_after threshold passes (typically 7 days, configurable), TimescaleDB rewrites the chunk into a compressed form: every chunk becomes one row of arrays, where each array holds 1000 values of one column, encoded with a column-appropriate algorithm (delta-of-delta for timestamps, Gorilla XOR for floats, dictionary for low-cardinality strings).

The result for metric data: time columns hit ~0.5 bytes/value, value columns hit ~1.3 bytes/value (Gorilla), series_id columns hit ~0.1 bytes/value (run-length encoded because adjacent rows usually share a series_id). Total: ~2 bytes/sample, in the same neighbourhood as Prometheus's chunk format despite a completely different journey.

# compression_economics.py — model the byte savings from chunk compression
# pip install pandas
import pandas as pd

# Synthetic Hotstar IPL traffic for one day: 25M concurrent viewers, 1 metric per pod per 15s
# 80 services * 60 pods/service avg * (60 * 4) scrapes/min * (24 * 60) min = ?
samples_per_day = 80 * 60 * (24 * 60 * 4)   # 80 services * 60 pods * 5760 scrapes/day
unique_series = 80 * 60                       # one series per service-pod (over-simplified)

raw_bytes = samples_per_day * 24             # 8B time + 8B value + 8B series_id = 24 B
compressed_bytes = samples_per_day * 2       # ~2 B/sample after columnar compression

raw_gb = raw_bytes / (1024**3)
compressed_gb = compressed_bytes / (1024**3)
ratio = raw_bytes / compressed_bytes

print(f"samples/day:           {samples_per_day:,}")
print(f"unique series:         {unique_series:,}")
print(f"raw row size:          {raw_gb:.1f} GB/day")
print(f"compressed (columnar): {compressed_gb:.1f} GB/day")
print(f"compression ratio:     {ratio:.0f}x")

# Cost in Indian rupees on AWS ap-south-1 EBS gp3 (~ ₹6 / GB / month)
inr_per_gb_month = 6.0
print(f"raw cost (30 days, gp3): ₹{raw_gb * 30 * inr_per_gb_month:,.0f}/month")
print(f"compressed cost:         ₹{compressed_gb * 30 * inr_per_gb_month:,.0f}/month")
samples/day:           27,648,000
unique series:         4,800
raw row size:          0.6 GB/day
compressed (columnar): 0.1 GB/day
compression ratio:     12x

A 12× compression ratio on the row layout, which is roughly what TimescaleDB delivers on metric data (and is not exactly 60× — that number is what Prometheus achieves over an even more compact baseline). Why the ratios look smaller than Prometheus's claimed 60×: Prometheus's "60×" baseline is the worst-case 16-byte uncompressed timestamp + 8-byte uncompressed value (= 24 bytes/sample), and the compressed result is ~0.4 bytes/sample on a flat counter. TimescaleDB starts from a similar 24-byte row layout but compresses to ~2 bytes/sample on average, giving 12×. Both numbers describe the same underlying Gorilla XOR algorithm operating on the float column; the difference is what other columns each system carries. Prometheus stores the series identity once per chunk (in the head's index, not in the chunk itself); TimescaleDB carries series_id in the row, so its compression includes the cost of de-duplicating the series_id column as well. End result: comparable byte/sample at rest.

The runtime cost of compressed chunks is paid at query time. A query that touches a compressed chunk has to decompress the relevant column arrays before applying predicates — and predicates that reference attributes inside the array (the standard case for WHERE series_id = 8821) require per-array decompression even when only one row of the array would match. TimescaleDB's _timescaledb_internal.decompress_chunk is the function that pays this cost. On Razorpay's production Promscale deployment in 2022, queries on hot (uncompressed) chunks ran in 12ms; the same query on a compressed chunk older than 7 days ran in 180ms. The 15× slowdown is the price of 12× space saving — the same trade other columnar OLAP systems make and the reason "warm vs cold tier" is a real distinction.

Hypertable chunk lifecycle — row-store while hot, columnar after compress_afterDiagram showing two chunks side by side. The left chunk is row-oriented: each row is (time, value, series_id), shown as a stack of small rectangles. The right chunk has been compressed: a single header row with three array columns labelled time array, value array, series_id array, each of length 1000, with byte sizes annotated (e.g. 500 bytes for time array, 1300 bytes for value array, 100 bytes for series_id array). An arrow labelled compress_after = 7d points from row-store to columnar. Annotations indicate query cost: row-store chunk decompresses in 12ms, columnar chunk decompresses in 180ms but holds 12x less storage.Hypertable chunk — row-store hot, columnar coldHot chunk (row-oriented)today, < 7 days old2026-04-25 12:00:00 | 82.4 | 88212026-04-25 12:00:15 | 88.1 | 88212026-04-25 12:00:30 | 79.6 | 88212026-04-25 12:00:45 | 91.2 | 8821~24 bytes / sampleinsert: ~5 µs/rowquery: 12 ms (no decode)compress_after = 7dCold chunk (columnar)≥ 7 days oldtime_array[1000]delta-of-delta, ~500 Bvalue_array[1000]Gorilla XOR, ~1300 Bseries_id_array[1000]RLE, ~100 B~2 bytes / sample (12× saved)insert: not applicablequery: 180 ms (decode)Source: TimescaleDB compression docs (compress_segmentby, compress_orderby) and Promscale benchmark, 2022.
Illustrative — compression ratio and query latencies derived from Timescale's published benchmarks and an internal Razorpay Promscale eval, not measured byte-for-byte. The hot/cold split is the same idea as Prometheus's "head vs persistent block" but expressed as in-place row-to-columnar rewrite of an entire hypertable chunk rather than a flush of memory-resident chunks to a sealed block file.

Querying — PromQL down, SQL up

Promscale's bet was that PromQL is a constraint, not a feature. Most observability queries are PromQL-shaped — rate(), histogram_quantile(), topk() — and Promscale supported them by translating the PromQL AST into SQL. But the interesting queries — the ones that join payments_settled_total against the customers table to find which top-tier merchants are seeing settlement failures — are not expressible in PromQL at all. Those queries are what SQL exists for.

-- A query you can write against Promscale but not against vanilla Prometheus.
-- Find top-10 merchants by settlement-failure rate, joined with their tier from Postgres.
WITH error_rate AS (
  SELECT
    series.labels->>'merchant_id' AS merchant_id,
    SUM(CASE WHEN series.labels->>'status' = 'failed' THEN value ELSE 0 END) /
      NULLIF(SUM(value), 0) AS fail_ratio
  FROM prom_data.payments_settled_total m
  JOIN prom_data_series.payments_settled_total series USING (series_id)
  WHERE m.time >= NOW() - INTERVAL '6 hours'
  GROUP BY series.labels->>'merchant_id'
)
SELECT er.merchant_id, c.tier, c.business_name, er.fail_ratio
FROM error_rate er
JOIN customers c ON c.id::text = er.merchant_id
WHERE er.fail_ratio > 0.001 AND c.tier IN ('gold', 'platinum')
ORDER BY er.fail_ratio DESC
LIMIT 10;
 merchant_id |   tier   |    business_name    | fail_ratio
-------------+----------+---------------------+-----------
 mer_8K2HQ7  | platinum | Tata 1mg            |   0.04123
 mer_3V1MQ9  | gold     | BigBasket           |   0.02871
 mer_X4LZH2  | platinum | Hotstar Subscription|   0.01402
 mer_7Q9PFR  | gold     | MakeMyTrip Bookings |   0.00984

This query — three lines of SQL joining a metrics hypertable with a transactional customers table — is the value proposition Promscale tried to deliver. It is not expressible in vanilla PromQL because PromQL has no join over heterogeneous data sources. Why this mattered to data teams more than to SREs: the SRE on call wants rate(payments_settled_total{status="failed"}[5m]) / rate(payments_settled_total[5m]) > 0.01 for an alert, which Prometheus does perfectly. The data analyst building a finance dashboard wants the merchant tier and the ₹-amount-at-risk attached to each row, which requires a join. Promscale made the second query trivial; Prometheus + Grafana + a separate query against the customers table makes it a multi-tab adventure with manual reconciliation. The pitch was real, but the audience was narrower than Timescale projected — most observability work is the first kind, not the second.

Why Promscale was sunset (and what replaced it)

In November 2023 Timescale announced that Promscale was end-of-life. The post-mortem the team published is unusually candid: the project worked, the architecture was sound, but the operational model — running PostgreSQL as a metrics store at the scale where Prometheus actually struggles — was harder than running Mimir or VictoriaMetrics. PostgreSQL's strengths (transactional consistency, foreign keys, complex query planner) are observability's weaknesses (every metric write becomes a transaction, FKs add latency, the planner over-optimises for OLTP and under-optimises for the "always sequential scan a chunk" pattern).

Three specific failure modes broke the deployment story:

  1. Vacuum at scale. A hypertable ingesting 80k samples/sec creates 80k dead tuples per second when retention deletes old chunks (until compression takes over). PostgreSQL's autovacuum runs every minute and reads index pages to mark tuples as visible-to-no-one — work that Prometheus's TSDB does not have because Prometheus does not have MVCC. Razorpay's Promscale deployment spent 30% of CPU on vacuum-related work; the operations team eventually disabled autovacuum on the metric tables and ran weekly VACUUM FULL, which is itself a multi-hour operation requiring downtime.
  2. WAL write amplification. Every Promscale insert wrote one row to the heap, one entry to each index, and one full-page-image to the WAL. Prometheus writes the same sample as one append to its own WAL. A 5× write-amplification factor in disk I/O made Promscale's storage cost-of-ingestion higher than Prometheus's, on the same hardware.
  3. Multi-tenant isolation. Razorpay runs separate Prometheus instances per business unit (payments, neobanking, banking); Promscale wanted to be a single shared store. PostgreSQL's row-security policies work but add 8% query overhead, and a tenant doing a bad query on a shared hypertable can still saturate the I/O for everyone. The Mimir / Cortex multi-tenant design (separate ingester pods per tenant) does not have this property.

The replacement story for teams that need SQL-shaped joins between metrics and OLTP is not "go back to Promscale" but "use a separate analytics warehouse that ingests both — ClickHouse for metrics, BigQuery or Snowflake for the join layer". The Mimir + Loki + Tempo stack handles operational observability; ad-hoc analytical joins move to whichever warehouse the data team already operates. The architectural lesson Promscale leaves behind: time-partitioning + columnar compression of cold chunks is the right recipe; doing it inside PostgreSQL is one of several places to put it, and not the operationally cheapest place.

Common confusions

Going deeper

compress_segmentby and compress_orderby — the columnar tuning knobs

TimescaleDB's compression configuration takes two key options: segmentby (which columns become grouping keys for the row-groups inside a compressed chunk) and orderby (which columns determine the in-row-group ordering, which controls which compression algorithm gets the most leverage). For metric data, the right setting is segmentby = (series_id), orderby = (time DESC) — group the rows of one series together, sort by time inside the group. Why this combination is non-obvious: segmentby = (series_id) makes the value column inside one row-group a contiguous time-ordered float series, which is exactly the input Gorilla XOR was designed for. Without segmentby, the value column would interleave samples from different series, the deltas would not be small, and Gorilla would compress to ~6 bytes/sample instead of ~1.3. The 5× difference in compression ratio is entirely a consequence of getting the segmentby right. Promscale's schema set this automatically per metric table; teams that ran TimescaleDB without Promscale and forgot to set segmentby got far worse compression and blamed the algorithm.

Continuous aggregates — TimescaleDB's recording-rule equivalent

A CONTINUOUS AGGREGATE is a materialised view that TimescaleDB maintains incrementally, refreshing only the recently changed time buckets rather than the whole view. For metrics, this is the equivalent of Prometheus's recording rules — pre-computed summaries (1-minute aggregates of high-cardinality series) that turn ad-hoc dashboard queries from seconds into milliseconds. The mechanism is different from Prometheus's: continuous aggregates are PostgreSQL MATERIALIZED VIEW plus a refresh policy, evaluated by pg_cron, written to a separate hypertable. The semantics are stronger (full ACID, exact incremental) at the cost of more disk for the aggregate hypertable.

The contrast with Prometheus recording rules is operationally interesting. A recording rule that lags is silently outdated; a continuous aggregate that lags has its pg_stat_continuous_aggregates_jobs.last_successful_finish column you can alert on directly. Mimir and VictoriaMetrics now ship similar features, partly inspired by the continuous-aggregate model.

What the post-mortem said and what comes next

Timescale's announcement of Promscale's end-of-life (November 2023) is required reading for anyone considering a Postgres-as-metrics architecture. The summary: the addressable market for "I want SQL joins on my metrics" was smaller than projected, the operational complexity was higher than projected, and the dominant observability stack consolidated around the OpenTelemetry-shaped pipeline (instrumentation → OTLP → Mimir/Tempo/Loki → Grafana) without a SQL hop in the middle. The post-mortem is not "the architecture was wrong" but "the product/market fit was thinner than we thought".

The successor architecture that does retain the SQL-joins benefit is a streaming pipeline: OTLP collector → Kafka → ClickHouse for metrics + ClickHouse or BigQuery for application data, with a query layer (Trino, Cube, or direct Grafana) that joins across both. The operational cost is still substantial; the difference is that the components are independently scalable, and ClickHouse is purpose-built for this access pattern in a way PostgreSQL is not.

When hypertables beat custom TSDBs — the OLTP-adjacent case

Not every team using TimescaleDB for time-series data is doing observability. CRED runs hypertables for rewards-engine event streams; Zerodha Kite runs them for tick data on the back-office side (not the trading hot path); Swiggy runs them for delivery-rider GPS pings that need to be joined against order tables. In all three cases the data has the time-series shape but the workload mixes append-heavy ingest with frequent OLTP-style point reads against the same rows — "show me Riya's last 10 rides", "find the GPS ping for order #5821", "what reward did Aditi accrue last Tuesday". A custom TSDB optimised for full-scan analytical queries is the wrong tool; a hypertable, indexed normally and joined freely, is exactly right.

The lesson is that "hypertable for time-series" is a yes; "hypertable for high-throughput observability metrics specifically" is the place where the operational trade-offs caught up with Promscale. The architectural pattern is sound; the use case has to match. Razorpay still uses TimescaleDB hypertables for its settlement event stream — 18 months of event-time-partitioned payments_settled rows that the finance team queries directly with SQL — and runs Mimir for operational metrics. Both choices are right at the same time.

Reproducibility footer

# Reproduce on your laptop — TimescaleDB only (Promscale itself is archived)
docker run -d --name tsdb -p 5432:5432 -e POSTGRES_PASSWORD=postgres timescale/timescaledb:latest-pg16
sleep 5
docker exec tsdb psql -U postgres -c "CREATE DATABASE promscale;"
docker exec tsdb psql -U postgres -d promscale -c "CREATE EXTENSION timescaledb;"
# Bootstrap the schema and run the ingest script
python3 -m venv .venv && source .venv/bin/activate
pip install psycopg2-binary pandas
python3 promscale_ingest.py
python3 compression_economics.py
# Inspect chunks and compression after a few minutes
docker exec tsdb psql -U postgres -d promscale -c "SELECT show_chunks('prom_data.checkout_latency_p99_ms');"
docker exec tsdb psql -U postgres -d promscale -c "SELECT * FROM hypertable_compression_stats('prom_data.checkout_latency_p99_ms');"

Where this leads next

Promscale's hypertable design is the SQL-layer alternative to Prometheus's TSDB. The next chapter on downsampling for long retention is the orthogonal compression lever — instead of squeezing each sample into 1.3 bytes, drop most of them entirely past a retention threshold. Both Prometheus's TSDB and TimescaleDB hypertables can apply downsampling; the mechanics differ.

The single insight of this chapter: Promscale's failure was operational, not architectural. The architecture — time-partitioned chunks, columnar compression of cold partitions, SQL as the query interface — is correct and is the design every modern columnar metrics store converges on. PostgreSQL turned out to be the wrong host process for it at observability scale because PostgreSQL's design is centred on transactions and MVCC, two properties metrics workloads do not need and pay for. The hypertable pattern is alive in ClickHouse, in Druid, in InfluxDB IOx, and in TimescaleDB itself for non-metrics workloads. Aditi's 02:40 IST query — joining payments_settled_total against the merchants table — is still the right query to want; the mistake was thinking PostgreSQL was the right place to run it at 50k samples/sec.

References