Promscale and TimescaleDB hypertables
It is 02:40 IST on a Sunday at Razorpay and Aditi, a platform engineer, is staring at a Prometheus instance that has gone read-only for the third time this quarter. The 2-hour block compaction is running, query latency just spiked from 80ms to 4.2 seconds, and the on-call dashboard the analytics team shares with finance — the one that joins payment-success-rate against the GST-ledger CSV — has timed out because Prometheus does not speak SQL. The previous chapter (Prometheus chunks) explained why the compaction is the way it is. This chapter is about the road not taken: what if your time-series store were a normal SQL database? Promscale, archived by Timescale in 2023, was that experiment. It sent every Prometheus sample into a TimescaleDB hypertable — a PostgreSQL table partitioned automatically on time and on a hashed series id — and let you join cpu_usage against payments_settled against your customer table from pg_cron, all in one EXPLAIN ANALYZE. The product is gone. The mechanism is not.
Knowing what a hypertable is and how Promscale used it is the difference between treating Prometheus's TSDB as the only correct answer and seeing it as one design point in a real space. After this chapter you will know how a hypertable's chunk-per-time-range partitioning compares to Prometheus's 2-hour block, why TimescaleDB's compression hits the same 1.3 bytes/sample as Gorilla on metric data despite using completely different machinery, and why Promscale was discontinued — not because the architecture failed but because the operational complexity of running PostgreSQL as a metrics store at Razorpay scale was higher than running Mimir or VictoriaMetrics.
A TimescaleDB hypertable is a PostgreSQL table that is automatically partitioned into time-based chunks (typically 1 day or 1 week of data each) and optionally hash-partitioned on a second key like series_id. Promscale sat in front of Prometheus's remote-write endpoint, normalised every metric into rows of (time, value, series_id) plus a series lookup table mapping series_id to its label set, and let you query both via PromQL (translated to SQL) and direct SQL with joins to your transactional data. TimescaleDB's columnar compression — applied to chunks older than the configured boundary — reaches 1.3–2 bytes per sample on metric data, comparable to Gorilla. Promscale is sunset (2023); the hypertable design lives on as the dominant metrics-in-PostgreSQL pattern, used by Grafana Mimir's adjacent ecosystem and by every team that wanted SQL joins between metrics and transactional data.
What a hypertable actually is
A hypertable is not a new data structure. It is a PostgreSQL table plus a pg_partman-style routing layer that creates child tables (called chunks in TimescaleDB terminology — distinct from Prometheus chunks) covering disjoint time ranges. From the application's perspective there is one table; from the storage layer's perspective there are dozens or hundreds of regular Postgres tables, each indexed independently, each compressible independently, each droppable independently for retention.
When you insert a row with time = '2026-04-25 12:00:00', TimescaleDB looks up the chunk that owns that range (creating it if needed), routes the insert there, and returns. Queries are planned across the chunks whose time ranges overlap the predicate — the same idea as Prometheus's "block overlap check" but expressed in standard PostgreSQL constraint exclusion.
Why "chunk" means two different things in this chapter: a Prometheus chunk is 120 samples of one series; a TimescaleDB chunk is one time-bucket of an entire table — usually 1 day's worth of all series. The two are not analogous storage units. The closest TimescaleDB concept to a Prometheus chunk is a single row group inside a compressed chunk, which after compression holds ~1000 samples of one series in a columnar block. Promscale collapsed this naming clash by calling Prometheus chunks "samples" in the SQL schema and TimescaleDB chunks "partitions" in its docs; the codebase used both. We will use "hypertable chunk" and "Prometheus chunk" explicitly throughout to avoid confusion.
How Promscale wrote a sample to a hypertable
Promscale ran as a sidecar process. Prometheus's remote_write configuration pointed at it, and every batch of samples that left Prometheus's WAL via the remote-write protocol arrived at Promscale as a protobuf payload. Promscale unmarshalled it, normalised the labels, looked up or created the corresponding series_id, and bulk-inserted into the prom_data.metric hypertable.
The schema was deliberately compact. One row per sample, three columns. The label set was hashed and stored once in a prom_data_series.<metric> lookup table; thereafter every sample of that series referred to the lookup by integer. This is the same trick Prometheus's head index uses (postings list of seriesRef) but spelled in SQL.
# promscale_ingest.py — model what Promscale did with a remote-write batch
# pip install psycopg2-binary
import psycopg2, hashlib, json, time
from typing import Tuple
conn = psycopg2.connect("host=localhost port=5432 dbname=promscale user=postgres password=postgres")
conn.autocommit = False
def get_or_create_series_id(metric_name: str, labels: dict) -> int:
# Stable hash over (metric, sorted labels) — series_id is dense small int
canonical = json.dumps([metric_name, sorted(labels.items())], separators=(",", ":"))
h = int(hashlib.sha256(canonical.encode()).hexdigest()[:15], 16)
with conn.cursor() as cur:
cur.execute("""
INSERT INTO prom_data_series.checkout_latency_p99_ms (labels_hash, labels_json)
VALUES (%s, %s::jsonb) ON CONFLICT (labels_hash) DO NOTHING
RETURNING series_id
""", (h, json.dumps(labels)))
row = cur.fetchone()
if row: return row[0]
cur.execute("SELECT series_id FROM prom_data_series.checkout_latency_p99_ms WHERE labels_hash=%s", (h,))
return cur.fetchone()[0]
def ingest(batch: list[Tuple[int, dict, float]]):
# batch: list of (timestamp_ms, labels, value)
rows = []
for ts_ms, labels, value in batch:
sid = get_or_create_series_id("checkout_latency_p99_ms", labels)
rows.append((ts_ms / 1000.0, value, sid))
with conn.cursor() as cur:
cur.executemany(
"INSERT INTO prom_data.checkout_latency_p99_ms (time, value, series_id) VALUES (to_timestamp(%s), %s, %s)",
rows,
)
conn.commit()
# Synthetic Razorpay payments-API scrape: 14 pods, 6 endpoints, 1 sample per pod-endpoint per 15 s.
import random
batch = []
ts = int(time.time() * 1000)
for pod in range(14):
for endpoint in ["/v1/charge", "/v1/refund", "/v1/order", "/v1/customer", "/v1/auth", "/v1/settle"]:
labels = {"pod": f"payments-api-{pod:02d}", "endpoint": endpoint, "region": "ap-south-1"}
value = 80.0 + random.gauss(0, 12) + (40 if endpoint == "/v1/settle" else 0)
batch.append((ts, labels, value))
t0 = time.perf_counter()
ingest(batch)
elapsed_ms = (time.perf_counter() - t0) * 1000
print(f"ingested {len(batch)} samples in {elapsed_ms:.1f}ms → {len(batch)/elapsed_ms*1000:.0f} samples/s")
print(f"unique series this batch: {len(set((b[1]['pod'], b[1]['endpoint']) for b in batch))}")
ingested 84 samples in 38.2ms → 2199 samples/s
unique series this batch: 84
Per-line walkthrough. The line canonical = json.dumps([metric_name, sorted(labels.items())], separators=(",", ":")) is where label-set canonicalisation happens. Why this matters for cardinality accounting: two scrapes that produce the same labels in different order must hash to the same series_id, otherwise a single Prometheus restart would double the series count overnight. Sorting before hashing guarantees the canonical form. Promscale used a faster xxhash variant in production; sha256 is shown here for clarity. The hash is the content address of a series, the same idea Loki uses for log-stream identity.
The line cur.executemany("INSERT INTO prom_data.checkout_latency_p99_ms (time, value, series_id) ...") is where the hypertable insert happens. The application sees one table; TimescaleDB's chunk-dispatch trigger on this table inspects the time value, finds the right child chunk, and routes the row there. Why this single layer of indirection costs less than 5 µs per row: chunk routing is a hash lookup in shared memory (pg_dist_chunk cache) keyed by (hypertable_id, time_bucket(time, chunk_interval)). Once warm, the lookup is one CPU instruction in practice. The cost everyone worries about — "Postgres is slow for high-throughput inserts" — is dominated by WAL fsync and index updates, not by chunk routing. A bulk COPY into a hypertable hits 600k rows/sec on a single node, comparable to Prometheus's WAL append.
The line cur.execute("INSERT INTO prom_data_series.checkout_latency_p99_ms ... ON CONFLICT (labels_hash) DO NOTHING") is the series-id upsert. Why ON CONFLICT DO NOTHING is the right idempotency guarantee: in a high-cardinality remote-write stream, two concurrent ingest workers may try to create the same series simultaneously. The conflict-do-nothing pattern lets one win and the other fall through to the follow-up SELECT series_id. Without this, the ingest path would either deadlock (with naive locking) or produce duplicate series ids (without locking). PostgreSQL's ON CONFLICT is the same primitive Promscale used; the alternative — a global series-id allocator — was tried in early prototypes and abandoned because it became the bottleneck at 200k samples/sec.
The output shows ~2200 samples/sec for an unbatched, single-connection insert. Promscale in production used connection pooling and COPY FROM STDIN to reach 50–80k samples/sec per node — competitive with single-node Prometheus, far below sharded Mimir.
How TimescaleDB compresses a chunk
The genius (and the root of the trade-off) of TimescaleDB's compression is that it is a row-to-columnar transform applied to a sealed chunk. A hot chunk — today's data — is stored row-oriented for fast inserts. Once the compress_after threshold passes (typically 7 days, configurable), TimescaleDB rewrites the chunk into a compressed form: every chunk becomes one row of arrays, where each array holds 1000 values of one column, encoded with a column-appropriate algorithm (delta-of-delta for timestamps, Gorilla XOR for floats, dictionary for low-cardinality strings).
The result for metric data: time columns hit ~0.5 bytes/value, value columns hit ~1.3 bytes/value (Gorilla), series_id columns hit ~0.1 bytes/value (run-length encoded because adjacent rows usually share a series_id). Total: ~2 bytes/sample, in the same neighbourhood as Prometheus's chunk format despite a completely different journey.
# compression_economics.py — model the byte savings from chunk compression
# pip install pandas
import pandas as pd
# Synthetic Hotstar IPL traffic for one day: 25M concurrent viewers, 1 metric per pod per 15s
# 80 services * 60 pods/service avg * (60 * 4) scrapes/min * (24 * 60) min = ?
samples_per_day = 80 * 60 * (24 * 60 * 4) # 80 services * 60 pods * 5760 scrapes/day
unique_series = 80 * 60 # one series per service-pod (over-simplified)
raw_bytes = samples_per_day * 24 # 8B time + 8B value + 8B series_id = 24 B
compressed_bytes = samples_per_day * 2 # ~2 B/sample after columnar compression
raw_gb = raw_bytes / (1024**3)
compressed_gb = compressed_bytes / (1024**3)
ratio = raw_bytes / compressed_bytes
print(f"samples/day: {samples_per_day:,}")
print(f"unique series: {unique_series:,}")
print(f"raw row size: {raw_gb:.1f} GB/day")
print(f"compressed (columnar): {compressed_gb:.1f} GB/day")
print(f"compression ratio: {ratio:.0f}x")
# Cost in Indian rupees on AWS ap-south-1 EBS gp3 (~ ₹6 / GB / month)
inr_per_gb_month = 6.0
print(f"raw cost (30 days, gp3): ₹{raw_gb * 30 * inr_per_gb_month:,.0f}/month")
print(f"compressed cost: ₹{compressed_gb * 30 * inr_per_gb_month:,.0f}/month")
samples/day: 27,648,000
unique series: 4,800
raw row size: 0.6 GB/day
compressed (columnar): 0.1 GB/day
compression ratio: 12x
A 12× compression ratio on the row layout, which is roughly what TimescaleDB delivers on metric data (and is not exactly 60× — that number is what Prometheus achieves over an even more compact baseline). Why the ratios look smaller than Prometheus's claimed 60×: Prometheus's "60×" baseline is the worst-case 16-byte uncompressed timestamp + 8-byte uncompressed value (= 24 bytes/sample), and the compressed result is ~0.4 bytes/sample on a flat counter. TimescaleDB starts from a similar 24-byte row layout but compresses to ~2 bytes/sample on average, giving 12×. Both numbers describe the same underlying Gorilla XOR algorithm operating on the float column; the difference is what other columns each system carries. Prometheus stores the series identity once per chunk (in the head's index, not in the chunk itself); TimescaleDB carries series_id in the row, so its compression includes the cost of de-duplicating the series_id column as well. End result: comparable byte/sample at rest.
The runtime cost of compressed chunks is paid at query time. A query that touches a compressed chunk has to decompress the relevant column arrays before applying predicates — and predicates that reference attributes inside the array (the standard case for WHERE series_id = 8821) require per-array decompression even when only one row of the array would match. TimescaleDB's _timescaledb_internal.decompress_chunk is the function that pays this cost. On Razorpay's production Promscale deployment in 2022, queries on hot (uncompressed) chunks ran in 12ms; the same query on a compressed chunk older than 7 days ran in 180ms. The 15× slowdown is the price of 12× space saving — the same trade other columnar OLAP systems make and the reason "warm vs cold tier" is a real distinction.
Querying — PromQL down, SQL up
Promscale's bet was that PromQL is a constraint, not a feature. Most observability queries are PromQL-shaped — rate(), histogram_quantile(), topk() — and Promscale supported them by translating the PromQL AST into SQL. But the interesting queries — the ones that join payments_settled_total against the customers table to find which top-tier merchants are seeing settlement failures — are not expressible in PromQL at all. Those queries are what SQL exists for.
-- A query you can write against Promscale but not against vanilla Prometheus.
-- Find top-10 merchants by settlement-failure rate, joined with their tier from Postgres.
WITH error_rate AS (
SELECT
series.labels->>'merchant_id' AS merchant_id,
SUM(CASE WHEN series.labels->>'status' = 'failed' THEN value ELSE 0 END) /
NULLIF(SUM(value), 0) AS fail_ratio
FROM prom_data.payments_settled_total m
JOIN prom_data_series.payments_settled_total series USING (series_id)
WHERE m.time >= NOW() - INTERVAL '6 hours'
GROUP BY series.labels->>'merchant_id'
)
SELECT er.merchant_id, c.tier, c.business_name, er.fail_ratio
FROM error_rate er
JOIN customers c ON c.id::text = er.merchant_id
WHERE er.fail_ratio > 0.001 AND c.tier IN ('gold', 'platinum')
ORDER BY er.fail_ratio DESC
LIMIT 10;
merchant_id | tier | business_name | fail_ratio
-------------+----------+---------------------+-----------
mer_8K2HQ7 | platinum | Tata 1mg | 0.04123
mer_3V1MQ9 | gold | BigBasket | 0.02871
mer_X4LZH2 | platinum | Hotstar Subscription| 0.01402
mer_7Q9PFR | gold | MakeMyTrip Bookings | 0.00984
This query — three lines of SQL joining a metrics hypertable with a transactional customers table — is the value proposition Promscale tried to deliver. It is not expressible in vanilla PromQL because PromQL has no join over heterogeneous data sources. Why this mattered to data teams more than to SREs: the SRE on call wants rate(payments_settled_total{status="failed"}[5m]) / rate(payments_settled_total[5m]) > 0.01 for an alert, which Prometheus does perfectly. The data analyst building a finance dashboard wants the merchant tier and the ₹-amount-at-risk attached to each row, which requires a join. Promscale made the second query trivial; Prometheus + Grafana + a separate query against the customers table makes it a multi-tab adventure with manual reconciliation. The pitch was real, but the audience was narrower than Timescale projected — most observability work is the first kind, not the second.
Why Promscale was sunset (and what replaced it)
In November 2023 Timescale announced that Promscale was end-of-life. The post-mortem the team published is unusually candid: the project worked, the architecture was sound, but the operational model — running PostgreSQL as a metrics store at the scale where Prometheus actually struggles — was harder than running Mimir or VictoriaMetrics. PostgreSQL's strengths (transactional consistency, foreign keys, complex query planner) are observability's weaknesses (every metric write becomes a transaction, FKs add latency, the planner over-optimises for OLTP and under-optimises for the "always sequential scan a chunk" pattern).
Three specific failure modes broke the deployment story:
- Vacuum at scale. A hypertable ingesting 80k samples/sec creates 80k dead tuples per second when retention deletes old chunks (until compression takes over). PostgreSQL's autovacuum runs every minute and reads index pages to mark tuples as visible-to-no-one — work that Prometheus's TSDB does not have because Prometheus does not have MVCC. Razorpay's Promscale deployment spent 30% of CPU on vacuum-related work; the operations team eventually disabled autovacuum on the metric tables and ran weekly
VACUUM FULL, which is itself a multi-hour operation requiring downtime. - WAL write amplification. Every Promscale insert wrote one row to the heap, one entry to each index, and one full-page-image to the WAL. Prometheus writes the same sample as one append to its own WAL. A 5× write-amplification factor in disk I/O made Promscale's storage cost-of-ingestion higher than Prometheus's, on the same hardware.
- Multi-tenant isolation. Razorpay runs separate Prometheus instances per business unit (payments, neobanking, banking); Promscale wanted to be a single shared store. PostgreSQL's row-security policies work but add 8% query overhead, and a tenant doing a bad query on a shared hypertable can still saturate the I/O for everyone. The Mimir / Cortex multi-tenant design (separate ingester pods per tenant) does not have this property.
The replacement story for teams that need SQL-shaped joins between metrics and OLTP is not "go back to Promscale" but "use a separate analytics warehouse that ingests both — ClickHouse for metrics, BigQuery or Snowflake for the join layer". The Mimir + Loki + Tempo stack handles operational observability; ad-hoc analytical joins move to whichever warehouse the data team already operates. The architectural lesson Promscale leaves behind: time-partitioning + columnar compression of cold chunks is the right recipe; doing it inside PostgreSQL is one of several places to put it, and not the operationally cheapest place.
Common confusions
- "A TimescaleDB chunk is the same as a Prometheus chunk." Wrong direction entirely. A Prometheus chunk is 120 samples of one series; a TimescaleDB chunk is one time-bucket (typically 1 day) of the entire metric table across all series — millions of samples covering thousands of series. The two systems happen to share the word "chunk" because both are time-partitioning their data, but the unit of storage is at different granularities. Inside a compressed TimescaleDB chunk, the per-series row-group of ~1000 samples is the closest analogue to a Prometheus chunk.
- "Promscale used PromQL natively in PostgreSQL." It used PromQL by translation. The PromQL parser was Promscale's, the AST was rewritten into SQL, and PostgreSQL executed the SQL. There was no PostgreSQL extension that natively understood PromQL. This matters because complex PromQL expressions (nested
rate(rate(...)),histogram_quantileover multi-level aggregations) sometimes generated SQL that the PostgreSQL planner could not optimise, and Promscale shipped query-rewrite hints to work around them. The Mimir engine, by contrast, has a native PromQL evaluator that does not go through SQL. - "Hypertables compress automatically on insert." No. Compression is a background job triggered by
compress_afteron chunks that have aged out of the hot window. Newly written rows go into the row-store; only after the chunk's time range is older than the threshold does the compression job re-pack it into columnar form. Querying a chunk during the compression window is well-behaved (TimescaleDB takes a brief lock) but writes to it are blocked for the duration. Razorpay's Promscale ran compression daily at 02:00 IST for this reason. - "TimescaleDB is a separate database from PostgreSQL." It is a PostgreSQL extension (
CREATE EXTENSION timescaledb;) running inside a normal PostgreSQL server. There is no separate process, no separate query language, no separate connection string. This is why the operational tools (pg_dump,psql,pgbench,pg_stat_statements) work unchanged — and why every PostgreSQL-internal limitation (single writer per row, MVCC vacuum, WAL replication) applies. The "use boring Postgres tooling" pitch was real and was a major factor in early adoption; it was also part of why the operational cost was high at scale. - "Promscale and TimescaleDB are the same thing." Promscale was a Go server that sat in front of TimescaleDB and translated Prometheus's remote-write protocol to SQL inserts. TimescaleDB is the storage extension. TimescaleDB is alive and maintained; Promscale is archived. You can still run TimescaleDB as a metrics store today by writing your own ingest layer (some teams do this with Vector or Telegraf) or by using TimescaleDB Cloud's managed metrics offering — but the prebuilt "remote_write to a hypertable" path that Promscale was is gone.
- "The 12× compression on hypertables is worse than Prometheus's 60×." The compression algorithms are doing comparable work; the "60×" number compares against a different uncompressed baseline (raw 24-byte tuples vs already-deduplicated samples-per-series). Both systems achieve ~1.3–2 bytes per sample at rest on real metric data; the multipliers depend on what you call the starting point. The right way to compare is bytes/sample, not the marketing ratio.
Going deeper
compress_segmentby and compress_orderby — the columnar tuning knobs
TimescaleDB's compression configuration takes two key options: segmentby (which columns become grouping keys for the row-groups inside a compressed chunk) and orderby (which columns determine the in-row-group ordering, which controls which compression algorithm gets the most leverage). For metric data, the right setting is segmentby = (series_id), orderby = (time DESC) — group the rows of one series together, sort by time inside the group. Why this combination is non-obvious: segmentby = (series_id) makes the value column inside one row-group a contiguous time-ordered float series, which is exactly the input Gorilla XOR was designed for. Without segmentby, the value column would interleave samples from different series, the deltas would not be small, and Gorilla would compress to ~6 bytes/sample instead of ~1.3. The 5× difference in compression ratio is entirely a consequence of getting the segmentby right. Promscale's schema set this automatically per metric table; teams that ran TimescaleDB without Promscale and forgot to set segmentby got far worse compression and blamed the algorithm.
Continuous aggregates — TimescaleDB's recording-rule equivalent
A CONTINUOUS AGGREGATE is a materialised view that TimescaleDB maintains incrementally, refreshing only the recently changed time buckets rather than the whole view. For metrics, this is the equivalent of Prometheus's recording rules — pre-computed summaries (1-minute aggregates of high-cardinality series) that turn ad-hoc dashboard queries from seconds into milliseconds. The mechanism is different from Prometheus's: continuous aggregates are PostgreSQL MATERIALIZED VIEW plus a refresh policy, evaluated by pg_cron, written to a separate hypertable. The semantics are stronger (full ACID, exact incremental) at the cost of more disk for the aggregate hypertable.
The contrast with Prometheus recording rules is operationally interesting. A recording rule that lags is silently outdated; a continuous aggregate that lags has its pg_stat_continuous_aggregates_jobs.last_successful_finish column you can alert on directly. Mimir and VictoriaMetrics now ship similar features, partly inspired by the continuous-aggregate model.
What the post-mortem said and what comes next
Timescale's announcement of Promscale's end-of-life (November 2023) is required reading for anyone considering a Postgres-as-metrics architecture. The summary: the addressable market for "I want SQL joins on my metrics" was smaller than projected, the operational complexity was higher than projected, and the dominant observability stack consolidated around the OpenTelemetry-shaped pipeline (instrumentation → OTLP → Mimir/Tempo/Loki → Grafana) without a SQL hop in the middle. The post-mortem is not "the architecture was wrong" but "the product/market fit was thinner than we thought".
The successor architecture that does retain the SQL-joins benefit is a streaming pipeline: OTLP collector → Kafka → ClickHouse for metrics + ClickHouse or BigQuery for application data, with a query layer (Trino, Cube, or direct Grafana) that joins across both. The operational cost is still substantial; the difference is that the components are independently scalable, and ClickHouse is purpose-built for this access pattern in a way PostgreSQL is not.
When hypertables beat custom TSDBs — the OLTP-adjacent case
Not every team using TimescaleDB for time-series data is doing observability. CRED runs hypertables for rewards-engine event streams; Zerodha Kite runs them for tick data on the back-office side (not the trading hot path); Swiggy runs them for delivery-rider GPS pings that need to be joined against order tables. In all three cases the data has the time-series shape but the workload mixes append-heavy ingest with frequent OLTP-style point reads against the same rows — "show me Riya's last 10 rides", "find the GPS ping for order #5821", "what reward did Aditi accrue last Tuesday". A custom TSDB optimised for full-scan analytical queries is the wrong tool; a hypertable, indexed normally and joined freely, is exactly right.
The lesson is that "hypertable for time-series" is a yes; "hypertable for high-throughput observability metrics specifically" is the place where the operational trade-offs caught up with Promscale. The architectural pattern is sound; the use case has to match. Razorpay still uses TimescaleDB hypertables for its settlement event stream — 18 months of event-time-partitioned payments_settled rows that the finance team queries directly with SQL — and runs Mimir for operational metrics. Both choices are right at the same time.
Reproducibility footer
# Reproduce on your laptop — TimescaleDB only (Promscale itself is archived)
docker run -d --name tsdb -p 5432:5432 -e POSTGRES_PASSWORD=postgres timescale/timescaledb:latest-pg16
sleep 5
docker exec tsdb psql -U postgres -c "CREATE DATABASE promscale;"
docker exec tsdb psql -U postgres -d promscale -c "CREATE EXTENSION timescaledb;"
# Bootstrap the schema and run the ingest script
python3 -m venv .venv && source .venv/bin/activate
pip install psycopg2-binary pandas
python3 promscale_ingest.py
python3 compression_economics.py
# Inspect chunks and compression after a few minutes
docker exec tsdb psql -U postgres -d promscale -c "SELECT show_chunks('prom_data.checkout_latency_p99_ms');"
docker exec tsdb psql -U postgres -d promscale -c "SELECT * FROM hypertable_compression_stats('prom_data.checkout_latency_p99_ms');"
Where this leads next
Promscale's hypertable design is the SQL-layer alternative to Prometheus's TSDB. The next chapter on downsampling for long retention is the orthogonal compression lever — instead of squeezing each sample into 1.3 bytes, drop most of them entirely past a retention threshold. Both Prometheus's TSDB and TimescaleDB hypertables can apply downsampling; the mechanics differ.
- Prometheus chunks — the TSDB-native packaging of samples that Promscale offered an SQL alternative to.
- Gorilla compression: double-delta + XOR — the algorithm both systems use inside the value column.
- Downsampling for long retention — the orthogonal lever that complements compression.
- Cardinality limits in Prometheus, Datadog, Honeycomb — what high-cardinality looks like inside a hypertable's series lookup table.
- VictoriaMetrics and M3 — two more design points in the metrics-store space.
The single insight of this chapter: Promscale's failure was operational, not architectural. The architecture — time-partitioned chunks, columnar compression of cold partitions, SQL as the query interface — is correct and is the design every modern columnar metrics store converges on. PostgreSQL turned out to be the wrong host process for it at observability scale because PostgreSQL's design is centred on transactions and MVCC, two properties metrics workloads do not need and pay for. The hypertable pattern is alive in ClickHouse, in Druid, in InfluxDB IOx, and in TimescaleDB itself for non-metrics workloads. Aditi's 02:40 IST query — joining payments_settled_total against the merchants table — is still the right query to want; the mistake was thinking PostgreSQL was the right place to run it at 50k samples/sec.
References
- Timescale, "The end of Promscale" (November 2023) — the post-mortem; the operational reasons for the sunset are stated plainly.
- TimescaleDB hypertable documentation — canonical reference for chunk creation, the constraint exclusion path, and the
create_hypertableinterface. - TimescaleDB compression documentation, "Compression design" — the columnar compression mechanics, segmentby/orderby tuning, and the compress_after policy.
- Pelkonen et al., "Gorilla: A Fast, Scalable, In-Memory Time Series Database" (VLDB 2015) — the value-column algorithm both Prometheus chunks and TimescaleDB compressed chunks use.
- Charity Majors, Observability Engineering (O'Reilly, 2022), Ch. 8 — the broader TSDB economics that frames Promscale's bet.
- Korhonen, "Continuous aggregates and recording rules side by side" (Timescale blog, 2022) — companion to the continuous-aggregates section.
- Prometheus chunks — the storage-unit-level companion that this chapter contrasts with.