In short

You have decided you need a time-series database. The four serious open-source choices are InfluxDB, TimescaleDB, QuestDB, and VictoriaMetrics — and they are not interchangeable. InfluxDB is a purpose-built TSDB written in Go with its own storage engine (TSM) and its own query language (Flux, with the older InfluxQL still supported). It is the most mature TSDB-first product, but the non-SQL surface is a real learning cost, and v3 is a major rewrite that is still settling. TimescaleDB is a Postgres extension. You install it into an existing Postgres database, run CREATE EXTENSION timescaledb, convert a regular table to a hypertable with SELECT create_hypertable(...), and keep using full standard SQL — joins, foreign keys, window functions, the lot. The price is that you inherit Postgres's single-node insert ceiling (50–120K rows/sec depending on hardware and tuning). QuestDB is a Java engine with off-heap columnar storage that posts the highest single-node write throughput of the four — comfortably in the millions of rows per second on a single box — and speaks SQL plus the InfluxDB Line Protocol (ILP) so existing InfluxDB clients can write to it unchanged. Smaller ecosystem, clustering still maturing. VictoriaMetrics is a Go engine that pretends to be Prometheus on the wire (same scrape, same /api/v1/query, plus PromQL and a superset MetricsQL) but with much better compression, horizontal scaling, and retention measured in years instead of weeks. It is the obvious answer when your existing observability stack is already Prometheus + Grafana and you have outgrown vanilla Prometheus's single-node store. The decision tree is short: already on Postgres? → TimescaleDB. Already on Prometheus? → VictoriaMetrics. Need maximum single-node throughput? → QuestDB. Greenfield, want the most established TSDB-first product? → InfluxDB. This chapter walks through the four side by side, then runs an Indian fintech case study (200K writes/sec of transaction telemetry, 30 days hot + 1 year aggregated, dashboards in Grafana) and picks one with reasoning you can re-apply to your own problem.

You have spent four chapters of Build 21 building a mental model of how a time-series database actually works — time-partitioned chunks, columnar layout inside chunks, delta + Gorilla compression, continuous aggregates, retention tiers ending in S3. That model is the foundation. But none of it tells you which TSDB to actually install on Monday morning. The four serious open-source contenders solve overlapping problems with very different design philosophies, and the right answer depends as much on what your team already knows and runs as it does on the raw numbers in a benchmark.

This chapter is the one that closes Build 21. It gives you the four products side by side, the criteria that should drive the choice, and a worked example of an Indian fintech doing the evaluation end to end. By the end you should be able to walk into a room, listen to your team describe their workload and existing stack for ten minutes, and confidently say "that one, here is why."

The four contenders, at a glance

The four products exist because they each made a different bet about which axis matters most. Understanding the bet is more important than memorising feature lists, because the bet determines what is hard to change later.

Four TSDBs, four design bets InfluxDB Go, custom TSM engine v3 = Apache Arrow Bet: purpose-built TSDB beats general-purpose Query: Flux / InfluxQL (SQL in v3) Throughput: ~500K rows/sec/node Hosted: InfluxDB Cloud Greenfield TS workloads TimescaleDB Postgres extension hypertables + chunks Bet: keep SQL + Postgres ecosystem, add chunks Query: full standard SQL + time_bucket() Throughput: ~50-120K rows/sec/node Hosted: Timescale Cloud Mixed OLTP + TS QuestDB Java, off-heap memory columnar files on disk Bet: extreme single-node throughput wins Query: SQL (TS extensions) + InfluxDB Line Protocol Throughput: 1-4M rows/sec/node Hosted: QuestDB Cloud Trading, IoT at scale VictoriaMetrics Go, columnar mergeset Prometheus-compatible Bet: be Prometheus, but cheap and at scale Query: PromQL + MetricsQL (no SQL) Throughput: 800K-2M samples/sec/node Hosted: Managed VM (third-party) Prometheus shops scaling up

The diagram captures the four design bets in one frame. Read the row labelled "Bet" carefully — it is the most important row, because a product's bet decides the shape of every later trade-off. Why the bet matters more than the feature list: features can be added, but a foundational bet cannot. InfluxDB chose its own query language because it bet that a TSDB-native syntax would be more expressive than SQL for time math; you cannot turn that bet into "we are SQL-first" without rewriting the engine. TimescaleDB chose to ride Postgres because it bet that SQL familiarity outweighs raw throughput; that bet caps single-node ingest at Postgres's ceiling forever. QuestDB chose Java with off-heap memory and tight CPU loops because it bet that single-node throughput is the differentiator; that bet pays off until you outgrow one box. VictoriaMetrics chose Prometheus wire-compatibility because it bet the metrics ecosystem had already standardised; that bet locks it out of mixed workloads where you also want to query orders, users, accounts.

InfluxDB — the established TSDB-first product

InfluxDB is the database most people think of first when they hear "time-series database", because it has been carrying that flag since 2013. It is written in Go, ships as a single binary, and is engineered around a custom storage engine called TSM (Time-Structured Merge tree) that is essentially an LSM with time as the primary sort key, plus aggressive Gorilla-style compression on the value columns.

The query language is Flux, a functional pipeline language that looks like a unix pipe: from(bucket:"telemetry") |> range(start: -1h) |> filter(fn: (r) => r._measurement == "cpu") |> aggregateWindow(every: 1m, fn: mean). It is genuinely expressive — once you internalise the pipeline model, certain queries that are clumsy in SQL become natural — but it is also genuinely a new language with its own learning curve. The older InfluxQL is still supported and looks SQL-ish, but it is a subset and not the recommended path going forward. InfluxDB v3 (the current major version, released over the last two years) is a substantial rewrite on top of Apache Arrow and the DataFusion query engine, and it adds proper SQL — though the migration story for organisations on v1/v2 is still settling.

Best for: greenfield time-series workloads on a team that has no SQL incumbency to preserve and is willing to invest in learning Flux for the expressive payoff. Companies that already use InfluxDB for IoT, DevOps metrics, or sensor telemetry rarely regret the choice. Companies that come from a Postgres background often do regret it within six months because Flux feels alien.

TimescaleDB — Postgres with time-series superpowers

TimescaleDB takes the opposite philosophical stance from InfluxDB: instead of building a new database, build a Postgres extension that adds the time-series-specific machinery to a database your team already knows. You install the extension, run CREATE EXTENSION timescaledb, and turn a regular table into a hypertable with SELECT create_hypertable('metrics', 'ts'). From the outside, the table looks identical — same INSERT, same SELECT, same joins to other tables, same foreign keys, same Postgres operators. Underneath, TimescaleDB transparently splits the table into time-partitioned chunks, adds chunk-level pruning to the planner, exposes time_bucket() and continuous aggregates, and (in newer versions) compresses old chunks columnar-style.

The advantage is enormous and obvious: you keep everything Postgres gives you. Window functions, recursive CTEs, JSONB columns, PostGIS for geospatial, foreign data wrappers, pg_stat_statements, pgBackRest, your existing connection poolers, your existing migration tools, your existing ORMs. You can join the time-series hypertable to a regular relational accounts table in one query without an ETL hop. Why this matters in real life: most "time-series workloads" in the enterprise are actually mixed — telemetry that needs to be joined to user metadata, transactions that need to be joined to merchants, sensor readings that need to be joined to device inventories. A pure TSDB forces you to denormalise heavily or to join in application code; TimescaleDB lets you write the natural SQL.

The disadvantage is the inherited ceiling. Postgres on a well-tuned single node tops out somewhere between 50K and 120K row inserts per second, depending on hardware, fsync settings, batch size, and how aggressive you are with synchronous_commit. TimescaleDB inherits that ceiling because every insert still goes through Postgres's MVCC machinery, WAL, and visibility map. You can scale by sharding (Timescale's multi-node feature) or by separating writes from reads with replicas, but the operational complexity grows quickly.

Best for: teams already running Postgres as their primary OLTP store, with mixed workloads where the time-series data needs to be joined to relational data, and where the throughput requirement is in the tens of thousands of writes per second rather than the millions.

QuestDB — the throughput champion

QuestDB is the youngest of the four and the one with the most aggressive performance bet. It is written in Java but uses off-heap memory and tight, branch-prediction-friendly inner loops to bypass most of the JVM's garbage collector overhead. The on-disk layout is straightforward columnar — one file per column per partition — and the ingest path is engineered to convert incoming rows into columnar appends with minimal copying. The published single-node benchmarks regularly post 1–4 million rows per second on a single commodity server, which is roughly 10× what TimescaleDB does and 4–8× what InfluxDB does on equivalent hardware.

The query language is SQL with time-series extensions like SAMPLE BY 1m FILL(LINEAR) and LATEST ON ts PARTITION BY symbol, which are concise and well-designed for time-series patterns. Crucially, QuestDB also speaks the InfluxDB Line Protocol (ILP) on its ingest port, so existing InfluxDB clients (Telegraf, the official InfluxDB SDKs) can write to QuestDB unchanged. That is a deliberate "easy switch from InfluxDB" story.

The trade-offs are real. The ecosystem is smaller — fewer third-party connectors, fewer Stack Overflow answers, fewer hosted offerings outside QuestDB's own cloud. Clustering and replication are still maturing; for production high-availability you usually run a single primary with read replicas and accept some operational overhead. The community is active and growing fast, but it is not yet the size of Influx's or Timescale's.

Best for: workloads where single-node throughput is the deciding factor — financial market data (ticks, order books, trade execution telemetry), industrial IoT at scale, or anywhere a single Indian fintech node is being asked to absorb hundreds of thousands of writes per second from a fleet of payment terminals.

VictoriaMetrics — Prometheus, but cheap and at scale

VictoriaMetrics solves a very specific problem: Prometheus is wonderful, but its single-node TSDB caps out around 15 days of retention before disk and RAM become painful, and it does not shard. VictoriaMetrics is a Go engine that speaks the same wire protocols (the /api/v1/write remote-write endpoint, the /api/v1/query query endpoint, the same scrape format), supports PromQL plus a strict superset called MetricsQL, but uses a much more aggressive columnar storage layout with better compression — typically 10× smaller on disk than Prometheus for the same data. It runs as either a single binary (the simplest deployment of the four) or as a cluster of three components (vmstorage, vminsert, vmselect) for horizontal scaling.

The ergonomics for an existing Prometheus shop are essentially zero-friction. You point Prometheus's remote_write at VictoriaMetrics, change Grafana's data source from Prometheus to VictoriaMetrics, and every existing dashboard and alert keeps working. Why this matters: in practice, the painful part of switching databases is rarely the database itself — it is the migration of dashboards, alert rules, runbooks, and operator muscle memory. VictoriaMetrics deliberately makes that migration cost zero by being indistinguishable from Prometheus on the wire. That is its single biggest selling point and the reason adoption has been so fast in observability shops.

The trade-off is that VictoriaMetrics is metrics-focused. There is no SQL surface; you cannot easily store and query non-metric time-series data (logs, traces, business events). You cannot join to relational tables. The data model is the Prometheus data model — a name, a set of labels, a sequence of (timestamp, double) samples — and that is it. Within that model VictoriaMetrics is excellent. Outside it, you need a different tool.

Best for: organisations that have already standardised on Prometheus + Grafana for observability and have outgrown vanilla Prometheus's single-node store, either because retention needs are now in months/years or because the volume needs sharding.

The feature matrix

The previous four sections gave you the prose. The matrix below gives you the at-a-glance summary you can hand to a sceptical colleague.

Feature matrix — open-source TSDBs Feature InfluxDB TimescaleDB QuestDB VictoriaMetrics SQL support partial (v3) full standard SQL SQL + ILP none (PromQL only) Write throughput (1 node) ~500K rows/s ~100K rows/s 1-4M rows/s 1-2M samples/s Query latency (1h scan) tens of ms tens of ms single-digit ms tens of ms Hosted offering InfluxDB Cloud Timescale Cloud QuestDB Cloud third-party only Retention controls bucket TTL policies + S3 tier partition drop retention + downsample Compression ratio (typical) 8-12x 10-20x 5-10x 15-70x Joins to relational data no yes (native SQL) limited no Numbers are typical — your hardware, schema, and tuning will move them by 2-3x in either direction.

A few things in this matrix deserve commentary because they are easy to misread. Compression ratio is wildly schema-dependent — VictoriaMetrics's 70× is achievable only on stationary, low-cardinality metrics where Gorilla compression eats the value column whole; on high-churn, high-cardinality data the ratio collapses toward the others. Write throughput is per node and assumes batching is enabled — single-row inserts halve every number in that row. Query latency for the "1h scan" assumes the chunks are warm in OS page cache; cold-cache numbers are 5–10× worse for all four.

The decision tree

The matrix tells you what each system can do. The tree below tells you which one to pick. It is short on purpose.

Which TSDB? — a 90-second decision tree Start: you need a TSDB what does your stack already run? Already on Postgres? mixed OLTP + TS workload → TimescaleDB keep SQL, add hypertables Already on Prometheus? pure metrics + Grafana → VictoriaMetrics drop-in Prometheus replacement Neither — greenfield. What dominates? throughput vs ecosystem maturity throughput → QuestDB maturity → InfluxDB

The tree makes the implicit assumption that the cheapest migration is the one you do not have to do. That is why the first two questions are about what you already run. Why this is the right framing: the database itself is rarely the bottleneck. The bottleneck is dashboards, alert rules, on-call runbooks, integration code, the team's mental model. A database that is technically 30% faster but requires you to throw away three years of dashboards is a worse choice than a slightly slower database that lets you keep them. Choose the database that minimises total migration cost, not just storage cost.

For completeness, here are the others you might see in a comparison post but did not make the four-way cut: Prometheus itself (the original; great for what it does, but limited storage), Apache IoTDB (IoT-focused, popular in the China market, niche elsewhere), ClickHouse (general-purpose OLAP that happens to be excellent at time-series, often the right answer when your TS volume is genuinely huge or you also want general analytics), and AWS Timestream (managed-only, AWS-only, useful when you are deeply committed to AWS and want zero ops).

Worked example — an Indian fintech makes the call

A worked example is worth more than another paragraph of abstract advice. The fintech below is composite — a stand-in for several real companies — and the numbers are realistic for an Indian payments company at Series B/C scale.

Company: A Bengaluru-based fintech running a payments rail used by 40,000 small merchants across India. They process about 8 million transactions per day at peak, with the usual long tail of telemetry: per-transaction latency, fraud-score, payment-method, region, terminal-id, success/failure reason, retries, downstream timeouts.

Workload: 200,000 writes per second sustained, peaking at 350,000 during the 12-hour daily peak window (10am to 10pm IST). Retention requirement: 30 days hot (full per-transaction granularity for SRE debugging and fraud investigation) plus 1 year of aggregated data (1-minute rollups for trend analysis, board reporting, capacity planning). Dashboards live in Grafana and the SRE team is already paged from Prometheus alert rules. The data must stay in India — they have hosted everything in Mumbai (ap-south-1) and cannot ship to a US-only managed service.

The team's existing stack:

  • Application: Go services on EKS, talking to RDS Postgres for OLTP.
  • Observability: Prometheus scraping the Go services via node_exporter and the prometheus-go-client library, with Grafana on top. They have ~400 dashboards and ~150 alert rules, written and maintained over three years.
  • Pain point: Prometheus is hitting its single-node ceiling. Retention is 14 days because the disk fills up otherwise. The SRE team is constantly being asked "can you go back further?" and the answer is always no.

The evaluation — they tried all four for two weeks each.

TimescaleDB. They spun up a db.m5.4xlarge Postgres on RDS with TimescaleDB, created a hypertable on transaction_events, and pointed a fan-out from the Go services at it. Sustained ingest topped out at 47,000 writes/sec before WAL became the bottleneck and the replication lag started climbing. To hit 200K they would need to shard across 4–5 Postgres nodes manually, run partition routing in the application, and accept that joins to the merchants table now require fan-out. "We could make this work, but the operational burden is real, and we lose the simplicity that was supposed to be the point." Verdict: viable for the OLTP slice but not for the high-throughput telemetry.

InfluxDB. Spun up InfluxDB v3 on a r5.4xlarge. Sustained ingest hit 220,000 writes/sec comfortably with batched line-protocol writes from a Telegraf pipeline. Storage compression was solid — about 9× on their schema. The blocker was the migration: 400 Grafana dashboards in PromQL, 150 alert rules, all of which would have to be rewritten in Flux (or InfluxQL). They estimated a 3-month migration, with a high risk of subtle behaviour changes in alert thresholds during the cutover. "Technically capable, but the migration cost is enormous and the team has no Flux experience."

QuestDB. Spun up QuestDB on an r5.4xlarge with the ILP ingest port enabled. Sustained ingest hit 1.4 million writes/sec in the load test — they ran out of test traffic before they ran out of QuestDB. SQL queries on the dashboards felt natural; the engineers who knew Postgres SQL could write QuestDB SQL on day one. But the same migration problem applied: Grafana dashboards in PromQL would need to be rewritten as SQL, and the Prometheus alert rules would need to be migrated to a separate alerting system (Grafana alerting against QuestDB, or a custom rule engine). They estimated a 1-month migration — faster than InfluxDB because at least the query language was SQL — but still a real cost.

VictoriaMetrics. Spun up the single-binary victoria-metrics (no cluster components yet) on an r5.2xlarge and pointed Prometheus's remote_write at it. Zero changes to the Go services, the scrape configs, the Grafana data sources (just changed the URL), the dashboards, or the alert rules. Sustained ingest hit 240,000 samples/sec with headroom. Compression was 22× on their schema — they immediately got 6 months of retention on the same disk that had been holding 14 days for Prometheus. Total integration work: 2 weeks, almost all of it spent verifying that every existing dashboard and alert rule behaved identically.

The decision: VictoriaMetrics. The reasoning is not that VictoriaMetrics is the technically best of the four — QuestDB has higher raw throughput, TimescaleDB has better SQL, InfluxDB has more time-series-specific features. The reasoning is that the existing stack is Prometheus + Grafana, the existing team knows PromQL, and the existing 400 dashboards plus 150 alert rules represent three years of accumulated institutional knowledge that VictoriaMetrics preserves at zero cost. For the long-tail per-transaction debugging that needs to be joined to merchant metadata, they keep a parallel TimescaleDB instance at lower throughput (the slice is well under TimescaleDB's ceiling once it is separated from the high-volume metric stream).

The migration shipped in 17 days, including the post-cutover bake. Six months later, retention is at 9 months and the SRE team has stopped being asked "can you go back further?"

The case study illustrates the most important lesson of the chapter: the best TSDB for your team is almost always the one that requires you to throw away the least. A 4× throughput advantage is wonderful in a benchmark and irrelevant if it costs three months of dashboard rewrites.

Going deeper

The four-way comparison above is the practical answer for most teams. The deeper question — why have we ended up with four serious open-source TSDBs all built on different bets? — is worth a few paragraphs because it tells you something about how the time-series space is likely to evolve.

The convergence problem

Look at the four products with a five-year horizon and a striking pattern emerges: they are converging. InfluxDB v3 added SQL on top of Apache Arrow, eroding TimescaleDB's "we have SQL, they don't" advantage. TimescaleDB added columnar compression on old chunks, eroding InfluxDB's "we are columnar, they aren't" advantage. QuestDB added an InfluxDB Line Protocol ingest port, eroding InfluxDB's "we own the metrics-ingest protocol" advantage. VictoriaMetrics added MetricsQL extensions that reach toward what SQL gives you, eroding the "you can't do complex analytics in PromQL" objection.

This convergence is rational — each product is racing toward the union of features the others have — but it creates a problem for the buyer: the products are getting harder to differentiate on raw capability, which means the differentiation is shifting toward ecosystem and ergonomics. That is exactly why the decision tree above is keyed on what you already run rather than on raw feature comparison. The capability gaps are closing; the ecosystem gaps are not.

When the answer is "none of the above"

There are three workload shapes where none of the four is the right choice, and you should know them so you do not reach for a TSDB by reflex.

Shape 1: heavy multi-dimensional analytical queries on time-series data. If your queries look like "top 10 merchants by failure rate, grouped by region and payment method, over the last 90 days, with year-on-year comparison" — that is fundamentally an OLAP workload that happens to have a timestamp. ClickHouse will outperform any of the four TSDBs on this shape because its query planner is built for wide aggregates, not for the narrow time-range scans the TSDBs optimise for.

Shape 2: very low write rates with very high cardinality. If you are storing per-user activity timelines for a billion users with sparse writes per user, the TSDB design assumptions (high write rate, low-to-medium cardinality on tags) work against you. The chunk count balloons, compression underperforms, and you would be better served by a wide-column store like Cassandra or ScyllaDB with explicit time partitioning at the application level.

Shape 3: transactional time-series. If your "time-series" data needs full ACID transactions across multiple rows — for example, a financial settlement system where a batch of related ticks must be inserted atomically and visible to readers atomically — the eventually-consistent ingest paths of most TSDBs are wrong. TimescaleDB is the only one of the four that gives you full Postgres ACID semantics, and even there you have to be careful about chunk boundaries.

The managed vs self-hosted question

A separate axis the chapter has touched on but not stressed: every one of these four has a hosted offering, and for many teams the hosted offering is the right answer regardless of the technical comparison. InfluxDB Cloud runs on AWS, Azure, and GCP with both pay-per-use and dedicated tiers. Timescale Cloud runs on AWS only but is the most mature of the managed offerings (Timescale the company has been operating it for years). QuestDB Cloud is the newest, AWS-only at the time of writing, with a smaller catalogue of regions. VictoriaMetrics does not have a first-party cloud — there are several third-party managed VM offerings (Aiven, Grafana Cloud's metrics tier is partially built on it).

For an Indian team the managed-vs-self-hosted question often gets resolved by data residency: if the data must stay in India for regulatory reasons (RBI-regulated financial data, healthcare data under DISHA), and the managed offering does not have a Mumbai region, the question answers itself. Self-hosted on EKS in ap-south-1 is the only option. Why this matters more in India than in the US: Indian data residency regulations have tightened steadily over the last five years, and the cost of a regulatory misstep — RBI restrictions on a payments company, for example — dwarfs any operational saving from a managed service. The default for any Indian fintech evaluating a TSDB should be "self-hosted in ap-south-1 unless we have an explicit reason to do otherwise." That immediately favours TimescaleDB, QuestDB, and VictoriaMetrics, all of which run cleanly on EKS, and slightly disfavours managed-only offerings like AWS Timestream that have historically had patchy India coverage.

The closing thought of Build 21

You have spent five chapters of Build 21 building up the picture: what makes time-series workloads different (ch.164), how time-partitioned columnar storage exploits that (ch.165), how downsampling and continuous aggregates make dashboards fast (ch.166), how retention tiers push old data to S3 cheaply (ch.167), and now how to pick the actual product (ch.168). The throughline is that time-series is a third database shape — neither OLTP nor OLAP — and the tools that win do so by exploiting the workload's peculiarities ruthlessly. Whether you pick InfluxDB, TimescaleDB, QuestDB, or VictoriaMetrics, you are picking a different combination of those exploits, packaged with a different ecosystem and a different operational model. Pick on ecosystem first, capabilities second, and you will rarely regret it.

Build 22 starts next with a different beast entirely: in-memory databases, where the storage hierarchy is inverted and Redis sits at the centre.

References

  1. InfluxDB documentation — official docs covering the storage engine, Flux query language, and v3 architecture.
  2. TimescaleDB documentation — official docs on hypertables, continuous aggregates, and compression.
  3. QuestDB benchmarks and documentation — official docs and the published time-series benchmark suite results.
  4. VictoriaMetrics documentation — official docs on the storage engine, MetricsQL, and the cluster vs single-node deployment models.
  5. DB-Engines ranking — Time Series DBMS — independent popularity tracker that gives a useful proxy for ecosystem size and momentum.
  6. Time-series benchmark suite (TSBS) — the open-source benchmark harness that most of the published TSDB comparisons use; you can reproduce the numbers in this chapter on your own hardware.