Polyglot persistence: picking the right DB per workload
Open the architecture page of any company you have heard of — Razorpay, Swiggy, Zerodha, Meesho — and count the databases on it. Five is normal. Seven is common. One is a lie. This chapter is about why no single engine survives a real workload, and how to read the trade-off matrix you have been quietly building since Build 1.
A real system runs three to seven different databases at once because workloads have wildly different shapes — and every storage engine you have built in this track makes the opposite trade-off from at least one other engine. You do not pick a database. You pick a portfolio. The skill is reading a workload's access pattern, latency budget, and consistency need, and matching it to the engine whose trade-offs that pattern can afford.
The one-database fantasy
Class assignments have one database. Production has many. The reason is not laziness or hype — it is geometry. Every storage engine in this track sits on a trade-off triangle whose corners cannot all be reached at once.
Recall the walls. B-trees give fast point reads and reasonable range scans, at the cost of write amplification on update. LSM-trees flip that — fast writes, painful read amplification, and tail-latency spikes during compaction. Column stores like Parquet or ClickHouse make a billion-row aggregate fly, but want nothing to do with a single-row update. Inverted indexes answer "find me every doc containing the word refund" in milliseconds and answer "give me row 4892's email" by re-scanning. Hash-based caches serve a hundred thousand reads per second per core, then forget everything when the box reboots. Vector indexes find the ten closest embeddings to a query in 5 ms — and cannot tell you which row was updated yesterday.
Every engine wins at one workload by losing at another. That is the whole game.
Why a single engine cannot win every corner: the on-disk layout that makes a column store fast (sorted, compressed, per-column files of millions of values) is the same layout that makes a single-row update brutal — you have to rewrite the whole column chunk. The inverted-list layout that makes Elasticsearch fast at "every doc with refund" is the same layout that makes "give me row 4892" expensive. There is no neutral data structure. The shape of the bytes on disk is the trade-off.
Reading a workload before you pick
Before you pick a database, write down four numbers and one shape. Without these, the choice is religion.
- Read-to-write ratio. Is this 99% read and 1% write (a product catalogue), 50/50 (a chat app), or 10% read and 90% write (a metrics pipeline)?
- Latency budget. What is the p99 your application can tolerate? 1 ms (cache layer), 50 ms (user-facing API), 500 ms (back-office screen), 30 s (analyst dashboard)?
- Working-set size. How much hot data does this workload touch in the last hour? 1 GB (fits in RAM), 100 GB (fits in NVMe), 50 TB (one column-store cluster), 10 PB (only object storage scales)?
- Consistency need. Does the read have to see the last write? (Bank balance: yes. Trending products: no.) This collapses to the CAP/PACELC decision.
- Access shape. Point lookup by primary key? Range scan? Full-text? K-nearest-neighbour vector? Aggregate over millions of rows? Graph traversal of friend-of-friend?
The shape is the most important. If you cannot describe the access shape in one sentence, you do not yet understand the workload, and any database you pick is a coin flip.
Five workloads inside one Razorpay request
A merchant opens the Razorpay dashboard at 11am on the 1st of the month — settlement day. A single page load fans out into five very different queries:
| # | Query the page asks | Read shape | Engine that fits |
|---|---|---|---|
| 1 | "Show me my total volume yesterday" | aggregate of ~50k rows | column store / OLAP |
| 2 | "List my last 20 captured payments" | range scan, sorted by created_at, by merchant_id |
B-tree on Postgres |
| 3 | "Search for customer 'rahul@gmail.com'" | inverted index on email field | Elasticsearch |
| 4 | "What is my available balance right now?" | single-row lookup, must reflect the most recent write | Postgres OLTP |
| 5 | "Show me three suggested actions you think I'll click" | nearest-neighbour over a feature vector | vector index (HNSW) |
If you tried to answer all five from one Postgres instance: query 1 takes 8 seconds because the row store has to read every payment column to sum one of them; query 3 either does a LIKE '%rahul%' table scan or you bolt on pg_trgm and pray; query 5 needs pgvector, which works but reaches its limit before you finish 1 lakh merchants. By the third such page-load per second, the database is on fire.
The fix is not "buy a bigger Postgres." The fix is to send each query to the engine its shape was designed for, and to keep all of them in sync from the same source of truth.
The portfolio, not the pick
Once you accept that one engine cannot serve every workload, the architecture you end up with looks remarkably consistent across companies. There is an OLTP engine that owns the truth — usually Postgres or MySQL, sometimes a NewSQL like CockroachDB. Around it sits a halo of derived stores, each indexing the truth differently for one specific access shape.
This is the inside-out architecture from chapter 181, but viewed from the operations side. Build 23 told you how the log fans out to many systems. Build 24 starts with which systems, and why these and not others.
The discipline is not to add stores randomly — it is to add a store only when an existing store cannot meet the workload's latency or shape. Each store earns its place by answering one question better than any incumbent. Add Elasticsearch only when free-text search becomes a real workload. Add ClickHouse only when analytics queries hit Postgres hard enough to threaten OLTP latency. Add the vector index only when "find similar items" is a product feature.
Worked example: Riya picks the stack for a small wallet app
Riya runs a four-engineer team in Pune building a wallet-style app — call it walletkart. They have 50,000 active users, growing at 20% a month, and a runway pressure that says do not buy a database you do not have to. She runs through the workloads on paper, in order of priority:
# walletkart_workloads.py — workload catalogue, in plain Python dicts
workloads = [
{
"name": "wallet balance lookup",
"ops_per_sec": 800,
"p99_latency_ms": 50,
"consistency": "strict — must reflect last write",
"shape": "point lookup by user_id",
"fits": "OLTP B-tree (Postgres)",
},
{
"name": "transaction history",
"ops_per_sec": 200,
"p99_latency_ms": 100,
"consistency": "read-your-writes",
"shape": "range scan: last 30 transactions for user_id, by created_at",
"fits": "Postgres with index on (user_id, created_at)",
},
{
"name": "merchant search",
"ops_per_sec": 60,
"p99_latency_ms": 200,
"consistency": "stale by 30s OK",
"shape": "free-text on merchant name + category",
"fits": "Elasticsearch (or Postgres tsvector for now)",
},
{
"name": "homepage carousel: 'recent merchants like yours'",
"ops_per_sec": 800,
"p99_latency_ms": 30,
"consistency": "stale by hours OK",
"shape": "point lookup of precomputed list, by user_id",
"fits": "Redis (cache, recomputed nightly)",
},
{
"name": "monthly settlement aggregates for ops team",
"ops_per_sec": 0.01, # 1 query every 100s during business hours
"p99_latency_ms": 5000,
"consistency": "yesterday's data is fine",
"shape": "GROUP BY merchant SUM amount over 30M rows",
"fits": "Postgres for now, ClickHouse if it gets slow",
},
]
for w in workloads:
print(f"{w['name']:<40} {w['ops_per_sec']:>6} ops/s p99 {w['p99_latency_ms']}ms "
f"shape={w['shape'][:40]} → {w['fits']}")
Output:
wallet balance lookup 800 ops/s p99 50ms shape=point lookup by user_id → OLTP B-tree (Postgres)
transaction history 200 ops/s p99 100ms shape=range scan: last 30 transactions... → Postgres index
merchant search 60 ops/s p99 200ms shape=free-text on merchant name... → Elasticsearch (or tsvector)
homepage carousel 800 ops/s p99 30ms shape=point lookup of precomputed list... → Redis (cache)
monthly settlement aggregates 0.01 ops/s p99 5000ms shape=GROUP BY merchant SUM amount... → Postgres for now, ClickHouse later
Riya's day-1 stack is Postgres + Redis, with three workloads sitting on Postgres and one on Redis. The merchant search is being answered by a Postgres tsvector column — slower than Elasticsearch but inside the latency budget, and one fewer system to operate. Settlement aggregates run against a read replica because the volume is currently fine.
Why she did not start with five databases: every additional engine costs an on-call rotation, a backup pipeline, a monitoring dashboard, and a CDC connector. A four-engineer team buying its sixth database is a team that has stopped shipping product. The right rule is "the smallest portfolio that answers the workloads inside their latency budgets, plus one engine of headroom for the workload most likely to outgrow its current home."
Six months later, search traffic crosses 500 ops/s and tsvector p99 climbs to 500 ms. Now Elasticsearch earns its place — it is solving a real problem that Postgres has stopped solving. Twelve months later, settlement queries take a minute and start blocking other replica reads. Now ClickHouse earns its place. The portfolio grows by demand, not by aspiration.
Why the order of additions matters: the first additional engine is almost always a cache (Redis), because cache misses on hot read paths are the cheapest workload to peel off. The second is almost always either a search engine or an analytics column store, depending on which workload outgrows Postgres first. The third is whatever the product roadmap demands next — vector index for ML features, graph store for fraud rings, time-series for metrics. Adding them in this order means each addition replaces a clearly visible bottleneck in the system you have, rather than answering a problem you do not yet have.
The picking matrix, made concrete
Builds 1 through 23 have given you the full inventory. This is the one-page lookup table that compresses what each build taught into a routing decision. Treat the rows as access shapes, not as products — the products listed are the canonical engines for that shape, and you should be willing to swap them for an equivalent.
| Access shape | Latency budget | Working set | Canonical engine | Where in this track |
|---|---|---|---|---|
| Point lookup, strict consistency | 5–50 ms | up to 10 TB / shard | Postgres / MySQL B-tree | Build 4 |
| Write-heavy point + range | 1–20 ms | TB to PB, shardable | Cassandra / RocksDB LSM | Build 3 |
| Sub-millisecond hot read | 0.2–2 ms | fits in RAM | Redis / Memcached | Build 21 |
| Free-text search | 10–200 ms | TB | Elasticsearch / OpenSearch | Build 17 |
| K-nearest-neighbour vectors | 5–50 ms | GB to TB | pgvector / Pinecone / Weaviate | Build 18 |
| Graph traversal | 10–500 ms | GB to TB | Neo4j / Memgraph | Build 19 |
| Time-series ingest + rollup | 10–100 ms | TB, append-mostly | Timescale / InfluxDB / VictoriaMetrics | Build 20 |
| Aggregate over millions | 1–10 s | TB to PB | ClickHouse / DuckDB | Build 16 |
| Lakehouse archive + ad-hoc | seconds to minutes | PB | S3 + Iceberg/Delta + Trino | Build 16 |
| Distributed strong-consistency OLTP | 10–100 ms | TB to PB | CockroachDB / Spanner / TiDB | Build 12 |
Two warnings about reading this table. First, the engines have meaningful overlap — Postgres can do point lookup, free-text, vector, and time-series, just worse than the specialist at high scale. The decision rule is "use the specialist when the cheaper general-purpose engine is no longer fast enough at this workload's scale," not "use the specialist on day one because the table says so." Second, latency budgets are p99 targets in steady state; tail latency under load is much worse and is the number you should actually engineer to.
What forces a new engine into the stack
There are exactly four signals that justify adding a new database. If you cannot point to at least one of them for a proposed addition, the addition is premature.
-
An access shape the existing engines do not natively support. Free-text search, k-nearest-neighbour vectors, graph traversals, time-series rollups. You can fake some of these in Postgres for a while; eventually the fake hits a wall and a specialised engine pays its keep.
-
A latency budget the existing engines cannot meet. A 1 ms p99 read forces a cache. A 5-second analytics query becomes unacceptable when the dashboard is on the customer-facing path. The trade-off matrix says "this engine cannot meet that budget at this scale" — and that is when you reach for the next engine.
-
A working set that has outgrown the existing engine's capacity. 50 TB of historical events does not belong in Postgres no matter how many indexes you add — it belongs in object storage with a lakehouse table format. Trying to keep it in the OLTP engine wastes IOPS and crowds out hot data.
-
A workload whose consistency need is genuinely weaker than the OLTP engine, and where the throughput cost of synchronous OLTP is not affordable. This is the cache case (eventual is fine) and the analytics case (yesterday is fine). The point of the polyglot stack is that you spend strong consistency only where you need it.
If none of these four hold, you do not need a new engine. You need a better index on the one you have.
Common confusions
-
"More databases is more sophisticated." No. More databases is more operational surface. Every store needs a backup, a recovery runbook, a monitoring dashboard, a schema-migration plan, and people on call. A polyglot stack is justified by workload diversity, not by resume-driven design. The right number of databases is the smallest number that meets the workloads' latency and shape budgets.
-
"NewSQL (CockroachDB, Spanner, TiDB) makes polyglot persistence obsolete." It does not. NewSQL solves OLTP at scale — strong consistency across many shards. It does not turn into Elasticsearch, ClickHouse, Redis, or a vector index. NewSQL replaces your sharded MySQL, not your search and analytics tier. Spanner-shaped systems still ship CDC into a Kafka log so search and analytics can subscribe.
-
"Postgres can do everything: JSON, full-text, vector, time-series." Postgres can do some of everything:
jsonb,tsvector,pgvector,TimescaleDBextension,PostGIS. For a small team in the early stage, this is correct and good. But each extension is a 10x weaker version of the specialist engine.tsvectoris not Elasticsearch.pgvectorHNSW caps out where Pinecone or Weaviate are warming up. The "Postgres for everything" rule is a staging rule, not an end state. -
"The OLTP engine should be the single source of truth and write-master forever." Mostly true, but not always. For event-sourced systems (Kafka-first architectures), the log is the source of truth, and Postgres itself is one of the derived views. Razorpay's payments table is the master because the WAL is the log; LinkedIn's member-update stream is the master because it predates any single database. The "source of truth" is whichever store guarantees the durable, ordered facts, and that is sometimes a log rather than a row store.
-
"Cache invalidation is so hard, just run everything against the OLTP engine." This is appealing until 1 lakh dashboard requests per second arrive at 11am IST and Postgres burns. The cache exists because the OLTP engine cannot meet the latency at the throughput. Yes, cache invalidation is hard — and that is why the cache should be fed by the same CDC stream that feeds search and analytics, so invalidation is not a manual step but a subscription.
-
"Polyglot is just
microservices for databasesand the same critique applies." They are not the same thing. Microservices is about who owns the code; polyglot persistence is about who owns the access shape. You can run a single monolithic service that talks to seven databases — that is fine. You can run thirty microservices each owning its own Postgres — that is the database-per-service problem and is much worse. Polyglot is about workload-engine fit; service decomposition is orthogonal.
Going deeper
The Fowler / Sadalage frame and where it came from
The phrase "polyglot persistence" was coined by Pramod Sadalage and Martin Fowler around 2011, in the run-up to the NoSQL Distilled book. Their case study was a fictional retail application broken into eight services, each picking a database from {RDBMS, key-value, document, column-family, graph} based on access shape. At the time it was a counter to two extremes — the classical "everything in Oracle" monolith, and the over-corrective "MongoDB for everything" early NoSQL fashion. The lasting contribution is the frame: stop asking "which database?" and start asking "which access shape, and what does it want?"
The frame survived because the trade-offs survived. Fifteen years later the engines are mostly different — RocksDB instead of Voldemort, ClickHouse instead of HBase, Pinecone instead of nothing — but the corners of the trade-off space have not moved. A scan-friendly engine is still bad at point updates; a write-fast engine still pays at read time; a cache is still volatile. The names change; the geometry does not.
CDC as the connective tissue
Polyglot persistence only works if the derived stores stay in sync with the OLTP engine. The naive approach — application code writes to Postgres, then writes to Elasticsearch, then writes to Redis — is the dual-write problem, and it is a bug factory. If the application crashes between the Postgres commit and the Elasticsearch write, you have permanent drift and no clean way to detect it.
The clean solution is change data capture. The application writes only to Postgres. Postgres's WAL emits a change record. Debezium (or equivalent) reads the WAL and publishes to Kafka. The other engines subscribe. Every derived store is now a function of the same log, and "which engines are in sync" is no longer a per-write question — it is "where is each subscriber's offset?" Drift becomes lag, lag becomes a number you can graph, and you can reset a derived store by deleting its data and replaying the topic from the beginning.
The CDC pattern is what makes polyglot survive at scale. Without it, polyglot is dual-write hell and most teams revert to "everything in one DB" within a year.
The reverse anti-pattern: too many engines
The Fowler frame is older than its current critique. Around 2017, several large engineering blogs (Airbnb, Uber, Stitch Fix) wrote post-mortems on polyglot stacks that had grown to fifteen, twenty, or thirty databases — every team picking its own — and the operational cost crushed the company. Each engine needed its own runbook, on-call rotation, backup pipeline, schema-evolution story, and security review. The teams that survived consolidated back down to a portfolio of five to seven engines, with strong defaults that every team had to justify deviating from.
The healthy rule is "right tool for the job, but the smallest set of tools that does all the jobs." Five Postgres clusters with different schemas is fine. Five different kinds of database, each new and unfamiliar, is fragile. Add a new engine when an existing one fails a workload, not when an engineer wants to learn it.
The CAP/PACELC layer underneath
The polyglot decision has a consistency dimension that the workload table does not always make explicit. Different stores in the same stack make different CAP/PACELC choices. The OLTP store is usually CP and PC/EC — strict consistency, accept latency cost. The cache is CP-shaped during normal operation but degrades to AP under partition (you serve stale, you keep going). The search index is AP, near-eventual. The lakehouse is asynchronous, hours-stale, no consistency guarantee at all.
This is fine — and explicit about it makes the polyglot stack honest. The application has to know which store to ask for which question. Asking the cache for an exact balance is a bug. Asking the OLTP store for a top-1000 aggregate is a different bug. The architectural diagram and the developer documentation must list each store's consistency level, so that engineers do not silently downgrade the system by routing the wrong question to the wrong place.
When polyglot is wrong
A useful corrective: polyglot is overkill if your workload has only one shape. A pure event-sourced log analytics tool — say, a tiny startup whose only product is "ingest webhooks, run aggregations, expose a dashboard" — does not need OLTP, cache, or search. ClickHouse alone covers it. A pure transactional ledger app with no search and no analytics can run on Postgres alone for years.
The decision rule is honest: count distinct access shapes in the workload. One shape, one engine. Three shapes, two or three engines. Seven shapes, you are running Razorpay and you have a portfolio.
Where this leads next
- How to read a database paper — chapter 183. The portfolio gets richer as you learn to read primary sources, not blog summaries.
- Benchmarking honestly: TPC-C, TPC-H, YCSB, and lies — chapter 184. Your portfolio decisions are only as good as the benchmarks behind them.
- Running a database in production: the parts they don't teach — chapter 185. Every engine in the portfolio carries an operations cost, and that cost dominates the picking decision.
- The 30-year arc and where databases go next — chapter 186, the closer of Build 24.
- Where this is all going: the database as a materialized view — chapter 181, the conceptual prequel to this chapter.
References
- Pramod Sadalage and Martin Fowler, NoSQL Distilled, ch. 13 Polyglot Persistence (Addison-Wesley, 2012) — the book that introduced the term and its discipline. martinfowler.com/bliki/PolyglotPersistence.html.
- Martin Kleppmann, Designing Data-Intensive Applications, ch. 1 Reliable, Scalable, Maintainable, and ch. 11 Stream Processing (O'Reilly, 2017) — the modern treatment of "many specialised stores, one log of facts." dataintensive.net.
- Jay Kreps, The Log: What every software engineer should know about real-time data's unifying abstraction (LinkedIn Engineering, 2013) — argues that CDC + a shared log is the architectural backbone of polyglot persistence. engineering.linkedin.com.
- Mark Cavage, There's Just No Getting around It: You're Building a Distributed System (ACM Queue, 2013) — on the operational cost of every additional store; the case against premature polyglot. queue.acm.org.
- Razorpay Engineering blog, How we handle 5000 TPS at peak (2023) — one public Indian-fintech walk-through of an OLTP + Kafka + ClickHouse + Elasticsearch + Redis stack. engineering.razorpay.com.
- Adrian Colyer, The morning paper on polyglot persistence and Lambda Architecture — multiple summaries of the foundational Fowler/Kleppmann arguments. blog.acolyer.org.
- padho-wiki: B-tree vs LSM: when to pick which and Lakehouse vs warehouse: the real tradeoffs — the per-engine trade-off chapters this picking framework is built on.