In short
Most apps do not need a distributed database. A single managed Postgres or MySQL instance — a $200/month RDS box, the kind every cloud sells from a dropdown — handles roughly ninety percent of real production workloads, indefinitely, with one connection string and one backup script. Fourteen builds into a distributed database series is a strange place to admit it, but the honest closer to Build 14 is the chapter that tells you not to use any of the things you just learned to build.
The three questions that decide whether you genuinely need distribution are sharp and quantitative. Volume: is your data above one terabyte and growing? Below that, a single beefy node holds it in RAM-warm SSD with room to spare. Write throughput: are you sustaining more than ten thousand writes per second? Below that, a tuned single node plus read replicas serves you for years. Geo-latency: do users span multiple regions and need writes acknowledged within a hundred milliseconds regardless of region? Below that, a single primary in Mumbai serves users in Singapore, London and California at 200-300 ms, which most apps tolerate fine. If you cannot say yes to at least one of these, distributed is the wrong answer.
The reason this matters is that distribution is not free; it is one of the most expensive trade-offs in systems engineering. Operational cost scales superlinearly: three to five times more nodes to keep online, more dashboards, more alert routes, harder backups (you cannot just pg_dump), harder schema migrations (every range, every replica, online), more failure modes (split-brain, slow follower, hot range, gossip storm). Latency cost is structural: every write becomes a Raft or Paxos round, cross-shard transactions add another round-trip via two-phase commit, even single-key reads pay the WAN penalty if your gateway is not local. Monetary cost is direct: 3× the hardware, Enterprise licensing on top (Spanner is a Google bill, CockroachDB Enterprise is per-vCPU), at least one extra ops engineer to babysit the cluster. Cognitive cost is what you do not see in the budget: developers must reason about transaction restarts (CockroachDB will retry your BEGIN ... COMMIT block if the timestamp pushes), HLC clock skew, hot ranges, range splits, the difference between leader reads and follower reads. Debugging cost is the last surprise: distributed traces, eventual-consistency anomalies, partial failures that look like bugs.
The cheaper path — the one most successful Indian SaaS companies actually took — is the Postgres scaling ladder. Step 1, single node with default settings, handles about a thousand writes per second on a $50/month box. Step 2, tuned single node with proper indexes and connection pooling, gets you to five thousand. Step 3, add read replicas, gives you effectively unlimited reads. Step 4, vertically scale to a beefy db.r6i.16xlarge or equivalent, gets you to thirty thousand writes per second. Step 5, Citus or pg_partman for application-layer sharding, takes you to millions of writes per second. Only after step 5 does it make sense to consider CockroachDB, Spanner, or TiDB. Razorpay, Freshworks and Postman ran on managed Postgres for years before adding sharding, and only at scales most startups never reach.
There are real cases for distributed databases — global e-commerce at Amazon scale, geo-distributed payments at PhonePe scale during a Diwali surge, multi-region banking with strict consistency, hyperscale ad platforms. The point of this chapter is not that distributed is bad; it is that distributed is for measured, specific scaling problems that simpler approaches genuinely cannot solve. Premature distribution — picking Spanner-class on day 1 of a startup that has a thousand users and fifty writes per second — is one of the most expensive engineering mistakes you can make, and it is mostly invisible to the people making it because the system works; it just costs five to ten times more than necessary, takes longer to ship, and is harder to debug. This chapter closes Build 14 with the framework to avoid that mistake.
The cost-benefit shape, drawn honestly
Before any decision tree, draw the shape. Distributed databases pay off above some scale threshold; below it, they cost more than they earn. The shape is not "linear, slightly worse" — it is "fixed overhead, then crossover, then dominant".
Why distributed has a high fixed minimum: a CockroachDB or Spanner cluster cannot run on one node and call itself distributed — you need at least three nodes for a Raft majority of any single range, and in production you typically run nine or more for tolerance across availability zones. That is three to nine times the per-month VM bill before you serve a single request. Add Enterprise licensing (CockroachDB charges per vCPU once you cross feature gates; Spanner is straight Google Cloud billing per node-hour) and on-call rotation for a system whose failure modes are not Postgres failure modes, and the floor sits at roughly $2,500 per month for the smallest production-credible cluster.
Why single-node is so cheap at the low end: a single managed Postgres on AWS RDS or GCP Cloud SQL at the db.t4g.medium tier — 2 vCPU, 4 GB RAM — runs about $50/month and handles a thousand writes per second sustained, with backups, point-in-time recovery, and automated minor-version upgrades included. You pay for one box, you back up with one snapshot policy, you alert on one CloudWatch dashboard. There is no fan-out cost, no consensus latency, no cross-AZ replication tax. The reason the green curve is so flat at the start is that everything below thirty thousand writes per second fits inside the operational envelope of a single managed instance.
The decision tree — three questions
Before the architecture diagram, run the three questions. They are not subjective. Each has a number, and the number was chosen because it corresponds to a real point at which a single tuned Postgres node stops being enough.
Why these particular numbers: 1 TB is roughly the point at which a single Postgres instance with NVMe storage stops keeping the working set in OS page cache, so query latency stops being predictable. 10K writes per second is roughly where a tuned Postgres with synchronous WAL fsync on a single NVMe drive starts queueing — you can push higher with synchronous_commit=off or batched writes, but ten thousand is the conservative honest number. 100 ms is the threshold at which UPI, web checkout flows, and most user-perceived "instant" actions start to feel sluggish; if your Singapore user must wait 300 ms for a write to clear in a Mumbai primary, you have either a UX problem or you need a regional write footprint.
The hidden costs nobody puts in the slide deck
When a team proposes a distributed database, the slide deck shows scale numbers, partition tolerance, and "future-proofing." It rarely shows the costs. Here are the five categories of cost you actually pay.
Operational cost
A three-node CockroachDB cluster has roughly three times the operational surface area of a single Postgres instance, but the cost is not 3×; it is more like 5×, because the failure modes multiply rather than scale.
You now have to watch range distribution — are any ranges hotter than others, are splits happening cleanly, has any node fallen behind on Raft replication. You watch HLC clock skew, because a node whose clock drifts past the cluster's --max-offset will kill itself, taking its ranges' leadership offline temporarily. You watch range leases — leadership transfers happen continuously and slow leases cascade into latency spikes. You watch gossip, the cluster's metadata propagation layer, because gossip storms are real.
Backups are no longer pg_dump. They are cluster-aware snapshots that have to coordinate consistent timestamps across all nodes, usually using BACKUP SQL extensions or vendor tools. Restoring a backup to a development cluster is a multi-step ceremony; restoring a single table from yesterday's snapshot — trivial in Postgres with pg_restore -t — is a different operation in CockroachDB or Spanner.
Schema migrations become online distributed operations. Adding an index in Postgres is CREATE INDEX CONCURRENTLY and a few hours of background work. In CockroachDB it is the same syntax but the implementation has to coordinate across every range that holds rows of the table, build the index in parallel, and swap atomically. Most of the time it works. Sometimes a migration runs for days because one range is hot and back-pressure stalls the scan.
Latency cost
Every write in a distributed database is a quorum operation. In a three-node Raft group, the leader must persist locally and have at least one follower acknowledge before the write returns. That is one network round-trip minimum. Across availability zones, that is 1-2 ms in the same region; across regions, 50-200 ms.
Cross-shard transactions add another round-trip via two-phase commit (or its CockroachDB-style equivalent: a transaction record on a primary range, then intent resolution). A single insert that touches two ranges costs more than an insert that touches one. A transaction that crosses three ranges costs more again. The cost is not in the SQL you wrote; it is in the physical layout the system chose for your data, which you do not directly control.
Even reads pay a tax. A single-key read on a single shard in the local region is roughly as fast as a Postgres point lookup. A read that needs to consult the leader (because you want strong consistency, not a stale follower read) pays an extra hop to find the leader. A multi-key read that scans across ranges pays per-range overhead. The local-fast-path that single-node databases give you for free is something you have to engineer for in distributed systems.
Monetary cost
Three nodes minimum, often nine, for a production-credible cluster. Across availability zones, often across regions. CockroachDB Enterprise licensing per vCPU once you cross certain feature gates — backup tooling, change data capture, geo-partitioning. Spanner is direct Google Cloud billing per node-hour, and the per-node-hour price assumes you reserved capacity ahead of time; spot pricing does not exist.
Add at least one extra ops engineer salary to maintain the cluster. In Bengaluru that is 25-40 lakh per year fully loaded; in San Francisco it is $250K. Distributed systems debugging is its own skill set, and you cannot cross-train your existing Postgres DBA into it in a quarter.
Cognitive cost
Developers must learn that BEGIN ... COMMIT blocks may need to be retried. CockroachDB pushes timestamps when it detects conflicts and your transaction may receive a 40001 serialization_failure error that did not exist in Postgres. You wrap every transaction in retry logic. You do this even for transactions you "know" do not conflict, because you cannot know which range a key will live on after the next split.
You learn to think about hot ranges. If your primary key is monotonically increasing — a typical id BIGSERIAL or a timestamp-prefixed UUID — every new write goes to the same range, the range that owns the highest key. That range's leader becomes a hotspot, its disk fills, the cluster splits the range, the new range inherits the same problem. You either change your key design (use UUIDs, hash prefixes, or Spanner-style reverse-bit timestamps) or you accept the bottleneck.
You learn the PACELC trade-offs of your chosen system. You learn what "follower read" means and when it is safe. You learn the difference between strong reads and stale reads. None of these were in your Postgres mental model.
Debugging cost
A query that used to be one row in pg_stat_activity is now a distributed trace with spans across three nodes. A correctness anomaly that looks like a bug — "this read returned the old value, even though the write completed" — may be eventual consistency catching up, or it may be a real bug, and telling them apart takes hours and tools. Partial failures — one node up, one node slow, one node returning errors — produce behaviour that no single-node test suite ever surfaced.
The Postgres scaling ladder
Most apps never reach the top of this ladder. The ones that do tend to know it. Each step has a clear signal that it is time to move to the next.
Step 1: Single node, default settings. A db.t4g.medium on RDS, 2 vCPU, 4 GB RAM, ~$50/month. Handles roughly 1,000 writes per second sustained for typical OLTP workloads — small inserts, indexed lookups. Almost every B2B SaaS startup with a few thousand users lives here for the first year. Razorpay's earliest billing system, Postman's early sync layer, every CRUD-shaped app you have ever shipped: this is the floor.
Step 2: Tuned single node. Same box, better choices: connection pooler (PgBouncer in transaction mode), proper composite indexes, EXPLAIN-driven query review, work_mem and shared_buffers tuned to your workload. Same ~$50-100/month. Now you handle 5,000 writes per second. The signal you are bottlenecked here is pg_stat_statements showing your top queries are doing sequential scans, or your connection count is hitting the pooler limit during peak.
Step 3: Read replicas. One primary, two or three async replicas, route reads to replicas via a DNS-based router or your application. Costs scale linearly per replica, but reads scale near-linearly with replica count. The signal: your primary's CPU is dominated by reads, not writes. Caveat: replica lag is real (typically 10-100 ms), and reading from a replica means accepting that occasional staleness — Werner Vogels' Eventually Consistent essay is the right mental model.
Step 4: Vertical scaling. A db.r6i.16xlarge on RDS — 64 vCPU, 512 GB RAM, NVMe SSD — runs about $4,000-5,000/month and handles 30,000 writes per second sustained for OLTP workloads. The signal: even with replicas, your primary is CPU-bound on writes, or your working set no longer fits in RAM and IO latency is your limit. Vertical scaling is unfashionable in a cloud-native era, but the truth is that one big machine is operationally simpler and often cheaper than a sharded cluster of small ones.
Step 5: Application sharding or Citus. Once you exceed what one box can do, you partition the data — by tenant ID, by user ID, by region — across multiple Postgres instances. Citus is the most popular extension that does this transparently; many teams roll their own. Sharded Postgres can serve millions of writes per second, with the cost being that cross-shard transactions are now your problem to design around (or to avoid entirely, the Helland approach).
Step 6: THEN distributed. If your sharded Postgres has become unwieldy — too many shards to manage, cross-shard transactions are killing you, you genuinely need geo-distributed strong consistency — now it is time to consider CockroachDB, Spanner, or TiDB. You will have specific, measured requirements at this point that justify the operational tax.
Most apps never get past Step 3.
When distributed actually wins
Distributed databases are not a mistake. They exist because the workloads at the top of the ladder are real. The cases where distributed genuinely earns its cost are well-defined.
Hyperscale e-commerce. Amazon at peak handles tens of millions of orders per hour during Prime Day; Flipkart during Big Billion Days does similar. Single-region Postgres simply does not run at that scale; even sharded Postgres becomes operationally untenable when you need the same data accessible with low latency from twenty regions.
Geo-distributed payments at scale. PhonePe and Paytm during a Diwali payments surge do tens of thousands of UPI transactions per second across multiple Indian regions, with strict consistency requirements (no double spends, no lost reversals). Spanner-class systems exist exactly for this — strong consistency across geographic distance with bounded latency.
Multi-region apps with strict consistency. A multinational bank operating across India, the EU, and the US, with regulators in each jurisdiction demanding that local data stay local but global accounting be consistent, is a textbook fit for distributed databases with geo-partitioning. CockroachDB's row-level region pinning and Spanner's region constraints exist for exactly this case.
Search and ad platforms at hyperscale. Google's ad platform, Facebook's news feed ranking, Bing's index — these are at scales where distribution is the only architecture that works. They are also at scales where they often build their own distributed databases rather than buying.
The common thread: specific, measured, large-scale problems with no simpler answer. Not "we might need to scale someday." Not "everyone uses Spanner now." Not "the CTO read a blog post."
The startup that almost bought CockroachDB on day one
A two-person founding team is building a B2B SaaS product — a customer support tool. They have raised a small seed round. The CTO, who has read a lot of distributed systems blog posts, is enthusiastic about CockroachDB: "we need to be ready for scale, we need to support multi-region from day one, we need strong consistency for billing." The proposal is to start on a 3-node CockroachDB cluster with Enterprise licensing.
Let us walk through the math.
The actual workload, day one through year one. A thousand users. Each user generates roughly fifty support tickets per month, each ticket has perhaps ten interactions. That is 500,000 tickets per year, 5 million interactions, plus auxiliary data — customer profiles, audit logs, attachments metadata. Total data: 100 GB. Write throughput at peak: roughly 50 writes per second; average is more like 5. Reads: dominated by dashboard queries, perhaps 200 per second at peak.
Plain Postgres cost. A db.t4g.medium on RDS in ap-south-1 costs about 50/month. Adding a read replica for the dashboard reads brings it to100. Backups are automatic. Point-in-time recovery is automatic. The team operates this in roughly four hours per quarter — read a CloudWatch alert, click "apply minor version update," go back to building the product. Year one cost: $1,200, plus zero engineering hours beyond setup.
CockroachDB cost. Three db.r6i.large instances across three availability zones in ap-south-1: ~500/month. CockroachDB Enterprise licensing for backup and CDC features: ~1,000/month at the base tier. A part-time DRE (database reliability engineer) consultant on retainer to set up monitoring, debug the first hot range incident, and review the schema for distributed-friendly key design: ~2,000/month conservatively. **Year one cost: ~42,000, plus engineering hours from the founders that could have gone to product.**
Cost ratio. 42,000 /1,200 = 35× more expensive, for zero observable benefit at this scale. CockroachDB will not be faster (it will in fact be slower per query, because every write is a quorum). It will not be more reliable at this scale (RDS Multi-AZ Postgres has higher uptime in practice for small workloads than self-managed distributed clusters, because the failure modes are simpler). It will not "future-proof" anything that cannot be future-proofed by a one-week migration when the team actually needs distribution.
The opportunity cost. The CTO who spent the first quarter setting up CockroachDB monitoring and learning HLC tuning did not spend that quarter shipping the integrations that would have driven the next funding round. This is the cost no spreadsheet captures.
The honest year-one architecture. Single managed Postgres in ap-south-1, with one async read replica. PgBouncer in front for connection pooling. Daily snapshot to S3 with seven-day retention. Sentry for error tracking, CloudWatch for the database metrics. Total monthly bill: $150 fully loaded. Total time spent operating it: maybe two hours per month.
When to revisit. When the team hits 50,000 users, or 1 TB of data, or 5,000 sustained writes per second, or genuine multi-region requirements from a customer contract. At that point, run the decision tree again. It is a one-week migration to a sharded Postgres with Citus, and a one-quarter migration to CockroachDB if the metrics genuinely demand it. The cost of moving later is much smaller than the cost of starting too big.
This story is roughly true for the early years of Razorpay, Freshworks, Postman, Zerodha, and most other Indian SaaS companies that are now at significant scale. They started on managed Postgres or MySQL, scaled vertically, added read replicas, eventually sharded — and reached the distributed-database conversation only when the metrics genuinely demanded it. The companies that picked Spanner-class on day one are mostly not around to tell the story.
The premature distribution timeline
The anti-pattern is so common it has a shape. Here is what it looks like across three years.
Why the rewrite usually wins on every axis: distributed databases are optimised for scale that the rewriting team does not have, and the operations they perform inefficiently — small transactions, small reads, simple joins — are exactly the operations Postgres has been optimised for since the 1990s. Removing the distributed layer removes the per-write quorum cost, removes the gateway-to-leader hop on every read, removes the need for transaction-restart logic in application code, and removes the per-shard fan-out cost on multi-row queries. p99 latency typically drops 30-50% for OLTP workloads, infra cost drops 5-10×, and on-call quietens dramatically because Postgres failure modes are familiar to every engineer in the building.
The Indian SaaS pattern
The Indian SaaS scene has produced a striking number of companies that scaled to hundreds of millions of users on managed Postgres or MySQL before adopting any kind of distribution. The pattern is consistent enough to be a useful prior.
Razorpay ran on managed Postgres for years, scaling vertically and adding read replicas, before introducing application-layer sharding for the highest-volume tables. Even today, much of the platform is sharded Postgres rather than a distributed SQL database — they reach for the distributed option only where the workload genuinely demands it (cross-region settlement reconciliation, certain audit pipelines).
Freshworks built much of its CRM and helpdesk product on MySQL, scaling vertically and across read replicas for years. The shift to sharded MySQL came late, and only for tenants whose data exceeded what a single shard could hold.
Postman ran its sync layer on Postgres for a long time, adding caching and read replicas before any sharding. The team's public talks have repeatedly returned to the theme: simpler is better, distributed is a last resort.
Zerodha famously runs much of its trading infrastructure on simple, single-region setups with aggressive use of Postgres, kdb+ for tick data, and minimal distributed-system complexity. The trading day has a known peak and known shape, and a sized-correctly single-node deployment outperforms a misconfigured distributed cluster handily.
Zoho has built a multi-billion-dollar SaaS business with a deliberately conservative database strategy — heavy use of MySQL, careful sharding where needed, zero rush to distributed SQL. They are profitable, they ship reliably, and they have not paid Spanner's bill once.
The lesson is not "Indian SaaS is afraid of distributed databases." The lesson is "Indian SaaS is good at engineering economics." Distribution is for the workloads that need it. Most workloads do not.
A decision framework, in one sentence
Start simple. Move to distributed only when you have a measured, specific scaling problem that simpler approaches genuinely cannot solve.
That is the entire framework. Everything else in this chapter is supporting evidence. The framework is hard to follow not because it is intellectually difficult but because it cuts against the gravitational pull of resume-driven development, conference-talk-driven architecture, and the cultural prestige of "scale-ready" choices. Resist all three. Ship the simple thing. Measure. When the simple thing breaks at a place you can point to with a number, replace just that piece.
The fourteen builds in this series — through Raft and Paxos, MVCC and snapshot isolation, Spanner and TrueTime, CockroachDB and HLC, Calvin's deterministic ordering, the PACELC trade-offs — were not designed to push you toward distributed databases. They were designed to give you the literacy to know when to choose one, and when not to. The closer for Build 14 is the chapter that closes the loop: most of the time, the answer is "not yet, maybe never, definitely not on day one."
If you remember one number from this chapter, remember the cost ratio: a managed Postgres at 200/month versus a distributed cluster at2,500-$5,000/month plus engineering tax. That is 10-25× the cost. You owe that ratio a real reason before you cross it.
References
- Pat Helland — Life Beyond Distributed Transactions: An Apostate's Opinion (CIDR 2007). The classic case for designing applications that avoid distributed transactions, from one of the architects of the field.
- Marc Brooker — You Probably Don't Need a Distributed Database. The blog post that names the pattern this chapter elaborates.
- Werner Vogels — Eventually Consistent. The Amazon CTO's foundational essay on consistency models and the costs of strong consistency.
- Citus Data — Distributed Postgres goes full open source. The most popular Postgres-sharding extension and a worked example of step 5 on the scaling ladder.
- PgBouncer — Project documentation. Connection pooling, the unsexy single most impactful Postgres scaling intervention.
- Eric Brewer — CAP Twelve Years Later. Brewer's own retraction of the "two of three" framing, useful context for why distributed-by-default is the wrong default.