In short

Document databases are sold on one line: no migrations needed. Want a new field? Just write it. Want to remove one? Stop writing it. Want to change a type? Write the new type. The database does not care; the database has no schema to change. Compared to a relational ALTER TABLE on a billion-row table — minutes of write blocking, hours of online migration, the whole on-call ceremony — this looks like freedom.

It is freedom for the first six months. Then the bill arrives.

The bill arrives because schemas do not actually vanish — they relocate. Every field still has a meaning; every field still has a type; every read still has to know which version of the field a document carries. When the database stops enforcing this, your application code does. And your application code is not one place. It is the mobile API service, the analytics ETL, the fraud-scoring batch job, the customer-support admin panel, and the half-finished migration script someone wrote two quarters ago. Each of those places independently decides what shape a document has, and they drift. Three months in, you find if (doc.amount) {} else if (doc.amount_v2) {} else if (doc.amountInPaise) {} in the hottest read path. A year in, the codebase has five variants of every field. Two years in, you run a forced backfill — exactly the migration you were avoiding — except now it is ten times more painful because the data is messier and the application paths that depend on each variant are scattered.

This chapter is the honest accounting. The promised flexibility is real; so is the hidden cost. We walk through the four compounding traps (multi-version queries, inconsistent enforcement across services, delayed-but-still-needed backfills, hard-to-detect drift), then through the four mitigation patterns that real teams adopt ($jsonSchema validators, versioned schema_version fields, code-side schema classes like Pydantic and Zod, explicit periodic backfills), and end with the comparison nobody puts on a marketing slide: relational databases enforce schema discipline through pain now; document databases distribute that pain over time. Pick the trade-off your team can sustain — but pick it knowingly.

The thesis

A relational database is loud about its schema. ALTER TABLE orders ADD COLUMN refund_amount DECIMAL(10,2) is a statement that touches every row, that the planner notices immediately, that a code reviewer can find by grep-ing migrations, and that lives forever in the audit log. It is annoying. It is also extremely visible.

A document database is quiet. db.orders.insertOne({..., refund_amount: 0}) is a single insert. The first time you do it, exactly one document in the entire collection has a refund_amount field. There is no migration; there is no audit; there is no announcement. The other four hundred million documents in the collection do not have the field, and the database does not care. Why this matters: visibility is what makes schema changes safe. A change you can grep for is a change a teammate can find when they write code that depends on it. A change buried in the body of an insertOne call in a microservice is invisible to everyone who is not currently editing that line.

The thesis of this chapter is simple. Schema enforcement is a constant cost. It does not go away when the database stops paying it. It just moves. It moves into the application — and the application is the worst possible place to centralise schema rules, because the application is many programs, many languages, many teams, and many releases. The flexibility you gained in the database is paid back in distributed schema chaos in the code.

This is not an argument against document databases. They are the right answer for a meaningful set of problems. It is an argument against the marketing of document databases — the implicit claim that schema-on-write is pure overhead and schema-on-read is pure win. The truth is that schema-on-read is overhead too; it is just charged on a different credit card.

The promised flexibility, exactly as advertised

Before the critique, give the promise its full due. There are four operations a document database lets you perform with literally zero ceremony:

Add a field. You start writing it in new documents. That is the entire change. Old documents do not have the field; new documents do. Reads that ask for it on old documents get a missing-field response (undefined in Mongo, null in your driver). No ALTER TABLE, no online migration, no downtime, no schema-evolution review meeting.

Remove a field. You stop writing it in new documents. Old documents still have it; new ones do not. The database does not care. If a query happened to filter on the field, the old documents still match and the new ones do not — which is sometimes what you want and sometimes a bug, but either way it is your application's problem, not the database's.

Change a type. You start writing the new type. Old documents have the old type; new documents have the new type. A field can be a number in some documents and a string in others, in the same collection. MongoDB will dutifully store both and return both.

Rename a field. Just use the new name. Reads that look for the old name keep finding it in old documents. Reads that look for the new name find it in new ones. The two names coexist forever — until you decide to do something about it.

In a relational database, each of these operations is a migration. On a small table the migration is cheap; on a large table it is anywhere from "kind of annoying" to "we will deploy on a Sunday at 3am with a rollback plan and three engineers on call". For an early-stage product where the data model genuinely is changing every week, removing that ceremony is an enormous productivity win. The first ten product iterations cost weeks less in document land than in relational land.

This is real. This is genuinely useful. This is also the part everybody talks about. Now we talk about the part nobody talks about.

The false promise: documents look easy to evolve until they don't

The false promise: clean writes early, messy reads foreverMonth 1: looks easy// schema iteration v1db.txn.insert({ amount: 500, currency: "INR"})read code: doc.amount - doneYear 3: the realityfunction getAmount(doc) { // v1 (rupees) if (doc.amount != null) return doc.amount * 100; // v2 (paise, regulator change) if (doc.amount_paise != null) return doc.amount_paise; // v3 (multi-currency rollout) if (doc.amount_minor != null) return doc.amount_minor; // v4 (a different team's path) if (doc.amountInPaise != null) return doc.amountInPaise; // v5 (subdoc, ledger redesign) if (doc.money && doc.money.value) return doc.money.value; throw new Error("unknown shape");}5 variants. Every read path. Every service.Bugs hide in the gaps between branches.No ALTER TABLE.No downtime.No migration ceremony.Looks like a free win.The cost is invisible at write time.It only shows up at read time —forever, in every reader, in every service.

Look at the diagram for a moment. The left side is what the document-database pitch shows you on day one: db.txn.insert({amount: 500, currency: "INR"}), four lines, no schema. The right side is what nobody shows you on day one: a getAmount(doc) function with five branches, one per historical schema variant, written by five different engineers across three teams over three years. Each branch was a "no migration needed!" moment at the time it was added. The cumulative effect is a function that nobody fully trusts.

This is the false promise. Each individual schema change is easy. The aggregate of schema changes, applied without migration discipline, is not easy — it is a slow-growing tax on every read in the system, and the tax compounds because each new variant adds another branch every reader has to handle.

The hidden costs, in detail

There are four costs, and they compound. Let us take them one at a time.

Multi-version queries

Every read path has to handle every historical shape of every field. The getAmount example above is unfortunately realistic — at a fintech of any size, fields acquire variants every quarter, and the reader code grows monotonically. The branches do not get removed because removing them requires proving no document in production still uses the old shape, and proving that requires a full collection scan, and a full collection scan on the prod cluster is its own ceremony.

The deeper problem is that each if/else branch is a place where a bug can hide. If branch v3 has a subtle off-by-100 (return doc.amount_minor / 100 vs return doc.amount_minor), that bug fires only on documents that match exactly v3 — not v1, not v2, not v4. Detecting it requires a test fixture that exercises the v3 shape specifically, which most test suites do not have because they were written when v3 did not exist. Production bugs in this space are notoriously hard to reproduce.

Inconsistent enforcement across services

Two services write to the same collection. Service A is the mobile API, written in Node by one team; it calls the field amount. Service B is the back-office settlement job, written in Python by another team; it calls the field amount_inr. Neither team knows the other exists, because in a microservice architecture nobody owns the database — they own their service.

Both writes succeed. The collection now contains documents with amount, documents with amount_inr, and documents with both (when the same transaction is touched by both services). A query that filters WHERE amount > 1000 returns the mobile-originated documents and misses the settlement-originated ones. A dashboard built on the query is silently wrong.

Why this happens: the database is not the schema authority. Each service decides its own field names, and there is no central place that says "in the transactions collection, the amount field is named X". In a relational world, the column name is in the schema, and any service that uses a different name gets a SQL error on the first insert. In the document world, the database accepts both and you discover the divergence at query time, often months later.

Backfills are still needed — they are just delayed

The honest dirty secret of document databases is that you still end up running migrations, just later, and under more pressure. Sooner or later a query needs to assume a uniform shape — because the analytics team needs a clean dataset, or because a new feature requires a new index, or because a regulator audit needs reconciled records — and at that point you have to backfill every old document to the new shape.

The backfill is exactly the migration the relational database would have made you do upfront. Except now: (a) the data has more variants, because more time has passed; (b) the application code has accreted more if/else branches that all have to be removed once the backfill completes, and removing them safely requires verifying no service still depends on the old shape; (c) the backfill itself is operationally riskier because the collection is now ten times bigger than it would have been on day one when the migration could have been a five-minute affair.

The accounting is brutal. Upfront migration: one engineer-day. Delayed backfill: ten engineer-days, plus a year of accumulated reader-code complexity that you now have to clean up.

Schema drift is hard to detect

In a relational database, \d+ orders in psql tells you every column, every type, every index. The schema is a queryable, auditable artefact. In a document database, there is no INFORMATION_SCHEMA for the collection. You can run db.orders.findOne() and see one document's shape, but that tells you nothing about whether other documents in the collection have different shapes.

Tools exist to mitigate this — MongoDB Compass has a "Schema" tab that samples documents and reports field-presence ratios, and there are open-source schema-inference tools — but they all sample, and sampling misses the long tail. The document with the weird historical shape that breaks the report is exactly the one a sample is unlikely to catch.

The practical effect is that new engineers do not know what fields a document has until they query for them and see what comes back. Onboarding becomes spelunking. Documentation becomes wishful thinking, because nobody can keep it in sync with a schema that has no canonical form.

A worked example: an Indian fintech and the compounding bill

The transaction collection that ate three engineering quarters

A Bengaluru-based payments startup — call it PayBharat — launches in early 2024. The transaction model is simple: every payment is a document.

// 2024-Q1: the launch schema, MongoDB
db.transactions.insertOne({
  _id: ObjectId(),
  user_id: "u_8a7f...",
  merchant: "Big Bazaar",
  amount: 500,           // rupees
  currency: "INR",
  status: "success",
  created_at: ISODate()
})

Six months in (2024-Q3), the RBI's revised reporting framework requires amounts to be reported in paise (1 INR = 100 paise) for reconciliation. The mobile API team adds a new field:

// 2024-Q3: regulator change
db.transactions.insertOne({
  _id: ObjectId(),
  user_id: "u_3b2c...",
  merchant: "BookMyShow",
  amount_paise: 35000,   // ₹350.00 in paise
  currency: "INR",
  status: "success",
  created_at: ISODate()
})

No migration. Old documents still have amount (rupees); new documents have amount_paise (paise). Reader code grows a branch:

function amountInPaise(doc) {
  if (doc.amount_paise != null) return doc.amount_paise;
  if (doc.amount != null) return doc.amount * 100;
  throw new Error("transaction missing amount");
}

So far, the cost is one helper function. Manageable.

Late 2024-Q4, the international team launches USD support. They write a different shape because they were not aware of the paise convention:

// 2024-Q4: international team, unaware of paise convention
db.transactions.insertOne({
  _id: ObjectId(),
  user_id: "u_intl_...",
  merchant: "AWS",
  amount_minor: 4999,    // $49.99 in cents
  currency: "USD",
  status: "success",
  created_at: ISODate()
})

amount_minor is the new generic name; amount_paise is the old INR-specific name. They mean the same thing for INR documents. But now the helper has three branches:

function amountInMinor(doc) {
  if (doc.amount_minor != null) return doc.amount_minor;
  if (doc.amount_paise != null) return doc.amount_paise;
  if (doc.amount != null) return doc.amount * 100;
  throw new Error("transaction missing amount");
}

By 2025-Q2, the ledger team redesigns transactions to support split payments. They wrap money in a sub-document:

// 2025-Q2: ledger redesign
db.transactions.insertOne({
  _id: ObjectId(),
  user_id: "u_split_...",
  merchant: "Zomato",
  money: { value: 89900, currency: "INR", scale: 2 },
  status: "success",
  splits: [
    { wallet: "cashback", amount: 4500 },
    { wallet: "card",     amount: 85400 }
  ],
  created_at: ISODate()
})

Now getAmount has four branches. Plus the splits introduce a new question: which is the canonical amount, the top-level money.value or the sum of splits[].amount? Reader code disagrees across services. The fraud team sums splits; the analytics team reads money.value. They mostly agree, but for documents where the splits do not exactly equal the total (rounding, currency conversion), they produce different fraud signals and different revenue numbers. This is found by the CFO during quarterly close. Three engineers spend two weeks on the reconciliation.

By mid-2026, the bill comes due. The risk team needs a clean transactions dataset for a regulator audit. The query needs uniform shape. The CTO approves a forced backfill:

# 2026-Q3: the migration we avoided in 2024-Q3
for doc in db.transactions.find():
    minor = compute_canonical_amount(doc)  # the four-branch helper
    db.transactions.update_one(
        {"_id": doc["_id"]},
        {"$set": {"money": {"value": minor, "currency": doc.get("currency", "INR"), "scale": 2}},
         "$unset": {"amount": "", "amount_paise": "", "amount_minor": ""}}
    )

On a 400 million document collection, this is a ten-day operation: write throttling, batched updates to avoid replication lag spikes, dry-runs on a snapshot, careful sequencing with the application services that still read the old fields, then a coordinated deploy to remove the four if/else branches from every reader. Total cost: 10 engineer-days of focused work, plus six weeks of calendar time for the coordination.

The relational counterfactual: in 2024-Q3, when the regulator change came, run ALTER TABLE transactions ALTER COLUMN amount TYPE BIGINT USING amount * 100; ALTER TABLE transactions RENAME COLUMN amount TO amount_paise;. On a 50-million-row table at the time, an online schema-change tool like pt-online-schema-change or gh-ost runs this in a few hours of background work. Total cost: 1 engineer-day. Every subsequent reader works against a single, canonical schema. The drift never happens because the database refuses to let it happen.

The flexibility was worth one day. The bill was ten. Why the multiplier is so big: the cost is not just the backfill. It is the two years of reader-code complexity, the bugs that hid in the branches, the reconciliations that were silently wrong, the engineer-hours spent debugging shape mismatches, the onboarding time for new engineers who could not figure out what a transaction looked like. The migration deferred is not the migration averted; it is the migration with compound interest.

The cost over time: integrating the area under the curve

Cost over time: relational pays upfront, document pays forever (and the integral grows)timecostlaunch6 moyear 1year 2year 3migrationmigrationRelational: spikesdiscrete, visible, plannedDocument: compounding driftcontinuous tax on every readintegral grows over timeforced backfillarrives anywayThe blue spikes are painful but bounded. The red curve is gentler but its integral — the total cost — keeps growing.

Look at the shape of the two curves. The relational curve (blue) is a series of sharp spikes — each ALTER TABLE is a planned, discrete event with a clear before-and-after. Between migrations the cost is approximately zero: the schema is stable, every reader works against the same shape, and the database catches violations.

The document curve (red) is a slow climb. There is no spike on day one — flexibility looks free. But every schema variant added by every team without a migration adds a tiny ongoing tax on every read in the system. The tax is small per read but applied billions of times per day across thousands of reader code paths. The integral — the area under the curve, which is the total engineering cost — grows monotonically. And the forced backfill at the end, when it finally happens, is itself a spike, except now layered on top of three years of accumulated debt.

Why this is the most important diagram in the chapter: the human brain is bad at integrating slow-growing costs. A spike is visible — you remember the weekend you spent on the Q3 migration. A slow climb is invisible — you do not remember the thousand small if/else branches you added one at a time. By the time the integral is obviously larger than the spike-sum would have been, you have already paid it.

Mitigation: how real teams actually live with documents

Document databases are not unworkable. Teams ship them at massive scale — Uber, Adobe, Toyota, the Indian Aadhaar system uses MongoDB for parts of its identity infrastructure. They do it by rebuilding schema discipline at a different layer. There are four patterns, and a healthy production system uses three of them simultaneously.

Pattern 1: JSON Schema validation at the database

MongoDB, since version 3.6, supports $jsonSchema validators on collections. You define a schema as a JSON Schema document, attach it to the collection, and the database rejects writes that do not conform.

db.createCollection("transactions", {
  validator: {
    $jsonSchema: {
      bsonType: "object",
      required: ["user_id", "money", "currency", "status", "created_at"],
      properties: {
        user_id:     { bsonType: "string", pattern: "^u_" },
        money:       { bsonType: "object",
                       required: ["value", "scale"],
                       properties: {
                         value: { bsonType: "long" },
                         scale: { bsonType: "int", minimum: 0, maximum: 8 }
                       } },
        currency:    { enum: ["INR", "USD", "EUR", "GBP"] },
        status:      { enum: ["pending", "success", "failed", "reversed"] },
        created_at:  { bsonType: "date" }
      }
    }
  },
  validationLevel: "strict",
  validationAction: "error"
})

This recovers most of the schema discipline a relational database gives you for free. Inconsistent writes are rejected at the database boundary; field types are enforced; required fields cannot be omitted. See the MongoDB schema validation documentation for the full feature surface.

The catch is validationLevel. "strict" enforces validation on every write, including updates. "moderate" only enforces it on documents that already conform, leaving non-conforming legacy documents untouched. Most teams introducing validators to existing collections start with "moderate" because "strict" would reject updates to old documents that violate the new schema — and the only fix for that is the backfill you were trying to defer.

Pattern 2: versioned schemas inside the document

Include an explicit schema_version field in every document. Migrators handle older versions transparently:

def normalise(doc):
    v = doc.get("schema_version", 1)
    if v == 1:
        doc = migrate_v1_to_v2(doc)
        v = 2
    if v == 2:
        doc = migrate_v2_to_v3(doc)
        v = 3
    return doc

Every read goes through normalise. Every write writes the latest version with schema_version: N set. Old documents stay on disk until you choose to backfill them, but reader code only deals with the canonical latest shape after normalise runs.

This is a pattern from the document-modelling literature — it appears as the "Schema Versioning Pattern" in MongoDB's data modelling guide and as a recurrent theme in critiques of naive document usage like Sarah Mei's influential 2013 essay. It works. Its cost is that every read pays a small CPU tax for the migrators, and the migrators themselves accumulate over time — eventually you do want to garbage-collect old versions, which means... a backfill. The pattern delays the bill but does not cancel it.

Pattern 3: code-side schema classes

Tools like Pydantic (Python), Zod (TypeScript), Joi (JavaScript), and Pkl (cross-language) let you declare schemas in code and have all reads/writes pass through them. Every write serialises from a typed object; every read deserialises into a typed object; schema violations throw at the application boundary.

from pydantic import BaseModel, Field
from typing import Literal
from datetime import datetime

class Money(BaseModel):
    value: int  # in minor units
    scale: int = Field(ge=0, le=8)

class Transaction(BaseModel):
    user_id: str = Field(pattern=r"^u_")
    money: Money
    currency: Literal["INR", "USD", "EUR", "GBP"]
    status: Literal["pending", "success", "failed", "reversed"]
    created_at: datetime
    schema_version: int = 3

# Every write goes through this
def write_transaction(txn: Transaction):
    db.transactions.insert_one(txn.model_dump())

# Every read goes through this
def read_transaction(txn_id) -> Transaction:
    raw = db.transactions.find_one({"_id": txn_id})
    return Transaction.model_validate(normalise(raw))

This is the pattern that scales best in practice, because the schema lives next to the application code that uses it, and language-level type-checking catches violations at compile time (or at least at PR review time). The catch is that it only enforces discipline within services that use the same Pydantic model. Two services with two different model files can still drift — which means the model itself has to be a shared library, owned by one team, versioned and released like any other dependency.

Pattern 4: explicit periodic backfills

Accept that backfills will happen, and plan for them. Schedule them quarterly or annually as part of a "schema-debt sprint". Migrate the long tail of old documents to the latest shape; remove the legacy if/else branches from reader code; tighten the validators. This is the migration discipline a relational database imposes — voluntarily applied. Teams that do this stay sane. Teams that do not become the multi-version-query case study.

The real comparison

Schema-on-read: the application becomes the schema authoritySchema-on-write (relational)PostgreSQL / MySQLone schema, owned by the databaseservice A → validated → DBservice B → validated → DBservice C → validated → DBAll services see one shape.Drift is impossible.Onboarding: read the schema.Schema-on-read (document)MongoDB / Couchbase / DynamoDBN schemas, one per service, no central authorityservice A → shape A → DBservice B → shape B → DBservice C → shape C → DBEvery read needs validation.Drift is the default.Onboarding: spelunk and pray.The fix: rebuild the schema authority outside the database($jsonSchema validators · versioned docs · shared Pydantic/Zod models · periodic backfills)If you do all four, you recover most of what schema-on-write gave you for free. If you do none, you pay the integral.

The honest comparison, free of marketing in either direction:

Relational databases (Postgres, MySQL, SQL Server, Oracle) pay the schema cost upfront. Every change is a migration; every migration is a planned event; every reader works against one canonical schema. The pain is concentrated in the migration moments — and is annoying enough that it nudges teams toward designing schemas carefully, because the cost of getting it wrong is visible. The DB enforces discipline.

Document databases (MongoDB, Couchbase, DynamoDB, AWS DocumentDB) defer the schema cost. Every change is a write; every reader has to handle every variant; the canonical schema lives in the application code, distributed across services, often inconsistently. The pain is spread out — a thin tax on every read forever — and is invisible enough that teams underweight it, because no single moment is acutely painful. The application enforces discipline (or fails to).

Both are valid. Document databases are the right choice when:

Relational databases are the right choice when:

Postgres JSONB is a hybrid. Postgres lets you have rectangular columns and a JSONB column for the variable bits. This is increasingly the most pragmatic answer for teams who want most of their data validated rigorously and a few fields evolving freely — see Eric Meyer's overview of JSON in Postgres for the engineering tradeoffs. The price is some query complexity (->>, @>, GIN indexes on JSONB paths) and slightly less efficient storage for the JSONB column, but you get to pay schema-enforcement costs only where you want them.

DynamoDB is a special case. It is schema-free at the database, but its access patterns are so constrained (single-table design, predetermined partition keys) that you end up imposing a tight schema in your application out of operational necessity. The lesson generalises: even when the database lets you be flexible, scale eventually forces discipline. The only question is who enforces it.

Pat Helland's classic essay If You Have Too Much Data, then "Good Enough" Is Good Enough is a useful frame for thinking about this. At small scales, schema discipline is cheap and worth it. At very large scales, perfect schema discipline becomes impossible regardless of database choice — you accept some drift and design systems that tolerate it. The interesting middle scale, where most teams live, is where the document-vs-relational choice matters most.

What we learned

A summary, for the engineer who needs to make this choice next quarter:

  1. The "no migrations" promise is real, but partial. You skip the database migration. You do not skip the schema work — it just relocates to your application code.

  2. The hidden cost compounds. Multi-version queries, inconsistent enforcement, delayed backfills, and undetected drift all grow over time. The integral of small ongoing costs eventually exceeds the spike of an upfront migration.

  3. Mitigation patterns exist and they work. $jsonSchema validators, versioned documents, code-side schema classes, periodic backfills. Use three of these four, and you recover most of what schema-on-write gave you for free.

  4. Pick based on what your team can sustain. A small team with one service can ride the document-flexibility curve for years. A 50-engineer organisation with 30 services touching the same collection cannot — they need the central schema authority a relational database provides, or they need the disciplined application-side patterns to substitute for it.

  5. Postgres JSONB is often the right answer. Most data is rectangular. Some data is variable. JSONB lets you be honest about which is which.

The next chapter — chapter 140: the aggregation pipeline — turns to MongoDB's answer to GROUP BY: composable stages that build query trees out of small declarative steps, and that have to do their work without the type information a relational planner gets for free.

References

  1. Pat Helland — If You Have Too Much Data, then "Good Enough" Is Good Enough — the canonical essay on schema discipline at scale; lossy formats, eventual consistency, and the limits of perfect data hygiene.
  2. Sarah Mei — Why You Should Never Use MongoDB — the influential 2013 critique of naive document modelling; the social-network case study that taught a generation what not to do with embedding.
  3. MongoDB documentation — Schema Validation — the canonical reference for $jsonSchema validators, validationLevel, and validationAction.
  4. MongoDB documentation — Building with Patterns: The Schema Versioning Pattern — the official treatment of versioned-document migration; one of twelve patterns in MongoDB's data-modelling guide.
  5. PostgreSQL documentation — JSON Types and JSONB — the canonical reference for JSONB storage, operators, and the GIN indexing strategy that makes Postgres a pragmatic relational+document hybrid.
  6. Martin Kleppmann — Designing Data-Intensive Applications, Chapter 2: Data Models and Query Languages — the deepest single treatment of the document-vs-relational tradeoff; the schema-on-read vs schema-on-write framing this chapter borrows.