Schema evolution across the CDC boundary

A Razorpay backend engineer ships a migration on Tuesday at 3 p.m. that adds a gst_invoice_id column to public.payments. The migration runs in production in 40 milliseconds — a clean ALTER TABLE. Nothing pages. The engineer goes home. By Thursday morning the analytics team finds that the Iceberg payments table in the lakehouse hasn't received a new row in 38 hours, the Snowflake mirror is showing rows where gst_invoice_id is silently missing, and a downstream Flink job for fraud detection has been crash-looping. None of those three breakages are bugs in the database. They are all the same bug: a schema change crossed the CDC boundary and the pipeline didn't know what to do with it.

A CDC pipeline has three places where schema lives — the source database, the change-event payload, and the sink table — and a schema change in the source races against all three. Every CDC system either freezes the schema (rejects incompatible changes), evolves it forward (adds columns, never removes), or breaks. Production-grade pipelines pin the source-of-truth in a schema registry, classify every change as backward-compatible or breaking, and route incompatible changes through a manual upgrade gate.

The three places schema lives

In a non-CDC world, schema lives in one place — the database that owns the table. The application reads INFORMATION_SCHEMA, the ORM regenerates types, life is fine. CDC adds two more places where the same schema must agree, and those places do not update atomically.

Three schema locations and how they driftDiagram showing the source database schema on the left, the in-flight change-event schema in the middle stored in a registry, and the sink table schema on the right. Arrows indicate that an ALTER TABLE on the source updates the database immediately but the registry and sink update asynchronously, with a window of inconsistency. Where schema lives in a CDC pipeline 1. Source database Postgres / MySQL / Mongo payments id bigint amount numeric status text created_at timestamptz + gst_invoice text updates atomically 2. Change-event schema Schema Registry (Avro) razorpay.public.payments-value v1: {id, amount, status, created_at} v2: + gst_invoice (after Debezium decodes WAL) Race window: ms–s depends on next write 3. Sink table Iceberg / Snowflake / BQ analytics.payments id BIGINT amount DECIMAL status STRING created_at TIMESTAMP (no gst_invoice yet) updates on next merge decode apply
Three places schema lives. The source updates instantly. The registry updates only after a row is written to the changed table (Debezium learns the new schema from the next decoded event). The sink updates only when its writer next runs a MERGE. Between them sits a race window measured in seconds to hours.

The source database is the only place that updates atomically with the ALTER. Everything downstream is eventually consistent, which on a quiet table can mean eventually is hours away. Why this is fundamental and not solvable by being clever: a CDC connector cannot decode WAL or binlog entries that reference a column the connector hasn't yet learned about. Debezium for Postgres maintains its own in-memory schema cache built from ALTER TABLE events that flow through the WAL alongside INSERT and UPDATE events. If no row is written after the ALTER, the connector never decodes any WAL entry that mentions the new column, so the registry is not updated. The window is not a bug — it's the geometry of replication-log decoding.

The three locations need to agree on three things: the set of columns, the type of each column, and the nullability of each column. The taxonomy of schema changes follows directly: you can change the set, you can change the types, you can change nullability, and any combination of these.

The compatibility taxonomy

Not every schema change is created equal. The schema-registry world (Confluent's, Apicurio's, AWS Glue's — they all converge on the same model) classifies changes into compatibility modes that determine whether a change can flow through a CDC pipeline without breaking consumers.

Change Backward Forward Full
Add column with a default OK OK OK
Add column without default OK Break Break
Drop column Break OK Break
Rename column Break Break Break
Widen type (int32int64) OK Break Break
Narrow type (int64int32) Break Break Break
Change NOT NULL to nullable OK Break Break
Change nullable to NOT NULL Break OK Break

Backward compatible means new readers can read old data. This is the default in most schema registries: an Iceberg table whose schema added gst_invoice last week can still read parquet files written before the column existed (those rows just get NULL). Forward compatible means old readers can read new data — a frozen Flink job with last week's schema can still parse a payload that has the new column (it ignores the unknown field). Full compatibility is both — required when a pipeline has heterogeneous consumer versions running simultaneously.

The two changes that break everything regardless of mode: drop column and rename column. Renames are particularly nasty because to the database they are atomic — ALTER TABLE payments RENAME COLUMN status TO state is one DDL statement — but to the CDC system they look like a drop-and-add, and worse, the connector emits the new column with all NULL priors because it's never seen state before. Why renames are the biggest source of "the pipeline silently lies" incidents: the connector can't tell semantically that the new column's data should come from the old column. It treats them as unrelated. Snowflake, Iceberg, and Delta all silently accept the new schema. Then dashboards built on status quietly stop receiving rows. The bug surfaces when someone notices "wait, why has status been NULL for everyone for the past three days".

The four strategies a CDC pipeline picks from

Once you accept that schema changes happen and the boundary is racy, the question becomes: how does the pipeline respond? Every production CDC system in 2026 picks one of four strategies, often a different strategy per sink.

Strategy 1 — Freeze. Reject any schema change at the registry boundary. The pipeline is configured with schema.compatibility = NONE or its equivalent. The first incompatible change causes Debezium (or the upstream producer) to fail loudly: "schema X cannot be registered, conflicts with Y". The DBA's ALTER TABLE is not blocked at the source — the source is still happily mutating — but the CDC connector enters a failed state. This forces a human to intervene. In a high-trust org with strong DBA-data-engineer collaboration, this is the right strategy: it makes schema change a planned event, not a surprise. Stripe famously runs this way.

Strategy 2 — Auto-evolve forward only. The pipeline accepts add-column changes, refuses everything else. This is what schema.compatibility = BACKWARD plus a sink writer like Iceberg's merge-schema=true does. New columns appear in the sink as NULL-filled extensions; old data remains queryable. Drops, renames, type narrowings still trigger a hard failure. This is the default for the Razorpay/Flipkart/Swiggy class of org — most schema changes in their experience are additive (new feature ships → new column).

Strategy 3 — Replay through a transformation. Instead of evolving the sink, ship every row through a translation layer that maps the source schema to a stable contract schema. The contract sits in a separate registry; the source can churn but the contract evolves on its own (much slower) cadence. This is what data contracts (chapter 33) and the dbt source layer do. The cost is one more hop and the discipline of maintaining the contract; the benefit is that 90% of schema changes never surface to consumers.

Strategy 4 — Break, alert, fix. No automatic evolution at all. Every schema change requires a manual cutover: stop the connector, stop downstream consumers, update sink schema, restart everything. Used by orgs that run CDC for compliance/audit (every change must be approved by a human) or by teams whose data lake schema is itself a compliance artefact subject to quarterly review. Slow, careful, expensive — but the schema state is always known.

Four CDC schema-evolution strategies on a tradeoff planeA 2D plane with engineering rigour on the x-axis and breakage rate on the y-axis. Four strategies are plotted: Freeze in the upper-left, Replay-through-contract in the lower-right, Auto-evolve forward in the middle, Break-and-fix in the lower-left. The strategy plane Engineering rigour required → Surprise breakage → Break & fix low rigour high breakage Auto-evolve add-only, break on rest Freeze connector dies on incompatible change Contract layer stable downstream, extra hop
Four strategies. Most production pipelines mix two: auto-evolve for the lakehouse sink (cheap, additive changes are the common case), contract layer for the metric store and serving systems (stability matters more than freshness).

Build it: a Debezium + Schema Registry pipeline that survives an ALTER TABLE

Let's wire the moving parts together. A small Python script that talks to Confluent's Schema Registry, registers a baseline schema, simulates a Debezium-style update with an added column, and shows what the registry does at each compatibility setting.

# schema_evolution_demo.py — exercise the schema registry boundary that sits
# between a Debezium connector emitting Razorpay payments events and the Iceberg
# sink that consumes them. Uses confluent-kafka-python's SchemaRegistryClient.

import json
from confluent_kafka.schema_registry import SchemaRegistryClient, Schema

sr = SchemaRegistryClient({"url": "http://schema-registry:8081"})
subject = "razorpay.public.payments-value"

# Set the compatibility mode for this subject. BACKWARD is the production default
# for a lakehouse sink — new readers can read old data, old data still queryable.
sr.set_compatibility(level="BACKWARD", subject_name=subject)

# v1: the schema as it stands on Tuesday morning. Fields match the Postgres table
# the Debezium connector is mirroring.
v1_schema = {
    "type": "record", "name": "Payment", "namespace": "razorpay.public",
    "fields": [
        {"name": "id", "type": "long"},
        {"name": "amount", "type": "long"},                 # rupees, in paise
        {"name": "status", "type": "string"},               # captured / failed / refunded
        {"name": "created_at", "type": "long"},             # epoch micros
    ]
}
v1_id = sr.register_schema(subject, Schema(json.dumps(v1_schema), "AVRO"))
print(f"v1 registered as schema id {v1_id}")

# Tuesday 3 p.m. — DBA runs ALTER TABLE payments ADD COLUMN gst_invoice TEXT;
# The next write produces a Debezium event with the new field. Connector
# computes a new Avro schema and asks the registry to register it.

# CASE A — backward-compatible: new field has a default. Old consumers can still
# read events written under v2 because they ignore unknown fields; new consumers
# reading old events get the default for the missing field.
v2_compatible = {
    "type": "record", "name": "Payment", "namespace": "razorpay.public",
    "fields": [
        {"name": "id", "type": "long"},
        {"name": "amount", "type": "long"},
        {"name": "status", "type": "string"},
        {"name": "created_at", "type": "long"},
        {"name": "gst_invoice", "type": ["null", "string"], "default": None},
    ]
}
v2_id = sr.register_schema(subject, Schema(json.dumps(v2_compatible), "AVRO"))
print(f"v2 (add-with-default) registered as schema id {v2_id}")

# CASE B — incompatible: drop a column. The registry rejects.
v3_breaking = {
    "type": "record", "name": "Payment", "namespace": "razorpay.public",
    "fields": [
        {"name": "id", "type": "long"},
        {"name": "amount", "type": "long"},
        # status removed — incompatible because old readers expect it
        {"name": "created_at", "type": "long"},
    ]
}
try:
    sr.register_schema(subject, Schema(json.dumps(v3_breaking), "AVRO"))
except Exception as e:
    print(f"v3 (drop-column) REJECTED: {e}")

A representative run:

v1 registered as schema id 4471
v2 (add-with-default) registered as schema id 4472
v3 (drop-column) REJECTED: Schema being registered is incompatible
  with an earlier schema for subject "razorpay.public.payments-value", details:
  [{errorType:'READER_FIELD_MISSING_DEFAULT_VALUE', description:'The field
  status at path /fields/2 in the new schema has no default value and is
  missing in the old schema'}]

Walking through the lines that earn their place:

The script is ~50 lines. The same structure scales to a real pipeline: replace the in-script schema with the schema Debezium emits, replace the print statements with metrics, and you have the schema-evolution gate of a production CDC pipeline.

What sinks actually do when the schema changes

The registry decision is half the story. The other half is what the sink writer does with a new schema. Different sinks have radically different defaults.

The mismatch between source and sink schema-change semantics is where most "subtle drift" bugs live. A Postgres source supports RENAME COLUMN. The Iceberg sink supports RENAME COLUMN (it is metadata only). But the path between them — Debezium → Kafka → Iceberg writer — has no concept of "rename"; it sees a drop and an add. So a rename on the source becomes a drop and an add on the sink. Reports built on the old name return zeros.

Common confusions

Going deeper

The dbhistory topic and decoding old events

Debezium's per-connector schema-history topic stores every ALTER TABLE event keyed by LSN (or binlog position). When a consumer needs to decode a six-month-old event from the change-events topic, it looks up the schema-id from the message header, fetches the full Avro schema from the registry, and decodes. This works as long as the registry has retained that schema version — which it does forever by default. The schema-history topic, in contrast, is for the connector itself to rebuild its in-memory state on restart; consumers never read it. The two are often confused. Read /wiki/debeziums-architecture for where each lives.

How Razorpay rolls schema changes in production

The pattern most Indian payments-scale orgs converge on, learned from outages: all schema changes are dual-coded. Step 1, the backend team adds the new column with a default; the migration ships during a low-traffic window. Step 2, the data-platform team validates the registry has accepted the v2 schema and the sink has evolved (this is automatic for Iceberg, manual for ClickHouse). Step 3, the backend team starts writing data to the new column. Step 4, downstream consumers are notified via the data-contract registry that v2 is available. The whole rollout takes 7–10 days for a column that the backend could have shipped on Tuesday afternoon. The friction is intentional — it's what prevents the Thursday-morning fire-drill.

Schema evolution under exactly-once with transactional sinks

Build 9's exactly-once story interacts with schema evolution in a subtle way. A transactional Kafka sink commits a batch atomically — but what if half the batch is encoded under schema v1 and half under v2 (because the schema rotated mid-batch)? The Kafka transactional protocol handles this fine because Avro's schema-id is per-record, not per-batch. But a Flink job's checkpoint that covers a state migration across the schema change is harder: the operator state has to know how to upgrade itself when it reads a v2 record using v1's deserialiser. Flink's TypeSerializerSnapshot is the mechanism; it is one of the gnarliest parts of Flink to get right. See /wiki/checkpointing-the-consistent-snapshot-algorithm.

Why most production teams pin the Debezium schema-name

By default, Debezium derives the Avro schema name from the source table — razorpay.public.Payment. If the source table is renamed (say, paymentstxns), the schema name changes and the registry treats it as a brand-new subject. Every downstream consumer breaks. The mitigation is transforms.unwrap.add.headers plus an explicit schema.name.adjustment.mode=avro and a fixed topic.naming.strategy. Boring but load-bearing config.

Where this leads next

The next chapter, /wiki/snapshot-cdc-the-bootstrapping-problem, takes the snapshot phase from a different angle: not "how does Debezium do it" but "what does a sink need to receive to safely apply both snapshot rows and streaming rows without double-counting". Schema evolution interacts with bootstrapping in one specific way — the snapshot reads under the current schema, but the streaming events from before the snapshot (if you bootstrap from a position with WAL retention) are encoded under the historical schema. The registry holds both.

After that, Build 12 picks up the lakehouse story: /wiki/iceberg-internals-snapshot-isolation-on-s3 shows what schema evolution looks like inside Iceberg's metadata tree — every schema is a versioned manifest, every snapshot points to a specific schema-id.

For the contract-layer strategy, /wiki/data-contracts-the-producer-consumer-boundary has the full design: how to define a contract that's stable even when the source schema churns daily.

References