MySQL binlog format and replication protocol

The previous chapter built a Postgres logical-decoding consumer in 30 lines. MySQL has the same primitive but a different shape: instead of an in-process plugin emitting decoded events, MySQL writes a binary file called the binlog and asks consumers to pretend they are a replica. Every CDC tool you have heard of — Debezium MySQL, AWS DMS, Maxwell, Canal at Alibaba — is doing the same thing: opening a TCP connection, sending COM_BINLOG_DUMP_GTID, and parsing the bytes that come back. This chapter writes that connection by hand.

The MySQL binlog is an append-only file of typed events — WRITE_ROWS_EVENT, UPDATE_ROWS_EVENT, DELETE_ROWS_EVENT, plus TABLE_MAP_EVENT and QUERY_EVENT. A CDC consumer connects on the standard MySQL port, identifies as a replica, and the server streams events from a position (filename + offset, or a GTID set). The choice of binlog_format — STATEMENT, ROW, or MIXED — decides whether the events are SQL strings or before/after images. ROW is the only format that supports CDC correctly.

What the binlog actually is

MySQL has had a binlog since 4.1 (2003). It exists for two original purposes — point-in-time recovery and replication to a replica — and one accidental purpose that the rest of the industry built a $200 crore market on top of: change data capture. The mechanism is the same in all three cases. Every transaction that commits writes a sequence of events to a file in datadir, named mysql-bin.000001, mysql-bin.000002, and so on. Each event is a typed binary record with a header (timestamp, type, server id, length) and a body whose layout depends on the type.

The events you care about for CDC are five:

Anatomy of a binlog transaction — the events on the wireA vertical event stream showing the sequence of binlog events for one UPDATE transaction: GTID, QUERY (BEGIN), TABLE_MAP, UPDATE_ROWS, XID (commit). Each event has a header showing timestamp, type, and length. One UPDATE transaction on the wire GTID_EVENT server_uuid:42 → assigns transaction GTID 65 bytes QUERY_EVENT "BEGIN" — opens transaction at LSN-equivalent 52 bytes TABLE_MAP_EVENT tbl_id=119, schema="razorpay", table="orders", cols=[BIGINT, VARCHAR, INT, ENUM] 88 bytes UPDATE_ROWS_EVENT tbl_id=119, before={id=98421, status="created", amt=4999} after ={id=98421, status="paid", amt=4999} 142 bytes XID_EVENT xid=2841 — commit; everything before this is now durable 31 bytes Total wire bytes: 378. The "row" you actually changed is 142 of them. A consumer sees these in order, parses each in turn, emits one logical UPDATE event after XID arrives.
Five events for one row change. The TABLE_MAP must be parsed before the UPDATE_ROWS — the row event references the table by id, not by name, and the column type metadata only lives in TABLE_MAP.

The crucial structural point: row events do not carry column names. They carry a table id and a bitmap of columns present in this image, plus the values, encoded according to the types declared in the most recent TABLE_MAP for that id. Why this matters for CDC correctness: if the consumer drops a TABLE_MAP for any reason — restart, network blip, a bug — the next row event arrives uninterpretable, because the type information has been lost. Consumers must replay binlog from the start of the transaction's containing event group, never from "the next row event". Debezium, Maxwell, and Canal all enforce this.

STATEMENT, ROW, MIXED — the format choice that decides everything

binlog_format is a server-wide setting. Three values:

In production, every shop running CDC at scale runs binlog_format = ROW. Razorpay, Flipkart, Dream11, BookMyShow — all of them. The cost is real: ROW format binlogs are 5–20× larger than STATEMENT for the same workload. A high-write database that fit in 100 MB of binlog per hour under STATEMENT will produce 1–2 GB of binlog per hour under ROW. Why this trade-off is forced: the consumer needs deterministic, side-effect-free events. STATEMENT is fundamentally non-deterministic on the consumer because the consumer is not re-executing the original query plan with the original data. ROW pays size for correctness, and the size cost is paid by the source's disk, not the consumer's.

Two related settings the production checklist insists on:

The replication handshake — what's actually on the wire

A CDC consumer is, from the server's perspective, a replica. The handshake is the same as a real MySQL replica's:

  1. TCP connect to port 3306. Standard.
  2. Authentication handshake — same as any client. The consumer's user account needs REPLICATION SLAVE and REPLICATION CLIENT privileges. (MySQL 8.0.22+ also accepts REPLICATION_SLAVE_ADMIN for finer-grained control.)
  3. Server identification — the consumer sends COM_REGISTER_SLAVE declaring its server-id, which must be unique in the cluster. Two consumers sharing a server-id will collide and one will be disconnected.
  4. Dump request — the consumer sends COM_BINLOG_DUMP_GTID (modern) or COM_BINLOG_DUMP (legacy, by filename + offset). The packet carries the position the consumer wants to start from.
  5. Streaming — the server begins writing events to the socket. There is no acknowledgement protocol from the consumer back to the server for events; the TCP connection alone keeps the stream alive. If the consumer disconnects, the server stops sending; on reconnection, the consumer resumes from the next event after its last-processed GTID.

The asymmetry with Postgres is worth naming: Postgres requires the consumer to send send_feedback(flush_lsn=...) to advance the slot. MySQL has no such mechanism. The consumer's progress is its own state; if the consumer crashes mid-event, on restart it must look up its last durably-persisted GTID and ask the server to resume there. Why this matters for at-least-once delivery: between the consumer reading a row event and durably writing it to its sink, a crash means the consumer must replay from before the crash. Replays are inevitable. Every CDC consumer needs a sink that is idempotent on (server_uuid, gtid) or on (binlog_file, position) — the next chapter on dedup hashing in CDC pipelines covers the patterns.

Replica handshake — the four packets every CDC tool sendsA sequence diagram with two columns (consumer and MySQL server) and arrows showing the handshake packets: TCP connect, auth, COM_REGISTER_SLAVE, COM_BINLOG_DUMP_GTID, then a continuous event stream from server to consumer. CDC consumer pretends to be a MySQL replica Consumer (Debezium) MySQL server TCP connect to :3306 Auth (user with REPLICATION SLAVE) COM_REGISTER_SLAVE (server_id=42) COM_BINLOG_DUMP_GTID (gtid_set=...) Continuous binlog event stream → no per-event ack — TCP keeps the stream alive; consumer tracks GTID itself
The four packets above are exactly what a real MySQL replica sends. A CDC consumer differs only in what it does with the bytes after they arrive.

Build it: a binlog consumer in Python

The library python-mysql-replication (pip install mysql-replication) speaks the wire protocol. The example below opens a binlog stream and prints events as they arrive — exactly what a one-host CDC pipeline at a Bengaluru SaaS startup would have run before they paid for Debezium.

import json
from datetime import datetime
from pymysqlreplication import BinLogStreamReader
from pymysqlreplication.row_event import (
    WriteRowsEvent, UpdateRowsEvent, DeleteRowsEvent,
)
from pymysqlreplication.event import QueryEvent, GtidEvent, XidEvent

# Razorpay-shaped local dev: replication user has GRANTed
#   REPLICATION SLAVE, REPLICATION CLIENT on *.* and SELECT on razorpay.*
SOURCE = {"host": "127.0.0.1", "port": 3306,
          "user": "cdc_reader", "passwd": "dev-only"}

# server_id must be unique across the entire MySQL cluster, including
# real replicas. Picking 9001 to stay out of the way.
stream = BinLogStreamReader(
    connection_settings=SOURCE,
    server_id=9001,
    only_events=[WriteRowsEvent, UpdateRowsEvent, DeleteRowsEvent,
                 QueryEvent, GtidEvent, XidEvent],
    only_schemas=["razorpay"],
    only_tables=["orders", "payments"],
    blocking=True,           # wait for new events; do not exit at EOF
    resume_stream=True,      # honour saved GTID position on restart
    auto_position=None,      # set to a GTID set string to seek
    fail_on_table_metadata_unavailable=True,  # safer than silent skip
)

last_gtid = None
events_seen = 0
try:
    for ev in stream:
        if isinstance(ev, GtidEvent):
            last_gtid = ev.gtid
            continue
        if isinstance(ev, XidEvent):
            # Commit boundary — checkpoint here in production.
            print(f"  COMMIT  xid={ev.xid:>6}  last_gtid={last_gtid}")
            continue
        if isinstance(ev, QueryEvent):
            sql = ev.query.strip()
            if sql.upper() not in ("BEGIN", "COMMIT"):
                print(f"  DDL     {sql[:80]!r}")
            continue
        # Row events — emit one logical event per row in the image.
        for row in ev.rows:
            if isinstance(ev, WriteRowsEvent):
                action, image = "INSERT", row["values"]
            elif isinstance(ev, DeleteRowsEvent):
                action, image = "DELETE", row["values"]
            else:
                action = "UPDATE"
                image = {"before": row["before_values"],
                         "after":  row["after_values"]}
            print(f"  {action:<6}  {ev.schema}.{ev.table}  "
                  f"@{ev.packet.log_pos}  {json.dumps(image, default=str)[:120]}")
            events_seen += 1
finally:
    stream.close()
    print(f"stream closed; processed {events_seen} row events; last_gtid={last_gtid}")

A representative run, after a separate session does INSERT, UPDATE, DELETE against razorpay.orders:

  INSERT  razorpay.orders  @98421  {"id": 98421, "merchant_id": "razorpay-test", "amount": 4999, "status": "created"}
  COMMIT  xid=  2841  last_gtid=3E11FA47-71CA-11E1-9E33-C80AA9429562:18
  UPDATE  razorpay.orders  @98598  {"before": {"id": 98421, "status": "created"}, "after": {"id": 98421, "status": "paid"}}
  COMMIT  xid=  2842  last_gtid=3E11FA47-71CA-11E1-9E33-C80AA9429562:19
  DELETE  razorpay.orders  @98775  {"id": 98421, "merchant_id": "razorpay-test", "amount": 4999, "status": "paid"}
  COMMIT  xid=  2843  last_gtid=3E11FA47-71CA-11E1-9E33-C80AA9429562:20
^C
stream closed; processed 3 row events; last_gtid=3E11FA47-71CA-11E1-9E33-C80AA9429562:20

A walkthrough of the lines that matter:

The code is ~50 lines. Debezium MySQL is ~400,000 lines. The difference is everything around the parsing — schema history (so the consumer can reconstruct the table layout at any point in the past, even after later DDL), reconnection / retry, Kafka transactional sinks, snapshot bootstrap. The wire protocol itself is small.

Operating MySQL CDC at scale — what breaks

The textbook issues are well-documented. The three that actually page on-call at Indian shops are:

1. Schema change (DDL) without coordination. A backend engineer at Cred runs ALTER TABLE rewards ADD COLUMN voucher_id BIGINT at 3 a.m. The CDC consumer parses the QUERY_EVENT carrying the SQL string and either understands the change (Debezium with schema history enabled) or silently produces wrong-shaped events for every subsequent row write. Mitigation: enable schema history, and have a contract that DDL goes through a process that notifies the CDC team. Several Bengaluru shops keep a migrations Slack channel that is mirrored to a dashboard the data platform team watches.

2. Long-running transactions blocking checkpoint. A 30-minute transaction holds the binlog flush forward. Until it commits, no XID arrives, and a correct consumer cannot checkpoint past it — meaning if the consumer crashes during that 30 minutes, on restart it replays the entire transaction. Why this asymmetry with Postgres matters: Postgres logical decoding has the same problem (commit-order emit), but Postgres 14+ added streaming = on to emit in-progress transactions. MySQL has no equivalent. Long transactions on a CDC source are a hard latency floor. The fix is application-side: keep transactions short on tables that are in the publication.

3. purge_binary_logs running ahead of the consumer. MySQL deletes binlog files based on expire_logs_days or binlog_expire_logs_seconds (8.0+). If the consumer falls behind by more than the retention window, the binlog files it needs are gone, and the consumer must rebootstrap from a fresh snapshot of every table — a 10 crore-row snapshot on a payments database can take 6 hours. The standard production setting is binlog_expire_logs_seconds = 604800 (7 days), and an alert that fires when consumer lag exceeds 24 hours, well before the 7-day cliff.

A monitoring query for replica lag against the source's current position:

-- Run on the source. Compare against the consumer's last-processed GTID.
SELECT @@global.gtid_executed AS source_gtid_set,
       @@global.binlog_expire_logs_seconds / 3600.0 AS retention_hours;

-- Approximate disk pressure: total binlog size vs free disk.
SELECT SUM(file_size) / 1024 / 1024 AS binlog_mb
FROM information_schema.files
WHERE file_type = 'BINLOG';

A consumer one binlog file behind is fine. A consumer 100 MB of binlog behind needs investigation. A consumer 1 GB behind on a 10 GB-per-day database is an active incident.

Common confusions

Going deeper

How the binlog file format actually frames events

Every event on disk is a [19-byte header][variable-length body][optional 4-byte checksum]. The header carries: 4-byte timestamp (seconds since epoch), 1-byte event type, 4-byte server-id, 4-byte total event length, 4-byte next-event-position, 2-byte flags. The body varies by type. The checksum (CRC32, since 5.6) is enabled by binlog_checksum = CRC32 and detects corruption from disk-level errors — rare but high-impact when it happens. Disabling it is one of those "do not do this" foot-guns; the saved bytes are not worth the silent corruption risk.

TABLE_MAP and the column-type encoding

TABLE_MAP carries column types as a byte string in MySQL's internal type-id format (MYSQL_TYPE_LONG = 3, MYSQL_TYPE_VARCHAR = 15, etc.). For variable-length types, additional metadata (max length for VARCHAR, precision/scale for DECIMAL, character set) is packed after the type bytes. The row events then encode values using these types — e.g. a VARCHAR is length-prefixed, an INT is little-endian fixed-width, a DATETIME is a packed bit format. A correct parser must mirror MySQL's serialisation logic exactly, which is why writing a binlog parser from scratch is a 6-month project; using python-mysql-replication or Debezium is what every production team does.

GTID vs file+position — the failover argument in detail

Razorpay published an architecture note in 2023 (referenced below) describing a primary-replica failover where the file+position-based CDC consumer pointed at a binlog file that didn't exist on the new primary. The consumer treated this as an invalid position and rebootstrapped from a snapshot — which took 4 hours, during which 4 hours of payment events accumulated unprocessed in Kafka. After the incident they migrated to GTID auto-position. The lesson is generic: any production MySQL ≥ 5.6.5 with replicas should run gtid_mode = ON, and any CDC consumer against it should use auto_position. The cost is one config change; the avoided cost is one outage.

Multi-source replication and CDC at SaaS scale

A common pattern at Bengaluru SaaS shops with one MySQL per tenant: the analytics warehouse needs CDC from 200 tenants. Running 200 separate consumers is operationally expensive; multi-source replication (MySQL 5.7.6+) lets one MySQL replica subscribe to many sources. The CDC consumer then reads from the aggregating replica's binlog instead of from each tenant. The trade-off is that the aggregator becomes a single point of failure and a per-tenant lag amplifier — a slow tenant slows everyone. Most teams settle on per-shard CDC with an automation layer rather than per-tenant; the right granularity is the same one your application sharding chose.

What changes between MySQL 5.7, 8.0, and 8.4

5.7 introduced binlog_row_image and made GTID production-ready. 8.0 added invisible columns (must be handled in TABLE_MAP parsing), JSON column improvements, and binlog_transaction_compression (8.0.20+) which can reduce binlog size 70% on text-heavy workloads. 8.4 (LTS, 2024) made GTID the default and removed several legacy replication options. For new CDC pipelines in 2026, target 8.0.34 or 8.4.0 minimum; avoid 5.7 because its end-of-life passed in October 2023.

Where this leads next

The next chapter, /wiki/snapshot-cdc-the-bootstrapping-problem, pairs binlog streaming with a consistent snapshot of every table at the binlog start position, so the consumer can join an existing source rather than starting from "everything that committed today". After that, /wiki/debezium-as-a-cdc-platform is the production wrapper around both Postgres and MySQL CDC primitives.

Two crosslinks worth noting now:

By the end of Build 11, MySQL and Postgres are both first-class CDC sources. Whichever your application database is, the pipeline pattern is the same: identify as a replica, stream events, deduplicate downstream, persist position before sink commit.

References