In short

A relational table is a rectangle. Every row has the same columns, the same types, the same width. That assumption breaks the moment your data is naturally a tree — an Order with a variable number of line items, a Customer with optional loyalty-tier metadata, a Product whose attributes depend on its category. Document databases (MongoDB, Couchbase, AWS DocumentDB, Azure Cosmos DB) drop the rectangle and let each record carry its own structure. The on-disk and on-the-wire format that makes this practical at scale is BSON — Binary JSON.

BSON looks like JSON when you print it ({"name": "Riya", "age": 25}) but is encoded as a length-prefixed binary stream of (type_code, key, value) tuples. The length prefix lets a parser skip subtrees it does not care about without reading their bytes. The type tags add what JSON lacks — native ObjectId, Date, Decimal128, Binary, Int32 vs Int64, so you do not lose precision parsing 123456789012345 into a JavaScript double. Field names live inside every document, which is a real cost (10 GB of {"price": ...} carries 600 MB of "price" strings), but in exchange you get per-document schema freedom: doc 1 can have fields A, B, C; doc 2 can have A, C, D, E; doc 3 can have just B — and they all coexist in the same collection with no NULL columns, no migration, no ALTER TABLE.

This chapter opens Build 17 by walking the BSON wire format byte for byte, encoding and decoding a real document with the Python bson library, and modelling an Indian e-commerce catalogue where shoes (size, colour, sole-type), electronics (voltage, brand, warranty), and stationery (pack-size) sit in one collection without a schema. By the end you will know exactly why MongoDB picked binary over text, what each type tag costs in bytes, and where the trade-off — flexibility for storage, simplicity for query-planner intelligence — bites in production.

The rectangle problem

A row store like Postgres or MySQL is built on one assumption: a table has a fixed schema, and every row is a fixed-width (or fixed-shape) tuple of those columns. That assumption underwrites almost every optimisation in the storage engine — page layout, B-tree key encoding, statistics, query plans. It is also wrong about most real data.

Consider an Indian e-commerce catalogue — call it Bharat Bazaar. The product table needs to hold:

A relational designer has three bad options:

  1. One wide table with every possible attribute as a nullable column. Most cells are NULL. Adding a new product category means ALTER TABLE (which on a billion-row table can take hours and a downtime window).
  2. Sub-tables per category with a join (products + product_shoes, product_electronics, product_stationery). Adding a category means a new table and updating every SELECT product query path. Cross-category queries become UNION ALLs of N tables.
  3. EAV (Entity–Attribute–Value) — one giant (product_id, attribute_name, attribute_value) table. Loses all type safety, kills query-planner statistics, makes every product fetch into a self-join.

Each option pays for the rectangle in a different currency: storage, schema-change pain, or query complexity. Why none of them feels right: the data simply is not rectangular. Forcing it into rows is an impedance mismatch — the application thinks in objects (a Product is a tree of fields, some optional), the database thinks in tuples (a row is a flat record of typed columns), and the gap is bridged by ORM code that nobody enjoys writing.

A document database starts from the other end. Each record is an arbitrary JSON-shaped tree, and the database stores it as-is. There is no schema; there are no NULL placeholders for fields a document does not have; there is no migration when a new field appears. The trade-off is that the database now knows almost nothing about the shape of your data, which has real costs we will get to in chapter 139. Today's chapter is about the encoding that makes this practical.

Why not just store JSON?

If you are going to store schema-free trees, why not just INSERT INTO products(doc) VALUES('{"name": ...}') and call it a day? Postgres has had a JSONB column type for over a decade. The answer comes down to three things text JSON cannot do well: parse speed, type richness, and partial reads.

Same document, two encodings: text JSON vs binary BSONJSON (text, UTF-8){"name":"Riya","age":25}35 bytes (with quotes, braces, commas)Parse cost: tokenise every bytefind quotes, find colons, find commasguess types: is "25" int or float?cannot skip a value without reading itBSON (binary)[1E 00 00 00] total length=30[02] string type | "name\0" | [05 00 00 00] "Riya\0"[10] int32 type | "age\0" | [19 00 00 00] = 25[00] document terminator~30 bytes (similar size, different shape)Length-prefixed: skip whole subtree by adding lengthParse cost: read 4-byte length, then walktype tag tells you the shape directlyno tokenisation, no type-guessing"give me doc.age" reads only 9 bytesBottom line: BSON is rarely smaller than JSON — but it is2–5× faster to parse and supports types JSON cannot expressFor one-document-at-a-time random reads on a server fielding 100k QPS, that parse-speed gap is the entire ball game

Faster parsing. A JSON parser has to tokenise — find quotes, find colons, find commas, decide if 25 is integer or float, decide if "true" is string or boolean. That is byte-by-byte scanning with branch-heavy state machines. BSON skips all of that: a 4-byte total-length prefix tells you the document size, and then each field is (1-byte type code, NUL-terminated field name, value of known shape). The type tag tells the parser how many bytes the value occupies without looking at the value. Why this matters at scale: a MongoDB shard fielding 50k QPS spends real CPU on parsing every document it reads from disk and every document it sends on the wire. A 2–5× parse-time win, multiplied by 50k operations per second per node, multiplied by however many nodes — that is enough CPU to fund the binary-format complexity ten times over.

Richer types. JSON has six types: object, array, string, number, boolean, null. That is not enough. A number is "some IEEE 754 double", which silently truncates 123456789012345678 to a nearby double, losing trailing digits. There is no native Date, no native Binary (you base64-encode and pretend), no distinction between 32-bit and 64-bit integers, and crucially no Decimal for money — you cannot represent ₹19.99 exactly in IEEE 754. BSON adds all of these as first-class types: 0x07 ObjectId (12-byte unique ID), 0x09 UTC datetime (64-bit milliseconds), 0x05 Binary (length-prefixed bytes with subtype), 0x10 int32, 0x12 int64, 0x13 Decimal128 (IEEE 754-2008 decimal floating point, the right type for money). The full list is in the BSON specification.

Skippable subtrees. Because every value's byte length is either fixed by its type tag (int32 is always 4 bytes) or stored in a length prefix at the start of the value (strings, arrays, embedded documents), a parser that wants only doc.customer.address.pincode can walk the top-level fields, skip the ones it does not need by adding their lengths to the read offset, descend into customer, skip again, descend into address, and read the pincode — without ever touching the bytes of fields it skipped. JSON parsing is monolithic: you cannot tell where "orders": [...] ends without reading every character of every order until the matching ]. For partial-document reads, BSON's structural skips can be 10–100× faster.

What BSON does not do well is space efficiency. Every document carries its full set of field names as ASCII strings — "customer_email" is 14 bytes in every single document — and BSON typically encodes a small document in more bytes than the equivalent minified JSON because of the length prefixes and type tags. For a small {"a": 1}, JSON is 8 bytes and BSON is 12. For documents with long field names and many small values, the field-name overhead dominates: a billion-document collection with field names averaging 12 bytes carries roughly 12 GB of pure field-name strings. That is the price of self-description.

The BSON wire format, byte by byte

Let's open up the format. A BSON document is structurally:

BSON document layout: length prefix · ordered fields · terminatorint32 LEtotal lengthordered list of (type, key, value) tuplesfield 1 · field 2 · field 3 · ... · field N0x00terminatorEach field, expanded:type1 bytefield name (CString)UTF-8 bytes + 0x00valuesize depends on typeCommon type codes:0x01 double (8B) · 0x02 string (4B len + UTF-8 + 0x00) · 0x03 embedded document0x04 array (embedded doc with "0","1","2"... keys) · 0x05 binary (4B len + subtype + bytes)0x07 ObjectId (12 B) · 0x08 boolean (1 B) · 0x09 UTC datetime (int64 ms)0x0A null (0 B) · 0x10 int32 (4 B) · 0x12 int64 (8 B) · 0x13 Decimal128 (16 B)Length prefixes (4-byte ints) let a reader skip any subtree by adding its length to the offset.

The shape is recursive: a value of type 0x03 embedded document is itself a 4-byte length plus a list of fields plus a 0x00. Arrays (0x04) are encoded as embedded documents whose field names are the ASCII decimal indices "0", "1", "2", "3", and so on — a slightly wasteful choice (those keys are always derivable from position) that exists to keep the document/array decoder paths identical.

The endianness is little-endian for all multi-byte integers and floats. Why little-endian: x86 and ARM64 (the platforms MongoDB actually runs on) are both little-endian, so the format avoids per-value byte swaps on 99.9% of deployments. The cost is that big-endian platforms (older IBM POWER, some embedded targets) pay a swap on every read — a tiny price for a huge majority gain.

Field order is preserved exactly as written. Unlike a relational tuple where column order is fixed by the schema, BSON documents are ordered records{"a": 1, "b": 2} and {"b": 2, "a": 1} produce different byte sequences. MongoDB compares documents byte-equal when sorting and grouping, so the order matters for identity tests. Most drivers preserve insertion order on encode.

Encoding and decoding a real document in Python

Let's stop talking and write some bytes. The Python bson library (bundled with PyMongo) gives a direct encode/decode API. Install with pip install pymongo then:

import bson

doc = {"name": "Riya", "age": 25}
encoded = bson.encode(doc)

print(f"length: {len(encoded)} bytes")
print(f"hex   : {encoded.hex()}")
print(f"first 4 bytes (LE int32 length): {int.from_bytes(encoded[:4], 'little')}")

The output:

length: 30 bytes
hex   : 1e000000026e616d6500050000005269796100106167650019000000 00
first 4 bytes (LE int32 length): 30

Read it byte by byte:

Total: 30 bytes. The equivalent JSON {"name": "Riya", "age": 25} is 27 bytes (no whitespace) — so for this tiny document, BSON is bigger, not smaller. That is expected. The point of BSON is parse speed and type richness, not byte savings on small docs.

Decoding is symmetric:

decoded = bson.decode(encoded)
print(decoded)             # {'name': 'Riya', 'age': 25}
print(type(decoded["age"]))  # <class 'int'> — preserved as integer, not float

Compare this to round-tripping through json: a JSON 25 decodes to int only because Python's json module guesses, but 25.0 decodes to float, and 25e0 decodes to float as well, and you have no way to express "I want this stored as exactly int32 and not int64" — JSON simply lacks that type. BSON's 0x10 tag carries that intent end to end.

Variable-structure records: documents of different shapes

Now the punchline. BSON's per-document self-description means a single MongoDB collection can hold documents that share no fields in common without any storage penalty for the missing ones. Compare the row-store and document-store world views:

Variable-structure records: row store wastes NULLs · document store stores only what is thereRow store (relational)ABCDEa1b1c1NULLNULLa2NULLc2d2e2NULLb3NULLNULLNULL7 NULLs out of 15 cells (47%)add field F → ALTER TABLE on whole tableschema must know all fields up frontDocument store (BSON)doc 1: {A: a1, B: b1, C: c1}3 fields stored, no slots for D or Edoc 2: {A: a2, C: c2, D: d2, E: e2}4 fields stored, no slot for Bdoc 3: {B: b3}1 field stored, nothing else carriedzero NULLs anywhereadd field F to one doc → just write itno schema, no migration, no downtimeThe cost: each BSON doc carries its field names ("A", "B", "C") inline.Row store carries the schema once, in the catalog. For wide-but-sparse data,document overhead is smaller than NULL-cell overhead. For narrow-but-dense, the row store wins.Rule of thumb: if >30% of your cells would be NULL, document storage saves space and grief.

In the relational world, doc 1 (fields A, B, C), doc 2 (A, C, D, E) and doc 3 (B alone) cannot live in one table without that table having columns A, B, C, D, E — and 7 of the 15 cells are NULL placeholders. Worse, every time a new product category arrives with a new field F, you run ALTER TABLE ADD COLUMN F which on a billion-row table is hours of online migration with all its operational risk.

In the document world, each row is a self-contained BSON blob with exactly the fields it needs — no NULLs, no migration. The price you pay is the field names inside every document: doc 1's BSON carries the strings "A", "B", "C"; doc 2's carries "A", "C", "D", "E". For wide-but-sparse data this is a clear win; for narrow-but-dense data (a users table with id, email, created_at and 100 million rows) the relational schema is more compact and document overhead loses.

The real insight is not "documents always beat rows" — they do not. It is that the right shape is the shape that matches the data. Rectangular data → rectangle. Tree-shaped, optionally-attributed, schema-evolving data → tree. BSON is what makes the tree practical on disk and on the wire.

Bharat Bazaar product catalogue in BSON

Three products from our e-commerce catalogue, encoded as BSON documents in one MongoDB collection.

import bson
from datetime import datetime
from bson import ObjectId, Decimal128

# Doc 1: a pair of running shoes
shoes = {
    "_id": ObjectId(),
    "category": "footwear",
    "name": "Stride Pro Runner",
    "brand": "Campus",
    "price_inr": Decimal128("2499.00"),
    "size": 9,
    "colour": "black-red",
    "sole_type": "EVA cushioned",
    "gender": "M",
    "in_stock": True,
    "added_at": datetime(2026, 3, 14, 10, 30),
}

# Doc 2: a mixer-grinder — totally different fields
mixer = {
    "_id": ObjectId(),
    "category": "kitchen",
    "name": "PowerWhirl 750W Mixer",
    "brand": "Bajaj",
    "price_inr": Decimal128("3299.50"),
    "voltage": 230,
    "wattage": 750,
    "warranty_months": 24,
    "jar_count": 3,
    "in_stock": True,
    "promo": {"type": "diwali", "discount_pct": 15, "ends_on": datetime(2026, 11, 5)},
}

# Doc 3: a Diwali hamper — nested array, no warranty/voltage at all
hamper = {
    "_id": ObjectId(),
    "category": "festival",
    "name": "Diwali Sweets & Diyas Hamper",
    "price_inr": Decimal128("899.00"),
    "contents": [
        {"item": "kaju katli", "weight_g": 250},
        {"item": "soan papdi", "weight_g": 250},
        {"item": "diyas", "count": 12},
    ],
    "gift_wrap": True,
}

for label, doc in [("shoes", shoes), ("mixer", mixer), ("hamper", hamper)]:
    encoded = bson.encode(doc)
    print(f"{label:8s}: {len(encoded):4d} bytes, fields={list(doc.keys())}")

Sample output:

shoes   :  217 bytes, fields=['_id', 'category', 'name', 'brand', 'price_inr',
                              'size', 'colour', 'sole_type', 'gender',
                              'in_stock', 'added_at']
mixer   :  267 bytes, fields=['_id', 'category', 'name', 'brand', 'price_inr',
                              'voltage', 'wattage', 'warranty_months',
                              'jar_count', 'in_stock', 'promo']
hamper  :  244 bytes, fields=['_id', 'category', 'name', 'price_inr',
                              'contents', 'gift_wrap']

Three documents, three different shapes, one collection. Insert all three into MongoDB:

from pymongo import MongoClient
client = MongoClient("mongodb://localhost:27017")
db = client.bharat_bazaar
db.products.insert_many([shoes, mixer, hamper])

Now query — and the interesting part is that each query selects only the docs that have the fields it asks about, with no errors for the others:

# All in-stock items above ₹1000 — works across all three docs
list(db.products.find({"price_inr": {"$gt": Decimal128("1000.00")}, "in_stock": True}))
# returns shoes + mixer (hamper has no in_stock field, so {in_stock: True} skips it)

# Things on Diwali promo — only mixer has a "promo" subdoc
list(db.products.find({"promo.type": "diwali"}))
# returns just the mixer

# Things shippable to size 9 — only shoes have "size"
list(db.products.find({"size": 9}))
# returns just the shoes

Why this is genuinely different from a relational design: in Postgres you would either have one giant products table with voltage, sole_type, gift_wrap, etc. as nullable columns — most cells NULL — or a master products table joined to per-category detail tables. The first option pollutes the schema with optional fields; the second multiplies the query surface. The document model lets the schema emerge from the data, one product category at a time, without ever running an ALTER TABLE.

Add a new product category tomorrow — say, smartphones with imei, ram_gb, storage_gb, os — and the only thing that changes is the documents you insert. The existing shoes, mixers, and hampers are untouched. The products collection now has four shapes coexisting; tomorrow it can have ten.

What the database loses by not knowing the shape

The flexibility is real, and it has a real bill.

The query optimiser is flying half-blind. A relational planner knows that customers.email is a VARCHAR(255), has 12 million distinct values, has a B-tree index, and has an average length of 24 characters. It uses all of that to decide whether to seek the index or scan the table. A document store knows that customers is a collection, that some documents have an email field, that some of those are indexed — but the field's type, cardinality, and presence ratio are per-document properties the planner has to estimate from samples. Plans are correspondingly less precise; a WHERE email = "x" against a sparse field can pick the wrong path more easily than its relational equivalent.

Schema consistency lives in your application code, not the database. If two services write to the same collection and one calls the field customer_email while the other calls it customerEmail, the database happily accepts both. You discover the bug when a query returns half the rows it should. Tools like JSON Schema and MongoDB's $jsonSchema validator close this gap — but only if you choose to use them, and only at write time, not retroactively.

Storage carries the field names forever. A billion-document collection with field names averaging 12 bytes spends roughly 12 GB on field-name strings — strings that a relational schema would store exactly once in the catalog. WiredTiger (MongoDB's storage engine) compresses pages with Snappy or zstd which mostly absorbs this, but the in-memory working-set cost is real.

Joins are not the natural primitive. Document stores nudge you toward embedding (put the order's line items inside the order document) instead of joining (separate orders and line_items tables). Embedding is great when the embedded data is always read with the parent and rarely shared — like line items in an order. It is a trap when the same data appears in many parents (a customer embedded in every order means updating an address requires touching every order). Modelling well in a document store is its own skill, covered in chapter 139.

Where BSON sits in the wider format zoo

BSON is one point in a design space full of binary encodings. Worth a brief tour:

Document databases pick BSON (or BSON-likes) because the schema-free property is non-negotiable for their use case. If you knew the schema in advance, you would already be running Postgres. The whole pitch of MongoDB (the canonical document database), Couchbase, AWS DocumentDB (which speaks the MongoDB wire protocol on top of a different storage engine), and Azure Cosmos DB (which exposes MongoDB, Cassandra, Gremlin, and SQL APIs over one engine) is "the data shape is what your application says it is, not what an ALTER TABLE says it is" — and BSON is the encoding that makes that pitch deliverable.

What's next in Build 17

We have the on-disk format; we have the per-document freedom. The next chapters build out everything you need to make this practical at scale:

The theme of Build 17 is the same as every Build in this series: understand the trade-offs deeply enough to know when this is the right tool. Document databases are not a general-purpose replacement for relational ones, and they are also not a niche oddity. They are the right answer for a specific shape of data — variable-structure, tree-shaped, schema-evolving — and BSON is the format that makes that answer practical.

References

  1. BSON specification (bsonspec.org) — the canonical wire-format reference; lists every type code and the byte layout for each.
  2. MongoDB documentation: BSON types — type-by-type reference with examples and conversion rules.
  3. MongoDB blog: JSON and BSON — a deep technical comparison — the case for binary over text from the people who made the choice.
  4. JSON Schema specification — the standard MongoDB's $jsonSchema validator builds on for write-time shape enforcement.
  5. FlatBuffers internals (Google) — the zero-copy schema-required alternative; useful contrast for understanding what BSON gives up by being schema-free.
  6. Apache Avro specification — schema-with-the-data binary format from the Hadoop ecosystem; another point in the design space.