Note: Company names, engineers, incidents, numbers, and scaling scenarios in this article are hypothetical — even when they resemble real ones. See the full disclaimer.

In short

BSON is the length-prefixed, type-tagged binary encoding that lets each document in a MongoDB collection carry its own shape — no schema, no NULL columns, no ALTER TABLE. It is rarely smaller than JSON; the wins are parse speed, richer types like ObjectId and Decimal128, and the ability to skip subtrees without reading them. The price you pay is field names stored inside every document and a query planner that knows less about the data than a relational one does.

The rectangle problem

A row store like Postgres or MySQL is built on one assumption: a table has a fixed schema, and every row is a fixed-width (or fixed-shape) tuple of those columns. That assumption underwrites almost every optimisation in the storage engine — page layout, B-tree key encoding, statistics, query plans. It is also wrong about most real data.

Consider an Indian e-commerce catalogue — call it Bharat Bazaar. The product table needs to hold:

  • A pair of running shoes: name, brand, price, size, colour, sole_type, gender.
  • A mixer-grinder: name, brand, price, voltage, wattage, warranty_months, jar_count.
  • A 100-page notebook: name, brand, price, pages, ruling (single/double/blank).
  • A festival hamper for Diwali: name, price, contents (a list of nested items), gift_wrap, delivery_window.

A relational designer has three bad options:

  1. One wide table with every possible attribute as a nullable column. Most cells are NULL. Adding a new product category means ALTER TABLE (which on a billion-row table can take hours and a downtime window).
  2. Sub-tables per category with a join (products + product_shoes, product_electronics, product_stationery). Adding a category means a new table and updating every SELECT product query path. Cross-category queries become UNION ALLs of N tables.
  3. EAV (Entity–Attribute–Value) — one giant (product_id, attribute_name, attribute_value) table. Loses all type safety, kills query-planner statistics, makes every product fetch into a self-join.

Each option pays for the rectangle in a different currency: storage, schema-change pain, or query complexity. Why none of them feels right: the data simply is not rectangular. Forcing it into rows is an impedance mismatch — the application thinks in objects (a Product is a tree of fields, some optional), the database thinks in tuples (a row is a flat record of typed columns), and the gap is bridged by ORM code that nobody enjoys writing.

A document database starts from the other end. Each record is an arbitrary JSON-shaped tree, and the database stores it as-is. There is no schema; there are no NULL placeholders for fields a document does not have; there is no migration when a new field appears. The trade-off is that the database now knows almost nothing about the shape of your data, which has real costs we will get to in chapter 139. Today's chapter is about the encoding that makes this practical.

Why not just store JSON?

If you are going to store schema-free trees, why not just INSERT INTO products(doc) VALUES('{"name": ...}') and call it a day? Postgres has had a JSONB column type for over a decade. The answer comes down to three things text JSON cannot do well: parse speed, type richness, and partial reads.

Same document, two encodings: text JSON vs binary BSONJSON (text, UTF-8){"name":"Riya","age":25}35 bytes (with quotes, braces, commas)Parse cost: tokenise every bytefind quotes, find colons, find commasguess types: is "25" int or float?cannot skip a value without reading itBSON (binary)[1E 00 00 00] total length=30[02] string type | "name\0" | [05 00 00 00] "Riya\0"[10] int32 type | "age\0" | [19 00 00 00] = 25[00] document terminator~30 bytes (similar size, different shape)Length-prefixed: skip whole subtree by adding lengthParse cost: read 4-byte length, then walktype tag tells you the shape directlyno tokenisation, no type-guessing"give me doc.age" reads only 9 bytesBottom line: BSON is rarely smaller than JSON — but it is2–5× faster to parse and supports types JSON cannot expressFor one-document-at-a-time random reads on a server fielding 100k QPS, that parse-speed gap is the entire ball game

Faster parsing. A JSON parser has to tokenise — find quotes, find colons, find commas, decide if 25 is integer or float, decide if "true" is string or boolean. That is byte-by-byte scanning with branch-heavy state machines. BSON skips all of that: a 4-byte total-length prefix tells you the document size, and then each field is (1-byte type code, NUL-terminated field name, value of known shape). The type tag tells the parser how many bytes the value occupies without looking at the value. Why this matters at scale: a MongoDB shard fielding 50k QPS spends real CPU on parsing every document it reads from disk and every document it sends on the wire. A 2–5× parse-time win, multiplied by 50k operations per second per node, multiplied by however many nodes — that is enough CPU to fund the binary-format complexity ten times over.

Richer types. JSON has six types: object, array, string, number, boolean, null. That is not enough. A number is "some IEEE 754 double", which silently truncates 123456789012345678 to a nearby double, losing trailing digits. There is no native Date, no native Binary (you base64-encode and pretend), no distinction between 32-bit and 64-bit integers, and crucially no Decimal for money — you cannot represent ₹19.99 exactly in IEEE 754. BSON adds all of these as first-class types: 0x07 ObjectId (12-byte unique ID), 0x09 UTC datetime (64-bit milliseconds), 0x05 Binary (length-prefixed bytes with subtype), 0x10 int32, 0x12 int64, 0x13 Decimal128 (IEEE 754-2008 decimal floating point, the right type for money). The full list is in the BSON specification.

Skippable subtrees. Because every value's byte length is either fixed by its type tag (int32 is always 4 bytes) or stored in a length prefix at the start of the value (strings, arrays, embedded documents), a parser that wants only doc.customer.address.pincode can walk the top-level fields, skip the ones it does not need by adding their lengths to the read offset, descend into customer, skip again, descend into address, and read the pincode — without ever touching the bytes of fields it skipped. JSON parsing is monolithic: you cannot tell where "orders": [...] ends without reading every character of every order until the matching ]. For partial-document reads, BSON's structural skips can be 10–100× faster.

What BSON does not do well is space efficiency. Every document carries its full set of field names as ASCII strings — "customer_email" is 14 bytes in every single document — and BSON typically encodes a small document in more bytes than the equivalent minified JSON because of the length prefixes and type tags. For a small {"a": 1}, JSON is 8 bytes and BSON is 12. For documents with long field names and many small values, the field-name overhead dominates: a billion-document collection with field names averaging 12 bytes carries roughly 12 GB of pure field-name strings. That is the price of self-description.

The BSON wire format, byte by byte

Let's open up the format. A BSON document is structurally:

BSON document layout: length prefix · ordered fields · terminatorint32 LEtotal lengthordered list of (type, key, value) tuplesfield 1 · field 2 · field 3 · ... · field N0x00terminatorEach field, expanded:type1 bytefield name (CString)UTF-8 bytes + 0x00valuesize depends on typeCommon type codes:0x01 double (8B) · 0x02 string (4B len + UTF-8 + 0x00) · 0x03 embedded document0x04 array (embedded doc with "0","1","2"... keys) · 0x05 binary (4B len + subtype + bytes)0x07 ObjectId (12 B) · 0x08 boolean (1 B) · 0x09 UTC datetime (int64 ms)0x0A null (0 B) · 0x10 int32 (4 B) · 0x12 int64 (8 B) · 0x13 Decimal128 (16 B)Length prefixes (4-byte ints) let a reader skip any subtree by adding its length to the offset.

The shape is recursive: a value of type 0x03 embedded document is itself a 4-byte length plus a list of fields plus a 0x00. Arrays (0x04) are encoded as embedded documents whose field names are the ASCII decimal indices "0", "1", "2", "3", and so on — a slightly wasteful choice (those keys are always derivable from position) that exists to keep the document/array decoder paths identical.

The endianness is little-endian for all multi-byte integers and floats. Why little-endian: x86 and ARM64 (the platforms MongoDB actually runs on) are both little-endian, so the format avoids per-value byte swaps on 99.9% of deployments. The cost is that big-endian platforms (older Compustar POWER, some embedded targets) pay a swap on every read — a tiny price for a huge majority gain.

Field order is preserved exactly as written. Unlike a relational tuple where column order is fixed by the schema, BSON documents are ordered records{"a": 1, "b": 2} and {"b": 2, "a": 1} produce different byte sequences. MongoDB compares documents byte-equal when sorting and grouping, so the order matters for identity tests. Most drivers preserve insertion order on encode.

Encoding and decoding a real document in Python

Let's stop talking and write some bytes. The Python bson library (bundled with PyMongo) gives a direct encode/decode API. Install with pip install pymongo then:

import bson

doc = {"name": "Riya", "age": 25}
encoded = bson.encode(doc)

print(f"length: {len(encoded)} bytes")
print(f"hex   : {encoded.hex()}")
print(f"first 4 bytes (LE int32 length): {int.from_bytes(encoded[:4], 'little')}")

The output:

length: 30 bytes
hex   : 1e000000026e616d6500050000005269796100106167650019000000 00
first 4 bytes (LE int32 length): 30

Read it byte by byte:

  • 1e 00 00 00 → length = 30 (little-endian int32).
  • 02 → next field type is string.
  • 6e 61 6d 65 00 → field name "name" followed by NUL.
  • 05 00 00 00 → string value length = 5 (4 chars + the trailing NUL).
  • 52 69 79 61 00"Riya" followed by NUL.
  • 10 → next field type is int32.
  • 61 67 65 00 → field name "age" followed by NUL.
  • 19 00 00 00 → int32 value = 25 (0x19 = 25 decimal).
  • 00 → document terminator.

Total: 30 bytes. The equivalent JSON {"name": "Riya", "age": 25} is 27 bytes (no whitespace) — so for this tiny document, BSON is bigger, not smaller. That is expected. The point of BSON is parse speed and type richness, not byte savings on small docs.

Decoding is symmetric:

decoded = bson.decode(encoded)
print(decoded)             # {'name': 'Riya', 'age': 25}
print(type(decoded["age"]))  # <class 'int'> — preserved as integer, not float

Compare this to round-tripping through json: a JSON 25 decodes to int only because Python's json module guesses, but 25.0 decodes to float, and 25e0 decodes to float as well, and you have no way to express "I want this stored as exactly int32 and not int64" — JSON simply lacks that type. BSON's 0x10 tag carries that intent end to end.

Variable-structure records: documents of different shapes

Now the punchline. BSON's per-document self-description means a single MongoDB collection can hold documents that share no fields in common without any storage penalty for the missing ones. Compare the row-store and document-store world views:

Variable-structure records: row store wastes NULLs · document store stores only what is thereRow store (relational)ABCDEa1b1c1NULLNULLa2NULLc2d2e2NULLb3NULLNULLNULL7 NULLs out of 15 cells (47%)add field F → ALTER TABLE on whole tableschema must know all fields up frontDocument store (BSON)doc 1: {A: a1, B: b1, C: c1}3 fields stored, no slots for D or Edoc 2: {A: a2, C: c2, D: d2, E: e2}4 fields stored, no slot for Bdoc 3: {B: b3}1 field stored, nothing else carriedzero NULLs anywhereadd field F to one doc → just write itno schema, no migration, no downtimeThe cost: each BSON doc carries its field names ("A", "B", "C") inline.Row store carries the schema once, in the catalog. For wide-but-sparse data,document overhead is smaller than NULL-cell overhead. For narrow-but-dense, the row store wins.Rule of thumb: if >30% of your cells would be NULL, document storage saves space and grief.

In the relational world, doc 1 (fields A, B, C), doc 2 (A, C, D, E) and doc 3 (B alone) cannot live in one table without that table having columns A, B, C, D, E — and 7 of the 15 cells are NULL placeholders. Worse, every time a new product category arrives with a new field F, you run ALTER TABLE ADD COLUMN F which on a billion-row table is hours of online migration with all its operational risk.

In the document world, each row is a self-contained BSON blob with exactly the fields it needs — no NULLs, no migration. The price you pay is the field names inside every document: doc 1's BSON carries the strings "A", "B", "C"; doc 2's carries "A", "C", "D", "E". For wide-but-sparse data this is a clear win; for narrow-but-dense data (a users table with id, email, created_at and 100 million rows) the relational schema is more compact and document overhead loses.

The real insight is not "documents always beat rows" — they do not. It is that the right shape is the shape that matches the data. Rectangular data → rectangle. Tree-shaped, optionally-attributed, schema-evolving data → tree. BSON is what makes the tree practical on disk and on the wire.

Bharat Bazaar product catalogue in BSON

Three products from our e-commerce catalogue, encoded as BSON documents in one MongoDB collection.

import bson
from datetime import datetime
from bson import ObjectId, Decimal128

# Doc 1: a pair of running shoes
shoes = {
    "_id": ObjectId(),
    "category": "footwear",
    "name": "Stride Pro Runner",
    "brand": "Campus",
    "price_inr": Decimal128("2499.00"),
    "size": 9,
    "colour": "black-red",
    "sole_type": "EVA cushioned",
    "gender": "M",
    "in_stock": True,
    "added_at": datetime(2026, 3, 14, 10, 30),
}

# Doc 2: a mixer-grinder — totally different fields
mixer = {
    "_id": ObjectId(),
    "category": "kitchen",
    "name": "PowerWhirl 750W Mixer",
    "brand": "BharatVehicles",
    "price_inr": Decimal128("3299.50"),
    "voltage": 230,
    "wattage": 750,
    "warranty_months": 24,
    "jar_count": 3,
    "in_stock": True,
    "promo": {"type": "diwali", "discount_pct": 15, "ends_on": datetime(2026, 11, 5)},
}

# Doc 3: a Diwali hamper — nested array, no warranty/voltage at all
hamper = {
    "_id": ObjectId(),
    "category": "festival",
    "name": "Diwali Sweets & Diyas Hamper",
    "price_inr": Decimal128("899.00"),
    "contents": [
        {"item": "kaju katli", "weight_g": 250},
        {"item": "soan papdi", "weight_g": 250},
        {"item": "diyas", "count": 12},
    ],
    "gift_wrap": True,
}

for label, doc in [("shoes", shoes), ("mixer", mixer), ("hamper", hamper)]:
    encoded = bson.encode(doc)
    print(f"{label:8s}: {len(encoded):4d} bytes, fields={list(doc.keys())}")

Sample output:

shoes   :  217 bytes, fields=['_id', 'category', 'name', 'brand', 'price_inr',
                              'size', 'colour', 'sole_type', 'gender',
                              'in_stock', 'added_at']
mixer   :  267 bytes, fields=['_id', 'category', 'name', 'brand', 'price_inr',
                              'voltage', 'wattage', 'warranty_months',
                              'jar_count', 'in_stock', 'promo']
hamper  :  244 bytes, fields=['_id', 'category', 'name', 'price_inr',
                              'contents', 'gift_wrap']

Three documents, three different shapes, one collection. Insert all three into MongoDB:

from pymongo import MongoClient
client = MongoClient("mongodb://localhost:27017")
db = client.bharat_bazaar
db.products.insert_many([shoes, mixer, hamper])

Now query — and the interesting part is that each query selects only the docs that have the fields it asks about, with no errors for the others:

# All in-stock items above ₹1000 — works across all three docs
list(db.products.find({"price_inr": {"$gt": Decimal128("1000.00")}, "in_stock": True}))
# returns shoes + mixer (hamper has no in_stock field, so {in_stock: True} skips it)

# Things on Diwali promo — only mixer has a "promo" subdoc
list(db.products.find({"promo.type": "diwali"}))
# returns just the mixer

# Things shippable to size 9 — only shoes have "size"
list(db.products.find({"size": 9}))
# returns just the shoes

Why this is genuinely different from a relational design: in Postgres you would either have one giant products table with voltage, sole_type, gift_wrap, etc. as nullable columns — most cells NULL — or a master products table joined to per-category detail tables. The first option pollutes the schema with optional fields; the second multiplies the query surface. The document model lets the schema emerge from the data, one product category at a time, without ever running an ALTER TABLE.

Add a new product category tomorrow — say, smartphones with imei, ram_gb, storage_gb, os — and the only thing that changes is the documents you insert. The existing shoes, mixers, and hampers are untouched. The products collection now has four shapes coexisting; tomorrow it can have ten.

What the database loses by not knowing the shape

The flexibility is real, and it has a real bill.

The query optimiser is flying half-blind. A relational planner knows that customers.email is a VARCHAR(255), has 12 million distinct values, has a B-tree index, and has an average length of 24 characters. It uses all of that to decide whether to seek the index or scan the table. A document store knows that customers is a collection, that some documents have an email field, that some of those are indexed — but the field's type, cardinality, and presence ratio are per-document properties the planner has to estimate from samples. Plans are correspondingly less precise; a WHERE email = "x" against a sparse field can pick the wrong path more easily than its relational equivalent.

Schema consistency lives in your application code, not the database. If two services write to the same collection and one calls the field customer_email while the other calls it customerEmail, the database happily accepts both. You discover the bug when a query returns half the rows it should. Tools like JSON Schema and MongoDB's $jsonSchema validator close this gap — but only if you choose to use them, and only at write time, not retroactively.

Storage carries the field names forever. A billion-document collection with field names averaging 12 bytes spends roughly 12 GB on field-name strings — strings that a relational schema would store exactly once in the catalog. WiredTiger (MongoDB's storage engine) compresses pages with Snappy or zstd which mostly absorbs this, but the in-memory working-set cost is real.

Joins are not the natural primitive. Document stores nudge you toward embedding (put the order's line items inside the order document) instead of joining (separate orders and line_items tables). Embedding is great when the embedded data is always read with the parent and rarely shared — like line items in an order. It is a trap when the same data appears in many parents (a customer embedded in every order means updating an address requires touching every order). Modelling well in a document store is its own skill, covered in chapter 139.

Where BSON sits in the wider format zoo

BSON is one point in a design space full of binary encodings. Worth a brief tour:

  • MessagePack is BSON's closest cousin — schema-free binary JSON-equivalent — but more compact, lacks length prefixes (so cannot skip subtrees as cheaply), lacks BSON's specific types like ObjectId and Decimal128. Used where size matters more than skip-speed (Redis serialisation, pub/sub).
  • CBOR (RFC 8949) is the IETF's standardised BSON-equivalent, used in WebAuthn and IoT. Same idea, different bytes, somewhat more compact than BSON.
  • Protobuf and FlatBuffers are schema-required binary formats. You declare a .proto schema, the compiler generates type-specific encoders, and the wire format omits field names entirely — every field is a small integer tag. The result is dramatically smaller (no field-name overhead) and faster to parse, but you lose schema-free-ness: every reader and writer must agree on the schema in advance. FlatBuffers in particular goes further and lets you read fields without any decoding pass at all (zero-copy access via offset tables). Used in Sociogram's mobile apps where every CPU cycle counts.
  • Avro is the Hadoop world's answer — schema-with-the-data, so the writer ships a tiny schema header alongside each block and the reader uses it to decode without a precompiled stub. Trades the schema-emerging flexibility for compactness and explicit evolution rules.

Document databases pick BSON (or BSON-likes) because the schema-free property is non-negotiable for their use case. If you knew the schema in advance, you would already be running Postgres. The whole pitch of MongoDB (the canonical document database), Couchbase, AWS DocumentDB (which speaks the MongoDB wire protocol on top of a different storage engine), and Azure Cosmos DB (which exposes MongoDB, Cassandra, Gremlin, and SQL APIs over one engine) is "the data shape is what your application says it is, not what an ALTER TABLE says it is" — and BSON is the encoding that makes that pitch deliverable.

Common confusions

  • "BSON is just JSON, only smaller." It is not smaller — for most small documents, BSON is a few bytes bigger than minified JSON because of length prefixes and type tags. The win is parse speed (no tokenisation, no type-guessing) and richer types (ObjectId, Decimal128, Date, Binary, int32 vs int64). Compression of the on-disk file (Snappy or zstd in WiredTiger) is what shrinks the bytes — the wire format itself is rarely the smallest option.

  • "MongoDB stores documents exactly the way I encoded them." MongoDB drivers normalise the BSON before sending — they may reorder fields, canonicalise types (a Python int of value 1 might be encoded as int32 even though Python has only one int), and enforce a 16 MB document size cap. For a hard guarantee on byte sequence, use raw BSON via bson.RawBSONDocument and pass it through unchanged.

  • "Schema-free means MongoDB will never reject a write." MongoDB happily accepts any shape — but it enforces invariants the storage layer needs: _id must be unique within a collection, total document size must be under 16 MB, and indexed fields must respect their key constraints (a unique index on email rejects a duplicate). Add a $jsonSchema validator and the database will start rejecting bad shapes too — but only at write time and only if you opt in.

  • "Decimal128 is just a more precise float." It is a decimal floating-point type — it represents 0.1 + 0.2 as exactly 0.3, the way humans expect, instead of 0.30000000000000004 the way IEEE 754 binary floats do. It is the right type for currency (so ₹19.99 is exactly 19.99, not a nearby double). Using double for prices, then comparing them, is one of the great recurring billing bugs at every fintech company.

  • "Documents of different shapes in one collection are a code smell." They can be, but they are also the literal point of a document database. The smell is unintentional shape variation — two services writing the same logical entity with different field names. Intentional variation, where each shape is a real product category with its own attributes, is exactly the use case BSON is designed for. Use category fields (type: "footwear") and $jsonSchema per-category validators to keep it disciplined.

  • "Field names cost so much that I should shorten them." The temptation is real — replace customer_email_verified_at with cev and save bytes. WiredTiger's page-level compression catches most of the repetition (every page has thousands of "customer_email_verified_at" strings, which Snappy crunches well). Save the abbreviation effort for fields that genuinely appear billions of times in tight inner documents (line items in an order, points in a time series). For top-level customer fields, readability wins.

Going deeper

If you just wanted to understand why MongoDB picked binary over text and what variable-structure records buy you, you have it: a self-describing, length-prefixed, type-tagged binary format that lets each document carry its own shape. The rest of this section connects BSON to the production realities you will meet at scale.

How BSON travels: from PyMongo to a MongoDB shard

Trace one insert_one call:

  1. Your Python dict enters pymongo's Collection.insert_one.
  2. PyMongo hands the dict to bson.encode, which produces the BSON byte string we walked above. This step pays the only-encoding-cost — type-checking each value, allocating the output buffer, writing length prefixes.
  3. PyMongo wraps the BSON in a MongoDB wire-protocol message (an OP_MSG with a section containing the document) and sends it over a TCP connection to the server.
  4. The server parses the message header, then the BSON document — using the length prefix to know the document boundary without parsing field by field. This is where BSON's format pays off: a 16 KB document parses in microseconds because the parser walks type tags, never tokenising.
  5. The server applies validation ($jsonSchema if configured), assigns a fresh ObjectId to _id if missing, and hands the BSON to the WiredTiger storage engine.
  6. WiredTiger compresses the document's BSON bytes (Snappy by default) and writes the compressed page into its B-tree, with the _id as the key. The original BSON byte sequence is preserved exactly — when you later read the document, you get those bytes back, decompressed and decoded once.

The whole hot path treats BSON as the lingua franca: the driver speaks it, the wire speaks it, the storage engine speaks it. There is no JSON anywhere in the production path — the JSON-like console output you see in mongosh is a display convention, not how the data lives.

Decimal128: the format that finally got money right

For decades, every database that wanted to represent ₹19.99 exactly had to pick between two bad choices: store it as an integer in paise (1999) and convert at the application boundary, or store it as a DECIMAL(10, 2) in SQL and pay variable-precision arithmetic costs. JavaScript and JSON could not represent it at all — JSON.parse("19.99") returns the IEEE 754 double 19.989999999999998, and any subsequent arithmetic carries that error.

BSON's 0x13 Decimal128 adopts IEEE 754-2008's decimal128 format: 128 bits, with a base-10 mantissa and base-10 exponent, capable of representing every decimal value with up to 34 significant digits exactly. Decimal128("19.99") + Decimal128("0.01") is Decimal128("20.00"), period — none of the binary-float drift. This is why MongoDB's financial-services and PaisaBridge-style customers reach for Decimal128 for every column that holds money. It is also why the MongoDB docs explicitly warn against using double for currency.

Postgres has had numeric (an arbitrary-precision decimal) for ages, and JSONB cannot store it natively — you serialise to a string. BSON solving this end-to-end (driver decodes back to Decimal128, application code can do exact arithmetic) is one of the under-appreciated reasons MongoDB earned its place in fintech stacks.

The 16 MB document limit and why it exists

MongoDB enforces a hard limit: no single BSON document may exceed 16 MB on the wire. People are forever surprised by this. The reason is several layers deep:

  • The wire-protocol length is a 32-bit signed integer, so the absolute ceiling is 2 GB — but allowing 2 GB documents would let one bad insert evict every page from the WiredTiger cache and crater the server.
  • A single document's encode/decode is single-threaded inside the driver and the server, so a 100 MB document parsed on the hot path adds tens of milliseconds of CPU latency to that one operation. Sixteen MB is the largest size where the parse stays comfortably under the 1–2 ms budget of a typical query.
  • Mongo's replication oplog stores entire post-image documents per change. A 16 MB document means up to 16 MB written to the oplog per update — multiplied by replication factor and across a busy collection, that is already a measurable I/O load.

When you genuinely need to store something bigger — a high-resolution medical scan, a multi-hour video — MongoDB ships GridFS, a convention that splits the binary into 255 KB BSON chunks and stores them across two collections (fs.files for metadata, fs.chunks for the bytes). GridFS is the polite admission that BSON is for documents, not blobs.

Inside WiredTiger: how BSON meets the page cache

WiredTiger is MongoDB's default storage engine since 3.2. It stores BSON documents in pages (default 32 KB), keyed by _id, in a B-tree. When the driver hands the server a BSON document for write, the engine:

  1. Locates the leaf page for the new document's _id (or splits/grows the tree if needed).
  2. Inserts the document's BSON bytes into the page's row store.
  3. Marks the page dirty in the cache.
  4. Eventually compresses the page with Snappy or zstd and writes it to disk during checkpointing.

The Python buffer → kernel page cache → disk-controller cache → platter layering from chapter 3 still applies — fsync on the journal is what crosses from kernel to controller, and WiredTiger's checkpoints are what amortise that cost. Why this matters for BSON sizing: the working-set cost of a collection is the uncompressed BSON size of every hot document, because the cache holds decompressed pages. A 100 GB compressed collection might have a 200 GB working set, which is what the cache has to hold. Field-name overhead pays here even though Snappy crushes it on disk.

When BSON loses: the columnar counterargument

For OLAP workloads — "what is the average order value across all 200 million orders this year?" — BSON is the wrong shape. Every BSON document carries every field including ones the query does not touch, so a scan reads customer_id, shipping_address, line_items, payment_info to compute one number from total_amount. The disk bandwidth wasted is enormous.

This is exactly the case where columnar formats — Apache Parquet, Apache ORC, ClickHouse's MergeTree — win by orders of magnitude. They store all values of one column contiguously, so a scan over total_amount reads only that column's bytes. MongoDB acknowledged this gap by adding time-series collections (Mongo 5.0) and columnstore indexes (Mongo 6.0) that internally lay out hot-cold fields columnar-style — a hybrid borrowing from the OLAP playbook.

The takeaway: BSON is optimal for whole-document random reads of variable-structure records. It is suboptimal for column-aggregation scans. Most production systems running serious analytics on MongoDB pipe the data nightly to a columnar warehouse (Snowflake, BigQuery, ClickHouse) and run OLAP there.

The historical accident: why "B" stands for "Binary," not "Better"

BSON was invented by Dwight Merriman and Eliot Horowitz at 10gen in 2009 as the wire format for the database that became MongoDB. The name is intentionally pedestrian — it is "Binary JSON," nothing more. The format borrowed structurally from MessagePack (which was contemporaneous) but added length prefixes, ObjectId, and the type-tag system. There was no academic paper, no committee — just two engineers picking a wire format for the database they wanted to ship.

You can read the original blog post announcing BSON (10gen, 2009) for the historical record. It is short, practical, and makes none of the formal claims that, say, Protocol Buffers does — which is itself a useful lesson in how production systems get built.

What's next in Build 17

We have the on-disk format; we have the per-document freedom. The next chapters build out everything you need to make this practical at scale:

  • Chapter 138 — Nested indexes and dot-path queries. How does MongoDB index customer.address.pincode when not every document even has a customer field? Multikey indexes, sparse indexes, wildcard indexes.
  • Chapter 139 — Schema flexibility and its hidden cost. When the schema lives in your application code, what does Conway's Law do to it? $jsonSchema validators, embed-vs-reference modelling, and the migration patterns nobody writes about.
  • Chapter 140 — The aggregation pipeline. MongoDB's answer to GROUP BY: composable stages ($match, $group, $lookup, $facet) that build query trees out of small declarative steps.
  • Chapter 141 — Change streams. MongoDB ships built-in CDC — a log of every insert, update, and delete you can subscribe to, without a separate Debezium-style pipeline.
  • Chapter 142 — Sharded MongoDB. Chunks, balancers, config servers — how a document database scales horizontally without losing the schema-free promise.

The theme of Build 17 is the same as every Build in this series: understand the trade-offs deeply enough to know when this is the right tool. Document databases are not a general-purpose replacement for relational ones, and they are also not a niche oddity. They are the right answer for a specific shape of data — variable-structure, tree-shaped, schema-evolving — and BSON is the format that makes that answer practical.

References

  1. BSON specification (bsonspec.org) — the canonical wire-format reference; lists every type code and the byte layout for each.
  2. MongoDB documentation: BSON types — type-by-type reference with examples and conversion rules.
  3. MongoDB blog: JSON and BSON — a deep technical comparison — the case for binary over text from the people who made the choice.
  4. JSON Schema specification — the standard MongoDB's $jsonSchema validator builds on for write-time shape enforcement.
  5. FlatBuffers internals (Querion) — the zero-copy schema-required alternative; useful contrast for understanding what BSON gives up by being schema-free.
  6. Apache Avro specification — schema-with-the-data binary format from the Hadoop ecosystem; another point in the design space.