How to read a database paper

Open the Spanner OSDI 2012 paper in one tab and a stopwatch in another. The first pass is fifteen minutes, not three hours. By the end of this chapter you will know exactly which fifteen minutes — and which sentences inside them — actually carry the design.

A database paper is not a textbook chapter. Read it in three passes — skim the abstract and figures (15 min), follow the system architecture and one example transaction (60 min), then chase the failure modes and the evaluation (90 min). The trick is knowing which sections lie, which numbers matter, and which sentence in the introduction is the entire idea.

Why papers feel impossible (and why they are not)

You read a Spanner paper end-to-end in textbook order. By page four you are lost in TrueTime API definitions. By page seven you have forgotten what problem the paper is solving. You give up and watch a YouTube summary, and now you "know what Spanner is" without ever having looked at the design.

This is the wrong loop. Papers are not written for the order they print in. They are written for peer reviewers — twelve people who already know the field — who skim them in fifteen minutes and decide whether they are publishable. The narrative order on paper is reverse-engineered from a checklist of evaluation criteria, not from the order an engineer would learn the system. If you read in print order you are reading in the order designed for a critic, not a builder.

You are a builder. You want one thing from every paper: what is the smallest mental model of this system that lets me predict what it does in a situation the paper does not mention? Three passes get you there. Each pass has a fixed budget, a fixed goal, and a fixed exit condition.

The three-pass method

The three-pass method comes from Srinivasan Keshav's 2007 essay How to Read a Paper, distilled here for database engineering specifically. The passes are nested: pass 2 only happens if pass 1 says the paper is worth more time, and pass 3 only happens if you actually need to implement or critique the system.

The three-pass reading method, with budgets and exit conditionsThree horizontal bands stacked top to bottom. Pass 1 at the top is fifteen minutes wide and labelled "Skim — abstract, intro, figures, conclusion". Pass 2 in the middle is sixty minutes wide and labelled "Read — architecture, one example, common-case path". Pass 3 at the bottom is ninety minutes wide and labelled "Critique — proofs, evaluation, failure modes". Each band has an exit-question on its right side: pass 1 asks "Is this paper relevant to me?", pass 2 asks "Could I explain the system to a colleague?", pass 3 asks "Could I reimplement it or find a flaw?".Pass 1 — Skim15 minabstract, intro, figures, conclusion"Is this paper relevant to me?"If yes → pass 2. If no → stop.Pass 2 — Read for the design60 minarchitecture, one example, common-case path"Can I explain it to a colleague?"If yes → pass 3. If no → stop.Pass 3 — Critique and rebuild90 minproofs, evaluation, failure modes, related work"Could I reimplementit, or find a flaw?"
Three nested passes, each with a budget and an exit question. Skip pass 2 if pass 1 was negative. Skip pass 3 if pass 2 was negative. Most papers stop at pass 1.

Why three passes and not one careful read: a single read forces you to spend the same effort on the abstract as on the proofs, but the abstract carries 80% of the value at 5% of the cost. Tiered reading lets you reject the wrong papers cheaply and concentrate budget on the right ones.

Pass 1 — fifteen minutes, no notes

Open the PDF. Read these in this order, and nothing else:

  1. Title and authors. Who wrote it and where? "Corbett et al., Google" already tells you the paper is going to describe an in-production system at planet scale, not a prototype. "Stonebraker et al., MIT" tells you it is going to be opinionated and benchmark-heavy. Authorship sets your prior.
  2. Abstract. One paragraph. The single most-rewritten paragraph in the paper — every word was negotiated. The thesis sentence (usually the second or third) is the entire claim of the work.
  3. Introduction's last paragraph. This is the "contributions" list — every paper has it, almost always at the end of section 1 as a bullet list or "We make the following contributions:". This is the table of contents of the ideas, not the paper itself.
  4. All figures. Every architecture diagram, every plot. Read the captions. A good systems paper tells its story in the figures alone; if you cannot follow the story from figures and captions, the paper is poorly written or you need pass 2.
  5. Conclusion. Last section. Restates the thesis with whatever caveats survived peer review. Often more honest than the abstract about limitations.

Total time: 15 minutes. At the end, answer one question: is this paper relevant to my problem? If no, stop. You learned what the paper is about and that is enough. If yes, schedule pass 2 for tomorrow — do not roll straight into it. The fifteen-minute skim is what your brain wants to consolidate overnight, not a 60-minute deep read on top.

Pass 2 — sixty minutes, with the architecture diagram on a notepad

Pass 2 is for the common-case path of the system. You are not reading every section. You are answering: when a typical operation happens, what does the system do? For a database paper that means trace one read and one write through the architecture, end to end.

Open a notebook. Redraw the system's architecture diagram by hand on a fresh page. Do not photograph or copy the figure — redraw it, with arrows and component names. The act of redrawing forces you to notice every component you would have skimmed past. Now read these sections, in this order:

  1. Section 2 (System Model / Architecture). What are the components? What is the unit of data? What is the unit of replication?
  2. Section 3 or wherever the data model is. What does a row / key / cell look like? What is the schema?
  3. The "common case" path. Find the one example the authors walk through (Spanner walks through a single transaction in section 4). Trace it through your hand-drawn architecture. Mark every box the request touches.
  4. The first half of the evaluation. What workload? What scale? What did they measure?

You will skip — deliberately — the proofs, the related work section, the formal definitions, and the second half of the evaluation. Those are pass 3.

The exit condition for pass 2 is: could you explain this system to a colleague at lunch in five minutes, including what it is for, what its three or four components are, and what happens when a typical request hits it? If yes, you understand the design. If no, schedule pass 3 — or, more often, find a different paper that explains the prerequisite (Paxos before Spanner, LSM-tree before Bigtable).

Pass 3 — ninety minutes, with a red pen

Pass 3 is the critique pass. You read this paper as if you were going to reimplement the system or write a paper that breaks it. Two things to do:

Find every assumption. A systems paper hides its assumptions in the choice of what to evaluate against. Spanner evaluates with TrueTime uncertainty bounds of 7ms. What if your machine has 70ms uncertainty? The paper does not say, because it does not have to — but a careful reader notes that bound and asks, "what breaks if it loosens?"

Find every failure mode they did not test. The evaluation section shows what works. The failure-mode section (often a single subsection, sometimes spread across the paper) shows what they thought about. The interesting failures are the ones in neither. Network partitions during a leader change. Disk corruption that the paper says "we ignore." A clock that runs backwards. Note these, because these are the questions you will ask in your interview / production debugging / blog post.

By the end of pass 3 you should be able to write a one-page critique: what the paper does well, what it does not test, and what the next paper in this lineage will probably attack. That is what every PhD reading group does on Friday afternoons.

Anatomy of a database paper — section by section

Database papers across SIGMOD, VLDB, OSDI, SOSP, FAST and CIDR converge on roughly the same eight-section shape. Knowing the shape lets you find what you need without reading in order.

The eight-section anatomy of a typical database paperA two-column layout. Left column lists eight sections labelled Abstract, Introduction, Background, System architecture, Mechanism deep-dive, Evaluation, Related work, and Conclusion. Each section has a small label on the right indicating its trustworthiness — Abstract is "the thesis", Intro is "the pitch", Background is "skip if known", Architecture is "read carefully", Mechanism is "the meat", Evaluation is "read with suspicion", Related work is "the citations map", Conclusion is "honest caveats". The right column is a vertical band that highlights which sections are read in pass 1, 2 and 3.1. Abstractthe thesis2. Introductionthe pitch — read last paragraph3. Backgroundskip if you know the area4. System architectureread carefully — pass 25. Mechanism deep-divethe meat — pass 2 / 36. Evaluationread with suspicion7. Related workthe citations map8. Conclusionhonest caveatsWhen to readPass 1Pass 2Pass 3Pass 1
Sections cluster by trustworthiness. The thesis sentences (abstract, intro's last paragraph, conclusion) are the most-edited. The evaluation is the least trustworthy — every team picks the workload that flatters them.

A few load-bearing notes on what each section actually tells you:

A worked example: reading Spanner OSDI 2012

Spanner is the right paper to practise on, because you have already met it across this track — it appeared in global-consistency, in transactions, and in consensus. You know what TrueTime is. Now learn how to read the paper that introduced it.

The paper is Corbett et al., Spanner: Google's Globally-Distributed Database, OSDI 2012. Fourteen pages. Twenty-seven citations. Eight sections.

Pass 1 (15 min) — what is Spanner?

Open the PDF. Read in this order:

Title. Spanner: Google's Globally-Distributed Database. "Globally-distributed" is the load-bearing word. Not "distributed". Globally — across continents. Note this; it shapes everything.

Abstract. Three sentences in. The thesis: "Spanner is the first system to distribute data at global scale and support externally-consistent distributed transactions." That is the contribution. Externally consistent + global scale. If both halves were already solved, the paper would not exist. Spanner's contribution is doing both at once.

Introduction's last paragraph. Skim the contributions list. Three things: (1) a globally-distributed database with ACID transactions, (2) a TrueTime API exposing clock uncertainty, (3) a way to use TrueTime to enforce external consistency. Note that TrueTime is both a system component and an API exposed to applications. That dual role is what makes the paper interesting.

Figures. Skim them all. Figure 1 is the deployment diagram — zones, spanservers, clients. Figure 2 shows the spanserver software stack. Figure 3 shows directories. Figures 4–6 are evaluation. Figure 6 is the famous TrueTime epsilon graph showing 7ms typical uncertainty. Note: every figure tells you something about the system except the evaluation ones, which tell you what they want you to think.

Conclusion. Honest about what cost: TrueTime requires GPS clocks and atomic clocks in every datacenter, which not every reader will have. Useful caveat.

Pass 1 verdict: Spanner is a globally-distributed database that uses physical clocks (with bounded uncertainty) to give applications a consistency model usually associated with single-machine databases. That is enough to talk about Spanner at lunch. If you are building a system at global scale, schedule pass 2 tomorrow. If you are not, you are done.

Pass 2 (60 min) — how does a Spanner transaction commit?

Goal: trace a single read-write transaction through the architecture. Pull out a notebook and draw the deployment.

Read section 2 (Implementation). Universe → zones → spanservers → tablets. A zone is a datacenter. A spanserver serves 100–1000 tablets. Each tablet is a Paxos group of replicas across zones. This is the load-bearing structure of the entire paper. Sketch it.

Read section 3 (TrueTime). Two API calls: TT.now() returns an interval [earliest, latest] such that the actual time is somewhere in there. TT.after(t) returns true if t is definitely in the past. The implementation uses GPS receivers and atomic clocks per datacenter. The interval width is typically 1–7 ms.

Read section 4 (Concurrency). The mechanism is a lock-based system layered over Paxos: each Paxos group has a leader; the leader holds locks; transactions hold locks across leaders for distributed transactions, coordinated by 2PC.

Why pass 2 traces one example instead of generalising: a system paper has fifty edge cases, but they all decorate one common-case path. If you cannot draw the common case end-to-end, the edge cases are noise. Trace the boring transaction first; the interesting failures only make sense once you know what success looks like.

Trace one transaction. A read-write transaction in Spanner:

  1. Client picks a coordinator leader (one of the involved Paxos leaders).
  2. Each leader acquires write locks on its keys and assigns a prepare timestamp.
  3. Each prepare timestamp is sent to the coordinator.
  4. The coordinator picks a commit timestamp s ≥ all prepare timestamps and ≥ TT.now().latest at the moment of commit.
  5. The coordinator waits until TT.after(s) is true — the commit-wait. This is usually a few milliseconds.
  6. The coordinator notifies the participants, who release locks and apply the writes.

The commit-wait at step 5 is the entire point of the paper. By waiting until s is definitely in the past, Spanner guarantees that any transaction that starts after this one finishes will see a strictly later commit timestamp — which is the definition of external consistency.

Why the commit-wait is the whole idea: clock uncertainty means you cannot tell, at the instant you assign timestamp s, whether s is in the future or the past. If you release locks while s could still be in the future, a later transaction might be assigned an earlier timestamp and see your writes "before they happened". Waiting until TT.after(s) is true forces the timeline to catch up.

You should now be able to explain Spanner to a colleague: "It's Bigtable plus Paxos plus 2PC plus a clock-wait. The clock-wait gives you serialisability across continents, paid for in commit latency proportional to your clock uncertainty."

Pass 3 (90 min) — what would break Spanner?

Now read the parts you skipped. Section 4.1.3 (read-only transactions). Section 4.2.5 (schema changes). Section 5 (evaluation). Find the assumptions.

The TrueTime epsilon assumption. Spanner assumes ε of 1–7 ms. Their own figure 6 shows ε spiking to 100 ms during failures. What happens to commit latency when ε is 100 ms? Every transaction waits 100 ms after assigning its timestamp. Throughput collapses. Spanner's section 5 does not stress-test this; you should note it as an open question.

The 2PC + Paxos cost. Section 5.3 measures commit latency at ~70 ms (median) for 1 participant, ~80 ms for 5. That is reasonable for cross-continent commits. But it is also the floor — single-region transactions still pay this. If you put Spanner in one Bengaluru datacenter, you get the same ~70 ms commit, which is much slower than a non-Spanner Postgres at sub-millisecond. The paper does not say this; you derive it from their numbers.

Failure modes not tested. What happens during a Paxos leader change while a transaction holds locks? The paper says "leases handle this" but does not show data. What happens during a network partition between zones during commit? Section 4 hand-waves this; the Jepsen analyses of CockroachDB (a Spanner-inspired system) found real bugs in exactly this regime.

By the end of pass 3 you have your one-page critique: Spanner is a brilliant integration of Paxos, 2PC, and physical clocks; its weakness is that the clock-wait is on the critical path, so any clock-uncertainty incident becomes a latency incident; and the open question is whether non-Google clouds can replicate the GPS+atomic clock infrastructure cheaply enough to follow this design. (As of 2026, AWS TimeSync and Azure Precision Time Protocol have made this much cheaper — a follow-up worth tracking.)

A reading log script — track what you have read

You will read hundreds of papers across a career. Track them. Here is a tiny Python script to maintain a reading log.

# paperlog.py — minimal reading log
import json, datetime, sys
from pathlib import Path

LOG = Path.home() / ".paperlog.json"

def load():
    if LOG.exists():
        return json.loads(LOG.read_text())
    return []

def save(entries):
    LOG.write_text(json.dumps(entries, indent=2))

def add(citation, pass_number, summary):
    entries = load()
    entries.append({
        "citation": citation,
        "pass": pass_number,
        "date": datetime.date.today().isoformat(),
        "summary": summary,
    })
    save(entries)
    print(f"logged: {citation} (pass {pass_number})")

def show():
    for e in load():
        print(f"[{e['date']}] pass {e['pass']}: {e['citation']}")
        print(f"   {e['summary']}\n")

if __name__ == "__main__":
    if sys.argv[1] == "add":
        add(sys.argv[2], int(sys.argv[3]), sys.argv[4])
    elif sys.argv[1] == "show":
        show()

Use it like this after each pass:

$ python paperlog.py add "Corbett et al. Spanner OSDI 2012" 1 "Globally-distributed DB; TrueTime exposes clock uncertainty; commit-wait gives external consistency"
logged: Corbett et al. Spanner OSDI 2012 (pass 1)

$ python paperlog.py add "Corbett et al. Spanner OSDI 2012" 2 "Traced RW transaction: leaders prepare → coordinator picks s ≥ TT.now().latest → commit-wait until TT.after(s) → release locks. Clock uncertainty is 1-7ms typically."
logged: Corbett et al. Spanner OSDI 2012 (pass 2)

$ python paperlog.py show
[2026-04-23] pass 1: Corbett et al. Spanner OSDI 2012
   Globally-distributed DB; TrueTime exposes clock uncertainty; commit-wait gives external consistency

[2026-04-25] pass 2: Corbett et al. Spanner OSDI 2012
   Traced RW transaction: leaders prepare → coordinator picks s ≥ TT.now().latest → commit-wait until TT.after(s) → release locks. Clock uncertainty is 1-7ms typically.

After a year of this, you have a searchable record of every paper you have engaged with, at what depth, and the one-line summary that survived. When you join a new team in Bengaluru and someone asks "have you read Calvin?", you check the log and either say yes-with-summary or schedule a pass-1.

Common confusions

Going deeper

The papers every database engineer should read at least once

There are ten or fifteen papers that the database community treats as canonical. Read them in roughly this order:

  1. The Anatomy of a Database System (Hellerstein, Stonebraker, Hamilton 2007) — the textbook chapter that maps how every commercial relational database is structured.
  2. Bigtable (Chang et al. OSDI 2006) — the paper that started the NoSQL wave; introduces tablets, Chubby, and the SSTable format.
  3. Dynamo (DeCandia et al. SOSP 2007) — the eventually-consistent counterpoint to Bigtable; introduces vector clocks, consistent hashing, and read repair to a wide audience.
  4. The Log-Structured Merge-Tree (O'Neil et al. 1996) — the LSM paper; you have already met its descendants in LevelDB and RocksDB.
  5. ARIES (Mohan et al. TODS 1992) — write-ahead logging done right.
  6. Paxos Made Simple (Lamport 2001) — the prerequisite for everything consensus-shaped.
  7. Raft (Ongaro and Ousterhout 2014) — Paxos rewritten to be teachable.
  8. Spanner (Corbett et al. OSDI 2012) — globally-consistent transactions.
  9. Calvin (Thomson et al. SIGMOD 2012) — deterministic ordering as an alternative to 2PC.
  10. The BW-Tree (Levandoski et al. ICDE 2013) — lock-free B-tree variant for modern hardware.

This is a six-month reading project at one paper a fortnight. Do it once and you have a foundation that compounds for the rest of your career.

The Adrian Colyer trick — read the paper and a summary

Adrian Colyer's The Morning Paper blog (2014–2021) summarised one paper per day for seven years. The summaries are concise, opinionated, and link to the original. The trick: read his summary as your pass 1, then go to the paper for pass 2. The summary acts as a sanity check — if your pass-1 conclusions diverge wildly from his, you missed something.

The same applies to Murat Demirbas's blog (muratbuffalo.blogspot.com), Aphyr's Jepsen analyses, and Database Internals by Alex Petrov (a textbook that paraphrases ~50 papers in plain English). These secondary sources are not a replacement for reading the original — they are scaffolding.

Industry papers vs. academic papers

A 2026 reader meets two flavours of database papers:

Read both kinds. Academic papers tell you what is possible; industry papers tell you what survived contact with reality.

The 5-paper interview

A standard senior-engineer database interview at the better Indian product companies (Razorpay, Zerodha, PhonePe, the Flipkart distributed-systems org) routinely asks: "Tell me about a database paper you have read recently and what you took from it." The interviewer is checking three things:

  1. Have you read a paper in the last year?
  2. Can you summarise the contribution in one sentence?
  3. Can you critique the paper — name an assumption, a missing benchmark, a failure mode?

The three-pass method gives you all three answers. Pass 1 produces the one-sentence summary. Pass 3 produces the critique. The reading log is the evidence.

If you cannot answer this question, you have not been reading. Pick one paper from the canonical list, run the three-pass method on it this weekend, and put it in the log. Repeat fortnightly.

The newer alternative — papers with code

The 2020s brought a quiet shift: many systems papers now publish runnable code along with the paper (typically on GitHub, sometimes via Zenodo for SIGMOD's reproducibility track). When code is available, change pass 2. Instead of just tracing the algorithm on paper, clone the repo and run the smallest example the README has. Trace one operation through the code with pdb or gdb open. The code is the ground truth; the paper is one description of it.

This is especially valuable for newer papers like DuckDB, Umbra, and Velox. For older papers (Bigtable, Spanner) the code is closed-source, so the paper is all you have — but for those there are usually open-source clones (HBase for Bigtable, CockroachDB for Spanner) whose code you can read instead.

Where this leads next

References

  1. Srinivasan Keshav, How to Read a Paper (ACM SIGCOMM CCR, 2007) — the foundational essay on the three-pass method. Stanford PDF.
  2. James C. Corbett et al., Spanner: Google's Globally-Distributed Database (OSDI 2012) — the worked example used throughout this chapter. Google Research.
  3. Joseph M. Hellerstein, Michael Stonebraker, James Hamilton, Architecture of a Database System (Foundations and Trends in Databases, 2007) — read this before any other database paper. Berkeley PDF.
  4. Adrian Colyer, The Morning Paper (2014–2021) — seven years of one-paper-a-day summaries; the gold standard for paper writeups. blog.acolyer.org.
  5. Murat Demirbas, Metadata blog — paper reviews from a distributed-systems professor with strong opinions and a long memory. muratbuffalo.blogspot.com.
  6. Alex Petrov, Database Internals (O'Reilly, 2019) — book-length paraphrase of ~50 systems papers, cross-referenced. databass.dev.
  7. Spanner: TrueTime and external consistency — the padho-wiki chapter that walks the Spanner mechanism in detail.
  8. Polyglot persistence: picking the right DB per workload — the chapter immediately before this one in Build 24.