How to read a database paper
Open the Spanner OSDI 2012 paper in one tab and a stopwatch in another. The first pass is fifteen minutes, not three hours. By the end of this chapter you will know exactly which fifteen minutes — and which sentences inside them — actually carry the design.
A database paper is not a textbook chapter. Read it in three passes — skim the abstract and figures (15 min), follow the system architecture and one example transaction (60 min), then chase the failure modes and the evaluation (90 min). The trick is knowing which sections lie, which numbers matter, and which sentence in the introduction is the entire idea.
Why papers feel impossible (and why they are not)
You read a Spanner paper end-to-end in textbook order. By page four you are lost in TrueTime API definitions. By page seven you have forgotten what problem the paper is solving. You give up and watch a YouTube summary, and now you "know what Spanner is" without ever having looked at the design.
This is the wrong loop. Papers are not written for the order they print in. They are written for peer reviewers — twelve people who already know the field — who skim them in fifteen minutes and decide whether they are publishable. The narrative order on paper is reverse-engineered from a checklist of evaluation criteria, not from the order an engineer would learn the system. If you read in print order you are reading in the order designed for a critic, not a builder.
You are a builder. You want one thing from every paper: what is the smallest mental model of this system that lets me predict what it does in a situation the paper does not mention? Three passes get you there. Each pass has a fixed budget, a fixed goal, and a fixed exit condition.
The three-pass method
The three-pass method comes from Srinivasan Keshav's 2007 essay How to Read a Paper, distilled here for database engineering specifically. The passes are nested: pass 2 only happens if pass 1 says the paper is worth more time, and pass 3 only happens if you actually need to implement or critique the system.
Why three passes and not one careful read: a single read forces you to spend the same effort on the abstract as on the proofs, but the abstract carries 80% of the value at 5% of the cost. Tiered reading lets you reject the wrong papers cheaply and concentrate budget on the right ones.
Pass 1 — fifteen minutes, no notes
Open the PDF. Read these in this order, and nothing else:
- Title and authors. Who wrote it and where? "Corbett et al., Google" already tells you the paper is going to describe an in-production system at planet scale, not a prototype. "Stonebraker et al., MIT" tells you it is going to be opinionated and benchmark-heavy. Authorship sets your prior.
- Abstract. One paragraph. The single most-rewritten paragraph in the paper — every word was negotiated. The thesis sentence (usually the second or third) is the entire claim of the work.
- Introduction's last paragraph. This is the "contributions" list — every paper has it, almost always at the end of section 1 as a bullet list or "We make the following contributions:". This is the table of contents of the ideas, not the paper itself.
- All figures. Every architecture diagram, every plot. Read the captions. A good systems paper tells its story in the figures alone; if you cannot follow the story from figures and captions, the paper is poorly written or you need pass 2.
- Conclusion. Last section. Restates the thesis with whatever caveats survived peer review. Often more honest than the abstract about limitations.
Total time: 15 minutes. At the end, answer one question: is this paper relevant to my problem? If no, stop. You learned what the paper is about and that is enough. If yes, schedule pass 2 for tomorrow — do not roll straight into it. The fifteen-minute skim is what your brain wants to consolidate overnight, not a 60-minute deep read on top.
Pass 2 — sixty minutes, with the architecture diagram on a notepad
Pass 2 is for the common-case path of the system. You are not reading every section. You are answering: when a typical operation happens, what does the system do? For a database paper that means trace one read and one write through the architecture, end to end.
Open a notebook. Redraw the system's architecture diagram by hand on a fresh page. Do not photograph or copy the figure — redraw it, with arrows and component names. The act of redrawing forces you to notice every component you would have skimmed past. Now read these sections, in this order:
- Section 2 (System Model / Architecture). What are the components? What is the unit of data? What is the unit of replication?
- Section 3 or wherever the data model is. What does a row / key / cell look like? What is the schema?
- The "common case" path. Find the one example the authors walk through (Spanner walks through a single transaction in section 4). Trace it through your hand-drawn architecture. Mark every box the request touches.
- The first half of the evaluation. What workload? What scale? What did they measure?
You will skip — deliberately — the proofs, the related work section, the formal definitions, and the second half of the evaluation. Those are pass 3.
The exit condition for pass 2 is: could you explain this system to a colleague at lunch in five minutes, including what it is for, what its three or four components are, and what happens when a typical request hits it? If yes, you understand the design. If no, schedule pass 3 — or, more often, find a different paper that explains the prerequisite (Paxos before Spanner, LSM-tree before Bigtable).
Pass 3 — ninety minutes, with a red pen
Pass 3 is the critique pass. You read this paper as if you were going to reimplement the system or write a paper that breaks it. Two things to do:
Find every assumption. A systems paper hides its assumptions in the choice of what to evaluate against. Spanner evaluates with TrueTime uncertainty bounds of 7ms. What if your machine has 70ms uncertainty? The paper does not say, because it does not have to — but a careful reader notes that bound and asks, "what breaks if it loosens?"
Find every failure mode they did not test. The evaluation section shows what works. The failure-mode section (often a single subsection, sometimes spread across the paper) shows what they thought about. The interesting failures are the ones in neither. Network partitions during a leader change. Disk corruption that the paper says "we ignore." A clock that runs backwards. Note these, because these are the questions you will ask in your interview / production debugging / blog post.
By the end of pass 3 you should be able to write a one-page critique: what the paper does well, what it does not test, and what the next paper in this lineage will probably attack. That is what every PhD reading group does on Friday afternoons.
Anatomy of a database paper — section by section
Database papers across SIGMOD, VLDB, OSDI, SOSP, FAST and CIDR converge on roughly the same eight-section shape. Knowing the shape lets you find what you need without reading in order.
A few load-bearing notes on what each section actually tells you:
- Abstract is the most-rewritten paragraph. Trust the thesis sentence (the one that says "we present X, which does Y by Z"). Distrust performance numbers — they are cherry-picked.
- Introduction ends in a contributions list. That is your real table of contents. The first half of the introduction is the pitch and is structurally identical to a sales deck.
- Background is for reviewers who do not know the field. If you do, skip. If you do not, do not read it from the paper itself — go read the foundational paper it cites instead. A background section is a compressed paraphrase, almost always lossy.
- System architecture is where the paper earns its keep. Read figure first, then prose, then re-read figure. If the architecture is not clear after thirty minutes, the paper is hiding something or you are missing a prerequisite.
- Mechanism deep-dive is the four or five pages that contain the actual contribution. For Spanner, this is sections 4 and 5 (concurrency and TrueTime). For Bigtable, it is sections 5 and 6 (tablets and Chubby). Every other section is scaffolding around these.
- Evaluation is read with suspicion. The team picked the workload, the hardware, and the comparison baseline. Read what they did not test as carefully as what they did.
- Related work is the citations map. Use it to find the prerequisite paper you should have read first, and the follow-up paper that critiques this one.
- Conclusion is sometimes more honest than the abstract about scope. Read.
A worked example: reading Spanner OSDI 2012
Spanner is the right paper to practise on, because you have already met it across this track — it appeared in global-consistency, in transactions, and in consensus. You know what TrueTime is. Now learn how to read the paper that introduced it.
The paper is Corbett et al., Spanner: Google's Globally-Distributed Database, OSDI 2012. Fourteen pages. Twenty-seven citations. Eight sections.
Pass 1 (15 min) — what is Spanner?
Open the PDF. Read in this order:
Title. Spanner: Google's Globally-Distributed Database. "Globally-distributed" is the load-bearing word. Not "distributed". Globally — across continents. Note this; it shapes everything.
Abstract. Three sentences in. The thesis: "Spanner is the first system to distribute data at global scale and support externally-consistent distributed transactions." That is the contribution. Externally consistent + global scale. If both halves were already solved, the paper would not exist. Spanner's contribution is doing both at once.
Introduction's last paragraph. Skim the contributions list. Three things: (1) a globally-distributed database with ACID transactions, (2) a TrueTime API exposing clock uncertainty, (3) a way to use TrueTime to enforce external consistency. Note that TrueTime is both a system component and an API exposed to applications. That dual role is what makes the paper interesting.
Figures. Skim them all. Figure 1 is the deployment diagram — zones, spanservers, clients. Figure 2 shows the spanserver software stack. Figure 3 shows directories. Figures 4–6 are evaluation. Figure 6 is the famous TrueTime epsilon graph showing 7ms typical uncertainty. Note: every figure tells you something about the system except the evaluation ones, which tell you what they want you to think.
Conclusion. Honest about what cost: TrueTime requires GPS clocks and atomic clocks in every datacenter, which not every reader will have. Useful caveat.
Pass 1 verdict: Spanner is a globally-distributed database that uses physical clocks (with bounded uncertainty) to give applications a consistency model usually associated with single-machine databases. That is enough to talk about Spanner at lunch. If you are building a system at global scale, schedule pass 2 tomorrow. If you are not, you are done.
Pass 2 (60 min) — how does a Spanner transaction commit?
Goal: trace a single read-write transaction through the architecture. Pull out a notebook and draw the deployment.
Read section 2 (Implementation). Universe → zones → spanservers → tablets. A zone is a datacenter. A spanserver serves 100–1000 tablets. Each tablet is a Paxos group of replicas across zones. This is the load-bearing structure of the entire paper. Sketch it.
Read section 3 (TrueTime). Two API calls: TT.now() returns an interval [earliest, latest] such that the actual time is somewhere in there. TT.after(t) returns true if t is definitely in the past. The implementation uses GPS receivers and atomic clocks per datacenter. The interval width is typically 1–7 ms.
Read section 4 (Concurrency). The mechanism is a lock-based system layered over Paxos: each Paxos group has a leader; the leader holds locks; transactions hold locks across leaders for distributed transactions, coordinated by 2PC.
Why pass 2 traces one example instead of generalising: a system paper has fifty edge cases, but they all decorate one common-case path. If you cannot draw the common case end-to-end, the edge cases are noise. Trace the boring transaction first; the interesting failures only make sense once you know what success looks like.
Trace one transaction. A read-write transaction in Spanner:
- Client picks a coordinator leader (one of the involved Paxos leaders).
- Each leader acquires write locks on its keys and assigns a prepare timestamp.
- Each prepare timestamp is sent to the coordinator.
- The coordinator picks a commit timestamp
s≥ all prepare timestamps and ≥TT.now().latestat the moment of commit. - The coordinator waits until
TT.after(s)is true — the commit-wait. This is usually a few milliseconds. - The coordinator notifies the participants, who release locks and apply the writes.
The commit-wait at step 5 is the entire point of the paper. By waiting until s is definitely in the past, Spanner guarantees that any transaction that starts after this one finishes will see a strictly later commit timestamp — which is the definition of external consistency.
Why the commit-wait is the whole idea: clock uncertainty means you cannot tell, at the instant you assign timestamp s, whether s is in the future or the past. If you release locks while s could still be in the future, a later transaction might be assigned an earlier timestamp and see your writes "before they happened". Waiting until TT.after(s) is true forces the timeline to catch up.
You should now be able to explain Spanner to a colleague: "It's Bigtable plus Paxos plus 2PC plus a clock-wait. The clock-wait gives you serialisability across continents, paid for in commit latency proportional to your clock uncertainty."
Pass 3 (90 min) — what would break Spanner?
Now read the parts you skipped. Section 4.1.3 (read-only transactions). Section 4.2.5 (schema changes). Section 5 (evaluation). Find the assumptions.
The TrueTime epsilon assumption. Spanner assumes ε of 1–7 ms. Their own figure 6 shows ε spiking to 100 ms during failures. What happens to commit latency when ε is 100 ms? Every transaction waits 100 ms after assigning its timestamp. Throughput collapses. Spanner's section 5 does not stress-test this; you should note it as an open question.
The 2PC + Paxos cost. Section 5.3 measures commit latency at ~70 ms (median) for 1 participant, ~80 ms for 5. That is reasonable for cross-continent commits. But it is also the floor — single-region transactions still pay this. If you put Spanner in one Bengaluru datacenter, you get the same ~70 ms commit, which is much slower than a non-Spanner Postgres at sub-millisecond. The paper does not say this; you derive it from their numbers.
Failure modes not tested. What happens during a Paxos leader change while a transaction holds locks? The paper says "leases handle this" but does not show data. What happens during a network partition between zones during commit? Section 4 hand-waves this; the Jepsen analyses of CockroachDB (a Spanner-inspired system) found real bugs in exactly this regime.
By the end of pass 3 you have your one-page critique: Spanner is a brilliant integration of Paxos, 2PC, and physical clocks; its weakness is that the clock-wait is on the critical path, so any clock-uncertainty incident becomes a latency incident; and the open question is whether non-Google clouds can replicate the GPS+atomic clock infrastructure cheaply enough to follow this design. (As of 2026, AWS TimeSync and Azure Precision Time Protocol have made this much cheaper — a follow-up worth tracking.)
A reading log script — track what you have read
You will read hundreds of papers across a career. Track them. Here is a tiny Python script to maintain a reading log.
# paperlog.py — minimal reading log
import json, datetime, sys
from pathlib import Path
LOG = Path.home() / ".paperlog.json"
def load():
if LOG.exists():
return json.loads(LOG.read_text())
return []
def save(entries):
LOG.write_text(json.dumps(entries, indent=2))
def add(citation, pass_number, summary):
entries = load()
entries.append({
"citation": citation,
"pass": pass_number,
"date": datetime.date.today().isoformat(),
"summary": summary,
})
save(entries)
print(f"logged: {citation} (pass {pass_number})")
def show():
for e in load():
print(f"[{e['date']}] pass {e['pass']}: {e['citation']}")
print(f" {e['summary']}\n")
if __name__ == "__main__":
if sys.argv[1] == "add":
add(sys.argv[2], int(sys.argv[3]), sys.argv[4])
elif sys.argv[1] == "show":
show()
Use it like this after each pass:
$ python paperlog.py add "Corbett et al. Spanner OSDI 2012" 1 "Globally-distributed DB; TrueTime exposes clock uncertainty; commit-wait gives external consistency"
logged: Corbett et al. Spanner OSDI 2012 (pass 1)
$ python paperlog.py add "Corbett et al. Spanner OSDI 2012" 2 "Traced RW transaction: leaders prepare → coordinator picks s ≥ TT.now().latest → commit-wait until TT.after(s) → release locks. Clock uncertainty is 1-7ms typically."
logged: Corbett et al. Spanner OSDI 2012 (pass 2)
$ python paperlog.py show
[2026-04-23] pass 1: Corbett et al. Spanner OSDI 2012
Globally-distributed DB; TrueTime exposes clock uncertainty; commit-wait gives external consistency
[2026-04-25] pass 2: Corbett et al. Spanner OSDI 2012
Traced RW transaction: leaders prepare → coordinator picks s ≥ TT.now().latest → commit-wait until TT.after(s) → release locks. Clock uncertainty is 1-7ms typically.
After a year of this, you have a searchable record of every paper you have engaged with, at what depth, and the one-line summary that survived. When you join a new team in Bengaluru and someone asks "have you read Calvin?", you check the log and either say yes-with-summary or schedule a pass-1.
Common confusions
-
"If I cannot follow a paper, the paper is too hard for me." Usually the paper is fine and you are missing a prerequisite. Spanner assumes you know Paxos, 2PC, MVCC, and Bigtable. If any of those are shaky, find that paper first. Read Paxos Made Simple before Spanner, Bigtable before Spanner, and the original 2PC writeup before any distributed-transaction paper. Papers compose; reading them out of order is the actual hard mode.
-
"The evaluation section tells me how good the system is." It tells you how good the system is on the workload the authors picked. Every evaluation is a bias toward the design's strengths. The TPC-C numbers in any paper claiming to beat Postgres are almost always on a specific TPC-C variant that the system was tuned for. Read the workload spec, then ask which workload was not shown.
-
"Every paper is equally trustworthy because it was peer-reviewed." Industry-track OSDI / SIGMOD papers from large companies (Google, Amazon, Microsoft) describe systems already running at scale — high empirical credibility. Pure-academia papers describe prototypes — high theoretical credibility, low operational credibility. CIDR papers are vision papers, often speculative. Conference matters; track within conference matters more.
-
"Read the references later." No — read at least one reference before the paper, the one cited as the prerequisite. Reading Spanner without having read Bigtable and Paxos is reading chapter 12 of a book without chapters 1–11. The references list is not an appendix; it is the prerequisite syllabus.
-
"If I take notes I will read more slowly and that is bad." Notes are not optional. The act of paraphrasing the architecture into your own words is what loads the system into long-term memory. A no-notes read is a dream you forget by the next morning. Use the reading log script above, or a paper notebook, but write something every pass.
-
"Following the citation graph is endless." Bound it: chase one prerequisite paper before each big paper, and one critique-paper after. That is two extra papers per primary paper, not infinity. Most papers cite the same three or four foundational works, so the graph collapses fast.
Going deeper
The papers every database engineer should read at least once
There are ten or fifteen papers that the database community treats as canonical. Read them in roughly this order:
- The Anatomy of a Database System (Hellerstein, Stonebraker, Hamilton 2007) — the textbook chapter that maps how every commercial relational database is structured.
- Bigtable (Chang et al. OSDI 2006) — the paper that started the NoSQL wave; introduces tablets, Chubby, and the SSTable format.
- Dynamo (DeCandia et al. SOSP 2007) — the eventually-consistent counterpoint to Bigtable; introduces vector clocks, consistent hashing, and read repair to a wide audience.
- The Log-Structured Merge-Tree (O'Neil et al. 1996) — the LSM paper; you have already met its descendants in LevelDB and RocksDB.
- ARIES (Mohan et al. TODS 1992) — write-ahead logging done right.
- Paxos Made Simple (Lamport 2001) — the prerequisite for everything consensus-shaped.
- Raft (Ongaro and Ousterhout 2014) — Paxos rewritten to be teachable.
- Spanner (Corbett et al. OSDI 2012) — globally-consistent transactions.
- Calvin (Thomson et al. SIGMOD 2012) — deterministic ordering as an alternative to 2PC.
- The BW-Tree (Levandoski et al. ICDE 2013) — lock-free B-tree variant for modern hardware.
This is a six-month reading project at one paper a fortnight. Do it once and you have a foundation that compounds for the rest of your career.
The Adrian Colyer trick — read the paper and a summary
Adrian Colyer's The Morning Paper blog (2014–2021) summarised one paper per day for seven years. The summaries are concise, opinionated, and link to the original. The trick: read his summary as your pass 1, then go to the paper for pass 2. The summary acts as a sanity check — if your pass-1 conclusions diverge wildly from his, you missed something.
The same applies to Murat Demirbas's blog (muratbuffalo.blogspot.com), Aphyr's Jepsen analyses, and Database Internals by Alex Petrov (a textbook that paraphrases ~50 papers in plain English). These secondary sources are not a replacement for reading the original — they are scaffolding.
Industry papers vs. academic papers
A 2026 reader meets two flavours of database papers:
- Industry-track papers (Spanner, Bigtable, Dynamo, F1, Aurora) — describe systems already in production. The numbers are real, the failures are real. The downside: the design is constrained by what the company already had (Spanner uses Paxos because Google had Chubby; Aurora uses MySQL because Amazon had MySQL customers).
- Academic papers (Calvin, BW-Tree, Hekaton, LeanStore) — describe prototypes or research systems. The numbers are clean, the design is clean. The downside: they often skip the boring operational concerns (backup, schema migration, observability) that determine whether the system survives in production.
Read both kinds. Academic papers tell you what is possible; industry papers tell you what survived contact with reality.
The 5-paper interview
A standard senior-engineer database interview at the better Indian product companies (Razorpay, Zerodha, PhonePe, the Flipkart distributed-systems org) routinely asks: "Tell me about a database paper you have read recently and what you took from it." The interviewer is checking three things:
- Have you read a paper in the last year?
- Can you summarise the contribution in one sentence?
- Can you critique the paper — name an assumption, a missing benchmark, a failure mode?
The three-pass method gives you all three answers. Pass 1 produces the one-sentence summary. Pass 3 produces the critique. The reading log is the evidence.
If you cannot answer this question, you have not been reading. Pick one paper from the canonical list, run the three-pass method on it this weekend, and put it in the log. Repeat fortnightly.
The newer alternative — papers with code
The 2020s brought a quiet shift: many systems papers now publish runnable code along with the paper (typically on GitHub, sometimes via Zenodo for SIGMOD's reproducibility track). When code is available, change pass 2. Instead of just tracing the algorithm on paper, clone the repo and run the smallest example the README has. Trace one operation through the code with pdb or gdb open. The code is the ground truth; the paper is one description of it.
This is especially valuable for newer papers like DuckDB, Umbra, and Velox. For older papers (Bigtable, Spanner) the code is closed-source, so the paper is all you have — but for those there are usually open-source clones (HBase for Bigtable, CockroachDB for Spanner) whose code you can read instead.
Where this leads next
- Polyglot persistence: picking the right DB per workload — chapter 182, the previous chapter; once you can read papers, you can pick systems based on their papers, not their marketing.
- Benchmarking honestly: TPC-C, TPC-H, YCSB, and lies — chapter 184, next chapter; the evaluation-section skill applied to your own benchmarks.
- Spanner: TrueTime and external consistency — the system this chapter used as the worked example.
- The 30-year arc and where databases go next — chapter 186; the culmination, which presupposes you can read the papers it cites.
References
- Srinivasan Keshav, How to Read a Paper (ACM SIGCOMM CCR, 2007) — the foundational essay on the three-pass method. Stanford PDF.
- James C. Corbett et al., Spanner: Google's Globally-Distributed Database (OSDI 2012) — the worked example used throughout this chapter. Google Research.
- Joseph M. Hellerstein, Michael Stonebraker, James Hamilton, Architecture of a Database System (Foundations and Trends in Databases, 2007) — read this before any other database paper. Berkeley PDF.
- Adrian Colyer, The Morning Paper (2014–2021) — seven years of one-paper-a-day summaries; the gold standard for paper writeups. blog.acolyer.org.
- Murat Demirbas, Metadata blog — paper reviews from a distributed-systems professor with strong opinions and a long memory. muratbuffalo.blogspot.com.
- Alex Petrov, Database Internals (O'Reilly, 2019) — book-length paraphrase of ~50 systems papers, cross-referenced. databass.dev.
- Spanner: TrueTime and external consistency — the padho-wiki chapter that walks the Spanner mechanism in detail.
- Polyglot persistence: picking the right DB per workload — the chapter immediately before this one in Build 24.