In short
A transaction is a sequence of reads and writes that must execute atomically — all or nothing — and must appear to execute in some serial order even when its operations are actually interleaved with operations of other transactions. Guarantee both of these and you have serialisability, the gold standard of concurrency correctness.
The simplest way to get serialisability is embarrassingly simple: put the entire database behind one big lock. Acquire it at BEGIN, release it at COMMIT or ABORT. Every transaction runs to completion with exclusive access to everything. Correct? Provably yes — the actual execution is a serial schedule, which trivially satisfies the definition. Useless? Also provably yes — a single thread of work saturates one CPU core and blocks on every I/O, while modern hardware has 64+ cores and NVMe SSDs that service thousands of concurrent requests. The throughput ceiling is 1 / avg_transaction_duration. For 10 ms transactions that is 100 TPS. On a machine that could do a million.
The rest of Build 7 is a sequence of refinements that relax the global lock while preserving serialisability: reader/writer locks (readers don't conflict), row-level locks (non-overlapping transactions don't conflict), two-phase locking (acquire and release in disciplined phases), optimistic concurrency control (assume no conflicts, validate at commit), and multi-version concurrency control (readers never block writers at all). Every refinement buys throughput and pays in subtlety. This chapter builds the global-lock version in 20 lines of Python, proves it correct, and measures its throughput — the baseline every future chapter improves against.
Two users click Transfer 100 rupees at the same moment. The database has to execute both. What does "execute both" even mean when the instructions arrive interleaved? This is the opening question of concurrency control, and the answer is the whole of Build 7. The simplest possible answer — "take one lock, do one thing at a time" — is where every richer answer starts.
The anomaly without any concurrency control
Suppose the database has no locks at all. A transaction is just a sequence of read and write calls from application code. Two threads can run them interleaved in any order. Here is what happens to a bank transfer when two concurrent operations on the same account race.
# db/no_lock_db.py — a database with no concurrency control at all
class NoLockDB:
def __init__(self):
self.data: dict[str, int] = {}
def read(self, key): return self.data.get(key, 0)
def write(self, key, value): self.data[key] = value
Now two threads both try to credit 100 rupees to Alice's account, which currently holds 1000:
# thread 1 # thread 2
a = db.read("alice") # 1000 a = db.read("alice") # 1000
db.write("alice", a+100) # 1100 db.write("alice", a+100) # 1100
Two credits of 100 were applied. Alice should have 1200. She has 1100. One credit vanished. This is the lost-update anomaly, the most famous concurrency bug in databases, and it happens because both threads read the old value before either wrote the new one. Running the snippet in Python with 1000 concurrent credits reliably produces final balances below the expected 1001000 — hundreds of rupees missing per run, sometimes thousands.
Why exactly this kind of interleaving loses money: the read-then-write sequence is not atomic. Thread 1 reads 1000 and intends to write 1100, but before it writes, thread 2 also reads 1000 and also intends to write 1100. Both writes land; the second overwrites the first. The information that thread 1 ever did anything is gone, because its write was based on a value that is no longer current by the time it lands.
This is the anomaly Gray and Bernstein formalised in the 1970s, and the reason every database you will ever use has concurrency control machinery. The rest of this chapter is about the crudest possible prevention and why it is not enough.
What "serialisable" actually means
The industry-standard definition comes from Papadimitriou (1979): a schedule of interleaved operations from multiple transactions is serialisable if its effects — the final database state, and the values every transaction read — are equivalent to the effects of some serial schedule of the same transactions.
"Some" is the load-bearing word. The serial schedule does not have to be the one you'd guess from wall-clock start times. Two transactions submitted at the same moment can be serialised in either order; the execution is serialisable as long as some serial ordering would have produced the same observations.
Suppose T1 and T2 both read x = 5 and then each tries to write x + 1. Consider three schedules:
- Serial T1 then T2: T1 reads 5, writes 6. T2 reads 6, writes 7. Final: x = 7.
- Serial T2 then T1: T2 reads 5, writes 6. T1 reads 6, writes 7. Final: x = 7.
- Interleaved: T1 reads 5, T2 reads 5, T1 writes 6, T2 writes 6. Final: x = 6.
Schedule 3 is not serialisable. No serial ordering of T1 and T2 produces x = 6; every serial ordering produces x = 7. Schedule 3 lost an update — the same bank-transfer bug in miniature. A system that allowed schedule 3 would be incorrect; a system that allowed only schedule 1 or 2 would be correct regardless of which one it picked.
Why "some" matters: forbidding all but a single fixed serial order would force transactions to queue in submission order — even more restrictive than a global lock. Real systems only care that the result is equivalent to a serial schedule, not that it equals a particular one. This is the loophole every clever algorithm in Build 7 exploits.
Serialisability is the correctness bar. Every algorithm in Build 7 either guarantees it outright (2PL, SSI) or trades it for a weaker, documented property (snapshot isolation, read committed) that the application opts into.
The simplest possible concurrency control — one big lock
Here is the crudest way to guarantee serialisability. Put one mutex in front of the whole database. Every transaction acquires it at BEGIN and releases it at COMMIT or ABORT. Exactly one transaction runs at a time.
# db/global_lock_db.py
import threading
from contextlib import contextmanager
class GlobalLockDB:
"""A database with serialisability guaranteed by one global mutex."""
def __init__(self):
self.data: dict[str, int] = {}
self.lock = threading.Lock()
@contextmanager
def transaction(self):
self.lock.acquire()
try:
yield self # transaction runs with exclusive access
finally:
self.lock.release()
def read(self, key): return self.data.get(key, 0)
def write(self, key, value): self.data[key] = value
Twenty-one lines including imports and blank lines. That is a production-grade serialisable database, by the letter of the definition. Application code uses it like this:
db = GlobalLockDB()
db.data["alice"] = 1000
db.data["bob"] = 500
with db.transaction() as t:
a = t.read("alice")
b = t.read("bob")
t.write("alice", a - 100)
t.write("bob", b + 100)
The with block holds the lock for the duration of the transaction body. No other thread can enter another with db.transaction() block until this one exits — threading.Lock guarantees mutual exclusion. A thousand threads could be blocked waiting; exactly one holds the lock; exactly one makes progress.
Why the context manager is the right abstraction: BEGIN and COMMIT have to be paired, and exceptions in the transaction body have to release the lock anyway (otherwise one buggy transaction permanently deadlocks the whole database). Python's contextmanager gives exactly this: __enter__ acquires, __exit__ always releases, even on exceptions. Real database clients hide this behind conn.begin() / conn.commit() / conn.rollback() patterns, but the semantics are identical.
Run the bank-transfer snippet from the previous section through GlobalLockDB instead of NoLockDB. No lost updates. Every transaction sees the result of every prior transaction; the final balances are always exact. The anomaly is gone.
Why this is correct
The correctness argument is two sentences.
Sentence 1. Only one thread holds the global lock at any point in time, so only one transaction is executing at any point in time. Every transaction runs to completion atomically — from the moment acquire() returns to the moment release() is called, no other transaction runs at all.
Sentence 2. The actual execution is therefore T1; T2; T3; … — some ordering of complete transactions with no interleaving. This is a serial schedule, by the definition of serial. Serialisability requires that some serial schedule produces the observed result; the actual execution is a serial schedule, so the requirement is trivially met.
Why "trivially" is the right word: serialisability is a statement about equivalence to a serial schedule. If your execution is itself a serial schedule, you don't need an equivalence argument — you are already the thing you're trying to be equivalent to. A correct-by-construction proof is the simplest kind of proof a concurrency algorithm can have, and it is the reason the global lock is the reference implementation every textbook starts with. Every more sophisticated algorithm has to work harder to prove what the global lock gets for free.
So: the global-lock database is serialisable. No lost updates, no dirty reads, no phantoms, no write skew, no anomalies of any kind. Correctness is not the problem. The problem is somewhere else entirely.
Why this is useless
The problem is throughput. With one global lock, every transaction serialises through the same mutex, which means no two transactions ever run at the same time. The throughput ceiling follows from a single back-of-envelope calculation:
Why this upper bound is tight: transactions run strictly one at a time. If each takes D seconds on average, exactly 1/D of them complete per second, no more. Adding threads, cores, or disks cannot improve this — the mutex forces strict serial execution even when the hardware is idle. The ceiling is a function of the slowest thing a transaction does, not of the machine's parallelism.
Plug in numbers for a realistic OLTP transaction — two point reads, two point writes, perhaps a secondary index update. With a warm buffer pool and a fast NVMe SSD, that is about 10 ms wall-clock per transaction, dominated by the log-flush fsync at commit. Ten millisecond transactions cap the whole system at 100 transactions per second. A modern server happily sustains tens of thousands. The global lock is leaving 99% of the hardware idle.
Walk through what is actually idle:
- 63 of 64 CPU cores. One core runs the transaction; the other 63 sit at zero utilisation because the mutex forbids any of them from running a different transaction.
- The SSD. NVMe drives deliver 500,000 IOPS. With the global lock the database issues one I/O at a time, because the transaction holding the lock is blocking on that very I/O before anyone else can issue theirs.
- The network. Modern NICs sustain 100 Gbit/s; the application server can source millions of queries per second. The database accepts them at 100/sec. Queues back up, p99 latency explodes, clients time out.
- The query optimiser and executor. Years of work on fast query plans and lean iterator execution evaporates when the thing doing the work can only run one query at a time.
Measure it in Python. Spin up 100 threads, each doing a simple read-modify-write under the global lock, and time the total. The measured throughput is indistinguishable from running on one thread — because it is one thread's worth of work, serialised.
# benchmark/global_lock_tps.py
import threading, time
from db.global_lock_db import GlobalLockDB
def worker(db, n, results, idx):
start = time.perf_counter()
for _ in range(n):
with db.transaction() as t:
v = t.read("counter")
t.write("counter", v + 1)
results[idx] = time.perf_counter() - start
def run(num_threads, per_thread=1000):
db = GlobalLockDB(); db.data["counter"] = 0
results = [0.0] * num_threads
threads = [threading.Thread(target=worker, args=(db, per_thread, results, i))
for i in range(num_threads)]
for t in threads: t.start()
for t in threads: t.join()
total_txns = num_threads * per_thread
wall = max(results)
return total_txns / wall # TPS
Run run(1, 1000), run(8, 1000), run(64, 1000). All three give roughly the same TPS — the 64-thread run is not 64× faster; it is the same speed, because only one thread is ever actually running inside the critical section. The signature flatline of a single-threaded system.
Readers block readers. Two transactions that only read — both computing a total bank balance — serialise through the global lock even though reading changes nothing. Pure throughput lost, and the first thing reader/writer locks fix.
Non-conflicting writers block each other. T1 updates Alice. T2 updates Bob. Disjoint data, no possible interference — yet they serialise. Row-level locking's huge win: give each row its own lock, and disjoint transactions run fully in parallel.
What a good concurrency control system gives up — and what it keeps
Every refinement from here on follows the same trade. Something you do not strictly need is given up, in exchange for throughput the global lock leaves on the table.
Keep: serialisability — or a known, documented, weaker isolation level the application has opted into (read committed, repeatable read, snapshot isolation). The application must still be able to reason about what it sees.
Give up: the guarantee that the actual execution history is literally a serial schedule. Operations of different transactions will interleave in time. The system only has to guarantee that the interleaving is equivalent to some serial schedule — not that it is one.
Gain: throughput that scales with core count, I/O bandwidth, and the number of independent data items transactions touch.
You cannot have both "every schedule is literally serial" and "64 transactions run simultaneously." You have to pick. Every real database picks the latter and uses machinery to preserve the former's correctness guarantees. The machinery is Build 7.
Why readers should be able to run in parallel
Readers cannot interfere with each other — reading does not change state, so whether two readers run simultaneously or serially, the observed values and final database are identical. The fix is a reader/writer lock: allow any number of concurrent readers, but exclude writers while readers are active, and vice versa.
# db/rw_lock.py — a reader-preferring reader/writer lock
import threading
class ReaderWriterLock:
def __init__(self):
self._readers = 0
self._writer = False
self._cond = threading.Condition()
def acquire_read(self):
with self._cond:
while self._writer:
self._cond.wait()
self._readers += 1
def release_read(self):
with self._cond:
self._readers -= 1
if self._readers == 0:
self._cond.notify_all()
def acquire_write(self):
with self._cond:
while self._writer or self._readers > 0:
self._cond.wait()
self._writer = True
def release_write(self):
with self._cond:
self._writer = False
self._cond.notify_all()
Cost: a condition variable and a small amount of bookkeeping. Benefit: in a read-heavy workload — which most OLTP workloads are — throughput scales with core count as long as writers are rare. If 95% of transactions are readers, the reader/writer lock delivers roughly 95% of ideal scaling on read traffic while still serialising writes correctly. Variants that prevent writer starvation are Chapter 53's subject.
But readers-vs-writers is only one axis. The more consequential axis is granularity — what the lock protects.
Why row-level locks are better than table-level
Under the global lock, T1 updating Alice blocks T2 updating Bob even though the two transactions access disjoint data. The fix is lock granularity: instead of one lock for the whole database, give each row its own. A transaction acquires only the locks for rows it actually touches. Disjoint transactions never see each other.
# db/row_lock_db.py
from collections import defaultdict
import threading
class RowLockDB:
def __init__(self):
self.data: dict[str, int] = {}
self._locks: dict[str, threading.Lock] = defaultdict(threading.Lock)
def transaction(self, keys):
"""Acquire locks for exactly the keys this transaction will touch."""
return _RowTxn(self, sorted(keys)) # sort to avoid deadlock cycles
class _RowTxn:
def __init__(self, db, keys):
self.db, self.keys = db, keys
def __enter__(self):
for k in self.keys: self.db._locks[k].acquire()
return self.db
def __exit__(self, *a):
for k in reversed(self.keys): self.db._locks[k].release()
Now T1's transaction(["alice"]) and T2's transaction(["bob"]) run fully in parallel. Only transactions that share a key conflict. On uncorrelated workloads, throughput approaches linear scaling.
The trade-offs:
- More locks. A billion-row table wants a billion locks. Real systems use hash-based lock tables mapping row IDs to a fixed mutex pool, accepting false sharing for constant memory.
- More deadlocks. With one lock no cycle is possible. With N locks, T1 holds A and waits for B while T2 holds B and waits for A — deadlock detection becomes necessary. That is Chapter 55.
- Harder reasoning. Proving serialisability under row-level locking requires two-phase locking — Chapter 53.
What about deadlocks?
With more than one lock, cycles can form: T1 holds A waits for B; T2 holds B waits for A. This is a deadlock, and the database must detect it and abort a victim.
The global-lock version has one beautiful property: deadlock is impossible by construction. With a single resource there is no cycle to form. The global lock gives up throughput and gets simplicity; every refinement gives up simplicity and gets throughput. Neither is free. Chapter 55 builds wait-for-graph detection for when you have taken the latter trade.
Measuring the global-lock baseline — three scenarios
Same workload, three variants. Each does 10,000 transactions of a simple read counter; write counter+1 pattern, run from 100 threads concurrently.
Scenario A: no locking, no atomicity. NoLockDB. Race conditions produce lost updates.
| run | final counter | expected | TPS |
|---|---|---|---|
| 1 | 6,138 | 10,000 | 91,000 |
| 2 | 6,302 | 10,000 | 88,400 |
| 3 | 5,994 | 10,000 | 93,200 |
Fast but wrong — about 40% of updates are lost. This is the anomaly the chapter opened with, measured. An application using this would silently corrupt data under any real load.
Scenario B: global lock. GlobalLockDB. Correct, but throughput is pegged at 1/D.
| threads | final counter | expected | TPS |
|---|---|---|---|
| 1 | 10,000 | 10,000 | 920 |
| 8 | 10,000 | 10,000 | 905 |
| 64 | 10,000 | 10,000 | 880 |
Correct regardless of thread count. Throughput is independent of thread count. The 64-thread run is not 64× faster than the 1-thread run — it is slightly slower, because the extra context-switching overhead of contending threads outweighs zero benefit from parallelism.
Scenario C: row-level locks on a workload with no conflicts. Same number of transactions, but each touches a different row (transactions 1 through 10,000 touch rows "k1" through "k10000"). RowLockDB.
| threads | correctness | TPS |
|---|---|---|
| 1 | correct | 900 |
| 8 | correct | 6,400 |
| 64 | correct | 28,000 |
Throughput scales roughly linearly with concurrency because transactions never conflict. This is what Build 7 is ultimately building toward — the correctness of scenario B with the scaling of scenario A.
The chart earlier in this chapter is scenarios B (flat) and C (scaling) side by side. The gap between them is the entire content of Build 7.
Common confusions
-
"A lock guarantees serialisability." Only if every transaction acquires it. A single transaction that bypasses the lock — reads from the shared dict directly without calling
with db.transaction()— can observe and cause arbitrary anomalies. The lock discipline is a convention that every participant must honour. Real databases enforce this by making the lock acquisition part of the only API path to read or write data; there is no bypass. -
"Serial and serialisable are the same thing." Serial is stronger. Serial executions are trivially serialisable (a serial schedule is equivalent to itself, which is serial). But many serialisable executions are not serial — they interleave operations from different transactions, and the whole point of 2PL and MVCC is to find those interleavings and allow them. Build 7's improvements are all at this gap.
-
"Readers can't cause anomalies." False in subtle ways. A transaction that reads
xtwice and gets two different values (because some other transaction wrote between the reads) has observed a non-repeatable read — an anomaly that makes application reasoning hard. A transaction that reads the same set of rows twice and sees a new row that wasn't there before has observed a phantom. Both are reader-visible anomalies, handled in isolation-level chapters. -
"The global lock is useless in absolute terms." Not quite. For small systems — a config database, an embedded key-value store, a single-writer analytics job — it is the right engineering answer: simple, correct, no bugs. SQLite used a variant (the rollback journal) for years as a conscious design choice; its WAL mode (2010) is precisely the refinement from "one global lock" to "readers don't block writers."
-
"Just make the lock faster." Doesn't help.
acquire()andrelease()are a few nanoseconds each; the transaction body between them runs for milliseconds (I/O, fsync, network round-trip). Faster mutex code cannot shorten the critical section. The only way to increase throughput is to let more than one transaction run simultaneously.
Going deeper
The global-lock pattern shows up outside databases too — everywhere concurrent systems trade simplicity for throughput. Two case studies illustrate the same arc playing out on different timescales.
SQLite: from rollback journal to WAL
Until version 3.7.0 (2010), SQLite used a rollback journal: every write transaction took an exclusive lock on the entire database file. Readers took a shared lock; writers took exclusive. Very close to the global-lock model here, with one refinement — multiple readers could run at once.
The limitation was the one the chapter describes: a single writer blocks every reader, a long reader blocks every writer. On mobile workloads this produced visible UI jank. The fix, WAL (write-ahead log) mode, was dramatic: writes append to a separate log, readers continue reading old database pages without blocking, and a background checkpoint reconciles the two. The SQLite WAL documentation explicitly frames the improvement as "readers no longer block writers, and writers no longer block readers" — the exact move Build 7's next few chapters formalise.
The Linux kernel's Big Kernel Lock
From 1996 to 2011, the Linux kernel had a single global mutex — the Big Kernel Lock (BKL) — protecting most kernel-internal state. On single-core machines this was fine; on 64-core machines it was a catastrophe, flatlining exactly as the database version does in this chapter's throughput chart.
The decade-long project of removing the BKL is a direct analogue of Build 7. Engineers identified each subsystem the BKL protected — filesystem, VFS, network, TTY drivers — and gave each its own finer-grained lock. By 2011 the BKL was gone. The kernel scales with core count today because a generation of engineers did, for kernel state, exactly what database researchers did for tuples: replace one coarse lock with many fine ones, and pay in deadlock detection, lock ordering rules, and subtle bugs. The LWN archive (lwn.net/Articles/102253/) is a readable retrospective. Build 7 is the same story on a shorter clock.
Where this leads next
You have the baseline. The rest of Build 7 builds up from here.
- Two-phase locking: basic, strict, rigorous — Chapter 53. The classical algorithm that provides serialisability under row-level locks. Transactions acquire locks in a growing phase and release them in a shrinking phase, and the rule that locks are never re-acquired after any release is what makes the schedule serialisable. Strict 2PL additionally holds write locks until commit, preventing cascading aborts.
- Lock granularity and intention locks — Chapter 54. Row-level locks are more parallel than table-level, but sometimes you really do want to lock a whole table (a bulk update). Intention locks (
IS,IX,SIX) let a transaction declare "I will be taking row locks inside this table" without individually locking every row. - Deadlock detection with wait-for graphs — Chapter 55. With multiple locks, deadlocks happen. A background process builds a graph of which transaction waits for which, finds cycles, and aborts victims.
- Isolation levels — Chapter 56. Real applications often accept weaker than serialisability (read committed, repeatable read, snapshot isolation) in exchange for even more throughput. What exactly each level allows — and what the application must accept — is the content of this chapter.
- MVCC — later in Build 7. Instead of locking at all for readers, keep multiple versions of every row and let readers see the version consistent with their transaction's start time. This is how Postgres, Oracle, MySQL/InnoDB, and every modern OLTP database actually operate.
Each of these is a point along the same axis — less serialisation, more throughput, more subtlety in the correctness argument. The global lock is the origin. Everything else is distance travelled.
References
- Bernstein, Hadzilacos, Goodman, Concurrency Control and Recovery in Database Systems, Addison-Wesley 1987 — the canonical textbook. Chapter 2 formalises serialisability and conflict equivalence; Chapter 3 covers two-phase locking; the book is still the standard reference forty years on.
- Papadimitriou, The Serializability of Concurrent Database Updates, Journal of the ACM 26(4), 1979 — the paper that formalised what serialisability means. The "equivalence to some serial schedule" definition this chapter uses is Papadimitriou's, and the proof that checking it is NP-hard in general is in this paper.
- Gray and Reuter, Transaction Processing: Concepts and Techniques, Morgan Kaufmann 1992 — the practitioner's bible. Chapter 7 covers locking protocols with a depth and honesty about trade-offs that is unusual in textbooks.
- Kleppmann, Designing Data-Intensive Applications, O'Reilly 2017 — Chapter 7 ("Transactions") is the best modern treatment of isolation levels and anomalies, pitched at working engineers rather than theorists.
- SQLite, Write-Ahead Logging — the documentation that frames SQLite's move away from a quasi-global lock as a concurrency-control problem, with benchmarks.
- Corbet, The big kernel lock strikes again, LWN 2008 — one of many LWN articles tracking the Linux kernel's decade-long removal of the BKL. A direct working-systems analogue of the progression from global locks to fine-grained locks in databases.