Isolation Levels and What ANSI Got Wrong

Note: Company names, engineers, incidents, numbers, and scaling scenarios in this article are hypothetical — even when they resemble real ones. See the full disclaimer.

In short

The ANSI SQL-92 standard defined four isolation levels as a clean ladder, each forbidding one more anomaly than the one below it. READ UNCOMMITTED forbids nothing. READ COMMITTED forbids dirty reads. REPEATABLE READ additionally forbids non-repeatable reads. SERIALIZABLE additionally forbids phantoms. The intent was a tidy vocabulary for trading correctness against throughput. The reality is a mess.

Three problems, all named by Berenson, Bernstein, Gray, Melton, O'Neil, and O'Neil in 1995 in "A Critique of ANSI SQL Isolation Levels":

The ANSI definitions are stated in terms of execution histories ("dirty read is when T1 reads a value T2 wrote but didn't commit"), not in terms of the property being prevented. The phrasing is ambiguous — different strict readings admit different histories.
Write skew is not in the ANSI vocabulary at all. A history with write skew is not a dirty read, not a non-repeatable read, and not a phantom at the SQL level. By ANSI's literal text, an implementation that allows write skew still qualifies as SERIALIZABLE. That implementation is not serialisable.
Snapshot isolation — what Oracle and Postgres historically shipped as their strongest level — has no ANSI name. Under the ANSI definition it has the same anomaly profile as SERIALIZABLE (forbids dirty reads, non-repeatable reads, phantoms) yet genuinely allows non-serialisable histories.

In production, pick the highest level you can afford. Postgres's SERIALIZABLE (Serialisable Snapshot Isolation, Cahill 2008) costs roughly 10-20% throughput over Snapshot Isolation but eliminates all anomalies, including write skew. REPEATABLE READ / SI is clean for read-heavy work but leaks write skew — you have to catch it with SELECT ... FOR UPDATE, unique constraints, or advisory locks. READ COMMITTED is the default almost everywhere and leaks lost updates that your application code has to guard against. READ UNCOMMITTED should not be used for anything correctness-sensitive.

The SERIALIZABLE paradox

You set your MySQL connection to SET TRANSACTION ISOLATION LEVEL SERIALIZABLE. You write two transactions that both check a shared invariant — "at least one doctor is on call" — then update different rows. You run them concurrently. You hit write skew: both transactions pass the check on the old snapshot, both update, the invariant is violated, and your emergency room has zero doctors on duty.

SERIALIZABLE, the highest level. The name says serialisable. Why are you seeing an anomaly that by definition cannot occur under a serial schedule?

Because the name does not match the guarantee. MySQL InnoDB's SERIALIZABLE uses gap locks on indexes — machinery designed for phantoms, not write skew; when the two transactions update disjoint rows the gap-lock mechanism may not interpose. Oracle's SERIALIZABLE was historically snapshot isolation — genuinely weaker than serialisable while proudly using the word. Postgres is the one major engine whose SERIALIZABLE actually is serialisable, and that took the field until 2008 to figure out how to do cheaply. This chapter is about why every engine labels levels differently, why the ANSI standard cannot help you distinguish them, and how to pick a level in production.

The ANSI SQL-92 definition

The standard (ISO/IEC 9075:1992) defines four levels by which anomalies each forbids. It names three anomalies:

Dirty read (P1). T1 reads a value T2 has written but not yet committed. If T2 then aborts, T1 has seen a value that "never existed."
Non-repeatable read (P2). T1 reads row r, then later re-reads r and sees a different value (because T2 updated r and committed in between).
Phantom (P3). T1 runs a predicate read — SELECT * FROM orders WHERE amount > 1000 — and gets n rows. Later in the same transaction it re-runs the same predicate and gets n+1 rows, because T2 inserted a new matching row and committed.

The four levels are stated exactly as this table:

Level	Dirty read (P1)	Non-repeatable read (P2)	Phantom (P3)
READ UNCOMMITTED	possible	possible	possible
READ COMMITTED	not possible	possible	possible
REPEATABLE READ	not possible	not possible	possible
SERIALIZABLE	not possible	not possible	not possible

That is the whole standard's statement of isolation. Four rows, three columns, twelve cells. No mention of how a level is implemented, no mention of write skew, no formal model of what "an execution" even is — the spec uses phrases like "SQL-transaction T2 is not permitted to read [data items] that SQL-transaction T1 has modified" without ever defining "modified" against a precise concurrency model.

Why ANSI phrased levels as anomaly forbiddances: the intent was implementation-neutral — a database could use 2PL, MVCC, optimistic, or magic, so long as the observable histories satisfied the rules. The intent is right; the execution is wrong. By listing only three anomalies, the standard ends up both too loose (SERIALIZABLE admits write skew) and too tight (the strictest reading of P2 rules out legitimate MVCC histories that are genuinely serialisable).

The intuition behind the ladder is clean — each level drops one more anomaly for less locking — and it survives in textbooks. It does not survive contact with production.

What ANSI missed — write skew

Take the two-doctor on-call example from chapter 56. A hospital has a rule: at all times, at least one doctor must be on call. The doctors table has a boolean on_call column. Two doctors, Alice and Bob, are currently on call. Each wants to check out for the day.

-- T1 (Alice's request)                  -- T2 (Bob's request)
BEGIN;                                   BEGIN;
SELECT count(*) FROM doctors             SELECT count(*) FROM doctors
  WHERE on_call = true;                    WHERE on_call = true;
-- T1 sees 2. OK to clock out.           -- T2 sees 2. OK to clock out.
UPDATE doctors SET on_call = false       UPDATE doctors SET on_call = false
  WHERE name = 'Alice';                    WHERE name = 'Bob';
COMMIT;                                  COMMIT;
-- T1 commits.                           -- T2 commits.

Both transactions read a snapshot in which the count was 2, both passed the check, both updated a different row, both committed. The final state has zero doctors on call. The hospital's invariant is violated.

Inspect this history against the ANSI definitions. Dirty read? No — neither transaction read an uncommitted value. Non-repeatable read? No — neither transaction re-read a row and saw a different value; each read each row exactly once. Phantom? No — neither re-ran a predicate and saw a new row.

Per ANSI, no anomaly occurred. A SERIALIZABLE implementation is permitted to produce this history. Yet the history is plainly not serialisable: no serial ordering would let both updates commit, because the second transaction to run would see count(*) = 1 and refuse to clock its doctor out.

This gap — non-serialisable histories the ANSI definitions miss — is write skew. Two transactions read overlapping data, make decisions based on what they read, and write disjoint pieces of state. The writes do not collide. The reads do not conflict. But the semantic invariant is silently violated because neither transaction saw the other's write. Write skew is the canonical shape of correctness bugs in payment reconciliation, inventory systems, scheduling, permissions — and until 1995 the standard did not acknowledge it existed.

What ANSI missed — snapshot isolation

The second gap is at the level of implementation classes. Throughout the late 1980s and 1990s, Oracle shipped a level called SERIALIZABLE that was actually snapshot isolation (SI). Postgres did the same from 7.0 through 9.0. The mechanism:

At transaction start, the system records a timestamp. Every read in that transaction sees the database as-of that timestamp, regardless of what other transactions commit in the meantime.
Writes go into a staging area (MVCC version chain, undo log, whatever).
At commit, the system checks that no other committed transaction has written to the same rows since your start timestamp. This is first-committer-wins: if T1 and T2 both write row r and T1 commits first, T2's commit aborts with a serialisation failure.

SI has excellent properties. It prevents dirty reads, non-repeatable reads, and SQL-level phantoms (the snapshot's predicate result is stable). Readers never block writers, writers never block readers. Throughput on read-heavy workloads is excellent.

But SI is not serialisable. It allows write skew — exactly the doctor-on-call anomaly above. T1's snapshot shows two doctors on call; T2's snapshot shows two doctors on call; T1 writes Alice's row; T2 writes Bob's row; neither wrote the same row the other wrote, first-committer-wins does not trigger; both commit; invariant violated.

Now ask: which ANSI level does SI correspond to? It forbids all three ANSI anomalies, so by the ANSI table it qualifies as SERIALIZABLE — yet it is not serialisable. The ANSI definitions cannot distinguish snapshot isolation from true serialisability. Both qualify as "SERIALIZABLE" under the standard; only one actually is.

Why this is a catastrophic standards failure and not a pedantic quibble: a specification whose vocabulary cannot distinguish between two levels admitting materially different histories is not a specification. Two implementers honestly reading ANSI SQL-92 can both ship "SERIALIZABLE" products that allow different anomalies. A portable application developer has no way to know which guarantees to code against. Oracle filled the gap by naming SI "SERIALIZABLE"; Postgres filled it by naming SI "REPEATABLE READ". Both are technically compliant with a broken standard. Neither is wrong. Both are confusing.

Berenson et al. 1995 — the critique

In 1995, six researchers — Hal Berenson and Phil Bernstein at Vistron, Jim Gray at DEC, Jim Melton at Sybase, and Elizabeth and Patrick O'Neil at UMass Boston — published "A Critique of ANSI SQL Isolation Levels" at SIGMOD. The paper is 12 pages and devastating. Three key moves:

1. Reformulate anomalies via dependency graphs. Rather than define anomalies by execution patterns ("T1 reads T2's uncommitted write"), express them as the presence of certain edge types in the dependency graph of a history. A dependency graph has one node per committed transaction and three edge types:

ww (write-write): T1 → T2 if T2 overwrote a value T1 had written.
wr (write-read): T1 → T2 if T2 read a value T1 had written.
rw (read-write / antidependency): T1 → T2 if T1 read a value that T2 subsequently overwrote.

A history is serialisable if and only if the dependency graph is acyclic (Bernstein, Hadzilacos, Goodman 1987, ch. 2). The paper reframes every anomaly as a particular cycle pattern.

2. Introduce names for the anomalies ANSI missed. The paper formalises A5A — read skew (T1 reads x, T2 writes x and y and commits, T1 reads y — T1 sees x and y from different points in time) and A5B — write skew (the doctor-on-call anomaly, an rw-rw cycle in the dependency graph). Write skew is not a rare edge case; it is the canonical bug in any application enforcing a cross-row invariant via application-level checks.

3. Propose a cleaner hierarchy. Rather than the ANSI ladder, the paper uses Gray's 1975 isolation degrees — degree 0 (no isolation), degree 1 (READ COMMITTED), degree 2 (a cluster containing Cursor Stability, REPEATABLE READ, and Snapshot Isolation — which differ in which anomalies among phantoms, lost updates, and write skew they admit), degree 3 (genuinely serialisable). Each degree-2 level gets its own anomaly profile.

The takeaway: the ANSI definitions are simultaneously too strong and too weak. Too strong because they forbid patterns a real MVCC implementation can safely allow without breaking serialisability. Too weak because they fail to forbid write skew, which a genuinely serialisable implementation must forbid. The standard is wrong on both sides of the boundary. This paper is why every modern database docs page on isolation is a three-page essay that apologises for the SQL standard before explaining what the engine actually does.

What real databases implement

Four major engines, three level names, at least seven distinct actual behaviours. The cross-engine table is the most useful page you can memorise in concurrency-control:

Engine	READ COMMITTED	REPEATABLE READ	SERIALIZABLE
PostgreSQL	RC, MVCC snapshot per-statement	Snapshot Isolation (one snapshot per tx)	SSI (Cahill 2008) — true serialisable
MySQL InnoDB	RC, MVCC snapshot per-statement	RR = repeatable-read + gap locks — blocks phantoms via index	Strict 2PL — all reads take S locks, no MVCC
Oracle	RC, MVCC snapshot per-statement	(no distinct level — maps to RC)	Snapshot Isolation (labelled "serialisable")
SQL Server	RC, blocking by default; RCSI variant is MVCC	RR via S-lock-until-commit + range locks	Strict 2PL

Every engine's READ COMMITTED is honest — prevents dirty reads, allows everything else. Every engine's REPEATABLE READ is different: Postgres is SI (allows write skew); InnoDB adds gap locks on top (blocks phantoms at index scan but allows multi-row write skew); SQL Server holds shared locks until commit (blocks a lot); Oracle doesn't really have a distinct RR. Every engine's SERIALIZABLE is different too: Postgres is SSI (optimistic, true serialisable); InnoDB and SQL Server are strict 2PL (pessimistic, true serialisable); Oracle is SI mislabelled (allows write skew).

Move an application between engines and its concurrency-correctness behaviour changes at the SAME level name. Oracle's SERIALIZABLE lets write-skew bugs through that Postgres's SERIALIZABLE catches. Postgres's SERIALIZABLE produces serialisation-failure retries that Oracle's silently commits as inconsistent. MySQL's SERIALIZABLE blocks where Postgres's aborts-and-retries.

Why no engine adopted uniform naming after 1995: backward compatibility. Oracle had shipped SI-as-SERIALIZABLE for a decade; renaming it would have broken every application. Postgres, adding real SSI in 9.1 (2011), upgraded SERIALIZABLE to actually be serialisable and renamed the old SI behaviour to REPEATABLE READ. Right call, but the level names still don't mean the same thing as Oracle's. There is no industry consensus and there won't be — too many applications pinned to too many vendor-specific behaviours.

Never reason about isolation levels by name across engines. Look up, per engine, what the level actually prevents and how.

Serialisable Snapshot Isolation (SSI)

Serialisable Snapshot Isolation (Cahill, Röhm, Fekete 2008) made genuine SERIALIZABLE affordable on MVCC engines. Postgres adopted it in 9.1 (2011); it is the only major MVCC engine to ship true serialisability without the cost of classical 2PL.

The key observation (from Adya 2000 and Fekete et al. 2005): SI allows write skew precisely when the dependency graph contains two consecutive rw-antidependencies between two or three transactions. In the doctor-on-call example, T1 reads both rows then writes Alice; T2 reads both rows then writes Bob; T1's read of Bob conflicts with T2's write of Bob (rw edge T1 → T2), T2's read of Alice conflicts with T1's write of Alice (rw edge T2 → T1) — a cycle.

SSI tracks rw-antidependencies at runtime. When it detects a dangerous structure — two consecutive rw edges meeting at a pivot transaction — it aborts one of them. Serialisable histories commit without abort; non-serialisable histories are aborted before commit. Overhead is ~10-20% over plain SI. The core tracking idea in a Python sketch:

# concurrency/ssi_sketch.py — the rw-edge tracker at the heart of SSI
class SSITransaction:
    """Track rw-antidependencies in and out of this transaction."""

    def __init__(self, tx_id):
        self.tx_id = tx_id
        self.in_conflict  = False        # some committed tx has rw -> me
        self.out_conflict = False        # I have rw -> some concurrent tx
        self.reads: set[str]  = set()    # keys I have read
        self.writes: set[str] = set()    # keys I have written

    def on_read(self, key: str, active_writers: "dict[str, SSITransaction]"):
        self.reads.add(key)
        # anyone who later writes `key` creates an rw edge from me to them

    def on_write(self, key: str, active_readers: "list[SSITransaction]"):
        self.writes.add(key)
        for r in active_readers:
            if r.tx_id != self.tx_id and key in r.reads:
                r.out_conflict = True            # r had rw -> me
                self.in_conflict = True          # I am target of rw from r
                # dangerous structure: someone with both in and out rw edges
                if self.in_conflict and self.out_conflict:
                    raise SerializationFailure(self.tx_id)

Real SSI is more subtle — it tracks SIREAD locks (lightweight predicate locks surviving past commit), handles read-only optimisations, and bounds memory by summarising committed transactions. See Postgres's predicate.c for the ~4000-line production version. Before 2011, "just turn on SERIALIZABLE" meant accepting a 5-10× throughput drop from 2PL. After SSI, on Postgres, it means ~15% overhead and a retry loop.

Choosing an isolation level in practice

A decision procedure you can apply, in decreasing order of preference:

SERIALIZABLE. The only level that lets you write sequential-looking application code and trust it. Every invariant — "at least one doctor on call", "no two bookings overlap", "balance never goes negative" — you express as SQL inside a transaction, and the database either commits it correctly or aborts with a serialisation failure. You handle the retry (below); you accept ~10-20% throughput overhead in Postgres or more aggressive blocking in MySQL/SQL Server. For finance, scheduling, inventory, permissions — anything with invariants that matter — this is the right default.

REPEATABLE READ / Snapshot Isolation. Clean for read-heavy workloads. Transactions see a consistent snapshot for their entire lifetime, so multi-statement reports produce consistent numbers without blocking writers. But you must explicitly handle write skew: every cross-row invariant needs SELECT ... FOR UPDATE over the rows that matter, a unique constraint that makes the skew pattern impossible, or an advisory lock around the critical section. If you know your workload has no cross-row invariants, SI is correct and fast.

READ COMMITTED. The default almost everywhere. Each statement sees a committed snapshot. Non-repeatable reads, phantoms, and lost updates all happen unless you use SELECT ... FOR UPDATE or atomic update expressions (UPDATE … SET balance = balance + 100). Handle concurrency explicitly at each read-then-write and RC is fine. Forget once and you have a silent lost-update bug.

READ UNCOMMITTED. Do not use this for correctness-sensitive work. If you need analytics without blocking writers, use a read replica or an explicit snapshot; don't tune the level.

A sharper rule: if you find yourself writing explicit SELECT ... FOR UPDATE on every read, you have built serialisability by hand on top of READ COMMITTED at the cost of more code and probably more bugs than just turning on SERIALIZABLE.

The retry pattern you need for SERIALIZABLE

SERIALIZABLE means aborts. When two transactions form a dangerous cycle, one gets a serialisation failure and must retry. In Postgres this is SQLSTATE 40001 (serialization_failure) or 40P01 (deadlock_detected). Your application code must recognise these and retry with jittered backoff.

# db/retry.py — the retry wrapper for SERIALIZABLE Postgres
import random, time, psycopg2

SERIALIZATION_FAILURES = {"40001", "40P01"}        # sqlstate codes

def run_serializable(conn, body, max_attempts=8, base_delay=0.005):
    """Run `body(cursor)` inside a SERIALIZABLE transaction with retries."""
    for attempt in range(max_attempts):
        try:
            with conn:                             # commits or rolls back
                conn.set_session(isolation_level="SERIALIZABLE")
                with conn.cursor() as cur:
                    return body(cur)
        except psycopg2.errors.SerializationFailure as e:
            if attempt == max_attempts - 1:
                raise
            time.sleep(base_delay * (2 ** attempt) * (0.5 + random.random()))
    raise RuntimeError("retry loop exhausted")     # unreachable

Write this once and import it everywhere you open a serialisable transaction. The exponential+jittered backoff (base 5 ms, doubling, randomised 50-150%) mirrors TCP and distributed-systems client libraries. The retry body must be idempotent under retry — no external side effects (HTTP calls, queue publishes) inside the transaction body, because they will fire on every retry. Do side effects after commit, or use outbox patterns.

Write skew at every level in Postgres

The doctor-on-call anomaly run three times with identical setup, changing only the isolation level. Postgres 16. Two concurrent sessions.

Setup:

CREATE TABLE doctors (name text PRIMARY KEY, on_call boolean);
INSERT INTO doctors VALUES ('Alice', true), ('Bob', true);

Each run, both sessions execute simultaneously:

BEGIN;
SELECT count(*) FROM doctors WHERE on_call = true;
-- application checks: count >= 1, so OK to update
UPDATE doctors SET on_call = false WHERE name = <me>;
COMMIT;

Run 1: SET TRANSACTION ISOLATION LEVEL READ COMMITTED. Both succeed. Final state: 0 doctors on call. Invariant violated. RC's per-statement snapshot does not protect a cross-statement invariant.

Run 2: SET TRANSACTION ISOLATION LEVEL REPEATABLE READ. (Postgres's SI.) Both succeed. Final state: 0 doctors on call. Invariant violated. Each transaction's own snapshot is internally consistent, but neither sees the other's concurrent write. This is write skew — the exact anomaly snapshot isolation cannot prevent.

Run 3: SET TRANSACTION ISOLATION LEVEL SERIALIZABLE. (Postgres's SSI.) First session commits. Second session aborts with ERROR: could not serialize access due to read/write dependencies among transactions. Final state: 1 doctor on call (Alice or Bob — depending on scheduling). Invariant preserved. The second session retries, sees the updated count of 1, and the application logic correctly refuses to clock out the remaining on-call doctor.

SSI detected this via a dangerous structure in the dependency graph:

The dependency cycle that SSI detects in the doctor-on-call scenario. Two rw-antidependency edges between T1 and T2 form a dangerous structure; SSI aborts whichever transaction closes the cycle at commit time. Plain snapshot isolation cannot see this cycle because it only tracks ww conflicts (first-committer-wins), not rw antidependencies.

The same application code, unchanged, produced a data-corruption bug at RC and RR and a clean serialisation failure plus retry at SERIALIZABLE. If your invariant matters, the cost of SERIALIZABLE is the cheapest insurance you can buy.

Common confusions

"SERIALIZABLE everywhere means the same thing." It does not. Postgres SERIALIZABLE is SSI (MVCC, optimistic). Oracle SERIALIZABLE is snapshot isolation (allows write skew, despite the name). MySQL InnoDB and SQL Server SERIALIZABLE are strict 2PL. Same word, four different guarantees and performance profiles.
"Higher isolation is always safer." Correct for correctness; not for operational behaviour. SERIALIZABLE has aborts-and-retries that RC does not, so p99 latency can increase under contention even as correctness improves. The decision is correctness vs tail-latency, not just correctness vs throughput.
"Snapshot isolation is broken." It is not. SI is a useful, well-defined, not-serialisable level. Oracle built an empire on it. The caveat is not "don't use SI" but "understand write skew and protect cross-row invariants explicitly".
"The ANSI standard is authoritative." For formal verification against ANSI text, yes. For actual database behaviour, no. Every major engine diverges deliberately. Treat the standard as a historical starting point and engine documentation as the truth.
"Just use SELECT ... FOR UPDATE everywhere." This promotes RC to approximate serialisability by explicit row locks. It breaks as soon as your invariant spans rows FOR UPDATE cannot cover — empty predicate results (no rows to lock), secondary indexes, range predicates. At that point you have hand-rolled 2PL on top of RC. Just use SERIALIZABLE.

Going deeper

The three directions that turn the 1995 critique into 2020s best practice.

Adya's 2000 thesis — the definitive reformulation

Adya's MIT PhD thesis, Weak Consistency: A Generalized Theory and Optimistic Implementations for Distributed Transactions (2000), is the definitive treatment after Berenson. Adya defines isolation levels via phenomena (patterns in dependency graphs over version orders) rather than via anomalies (which are implementation-oriented and vague). This gives a precise, implementation-independent specification that handles MVCC cleanly — the framework Fekete et al. 2005 used to prove snapshot isolation + certain application constraints is serialisable, leading directly to Cahill's 2008 SSI. For formal verification, use Adya's definitions, not ANSI's.

`SELECT ... FOR UPDATE` as the hand-rolled escape hatch

Every engine exposes explicit row-level locks inside a transaction (SELECT ... FOR UPDATE on Postgres/MySQL/Oracle, WITH (UPDLOCK) on SQL Server). Reading the rows acquires a write lock on each, held until commit. This promotes RC to behave roughly like RR or SERIALIZABLE on those rows. Useful for lost updates and, in limited cases, write skew — but FOR UPDATE can only lock rows that exist (empty predicate results lock nothing, which is how the doctor-on-call invariant escapes when a doctor might be deleted concurrently), doesn't lock index ranges, and adds lock contention that can cascade into deadlocks. Not a substitute for the right isolation level.

Materialise invariants as database constraints

The best write-skew fix is to let the database enforce the invariant directly. UNIQUE prevents two rows sharing a key; CHECK enforces row-local invariants; Postgres's EXCLUSION constraints prevent two rows from overlapping on a custom predicate. The booking-overlap invariant is cleanly expressed as EXCLUDE USING gist (room WITH =, period WITH &&) — any two rows with overlapping periods for the same room are rejected. Every invariant you move from application code into the schema is a class of write-skew bugs your application can no longer produce, regardless of isolation level.

Where this leads next

This chapter closes Build 7 on concurrency control proper. You now have the conceptual toolkit — locking, MVCC, isolation levels, the specific failures each level permits, and the patterns to compensate. Build 8 moves from "how do transactions see each other while running" to "what survives when the machine dies mid-transaction":

The write-ahead rule — the single invariant that makes durable commits possible: log records hit disk before the data pages they describe.
ARIES (analysis, redo, undo) — the recovery algorithm every serious relational engine uses, in three passes.
Group commit — how real engines achieve thousands of commits per second despite fsync taking milliseconds.

Isolation is about what transactions see of each other while they run. Recovery is about what survives when they stop. Both are parts of the same story: the database's promises to the application must hold against concurrent peers, flaky clients, and its own hardware.

References

Berenson, Bernstein, Gray, Melton, O'Neil, O'Neil, A Critique of ANSI SQL Isolation Levels, SIGMOD 1995 — the paper that broke the ANSI definitions. Introduces the dependency-graph framework, names write skew and read skew formally, and shows that the ANSI levels admit non-serialisable histories. Free PDF from Vistron Research. Twelve pages; required reading for anyone who writes SQL against a real database.
Adya, Weak Consistency: A Generalized Theory and Optimistic Implementations for Distributed Transactions, MIT PhD thesis 2000 — the definitive re-formulation of isolation levels using phenomena on dependency graphs. Precise, implementation-independent, the specification the ANSI standard should have been.
Cahill, Röhm, Fekete, Serializable Isolation for Snapshot Databases, SIGMOD 2008 — the SSI algorithm, ~10-20% overhead to upgrade snapshot isolation to true serialisability. Postgres implemented this in 9.1 (2011); the paper is the definitive reference, with proofs and benchmarks.
PostgreSQL documentation, Serializable Snapshot Isolation and Transaction Isolation — the practitioner-facing writeup of how Postgres implements each level. Includes the rw-edge tracking, retry guidance, and the specific patterns (read-only optimisation, deferrable transactions) unique to Postgres's SSI.
ISO/IEC 9075:1992, Database Language SQL, Section 4.28 "SQL-transactions" — the original ANSI SQL-92 text defining the four isolation levels. Available (behind a paywall) at ISO; the relevant section is reproduced in Berenson et al. 1995. Read it to see for yourself how little the standard actually specifies.
Kleppmann, Designing Data-Intensive Applications, O'Reilly 2017, chapter 7 — the best modern pitched-at-engineers treatment of isolation levels and the ANSI failure. Covers the same ground as this chapter with additional production examples (two-phase commit, distributed transactions, the read-skew case-study) and is where many working engineers first learn that the level names mean different things across engines.