In short

The concurrency anomalies in every textbook — dirty read, lost update, non-repeatable read, phantom, write skew — are not abstract theory. They are reproducible bugs you can trigger on any real DBMS inside thirty seconds, using two Python threads and a local Postgres or SQLite file. This chapter walks through each named anomaly in order of severity, gives the minimal SQL and Python sequence that reproduces it, and names the isolation level that blocks it.

The six classical anomalies:

  1. Dirty read — a transaction reads another transaction's uncommitted write. If that write is later rolled back, the reader has consumed a value that never existed.
  2. Lost update — two transactions read the same row, each computes a new value from what it read, each writes back. The second write overwrites the first; the first update is silently lost.
  3. Non-repeatable read — a transaction reads a row, another transaction updates and commits, the first re-reads the same row and gets a different value. The read was not repeatable.
  4. Phantom read — a transaction runs a predicate query (WHERE country='IN'), another transaction inserts a matching row and commits, the first re-runs the query and sees an extra row. The row set is not stable.
  5. Write skew — two transactions both read overlapping data and both write, based on what they read, in a way that violates an invariant neither alone would have violated. The classic anomaly of snapshot isolation.
  6. Read-only serialisation anomaly — Fekete, Liarokapis, O'Neil, O'Neil, Shasha 2005. A read-only transaction can, under snapshot isolation, see a database state that no serial order would have produced, even when every concurrent writing transaction was correct.

Each anomaly is a different kind of interleaving. Each is blocked by serialisable isolation; weaker levels leak various subsets. By the end of this chapter you will have seen every one of them happen in a Python terminal.

The unusual thing about concurrency anomalies is how embarrassingly easy they are to reproduce. You do not need a distributed system, GPU-accelerated contention, or a custom workload generator. You need two Python threads, a local database, and the willingness to type BEGIN in two terminals. Every anomaly in this chapter fits in fifteen lines of SQL and produces its nasty behaviour the first time you run it.

The point of doing this is not pedagogical decoration. It is that reading a list of anomaly definitions leaves them as words. Running them leaves them as experiences. Once you have seen a transaction commit based on a value that later disappeared, you stop treating "isolation level" as a knob and start treating it as a property of the program.

The prior chapter on two-phase locking showed what you get when the database serialises for you — conflict-free schedules, provable equivalence to some serial order. This chapter shows what you get without that, at each weaker isolation level the standard offers, by reproducing every anomaly the relaxations admit.

Setup — a two-thread Python harness against Postgres

Everything in this chapter uses the same harness. A single function opens a new connection at a chosen isolation level; two threads run two SQL sequences; a barrier synchronises the interleaving. The scaffolding is identical across anomalies — only the SQL changes.

# anomalies/harness.py
import psycopg2
import threading
from psycopg2.extensions import (
    ISOLATION_LEVEL_READ_UNCOMMITTED, ISOLATION_LEVEL_READ_COMMITTED,
    ISOLATION_LEVEL_REPEATABLE_READ, ISOLATION_LEVEL_SERIALIZABLE)

LEVELS = {
    "RU": ISOLATION_LEVEL_READ_UNCOMMITTED,
    "RC": ISOLATION_LEVEL_READ_COMMITTED,
    "RR": ISOLATION_LEVEL_REPEATABLE_READ,
    "SS": ISOLATION_LEVEL_SERIALIZABLE,
}

def new_tx(level="RR"):
    conn = psycopg2.connect("dbname=anomaly user=you host=localhost")
    conn.set_isolation_level(LEVELS[level])
    return conn, conn.cursor()

def run_two(t1_fn, t2_fn, level="RR"):
    barrier = threading.Barrier(2)
    th1 = threading.Thread(target=t1_fn, args=(level, barrier))
    th2 = threading.Thread(target=t2_fn, args=(level, barrier))
    th1.start(); th2.start(); th1.join(); th2.join()

The barrier is the whole trick. Without it the two threads race and you get a different interleaving on every run. With a barrier, both threads reach a known point at the same instant, and from there the SQL order is deterministic — which is what you need to reliably reproduce an anomaly instead of "sometimes seeing it".

Switch isolation levels by changing the string passed into new_tx. Postgres supports all four SQL-standard levels, though its READ UNCOMMITTED silently behaves as READ COMMITTED — there is literally no weaker mode available. If you prefer SQLite, use sqlite3.connect(..., isolation_level=None) and issue explicit BEGIN/COMMIT with WAL mode enabled (PRAGMA journal_mode=WAL), but be aware that SQLite's locking model serialises writers globally, so write-skew is harder to demonstrate there. The demos below target Postgres.

Populate a small schema to work against:

CREATE TABLE account (id INT PRIMARY KEY, balance INT);
INSERT INTO account VALUES (1, 100), (2, 100);
CREATE TABLE users (id INT PRIMARY KEY, name TEXT, country TEXT);
INSERT INTO users VALUES (5, 'Alice', 'IN'), (6, 'Bob', 'IN');
CREATE TABLE doctor (name TEXT PRIMARY KEY, on_call BOOLEAN);
INSERT INTO doctor VALUES ('Alice', TRUE), ('Bob', TRUE);

Nine rows across three tables. Enough to produce every anomaly in the standard.

Anomaly 1 — Dirty Read

A dirty read is when one transaction reads a value that another transaction has written but not yet committed. If the writer then rolls back, the reader has consumed a value that never existed in any committed state of the database.

The canonical demo: T1 writes, T2 reads before T1 commits, T1 rolls back.

T1: BEGIN; UPDATE account SET balance = 0 WHERE id = 1;    -- written, not committed
T2:                                 BEGIN; SELECT balance FROM account WHERE id = 1;  -- sees 0
T1: ROLLBACK;                                              -- 0 never happened
T2:                                 -- but already saw 0 and acted on it

Python harness:

# anomalies/dirty_read.py
from harness import new_tx, run_two

def t1(level, barrier):
    conn, cur = new_tx(level)
    cur.execute("UPDATE account SET balance = 0 WHERE id = 1")
    barrier.wait()            # T2 now reads the uncommitted 0
    barrier.wait()            # T2 is done reading
    conn.rollback()
    print("T1 rolled back. Real balance unchanged.")

def t2(level, barrier):
    conn, cur = new_tx(level)
    barrier.wait()
    cur.execute("SELECT balance FROM account WHERE id = 1")
    val = cur.fetchone()[0]
    print(f"T2 saw balance = {val}")
    barrier.wait()
    conn.commit()

run_two(t1, t2, level="RU")

Run this against a database that actually honours READ UNCOMMITTED — SQL Server with SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED, or MySQL InnoDB with the same — and T2 prints balance = 0, even though the committed balance never changed from 100. T2 has acted on a value from an alternate history.

Run the exact same code against Postgres, however, and T2 prints balance = 100. Why: Postgres's minimum isolation level is READ COMMITTED; the READ UNCOMMITTED constant silently gets promoted. This is a deliberate design choice going back to the original MVCC implementation — there is no way to read uncommitted data in Postgres. Dirty reads are impossible there by construction, regardless of what level you ask for.

So dirty read is one of two anomalies that require specific engines to demonstrate. The SQL standard lists it as the worst anomaly, but the most-used open-source OLTP database cannot produce it. This is a historical oddity worth internalising: isolation levels are minimums, not exacts, and engines may strengthen them silently.

Upgrading to READ COMMITTED prevents dirty reads universally. T2's SELECT blocks until T1 commits or aborts, then reads the committed value. Every mainstream database blocks this anomaly at READ COMMITTED or stronger.

Anomaly 2 — Lost Update

The bank-transfer classic, and the anomaly that motivated the whole field. Two transactions read the same row, each computes a new value based on what it read, and each writes back. The second write overwrites the first's update without incorporating it.

T1: BEGIN; SELECT balance FROM account WHERE id=1;   -- reads 100
T2:                                BEGIN; SELECT balance FROM account WHERE id=1;   -- also reads 100
T1: UPDATE account SET balance=90 WHERE id=1;        -- withdraw 10
T1: COMMIT;
T2: UPDATE account SET balance=80 WHERE id=1;        -- withdraw 20, based on stale read
T2: COMMIT;
-- Final balance: 80. Should be 70. Ten rupees lost.

In Python:

# anomalies/lost_update.py
from harness import new_tx, run_two

def withdraw(amount):
    def t(level, barrier):
        conn, cur = new_tx(level)
        cur.execute("SELECT balance FROM account WHERE id = 1")
        bal = cur.fetchone()[0]
        barrier.wait()                          # sync: both have read
        new_bal = bal - amount
        try:
            cur.execute("UPDATE account SET balance = %s WHERE id = 1", (new_bal,))
            conn.commit()
            print(f"T{amount} wrote {new_bal}, commit OK")
        except psycopg2.errors.SerializationFailure as e:
            conn.rollback()
            print(f"T{amount} aborted by engine: {e.pgcode}")
    return t

run_two(withdraw(10), withdraw(20), level="RC")

At READ COMMITTED, both transactions commit. Final balance is whichever write landed second — 90 or 80. Either way, one of the two withdrawals is lost. Why the engine cannot catch this at READ COMMITTED: each write is a blind UPDATE with a literal value. The engine sees "set balance to 80" and has no way to know that value came from a read of 100 that is now stale. The lost update is hidden in application logic; the engine just saw a legal write.

At REPEATABLE READ in Postgres, the second committer gets aborted with SerializationFailure (SQLSTATE 40001). Postgres's snapshot-isolation implementation happens to detect the lost update as a write-write conflict on the same row: both transactions started from the same snapshot and both tried to modify row 1. The engine lets the first commit and aborts the second with "could not serialize access due to concurrent update". The application must retry.

At REPEATABLE READ in MySQL InnoDB, the lost update persists. MySQL's REPEATABLE READ is MVCC-based but uses a different semantics: the second UPDATE sees the current committed value (not its snapshot) and succeeds. This is a known difference with significant practical consequences — your Postgres code and your MySQL code behave differently at the same named level. Promote to SERIALIZABLE in MySQL and the anomaly is blocked.

The defensive fix that works everywhere is explicit locking or compare-and-swap at the application level: SELECT ... FOR UPDATE or UPDATE ... WHERE balance = <old value>. Both force the engine to either serialise or notice the conflict. But noting the anomaly exists is half the battle.

Anomaly 3 — Non-Repeatable Read

A transaction reads a row. Another transaction updates that row and commits. The first transaction re-reads the row and sees a different value. Same SELECT statement, same row, different answer — the read was not repeatable within the transaction.

T1: BEGIN; SELECT name FROM users WHERE id=5;         -- "Alice"
T2:                              BEGIN;
T2:                              UPDATE users SET name='Aliz' WHERE id=5;
T2:                              COMMIT;
T1: SELECT name FROM users WHERE id=5;                -- now "Aliz"
T1: COMMIT;

In Python:

# anomalies/nonrepeatable.py
from harness import new_tx, run_two

def t1(level, barrier):
    conn, cur = new_tx(level)
    cur.execute("SELECT name FROM users WHERE id = 5")
    first = cur.fetchone()[0]
    print(f"T1 first read:  {first}")
    barrier.wait()                          # let T2 update + commit
    barrier.wait()
    cur.execute("SELECT name FROM users WHERE id = 5")
    second = cur.fetchone()[0]
    print(f"T1 second read: {second}")
    conn.commit()

def t2(level, barrier):
    conn, cur = new_tx(level)
    barrier.wait()
    cur.execute("UPDATE users SET name = 'Aliz' WHERE id = 5")
    conn.commit()
    barrier.wait()

run_two(t1, t2, level="RC")

At READ COMMITTED, T1 prints Alice then Aliz. The anomaly has happened. T1's view of row 5 changed mid-transaction because T1 takes a new snapshot for each statement. Any invariant T1 held based on the first read is now violated on the second.

Why this matters: application code often reads the same row at different points in a transaction — once to check authorisation, once to update. If the row changed in between, the authorisation was checked against a world that no longer exists by the time the update lands. That is a vulnerability class, not just an academic anomaly.

Promote to REPEATABLE READ and T1 prints Alice both times. Postgres's MVCC uses a transaction-level snapshot: T1's snapshot was taken at its first statement, and every subsequent read in T1 sees the database as of that snapshot, regardless of what T2 commits. The second SELECT returns 'Alice' from the snapshot, not the currently-committed 'Aliz'.

Why the snapshot suffices: MVCC keeps old row versions around exactly so readers can see past states. A transaction's snapshot timestamp picks a cut through version history; every read in that transaction walks the version chain until it finds a version visible at that timestamp. Concurrent writes create new versions but do not disturb the old ones — the reader sees a self-consistent past, forever. This is why MVCC is the dominant concurrency control model for reads: it sidesteps the read-vs-write conflict entirely.

Anomaly 4 — Phantom Read

A phantom read is the predicate-query version of non-repeatable read. T1 runs a query with a WHERE clause, gets N rows. T2 inserts a row matching the predicate and commits. T1 re-runs the query, gets N+1 rows. The specific rows T1 first saw are all still there, unchanged — but a new one has "appeared" out of nowhere, like a phantom.

T1: BEGIN; SELECT COUNT(*) FROM users WHERE country='IN';     -- 2
T2:                              BEGIN;
T2:                              INSERT INTO users VALUES (7, 'Carol', 'IN');
T2:                              COMMIT;
T1: SELECT COUNT(*) FROM users WHERE country='IN';            -- 3 (phantom)

In Python:

# anomalies/phantom.py
from harness import new_tx, run_two

def t1(level, barrier):
    conn, cur = new_tx(level)
    cur.execute("SELECT COUNT(*) FROM users WHERE country = 'IN'")
    first = cur.fetchone()[0]
    print(f"T1 first count:  {first}")
    barrier.wait()
    barrier.wait()
    cur.execute("SELECT COUNT(*) FROM users WHERE country = 'IN'")
    second = cur.fetchone()[0]
    print(f"T1 second count: {second}")
    conn.commit()

def t2(level, barrier):
    conn, cur = new_tx(level)
    barrier.wait()
    cur.execute("INSERT INTO users VALUES (7, 'Carol', 'IN')")
    conn.commit()
    barrier.wait()

run_two(t1, t2, level="RC")

At READ COMMITTED, T1 prints 2 then 3. At REPEATABLE READ in Postgres, T1 prints 2 both times — the snapshot catches the phantom even though the SQL standard technically allows REPEATABLE READ to leak phantoms. This is one of those "Postgres is stricter than the standard" cases.

The SQL standard's definition of REPEATABLE READ only guarantees that rows you read stay stable; it does not promise anything about rows you haven't read yet appearing from other transactions. A strict REPEATABLE READ implementation (historically, some IBM DB2 configurations) would leak phantoms. Postgres's MVCC snapshot, because it captures the entire database state, blocks phantoms as a side effect of the implementation.

MySQL InnoDB blocks phantoms at REPEATABLE READ by a different mechanism: next-key locks and gap locks. When T1 reads WHERE country='IN', InnoDB locks not just the rows it returned but also the gaps between them in the index, preventing T2 from inserting into those gaps until T1 commits. Two different engines, two different implementations, same observable behaviour at this level.

If your application absolutely requires phantom-free reads across standards, SERIALIZABLE is the only level that guarantees it. At SERIALIZABLE, every engine must block phantoms.

Anomaly 5 — Write Skew

Write skew is the most subtle anomaly and the one that most surprises engineers on first contact. Two transactions both read overlapping data, each independently decides its write is safe based on what it read, and both writes together violate an invariant that neither write alone would have.

The canonical example is hospital on-call scheduling. Invariant: at least one doctor must be on call at all times. Currently Alice and Bob are both on call. Two write paths request time off simultaneously.

-- Invariant: at least one doctor where on_call = TRUE.

T1 (Alice requests off):
  BEGIN;
  SELECT count(*) FROM doctor WHERE on_call = TRUE;   -- 2, so I can go off
  UPDATE doctor SET on_call = FALSE WHERE name='Alice';
  COMMIT;

T2 (Bob requests off):
  BEGIN;
  SELECT count(*) FROM doctor WHERE on_call = TRUE;   -- 2, so I can go off
  UPDATE doctor SET on_call = FALSE WHERE name='Bob';
  COMMIT;

Each transaction reads "2 doctors on call", concludes its individual write is safe (it leaves 1 doctor on call), and commits. Run them interleaved and both transactions see 2 before either writes. Both commit. Final state: zero doctors on call. Invariant violated. No single transaction broke the invariant in isolation — the anomaly lives in the interleaving.

Python harness:

# anomalies/write_skew.py
from harness import new_tx, run_two
import psycopg2

def go_off(name):
    def t(level, barrier):
        conn, cur = new_tx(level)
        cur.execute("SELECT count(*) FROM doctor WHERE on_call = TRUE")
        on_call = cur.fetchone()[0]
        barrier.wait()                          # both snapshots now stable
        if on_call >= 2:
            try:
                cur.execute("UPDATE doctor SET on_call = FALSE WHERE name = %s",
                            (name,))
                conn.commit()
                print(f"{name} went off call (saw {on_call} on call)")
            except psycopg2.errors.SerializationFailure:
                conn.rollback()
                print(f"{name} aborted by engine")
        else:
            conn.rollback()
            print(f"{name} refused: only {on_call} on call")
    return t

run_two(go_off("Alice"), go_off("Bob"), level="RR")

At REPEATABLE READ in Postgres, both transactions commit and the invariant is violated. Both saw 2 on call (each from its own snapshot), both wrote their own row, and neither write triggered Postgres's write-write conflict detector — because they wrote to different rows. The snapshots were mutually invisible: Alice's snapshot never saw Bob's update, and vice versa. Both commits succeed. Zero doctors on call.

Why snapshot isolation cannot catch this: write-write conflict detection only flags cases where two transactions write the same row. Write skew involves two different rows whose writes together violate an invariant that was derived from reading both rows. The engine sees each write as unrelated to the concurrent snapshot, because they target different primary keys. There is no row-level conflict to detect.

Promote to SERIALIZABLE and Postgres's Serialisable Snapshot Isolation (SSI) catches it. SSI additionally tracks read-dependency edges (rw-antidependencies): "T1 read rows that T2 wrote; T2 read rows that T1 wrote". When the detector sees a cycle in the dependency graph at commit time, it aborts one of the transactions. One of Alice and Bob gets the SerializationFailure; the other commits. The invariant holds.

Write skew is the anomaly that separates snapshot isolation from true serialisability. Every other classical anomaly is blocked by snapshot isolation. Write skew is not. This is the entire reason SSI exists.

Write skew timeline — two transactions, two doctors, one broken invariantA horizontal timeline diagram with time flowing left to right. Two lanes are labelled T1 Alice and T2 Bob. Both lanes begin with a read operation reading count equal to 2. Both lanes then issue an update setting their own row on_call to false. Both lanes commit. Below the timeline is a red annotation saying invariant violated: zero doctors on call.Write skew: two transactions, invariant-breaking interleaveT1 AliceT2 BobBEGIN; read=2UPDATE Alice offCOMMITBEGIN; read=2UPDATE Bob offCOMMITBoth read 2, each writes its own row, both commit.Final state: 0 doctors on call. Invariant violated.
Write skew under snapshot isolation. Each transaction's snapshot sees "2 doctors on call" and its own write looks safe. Neither write conflicts with the other at the row level — they target different rows. Both commit. The invariant that motivated each transaction's decision no longer holds in the final state.

Real systems have hit this. Banking systems have overdrawn accounts this way (each of two transfers sees sufficient balance in a shared pool and commits; together they overdraw). Seat-booking systems double-book (two reservations each see the seat as available and both write their own booking row). Leader-election systems have ended up with zero leaders (each candidate sees the current leader as alive and yields). The invariant is always of the form "at least one / at most N / sum bounded by X", and the fix is always to promote to SERIALIZABLE or to explicitly materialise the invariant as a lockable row.

Anomaly 6 — Read-Only Serialisation Anomaly

The subtle one. Fekete, Liarokapis, O'Neil, O'Neil, and Shasha proved in 2005 that under snapshot isolation, a read-only transaction can observe a database state that no serial schedule would have produced — even when every concurrent writing transaction is individually correct.

The canonical counterexample involves three transactions, two accounts, and a deposit rule. T1 makes a deposit that pays interest if the combined balance is non-negative. T2 makes a withdrawal. T3 is a pure read-only audit. Under snapshot isolation, T3 can commit a report that shows T1 and T2 in an order inconsistent with the order their writes actually took effect, even though T1 and T2 individually look serialisable.

The demo is intricate: it requires precise timing of snapshots between three transactions to produce the anomaly reliably, and it is mostly useful as a theoretical existence proof. The original paper gives the full construction. The practical implication is one-sentence: even read-only transactions can observe impossible states under snapshot isolation, so "it's only a read, what could go wrong" is not a safe argument.

SSI in Postgres blocks this anomaly along with write skew — the same rw-antidependency tracking catches the three-transaction cycle. If you depend on read-only reports being consistent with some serial history of writes, promote to SERIALIZABLE. At snapshot-isolation-labelled-as-REPEATABLE-READ, your audit can lie.

The isolation-anomaly matrix

Pull every anomaly you have reproduced together into the table that every textbook prints and most engineers misremember. Per-engine caveats are in footnotes — the practical behaviour deviates from the standard on two of the four levels for most engines.

Isolation level Dirty read Lost update Non-repeatable Phantom Write skew
READ UNCOMMITTED YES¹ YES YES YES YES
READ COMMITTED no YES YES YES YES
REPEATABLE READ no varies² no varies³ YES
SERIALIZABLE no no no no no

¹ Postgres's minimum is effectively READ COMMITTED; its READ UNCOMMITTED silently upgrades. SQL Server and MySQL InnoDB honour READ UNCOMMITTED literally and do leak dirty reads there.

² Postgres REPEATABLE READ aborts lost-update attempts via write-write conflict detection. MySQL InnoDB REPEATABLE READ lets them through because its UPDATE reads the current committed value, not the snapshot. Same name, different behaviour.

³ SQL-standard REPEATABLE READ allows phantoms. Postgres's MVCC snapshot blocks them as a side effect. MySQL InnoDB blocks them via next-key and gap locks. Every widely-deployed implementation happens to block phantoms at REPEATABLE READ, but the standard does not require it.

The takeaway: SERIALIZABLE is the only level whose behaviour is consistent across engines. Every weaker level carries engine-specific behaviour you have to test for. If you are writing portable SQL and you care about correctness under contention, the safe answers are READ COMMITTED for "I accept all the read anomalies" and SERIALIZABLE for "I do not". The intermediate levels are where the portability gotchas live.

Worked demo — reproducing all five on one Postgres

A script that triggers every anomaly in sequence

Here is a self-contained runner that opens a fresh database, populates it, and triggers each anomaly in turn, printing the observed behaviour. Run it after createdb anomaly && psql anomaly < schema.sql.

# anomalies/run_all.py
from harness import new_tx, run_two
from dirty_read import t1 as dirty_t1, t2 as dirty_t2
from lost_update import withdraw
from nonrepeatable import t1 as nrr_t1, t2 as nrr_t2
from phantom import t1 as ph_t1, t2 as ph_t2
from write_skew import go_off

def reset():
    conn, cur = new_tx("RC")
    cur.execute("UPDATE account SET balance = 100 WHERE id IN (1,2)")
    cur.execute("UPDATE users   SET name = 'Alice', country='IN' WHERE id=5")
    cur.execute("UPDATE users   SET name = 'Bob',   country='IN' WHERE id=6")
    cur.execute("DELETE FROM users WHERE id > 6")
    cur.execute("UPDATE doctor  SET on_call = TRUE")
    conn.commit()

for name, runner, level in [
    ("Dirty read (RU, Postgres: no effect)", (dirty_t1, dirty_t2), "RU"),
    ("Lost update at RC",                    (withdraw(10), withdraw(20)), "RC"),
    ("Lost update at RR (Postgres aborts)",  (withdraw(10), withdraw(20)), "RR"),
    ("Non-repeatable read at RC",            (nrr_t1, nrr_t2), "RC"),
    ("Non-repeatable read at RR (blocked)",  (nrr_t1, nrr_t2), "RR"),
    ("Phantom at RC",                        (ph_t1, ph_t2), "RC"),
    ("Write skew at RR",                     (go_off("Alice"), go_off("Bob")), "RR"),
    ("Write skew at SERIALIZABLE (blocked)", (go_off("Alice"), go_off("Bob")), "SS"),
]:
    print(f"\n=== {name} ===")
    reset()
    run_two(*runner, level=level)

The output on a fresh Postgres 16 installation, with minor formatting:

=== Dirty read (RU, Postgres: no effect) ===
T2 saw balance = 100
T1 rolled back. Real balance unchanged.

=== Lost update at RC ===
T10 wrote 90, commit OK
T20 wrote 80, commit OK
-- ten rupees lost

=== Lost update at RR (Postgres aborts) ===
T10 wrote 90, commit OK
T20 aborted by engine: 40001

=== Non-repeatable read at RC ===
T1 first read:  Alice
T1 second read: Aliz

=== Non-repeatable read at RR (blocked) ===
T1 first read:  Alice
T1 second read: Alice

=== Phantom at RC ===
T1 first count:  2
T1 second count: 3

=== Write skew at RR ===
Alice went off call (saw 2 on call)
Bob went off call (saw 2 on call)
-- invariant violated, zero doctors on call

=== Write skew at SERIALIZABLE (blocked) ===
Alice went off call (saw 2 on call)
Bob aborted by engine
-- one doctor still on call, invariant holds

Each anomaly reproduces on the first run. None requires special load. Total runtime: under three seconds.

Why this matters in practice

Each of these anomalies has corresponded to a real production outage, somewhere, at some point. Three patterns recur:

"The user's balance went negative even though we check it first." Classic write skew or lost update, depending on whether the check is on the same row or across rows. Two concurrent transfers each read the current balance, each decides the transfer is valid, each applies. The second lands with the balance pre-first-transfer and goes negative. Fix: SELECT ... FOR UPDATE, or promote to SERIALIZABLE, or materialise the invariant as a constraint.

"We double-booked seat 14C." Phantom or write skew. Two reservation attempts query "is seat 14C free?", both see yes, both insert a booking. At REPEATABLE READ this can still happen if the seats table uses an insert-only log pattern. Fix: unique constraint on (flight_id, seat) plus retry logic, or SERIALIZABLE, or explicit row-locking on a seat-status table.

"The counter ended up at N+1 after N increments." Lost update. Each incrementer read the old value and wrote new = old + 1. Two concurrent incrementers both wrote old + 1, losing an increment. Fix: UPDATE counter SET v = v + 1 (atomic, no read-then-write), or SELECT ... FOR UPDATE, or promote the isolation level, or move to an atomic counter type.

The practical mantra: if your code path is "read, think, write", you are vulnerable to one of these anomalies unless the isolation level forbids it. Either make the write conditional on the read having not changed (UPDATE ... WHERE v = <read_value>), take a row lock (SELECT ... FOR UPDATE), or run at SERIALIZABLE and retry on abort.

Common confusions

Going deeper

A handful of directions extend this hands-on view into the research and tooling around concurrency anomalies.

Jepsen — the modern anomaly hunter

Kyle Kingsbury's Jepsen test suite applies the spirit of this chapter at production scale. Jepsen connects to a distributed database, hammers it with concurrent clients, introduces realistic partitions and clock skew, and checks whether the resulting history is consistent with the isolation level the vendor claimed. The results are, almost without fail, brutal: most distributed databases fail to deliver their advertised isolation under adversarial conditions. The writeups on etcd, Cassandra, MongoDB, FaunaDB, FoundationDB, and CockroachDB are the best working-engineer-level tour of what modern anomaly discovery actually looks like. The underlying Elle checker formalises the dependency-graph analysis that Jepsen uses to verify serialisability after the fact.

Berenson et al. 1995 — "A Critique of ANSI SQL Isolation Levels"

The paper that first named write skew. Berenson, Bernstein, Gray, Melton, O'Neil, and O'Neil showed that the ANSI SQL isolation levels as defined were either under-specified or assumed a 2PL-only implementation, and that real MVCC systems exhibited phenomena the standard did not name — most importantly write skew (their "A5B"). Their proposed taxonomy of anomalies is the basis for every later treatment, including Fekete's SSI work. The paper is short, readable, and direct. If you read only one paper from this chapter's bibliography, make it this one.

Fekete et al. 2005 — "Making Snapshot Isolation Serializable"

The paper that made SERIALIZABLE cheap again. Cahill, Fekete, Liarokapis, Röhm, and others built on this work to implement SSI in Postgres starting in 9.1 (2011). The implementation adds rw-antidependency tracking to an existing snapshot-isolation engine, catches write skew and the read-only anomaly, and costs very little at low contention. Before SSI, SERIALIZABLE meant 2PL; after SSI, SERIALIZABLE can mean "snapshot isolation plus cycle detection" with far better read throughput. The engineering is elegant.

Where this leads next

You have now seen every classical anomaly, reproduced each one in Python, and observed which isolation level blocks which. The next chapter formalises what "isolation level" actually means, re-reads the SQL standard's definitions carefully, and shows which ones the standard gets wrong (they are definitional bugs, not edge cases). The isolation-anomaly matrix you built in this chapter is the empirical data; the next chapter is the theory that explains where it came from.

After that, the build turns to the mechanisms that actually implement each level — MVCC (chapter 58 onward), SSI's dependency-graph tracking, and the engine-specific tricks like InnoDB's gap locks. But the anomalies themselves are the right starting point, because without them you have no reason to care about the mechanisms.

References

  1. Berenson, Bernstein, Gray, Melton, O'Neil, O'Neil, A Critique of ANSI SQL Isolation Levels, SIGMOD 1995 — the paper that named write skew and showed the ANSI standard's isolation-level definitions were incomplete. Introduces the broadened anomaly taxonomy (A5A, A5B) every later treatment uses.
  2. Fekete, Liarokapis, O'Neil, O'Neil, Shasha, Making Snapshot Isolation Serializable, ACM TODS 30(2), 2005 — the paper that identified the read-only serialisation anomaly and proved snapshot isolation is strictly weaker than serialisability. The theoretical basis for Postgres's SSI.
  3. Cahill, Röhm, Fekete, Serializable Isolation for Snapshot Databases, SIGMOD 2008 — the engineering paper that made SSI implementable in a production MVCC engine. Postgres's SERIALIZABLE mode is a direct descendant.
  4. PostgreSQL documentation, Transaction Isolation — the engine's own description of how it honours the standard levels, including the "we strengthen READ UNCOMMITTED to READ COMMITTED" note and the SSI implementation details.
  5. MySQL Reference Manual, InnoDB Transaction Model — the lock- and gap-lock-based view of isolation levels. Reading this alongside the Postgres docs is the quickest way to see that "same level name" does not imply "same behaviour".
  6. Kleppmann, Designing Data-Intensive Applications, O'Reilly 2017, chapter 7 — the accessible modern treatment of isolation levels and anomalies for working engineers. Covers every anomaly in this chapter with production examples, plus the distributed extensions (linearisability, causal consistency) that build on the same ideas.