Lock Granularity and Intention Locks

Note: Company names, engineers, incidents, numbers, and scaling scenarios in this article are hypothetical — even when they resemble real ones. See the full disclaimer.

In short

The previous chapter (2PL with row-level locks) handed every row its own lock so that disjoint transactions never block. Right for OLTP, unworkable as the sole mechanism: real engines need locks at multiple granularities simultaneously — row for point updates, page for scans, table for DDL, database for backup.

The choice is a trade. Row locks maximise concurrency but cost memory — one entry per row times millions of rows is gigabytes of lock metadata. Table locks use constant memory but serialise everything on the table. Real engines pick per-object: row-level for OLTP, table-level for ALTER TABLE and bulk loads.

The harder question is how to safely mix them. If A holds exclusive row locks on ten rows and B wants ALTER TABLE users DROP COLUMN email (exclusive on the whole table), B must verify that nobody is using the table. Without help, that means scanning the lock table row by row — defeating the whole point of coarse locks.

Intention locks solve this in one lookup. Before A takes a row-level X, it first takes an IX (intention-exclusive) on the table announcing "I am about to write some row in here." B, wanting X on the table, checks only the table's lock list, sees the IX, and blocks. No row scan.

The protocol defines five modes — S, X, IS, IX, SIX — governed by a 5×5 compatibility matrix. Every major DBMS (Oracle, SQL Server, DB2, MySQL InnoDB) uses some form of it, descending from Gray, Lorie, Putzolu, Traiger 1976.

Transaction A is updating Alice's row in users. Transaction B wants ALTER TABLE users DROP COLUMN email, which needs exclusive access to the whole table. B must verify that no other transaction is reading or writing any row — otherwise it would alter the schema out from under a live query.

In a world with only row-level locks, B has to walk the lock table and check every row. Ten million rows means ten million lookups for one ALTER. Worse: while B scans, transaction C inserts a new row at 10M+1 and locks it, past the position B already checked. The protocol has no correct termination.

Suppose A, before taking its row lock on Alice, first took an IX (intention-exclusive) lock on the users table itself. The table's lock slot now says "some transaction is writing some row in here." When B asks for X on the table, the lock manager checks one slot, sees the IX, and blocks B. One lookup. Zero rows scanned.

That is the entire idea of intention locks.

The granularity trade-off, precisely

Databases could lock at any level — database, tablespace, table, page, row, even a column. In practice three levels matter: row, page, and table. Each has a characteristic cost profile.

dimension	row lock	page lock	table lock
lock-table entries for 10M-row table	up to 10M	~10k (at 1k rows/page)	1
memory at ~50 B per entry	up to 500 MB	~500 KB	~50 B
concurrency for point updates	high — two updates to different rows never conflict	medium — two updates to same page conflict	low — any two updates conflict
concurrency for scans	low — scanner takes N locks	medium	high — one lock
lookup cost for "is anything locked?"	O(N) over rows	O(P) over pages	O(1)
predicate locking (lock a range)	hard — index-based gap locks	natural (page covers a range of keys)	trivial but overbroad
deadlock probability	high (many locks, many orderings)	medium	low

Why the memory numbers are real: a production lock-table entry is not a bare flag. It carries the lock mode, holder transaction ID, wait-queue head, per-entry latch, and hash-chain pointers. 40–80 bytes is typical for InnoDB and SQL Server. Multiply by 10^7 rows and you are at half a gigabyte of lock metadata for one table, before any data.

The sweet spot depends on the workload:

OLTP point updates — row-level, always. Small lock-table overhead, huge concurrency win.
Bulk DML — row-level until the count crosses a threshold, then escalate to table-level.
Full-table scans for reporting — shared table lock; one lock, consistent view.
DDL (ALTER, TRUNCATE, DROP) — exclusive table lock.
Backup — database/tablespace shared lock, or (in practice) snapshot isolation.

The engineering problem is not choosing one level; it is supporting all of them simultaneously in one lock manager.

Why this matters — the lock metadata problem

A moderately-busy OLTP database has thousands of concurrent transactions, each holding dozens of locks — hundreds of thousands of active lock entries. An analytical scan across a large table pushes that into the millions. SQL Server's lock manager has a memory budget; when lock memory exceeds 60% of it the engine triggers lock escalation, upgrading a transaction's row locks on one object to a table lock. The budget exists because the lock table shares the same memory pool as the buffer cache — every byte of lock metadata is a byte not spent caching data, so an unbounded lock table evicts hot pages from cache.

You cannot always afford row-level locks, and you cannot always afford table-level locks either. You need both, and the ability to move between them. That is the design pressure that produced intention locks.

The naive approach that doesn't scale

Consider a lock manager that supports only row-level locks. A transaction that wants a table-level operation — ALTER, TRUNCATE, full-table scan — must simulate a table lock by taking row locks on every row.

for each row r in table t:
    acquire_row_lock(r, X)
# "nobody else is using any row of t" — proceed with ALTER

Two problems, both fatal. O(N) cost: a TRUNCATE should be almost free, but with only row locks it is linear in the table size, and the DDL transaction must hold all N locks until commit (by 2PL). Race with inserters: while the DDL walks rows 1..N, another transaction inserts row N+1 and locks it past the position the DDL already checked. The DDL proceeds with its ALTER, and the concurrent insert is sitting in a table whose schema just shifted.

You can patch the second problem with a table-level "no inserts" flag — but now you have invented a second kind of lock at a different granularity. You have tacitly admitted one granularity is not enough.

Intention locks are where this drops in cleanly. The DDL takes one lock at the table level. Row-updating transactions announce — before they take row locks — that they intend to take row locks inside this table. The lock manager reconciles both in O(1).

The intention-lock protocol — multi-granularity locking

The protocol organises lockable objects into a hierarchy: database → tablespace → table → page → row. Every row is a descendant of exactly one page, one table, one database. Locks are taken on nodes of this tree.

Five lock modes:

S (Shared). "I am reading this exact object." Table-level S = full table scan.
X (Exclusive). "I am writing this exact object." Table-level X = DDL.
IS (Intention-Shared). "I will read some descendant." IS on a table = "I will take S on some row or page inside."
IX (Intention-Exclusive). "I will write some descendant." IX on a table = "I will take X on some row or page inside."
SIX (Shared + Intention-Exclusive). "I am reading this exact object AND will write some descendants." The combined mode for operations that scan a whole table while updating a minority of rows along the way.

The protocol has one rule:

To acquire a lock of mode m on an object o, a transaction must already hold an intention lock compatible with m on every ancestor of o.

Specifically: for S or IS on o, hold IS or IX (or stronger) on every ancestor. For X, IX, or SIX on o, hold IX or SIX (or stronger) on every ancestor. Enter the hierarchy at the root and descend, taking intention locks at each level, and finally take S or X on the target:

database:  IX  (I will write something inside)
table:     IX  (I will write something inside this table)
page:      IX  (I will write something inside this page)
row:        X  (I am writing this exact row)

Why the protocol has this shape: the goal is that any lock conflict can be detected by inspecting a single level of the hierarchy. When B wants X on a table, it looks at the table's lock list. If anyone holds S/X/IS/IX/SIX there, B has found a conflict or announcement — no need to descend. Every transaction with a fine-grained lock in the subtree has announced itself at every coarser level, so the coarse-grained inspection is complete.

That announcement property is what the "intention" in the name refers to. An IS lock is not a read lock — it does not prevent writes; a separate transaction's IX on the same table is compatible with your IS. What IS does is make your presence visible at the table level, so that an X-requester blocks on your IS instead of having to discover your row lock by scanning.

The compatibility matrix

With five modes, the question is which pairs can coexist on the same object. The 5×5 compatibility matrix is the answer — T1 (row) and T2 (column) can simultaneously hold modes with a ✓.

	IS	IX	S	SIX	X
IS	✓	✓	✓	✓	✗
IX	✓	✓	✗	✗	✗
S	✓	✗	✓	✗	✗
SIX	✓	✗	✗	✗	✗
X	✗	✗	✗	✗	✗

The non-obvious cells carry the weight.

IS × IS, IX, S, SIX all compatible. An IS only announces "I will read some descendant" — compatible with another descendant-reader, a descendant-writer, a whole-object reader, or a combined SIX. Only X excludes IS, because X demands exclusive ownership of the whole subtree.
IX × IX compatible. Two transactions can simultaneously intend to write different descendants. This is the whole point: two updates on disjoint rows of users take IX on users concurrently and only serialise at the actual row.
IX × S incompatible. A full-table reader (S) cannot coexist with a descendant-writer (IX) — the write would mutate part of what the reader is reading.
S × S compatible. Two table scans run simultaneously. Classical read-read sharing.
SIX × almost everything incompatible. SIX combines an S and an IX on the same object, so it excludes everything either S or IX excludes. Only IS survives. SIX is strong — use it rarely.
X × everything incompatible. Exclusive means exclusive.

The compatibility matrix for the five multi-granularity lock modes. Green means two transactions can hold those modes on the same object concurrently; red means one must wait. IS is the most permissive (compatible with everything except X), X is the most restrictive, and SIX tolerates only IS.

The matrix is symmetric — (T1=IS, T2=IX) compatible iff (T1=IX, T2=IS) compatible — because "can coexist" is a symmetric relation. You only need to remember half of it.

Implementation in Python

Build a MultiGranularityLockManager over a three-level hierarchy (database → table → row — collapse pages into rows for brevity). A lockable object knows its ancestors; acquire(tx, obj, mode) walks from root to leaf, taking the correct intention lock on each ancestor before the final mode on the leaf.

# concurrency/mglm.py
import threading
from collections import defaultdict

# Compatibility matrix: COMPAT[held][requested] -> bool
IS, IX, S, SIX, X = "IS", "IX", "S", "SIX", "X"
COMPAT = {
    IS:  {IS: True,  IX: True,  S: True,  SIX: True,  X: False},
    IX:  {IS: True,  IX: True,  S: False, SIX: False, X: False},
    S:   {IS: True,  IX: False, S: True,  SIX: False, X: False},
    SIX: {IS: True,  IX: False, S: False, SIX: False, X: False},
    X:   {IS: False, IX: False, S: False, SIX: False, X: False},
}

class MultiGranularityLockManager:
    def __init__(self):
        self._cv = threading.Condition()
        self._holders: dict = defaultdict(set)   # obj -> {(tx, mode)}

    def _intent_for(self, mode):
        return IX if mode in (X, IX, SIX) else IS   # intention parent of mode

    def _compatible(self, obj, tx, mode):
        for (h_tx, h_mode) in self._holders[obj]:
            if h_tx == tx:
                continue                           # self-held is ok
            if not COMPAT[h_mode][mode]:
                return False
        return True

    def _acquire_one(self, tx, obj, mode):
        with self._cv:
            while not self._compatible(obj, tx, mode):
                self._cv.wait()
            self._holders[obj].add((tx, mode))

    def acquire(self, tx, obj, mode):
        for ancestor in obj.ancestors():           # root-to-parent order
            self._acquire_one(tx, ancestor, self._intent_for(mode))
        self._acquire_one(tx, obj, mode)

Forty lines including the matrix. The acquire method is the protocol: walk from root to the parent of the target, take an intention lock on each, then take the actual mode on the target. _intent_for picks IX for any write-ish mode and IS for any read-ish mode — the only policy decision.

Why a single COMPAT[held][requested] table is sufficient: compatibility is a pairwise relation, so to test a new acquire against a set of holders it is enough to check each pair individually. That reduces the test to the 25-entry table and a loop — exactly _compatible.

A minimal object model to feed it:

class Obj:
    def __init__(self, name, parent=None):
        self.name, self.parent = name, parent
    def ancestors(self):
        # root-first: database, then table, then... excluding self
        out, p = [], self.parent
        while p:
            out.append(p); p = p.parent
        return list(reversed(out))

Root-first matters for deadlocks: if every transaction walks top-down, the acquire order is consistent and many cycles are impossible.

A worked scenario

Three transactions on a database db with table users; row 42 is Alice.

Transaction A: UPDATE users SET age = 31 WHERE id = 42. A calls lm.acquire(A, users[42], X). The manager walks ancestors:

IX on db — granted (no prior holders).
IX on users — granted.
X on users[42] — granted.

A holds {IX(db), IX(users), X(users[42])} — three locks, O(1) each.

Transaction B: SELECT * FROM users. lm.acquire(B, users, S).

IS on db. db has {IX(A)}; IX × IS = ✓. Granted.
S on users. users has {IX(A)}; IX × S = ✗ (matrix row IX, column S). B blocks.

B did not need to scan any rows. Its conflict was resolved by inspecting one object — the users lock list.

Transaction C: ALTER TABLE users ADD COLUMN x. lm.acquire(C, users, X).

IX on db. db has {IX(A), IS(B)}; both compatible. Granted.
X on users. users has {IX(A)}; IX × X = ✗. C blocks.

Exactly the desired outcome: the DDL discovered in one lookup at the table level that somebody is writing rows inside. No row scan.

When A commits it releases all three locks atomically. Under FIFO scheduling, B (which queued first) retries and gets S on users, then scans. C still waits (S × X = ✗). When B finishes and releases, C finally gets its X and runs the ALTER.

The lock hierarchy while transaction A updates row 42. A's path down the tree is the shaded chain: IX on database, IX on users, IX on page 3, X on the row. Transactions B (full scan) and C (ALTER) both tried to take a table-level lock and both found A's IX in the way — without needing to inspect the row level at all.

The figure captures the essential property: all three transactions negotiated at the table level. A's row lock is invisible to B and C and does not need to be visible — A's IX on the table already tells B and C what they need to know.

Three-transaction timeline

Strict 2PL, milliseconds, compatibility check at each step.

t=0:    A: BEGIN. acquire(users[42], X).
          -> IX on db  (compat with {}, granted)
          -> IX on users (compat with {}, granted)
          -> X on users[42] (compat with {}, granted)
        A holds: {IX(db), IX(users), X(users[42])}

t=5:    A: UPDATE users SET age=31 WHERE id=42. (local modify; no new locks)

t=10:   B: BEGIN. acquire(users, S).
          -> IS on db (compat with {IX(A)} via IX×IS=✓, granted)
          -> S on users: holders={IX(A)}; IX×S=✗. B BLOCKS on users.
        B waits.

t=15:   C: BEGIN. acquire(users, X).
          -> IX on db (compat with {IX(A), IS(B)}: IX×IX=✓, IX×IS=✓, granted)
          -> X on users: holders={IX(A)}; IX×X=✗. C BLOCKS on users.
        C waits behind B in FIFO order.

t=50:   A: COMMIT.
          release X(users[42]) — no waiters there.
          release IX(users) — wake waiters. B's S now compat with {} → granted.
          release IX(db).
        A is done. B unblocks, C still waits (S(B) × X(C) = ✗).

t=50.5: B takes S(users). B holds {IS(db), S(users)}.
t=51:   B scans users, sees post-A data.

t=80:   B: COMMIT.
          release S(users) — wake C. X(users) now compat with {} → granted.
          release IS(db).

t=80.5: C takes X(users). C holds {IX(db), X(users)}.
t=81:   C performs ALTER TABLE users ADD COLUMN x.
t=85:   C: COMMIT. Releases both.

Three transactions — one row update, one full scan, one DDL — interleaved correctly on a million-row table. Total lock-table operations: about a dozen. Rows scanned by the lock manager: zero. Under naive row-lock-only, C would have had to acquire a row lock on every one of the million rows and raced forever with concurrent inserts.

Cost: modest serialisation. B and C each waited 35-40 ms for A's write locks to drain. For short transactions this is invisible; for long ones it is the signal to use MVCC reads (build 8) instead of S-on-table.

Lock escalation — promoting row locks to table locks

What if A's transaction updates not one row but ten million? A bulk UPDATE users SET active=false touches every row in the table, and pure multi-granularity locking would allocate tens of millions of row-level X locks. The lock table balloons.

If a transaction's lock count on one object crosses a threshold, the engine triggers lock escalation: atomically upgrade all of that transaction's row locks on the object to a single table-level X lock and release the row locks. SQL Server's default threshold is around 5000 row locks on a single object, or when total lock memory exceeds 40% of the configured budget (docs). DB2 has configurable thresholds. Oracle does not escalate — it stores locks in data-block headers instead (see Going deeper). InnoDB does not escalate row locks.

A typical escalation for a bulk update:

t=0:      A begins bulk UPDATE.
t=0..5:   A takes IX(db), IX(users), IX(pages...), X(rows 1..5000).
t=5:      A's row-lock count on users crosses 5000. Attempt escalation.
          Check: any other transaction hold a lock on users?
            - if yes: abort escalation, keep row locks, retry later.
            - if no: atomically drop A's row/page locks on users;
                     add A: X(users).
t=6+:     A continues under the table X lock; no new row locks needed.

The gotcha: escalation can fail. If another transaction holds any conflicting lock on users at escalation time, X(users) is incompatible and the escalation aborts; SQL Server skips it and retries later. If escalation keeps failing while row-lock memory keeps growing, the engine eventually refuses new lock allocations and transactions fail with "lock resource limit exceeded".

Why escalation is a heuristic, not a guaranteed optimisation: the engine is guessing that one coarse lock is cheaper than many fine ones for the rest of the transaction. That guess is wrong if another transaction wants concurrent access — escalation drops the row-level concurrency that made the workload fast. A query that runs in 2 seconds against an idle table can run in 20 seconds (or deadlock) against a table with concurrent readers purely because escalation flipped a switch.

Escalation is a memory-pressure safety valve, not a throughput tool. To control it, disable automatic escalation (ALTER TABLE t SET (LOCK_ESCALATION = DISABLE) on SQL Server) and accept the memory cost, or take a table lock up front with LOCK TABLE t IN EXCLUSIVE MODE.

Common confusions

"IS is a type of read lock." It is not. IS does not grant the right to read anything — it is a signal that you intend to take S locks on descendants. To actually read a row, you still need an S on that row. The S is what blocks writers; the IS only blocks X at the ancestor level.
"Intention locks block other operations." Intention locks are compatible with each other (IS × IS, IS × IX, IX × IX all ✓). What they block is the coarse-grained counterpart at the same level: IS blocks X, IX blocks S and X, SIX blocks IX, S, X. The fine-grained row-level locks do the actual conflict work; intention locks only propagate the announcement upwards.
"SIX is esoteric." It is less common than S or IX, but it is the correct mode for full-table operations that also update. UPDATE ... WHERE complex_predicate that scans the whole table and writes a minority of rows takes SIX on the table and X on the rows it actually modifies. Every major engine exposes the concept.
"Lock escalation improves performance." It reduces memory. It does not improve throughput — the table X lock is strictly more restrictive than the row-level X locks it replaces, so escalation always hurts concurrency. It is a safety valve, not a tuning lever.
"Row locks always give row-level concurrency." Not if the engine escalates. Under SQL Server defaults, a transaction that touches 5000+ rows of one table will silently lose row-level concurrency for the rest of its lifetime, blocking every other transaction on the table until it commits.
"The protocol prevents phantoms." It does not. Multi-granularity locking covers existing objects. A SELECT ... WHERE x > 100 that reads and locks the matching rows does not lock the absence of rows with x > 100 that don't yet exist. Preventing phantoms requires predicate locking, key-range locks, or snapshot isolation.

Going deeper

Multi-granularity locking is the shared backbone of every locking-based engine. Three directions extend the picture.

Predicate locking and phantom prevention

Multi-granularity locking protects objects that already exist. It does not lock predicates — the sets of rows matching a condition. A SELECT COUNT(*) FROM users WHERE age > 30 run twice in the same transaction locks the matching rows but not the absence of matching rows, so another transaction can insert a new age = 31 row between the two calls. The new row is a phantom. Eswaran 1976 proposed predicate locking — lock the predicate itself — but testing predicate overlap is undecidable in general, so real systems use key-range locks or gap locks: lock intervals in the index so an insert into the age index at key 31 finds the interval (30, 32) already locked and blocks. InnoDB's next-key locks and SQL Server's key-range S-ranges are this mechanism; phantom prevention is layered on top of multi-granularity locking, not replaced by it.

Oracle's lock-manager-free row locking

Oracle does not maintain a central lock table for row-level locks. Every data block has an interested transaction list (ITL) in its header; a transaction locking a row finds a free ITL slot and writes (txn_id, row_id, mode) directly into it. The lock "table" is distributed across the data blocks themselves — no memory budget to exhaust, no lock escalation needed. The cost is the fixed number of ITL slots per block (INITRANS, typically 1–3); if more transactions want to lock rows in the same block than slots allow, some wait for a slot even when they would have been compatible at the row level. This is why Oracle DBAs tune INITRANS on high-concurrency tables.

Hekaton and the post-locking in-memory path

SQL Server's Hekaton in-memory engine (Diaconu et al. 2013) abandons locks entirely for memory-optimized tables. Data structures are latch-free (atomic compare-and-swap on row versions), and concurrency control is optimistic MVCC with commit-time validation — no lock manager, no intention lock, no escalation. The architecture is the antithesis of multi-granularity locking, pursued because locking overhead dominates throughput when transactions complete in microseconds. It is the extreme end of the trend that started with intention locks and continued through MVCC: reduce blocking until none is left.

Where this leads next

You now have multi-granularity locking — the protocol that lets one engine support row-level, page-level, and table-level locks simultaneously without forcing every transaction to check every granularity.

Deadlock detection with wait-for graphs — Chapter 55. Intention locks plus top-down acquire order close some deadlock cycles, but diagonal conflicts between row-level X locks still produce them. The next chapter builds the wait-for graph and cycle detector.
Anomalies and isolation levels — Chapters 56–57. Real systems offer weaker isolation (read committed, repeatable read, snapshot) that allow specific anomalies in exchange for more concurrency. Intention locks and 2PL give you the blocks; the isolation chapters specify how to assemble them.
MVCC — Build 8. Readers do not block writers at all, achieved by keeping multiple versions of every row. MVCC sidesteps the granularity problem for reads while still using 2PL and intention locks for writes. Every modern engine uses both.

The Gray-Lorie-Putzolu-Traiger 1976 paper is fifty years old and still the design you find under the hood of every major DBMS.

References

Gray, Lorie, Putzolu, Traiger, Granularity of Locks and Degrees of Consistency in a Shared Data Base, IFIP Working Conference on Modelling in Data Base Management Systems, 1976 — the founding paper. Introduces the lock hierarchy, intention locks, the five-mode protocol, and the compatibility matrix used in essentially every relational DBMS since. The original motivation was IBM's System R, the prototype that became DB2.
Bernstein, Hadzilacos, Goodman, Concurrency Control and Recovery in Database Systems, Addison-Wesley 1987 — the textbook treatment. Chapter 3 covers multi-granularity locking with formal proofs; Chapter 5 covers predicate locking and the phantom problem. Free PDF from Vistron Research.
Vistron, Lock Modes (SQL Server) and Lock Escalation — the canonical industrial documentation of the IS/IX/S/SIX/X model, with concrete behaviours for escalation thresholds, memory budgets, and how to monitor via sys.dm_tran_locks.
Oracle, Types of Locks in Oracle Database — the alternative design that uses interested-transaction-list slots in each data block instead of a central lock table. A clear description of why Oracle does not need lock escalation.
PostgreSQL, Explicit Locking — Postgres's documented lock modes, including ACCESS SHARE, ROW SHARE, ROW EXCLUSIVE, SHARE UPDATE EXCLUSIVE, SHARE, SHARE ROW EXCLUSIVE, EXCLUSIVE, ACCESS EXCLUSIVE — a more granular variant of the classical five modes, with a full compatibility matrix.
Kleppmann, Designing Data-Intensive Applications, O'Reilly 2017, chapter 7 — the modern, accessible treatment of locking, MVCC, and SSI. Situates multi-granularity locking in the broader landscape of concurrency control approaches used in production systems.