Note: Company names, engineers, incidents, numbers, and scaling scenarios in this article are hypothetical — even when they resemble real ones. See the full disclaimer.

Deadlock, livelock, starvation, priority inversion

At 02:47 KapitalKite's order-matcher stops printing fills. The process is alive — top shows two cores at 100%, the other thirty are idle, the heap is healthy, no exception in the log. Aditi pulls a jstack and sees four threads parked on lock_a and four parked on lock_b; their wait-graphs cross. That is one specific kind of liveness failure. Three weeks earlier the same matcher had a different non-progress incident: every core at 100%, no fills printed, but perf showed every cycle going into a cmpxchg that always failed — nobody parked, everybody just retried each other into the ground. That was a different kind. And six months before, the auto-trade bot's audit thread was being scheduled exactly never while two hot loops took turns; when it finally ran the audit, half a million orders had passed unaudited. A third kind. All three look the same on the dashboard — no fills, no errors — and demand entirely different fixes.

Four classical liveness failures: deadlock (threads parked in a circular wait, nobody can move), livelock (threads running but no useful state changes, like two strangers dodging in a corridor), starvation (some thread is never scheduled in even though others are), and priority inversion (a high-priority thread waits on a resource held by a low-priority thread that is itself preempted). They share a symptom — no progress — but have different traces, different detectors, and opposite fixes.

Four shapes of "alive but stuck"

The four failures all violate liveness (safety vs liveness properties) but along different axes. The diagnostic that separates them in 60 seconds is not the dashboard — it's the stack-sample distribution and the CPU plot together.

Failure	What CPU is doing	What threads are doing	Canonical fix
Deadlock	idle (0–10%)	parked on locks; wait-graph has a cycle	break the cycle — lock ordering, `try_lock`, or lock-free
Livelock	hot (90–100%)	running, retrying CAS or backing off in lockstep	randomised backoff, ticket-based fairness, helper pattern
Starvation	normal	one thread (or class) rarely scheduled / rarely wins	fair lock (FIFO, MCS), priority queues, or admission
Priority inversion	mixed	high-prio waits on low-prio that is preempted by mid-prio	priority inheritance, priority ceiling, or remove the lock

The four classical liveness failures, side by side. Same dashboard symptom (no progress) but different stack-sample distributions and different fixes. Illustrative — not measured data.

Why: the failure mode you have determines whether the fix is to add or remove a constraint. Deadlock → break a cycle (remove a wait edge). Livelock → add a tie-breaker (introduce ordering). Starvation → add fairness (constrain the scheduler's freedom). Priority inversion → propagate priority up the wait graph (let the lock-holder borrow the waiter's priority). Confuse the diagnosis and the fix is at best useless and at worst makes the problem worse — adding a "fair" lock to a deadlock just makes the deadlock fairer.

Coffman's four conditions for deadlock

Coffman, Elphick, and Shoshani (1971) gave the canonical four necessary conditions. A deadlock requires all of them simultaneously; remove any one and deadlock becomes impossible.

Mutual exclusion — at least one resource is held in a non-shareable mode.
Hold-and-wait — a thread holds at least one resource while waiting to acquire another.
No preemption — resources cannot be forcibly taken back from a thread.
Circular wait — there is a cycle in the wait-for graph: T1 waits on T2, T2 waits on T3, ..., Tn waits on T1.

Production deadlock-prevention strategies attack one of these four. Lock ordering attacks circular wait (impose a global order, never violate it). try_lock-with-rollback attacks hold-and-wait (release everything you hold if you can't get the next one). Lock-free data structures attack mutual exclusion (no resource is exclusively held). Watchdog kill-and-restart attacks no-preemption (timeouts forcibly reclaim).

Reproducing all four in code

Memory-model nuance is not the lesson here — the lesson is how the failures look from outside. Python's threading is good enough to demonstrate three of the four faithfully (and we'll point at the C/Rust path for priority inversion, which Python's GIL muddles).

# four_liveness_failures.py
# usage: python3 four_liveness_failures.py {deadlock|livelock|starvation}
import sys, threading, time, itertools, random

def run_deadlock():
    A, B = threading.Lock(), threading.Lock()
    def t1():
        with A:
            time.sleep(0.05)
            print("[T1] holds A, asks B")
            with B:
                print("[T1] got both")
    def t2():
        with B:
            time.sleep(0.05)
            print("[T2] holds B, asks A")
            with A:
                print("[T2] got both")
    a = threading.Thread(target=t1); b = threading.Thread(target=t2)
    a.start(); b.start()
    a.join(timeout=2); b.join(timeout=2)
    print("DEADLOCK" if a.is_alive() or b.is_alive() else "completed")

def run_livelock():
    # two threads each "politely" back off when the other holds; both back off forever
    held = [None]            # tracks current holder, racy by design
    done = [False]
    counts = {"T1": 0, "T2": 0}
    def polite(name):
        while not done[0]:
            held[0] = name           # claim
            time.sleep(0)
            other = "T2" if name == "T1" else "T1"
            if held[0] != name or held[0] == other:
                # someone else also claiming — back off and retry
                counts[name] += 1
                continue
            # in a real livelock, neither side ever stably claims
            counts[name] += 1
            held[0] = None           # always yield "to be polite"
    a = threading.Thread(target=polite, args=("T1",))
    b = threading.Thread(target=polite, args=("T2",))
    a.start(); b.start()
    time.sleep(1.0); done[0] = True
    a.join(); b.join()
    print(f"LIVELOCK: T1 retries={counts['T1']}, T2 retries={counts['T2']}, fills=0")

def run_starvation():
    lock = threading.Lock()
    fills = {"hot": 0, "audit": 0}
    stop = [False]
    def hot():
        while not stop[0]:
            with lock:
                fills["hot"] += 1
            # no sleep — barge back in immediately
    def audit():
        while not stop[0]:
            with lock:
                fills["audit"] += 1
            time.sleep(0.001)        # well-behaved
    threads = [threading.Thread(target=hot) for _ in range(4)] + [threading.Thread(target=audit)]
    for t in threads: t.start()
    time.sleep(2.0); stop[0] = True
    for t in threads: t.join()
    ratio = fills["hot"] / max(fills["audit"], 1)
    print(f"STARVATION: hot fills={fills['hot']}, audit fills={fills['audit']}, ratio={ratio:.0f}x")

if __name__ == "__main__":
    {"deadlock": run_deadlock, "livelock": run_livelock, "starvation": run_starvation}[sys.argv[1]]()

Sample run on a laptop:

$ python3 four_liveness_failures.py deadlock
[T1] holds A, asks B
[T2] holds B, asks A
DEADLOCK

$ python3 four_liveness_failures.py livelock
LIVELOCK: T1 retries=2841394, T2 retries=2738812, fills=0

$ python3 four_liveness_failures.py starvation
STARVATION: hot fills=8124301, audit fills=412, ratio=19719x

Three different shapes of "the program ran and produced no useful work". Deadlock prints two lines and then nothing — the join(timeout=2) is the only thing that ends the test; without it the program hangs forever. Livelock prints CPU-burning numbers — millions of retries per second per thread, but fills=0 because no thread ever stably commits. Starvation prints something more sinister: progress is made, but distributed grossly unfairly. The audit thread did run — 412 times — while the hot loops ran 8.1 million times each. Why: starvation is the hardest of the four to detect because the program looks like it is working. Total throughput is high. Some thread is even getting in. Only when you measure per-thread throughput, latency histograms, or the time since each thread last completed a unit of work do you see the failure. KapitalKite's audit-skip incident (hypothetical) was found by an SLO on "max time since audit thread last committed", not by any aggregate metric.

Livelock is the most embarrassing in code review. The retry-and-yield pattern looks "polite" — each thread tries to be cooperative — but two polite threads in lockstep make no progress. The corridor metaphor is exact: two strangers each step aside, again, and again, until one breaks the symmetry. The fix is break the symmetry — randomised exponential backoff (each thread sleeps a random interval before retrying so they desynchronise), ticket-locks (impose an order), or a single arbitrating thread that tells the contenders who goes first (the helper pattern in concurrent data structures).

The fourth failure — priority inversion — needs real OS priorities, which Python's GIL hides. The classical reproduction is in C with pthread_setschedparam and SCHED_FIFO, the same setup as VxWorks on Mars Pathfinder.

// priority_inversion.c — Linux x86_64; needs CAP_SYS_NICE
// gcc -O2 -pthread priority_inversion.c -o pi && sudo ./pi
#include <pthread.h>
#include <stdio.h>
#include <unistd.h>
#include <sched.h>
pthread_mutex_t lock;            // default mutex — no priority inheritance
volatile int low_done = 0, high_done = 0;
void *low(void *_)  { pthread_mutex_lock(&lock); for (volatile long i=0;i<200000000L;i++); low_done=1; pthread_mutex_unlock(&lock); return 0; }
void *mid(void *_)  { for (volatile long i=0;i<800000000L;i++); return 0; }
void *high(void *_) { usleep(50000); pthread_mutex_lock(&lock); high_done=1; pthread_mutex_unlock(&lock); return 0; }

Compile with -pthread, attach SCHED_FIFO priorities (low=10, mid=20, high=30), and watch: high blocks on the lock that low holds; mid is now eligible and preempts low because mid has higher priority than low; high is now waiting transitively on mid's compute loop. The fix is a PTHREAD_PRIO_INHERIT mutex — when high blocks on low's lock, low temporarily inherits high's priority so mid cannot preempt it. The Pathfinder rover bug-fixed itself in flight by enabling priority inheritance on the offending VxWorks mutex via a remote patch.

In production: which one are you looking at?

The KapitalKite stack-sample distinguishes them quickly. Aditi's runbook (KapitalKite, hypothetical):

1. CPU per-core: are some cores hot?
   - all idle (<10%)         → likely DEADLOCK; pull jstack/pstack, look for cycles
   - all hot (>90%)          → likely LIVELOCK or busy-wait; perf top, CAS in hot path?
   - mixed                   → continue
2. Per-thread progress: any thread stuck while others advance?
   - yes, one specific thread → STARVATION; check fairness of the lock involved
   - no, everyone advancing  → not a liveness failure; look elsewhere
3. Real-time schedule: is a high-priority thread blocked behind a low-prio one?
   - yes                     → PRIORITY INVERSION; enable PRIO_INHERIT
4. None of the above         → drop into TLA+ / loom for protocol-level liveness check

The wait-for graph in two states. A cycle is, by Coffman's fourth condition, exactly the structure of a deadlock. Production deadlock detectors (PostgreSQL, MySQL InnoDB, Java's `ThreadMXBean.findDeadlockedThreads`) walk this graph periodically and abort one thread per cycle to break it.

This is the diagnostic tree that turns 2-hour incidents into 10-minute ones. Why: the four failures cluster naturally on two axes — is the CPU spinning (yes for livelock, sometimes priority inversion; no for deadlock; partial for starvation) and is some thread actually progressing (no for deadlock and livelock; yes-but-unfairly for starvation; yes-except-the-high-prio-one for priority inversion). Once you've pinned down those two answers you've ruled out three of the four candidates and can dig into the right tooling.

The wait-for graph (Holt 1972) is the formal tool for deadlock detection: nodes are threads, edges go from a thread to the resource it is waiting on, and from a resource to the thread that holds it. A cycle in this graph is a deadlock. Java's jstack -l prints thread states and locks held; ThreadMXBean.findDeadlockedThreads() walks the wait-for graph at runtime and returns the offending cycle. Real production fix: log the wait-for graph on every long-held-lock alert; you don't need a model checker if you have the cycle.

Common confusions

"Deadlock and livelock are the same — both are stuck" Stuck differently. Deadlock is quiescent stuck — threads are parked in the kernel, CPU is idle, nothing in flight. Livelock is frantic stuck — threads are running flat-out, CPU is at 100%, but no committed work. Diagnostic: look at CPU utilisation. Idle? deadlock. Hot? livelock. The fixes are opposite: deadlock is broken by not waiting (try-lock with backoff), livelock is broken by waiting longer (randomised exponential backoff to desynchronise).
"Starvation is just very long deadlock" Different shape. In a deadlock, no thread makes progress and no thread can; in starvation, the system makes progress but a particular thread is consistently passed over. Some lock implementations (pthread_mutex default, std::mutex on Linux) admit unbounded starvation under contention — the lock is correctly held, no invariants are violated, but a particular waiter may wait arbitrarily long because barging threads keep winning. Switching to a fair lock (MCS, ticket lock, FIFO) trades throughput for bounded waiting.
"Priority inversion is the OS scheduler's fault" It is the interaction of a priority-aware scheduler with a non-priority-aware mutex. Default pthread_mutex knows nothing about priorities; when a high-priority thread blocks on it, the lock-holder's priority does not change. Adding PTHREAD_PRIO_INHERIT makes the mutex priority-aware. The fix lives in the synchronisation primitive, not in the scheduler — the same scheduler is fine if all the locks understand priority.
"Thread-safe data structures cannot have liveness bugs" Wrong. A Mutex<T> makes the data thread-safe (race-free) and adds the possibility of deadlock if you take two of them in different orders. A wait-free queue is deadlock-free by construction but can still have starvation if the algorithm only guarantees lock-free (some thread makes progress) rather than wait-free (every thread makes progress in bounded steps). State the property: lock-free, wait-free, obstruction-free — they have different liveness guarantees.
"Use try_lock everywhere to avoid deadlock" Trades deadlock for livelock. If both threads try_lock and back off when they fail, two threads in lockstep can spin forever. The fix is try_lock plus randomised backoff plus a bound on retry attempts (after which one side wins by force, e.g. by acquiring locks in canonical order). Bare try_lock is necessary, not sufficient.
"Priority inheritance fixes priority inversion completely" Fixes the simplest case (two-level inversion: H waits on L, M preempts L). Higher-order inversions can remain — priority ceiling (every mutex has a ceiling priority equal to the highest-priority thread that ever takes it; a thread holding the mutex temporarily runs at the ceiling) is the stronger discipline used in real-time systems. RTOS like VxWorks and QNX expose both knobs.

Going deeper

The dining philosophers and lock ordering

Dijkstra's dining philosophers (EWD-310, 1965) is the canonical deadlock-causing setup: five philosophers around a table with five forks between them; each philosopher needs the fork on their left and the fork on their right to eat. The naive protocol — pick up left first, then right — deadlocks if all five pick up their left fork simultaneously. The classical fix is resource ordering: number the forks 0..4; every philosopher always picks up the lower-numbered fork first. Now there is no cycle in the wait-for graph; deadlock is impossible. This generalises: in any lock-using system, if all threads acquire locks in a single global order, circular wait is impossible. PaySetu enforces this in Java with a LockOrder enum and a debug-mode assertion that throws if a thread takes lock B while already holding lock A where B.order < A.order. The discipline is unfashionable but it is also the only deadlock-prevention scheme that does not depend on runtime detection or rollback.

Livelock under symmetric exponential backoff

A subtle livelock arises when two threads detect contention and back off using the same exponential schedule. Both back off 1 ms, both retry, both fail, both back off 2 ms, both retry, both fail — the schedules stay synchronised, contention never resolves. The fix is randomised backoff: each thread sleeps random.uniform(0, 2^k) ms on the kth retry. The randomisation desynchronises the threads with high probability after a few retries. Ethernet's CSMA/CD (1976) used this exact protocol — binary exponential backoff — to resolve packet collisions, and modern TCP, BitTorrent, and lock-free retry loops all inherit the trick. Why: in any retry-on-contention system, deterministic backoff schedules become a synchronisation source — the threads accidentally agree on when to retry. Randomness breaks the agreement. Real lock-free queues (Michael-Scott with backoff) use randomised backoff at the CAS site to prevent two enqueuers from livelocking on the tail-pointer CAS.

Priority ceiling vs priority inheritance

Priority inheritance: when a high-priority thread blocks on a mutex, the holder temporarily inherits the waiter's priority until it releases the mutex. Pros: minimal overhead, only kicks in on contention. Cons: chained inheritance is complex; deadlock under priority inheritance is still possible if the lock graph itself has cycles. Priority ceiling: every mutex is statically annotated with a ceiling priority equal to the highest priority of any thread that takes it; a thread holding the mutex runs at the ceiling. Pros: prevents priority inversion and deadlock between a fixed set of locks (because any thread holding a lock runs at least at the ceiling, so it cannot be preempted by another thread that needs a lock with a lower or equal ceiling). Cons: requires static analysis of which threads take which locks; over-conservative if a low-priority thread takes a high-ceiling lock for a non-critical operation. POSIX exposes both: PTHREAD_PRIO_INHERIT and PTHREAD_PRIO_PROTECT. Real-time Linux uses inheritance for rt_mutex in the kernel; QNX and VxWorks use ceiling for safety-critical applications.

Detection vs prevention vs avoidance

There are three classical strategies. Prevention attacks one of Coffman's conditions a priori — lock ordering prevents circular wait, lock-free prevents mutual exclusion, watchdog kill-and-restart prevents no-preemption. Cheap at runtime, requires discipline at design time. Avoidance uses runtime information to never enter a state from which deadlock is reachable — Banker's algorithm (Dijkstra 1965) is the textbook example, requiring threads to declare their maximum resource needs in advance. Expensive and rarely used in practice because the declaration is hard to write correctly. Detection lets deadlock happen, runs a periodic background scan of the wait-for graph for cycles, and resolves found cycles by killing one thread (rolling back its work). DBMSes (PostgreSQL, MySQL, SQL Server) use detection: every transaction is one node, lock requests are edges, the periodic deadlock-detector scans for cycles every 1 second and aborts the youngest victim with ERROR: deadlock detected. KapitalKite's order-matcher uses prevention via lock ordering; PaySetu's database tier uses detection via PostgreSQL's deadlock detector — different layers, different strategies.

Where this leads next

Liveness pathologies recur throughout this curriculum. Mutexes (Part 5) introduce specific deadlock and priority-inversion patterns and the kernel mechanics (futex_wait, PI-futex) that fix them. Lock-free data structures (Part 7) trade deadlock-freedom for ABA hazards and require memory reclamation (Part 8) — a domain where livelock manifests as retry-loop pathology under contention. Async runtimes (Part 11) introduce a uniquely subtle starvation: a coroutine that does not yield at an await point can starve every other task on the same event loop. And testing (Part 15) closes the loop with loom and TLA+, the only tools that can systematically explore liveness properties — there is no "linter" for deadlock the way there is for type errors.

References

System Deadlocks — Coffman, Elphick, Shoshani (Computing Surveys 1971) — the four necessary conditions; the canonical analysis.
Cooperating Sequential Processes — E. W. Dijkstra (EWD-123, 1965) — the dining philosophers and the Banker's algorithm, both in one document.
Some Deadlock Properties of Computer Systems — Holt (Computing Surveys 1972) — the wait-for graph formalism that production deadlock detectors still use.
Priority Inheritance Protocols: An Approach to Real-Time Synchronization — Sha, Rajkumar, Lehoczky (IEEE TC 1990) — the formal treatment of priority inheritance and ceiling.
What Really Happened on Mars — Mike Jones / Glenn Reeves (1997) — the Pathfinder priority-inversion incident; classic systems-engineering postmortem.
Real-World Concurrency — Cantrill & Bonwick (CACM 2008) — production-grounded discussion of deadlock, livelock, and the diagnostic shape of each.
The Art of Multiprocessor Programming — Herlihy & Shavit (2nd ed., 2020) — chapters 2–3 on mutual exclusion and liveness properties.
Internal — safety vs liveness properties — the formal foundation; deadlock, livelock, starvation, and priority inversion are all liveness failures.

# Reproduce this on your laptop (Linux/macOS, Python 3.11+, GCC for the C kernel)
python3 four_liveness_failures.py deadlock
python3 four_liveness_failures.py livelock
python3 four_liveness_failures.py starvation
# expect: deadlock prints "DEADLOCK" after 2-second join timeout
# expect: livelock prints millions of retries with fills=0
# expect: starvation prints a 1000x+ ratio between hot and audit threads
# For priority inversion (needs Linux + CAP_SYS_NICE):
gcc -O2 -pthread priority_inversion.c -o pi && sudo ./pi