In short

You cannot unit-test durability. A unit test runs inside your process; a crash happens between two of its instructions, somewhere your test function is never allowed to stand. The only honest way to know your store survives power loss is to actually kill it — with SIGKILL, a parent-managed kill -9, or a virtual-machine power-cut — mid-write, thousands of times, and then restart it and check that a stated invariant still holds. This chapter builds that harness. You will write a parent supervisor in Python that spawns a child writing to an append-only log, murders the child at a random instant with os.kill(pid, signal.SIGKILL), starts a fresh child that reads the log back, and verifies that every write the parent saw acknowledged is still present and every record that the parent did not yet see is either absent or a discardable partial tail. Five minutes of this loop exposes more durability bugs than a year of hand-written tests. You will also meet the four classes of failure it finds — lost writes, torn writes, reordered writes, and silent corruption — with a tiny reproducer for each, and the production-grade tools (Jepsen, ALICE, CrashMonkey, dm-log-writes, eBPF) that grown-up database teams use to catch them.

Your unit tests are green. Every record you wrote comes back. The put/get round-trip passes a thousand times in CI. You ship. Three days later, a user opens the application after a power outage and it starts with an empty database.

What went wrong is not a bug in any single line of your code. It is a bug in the gap between lines. Somewhere between the write() that returned success to Python and the fsync() that never happened — or happened to the wrong file, or happened before the directory entry was durable, or was swallowed by a lying SSD cache — a power cut slipped in and cashed the difference. Unit tests cannot see this gap because unit tests run on one side of it. They run inside the process. A crash lives between two instructions of that process, in a place the test framework is structurally forbidden from inspecting.

This chapter is the missing test. It is the test you cannot write inside the program because the program has to die for the test to mean anything. You will build a small, fast harness that kills your store at a random moment, restarts it, and checks whether the invariants survived. Five minutes of this loop will teach you more about your own code than the previous week of linting did.

Why unit tests cannot catch durability bugs

A unit test looks like this:

def test_put_get_roundtrip():
    db = AppendOnlyKV("test.log")
    db.put("k", "v")
    assert db.get("k") == "v"

It tests one thing: that within a single live process, the putget path produces the right answer. It is a useful test. It will catch a typo in the scanner. It will catch a regression where you forgot to flush before returning. It is also completely blind to every interesting durability bug, because every interesting durability bug happens across a process boundary.

Why across a boundary: the only reason durability is interesting is that something outside the process can end the process at an arbitrary moment. That something is power loss, a kernel panic, an OOM killer, or a kill -9 from an admin. A unit test function cannot invoke any of these on itself, because if it did its test framework would die with it and report nothing. The failure mode lives in the region the test cannot inhabit.

Compare the two pictures below. A unit test sees the full sequence of instructions; a crash happens between instructions in a way no assertion can observe.

The crash window between write() and fsync()A horizontal timeline for a single put() call. Four labelled tick marks from left to right: Python buffer write, write() syscall (bytes reach kernel page cache), fsync() issued, fsync() returns (bytes are durable on disk). Above the timeline, a red shaded band labelled "crash window" covers the interval between the write() syscall returning and fsync() returning. A note at the top reads "If the power cuts anywhere in the red band, the write is lost — yet put() already returned success to the caller." Below the timeline, three small icons show what survives at three representative crash times: at the first crash point the record is completely gone; at the second it is a partial tail on disk; at the third it is fully durable.crash windowpower cut anywhere in this band loses a write that put() has already "returned"put() calledwrite()bytes in page cachefsync() issuedfsync() returnsdurablecrash hererecord gonecrash herepartial tail on diskcrash hererecord fully durable
A single put() is not an instant. It is an interval, stretched across userspace, the kernel, and the disk controller. The "crash window" is everything between the moment the bytes leave your Python buffer and the moment the disk reports them persistent. A unit test cannot observe this interval because the unit test is one of the instructions inside it.

The harness in this chapter does exactly what a unit test cannot: it stands outside the store process, so when the store process is killed mid-interval, the harness is still alive to inspect what the disk actually kept.

Simulating crashes — the three levels of violence

Not every way of stopping a process tests the same thing. The ladder from "gentle" to "actually pulling the plug" has three rungs, each exposing a stricter set of durability bugs.

Rung 1 — SIGTERM or normal exit. You ask the process nicely to stop. It runs its exit handlers, flushes any Python-level buffers, closes files, and returns. Any bytes in Python's buffer make it to the kernel via the atexit handler's implicit close(). This is not a crash. It tests only that your store flushes its own buffers on a clean shutdown. Every toy gets this right on the first try.

Rung 2 — SIGKILL (kill -9). The kernel destroys the process immediately. No exit handlers run. No close() is called. Anything sitting in Python's buffer is gone — only bytes that already made it to the kernel page cache survive. This is the first real crash test: it simulates a process abort in the middle of a write, which mimics an OOM-killer strike or a segfault. It does not simulate power loss, because the kernel is still alive and will still flush its own page cache to disk in the background.

Rung 3 — power cut. The kernel dies too. The page cache is gone. Only bytes that were issued a durability barrier (fsync or a write-through FUA write) and successfully acknowledged by the disk survive. This is the test your users' hardware actually runs on them. You cannot trigger this from within the OS — you have to pull the plug, unplug the VM's virtual disk, or use a block-layer tool like dm-log-writes (covered later) to simulate it deterministically.

There is also SIGSTOP, which pauses a process without killing it. SIGSTOP is not a crash — it is the pause button. But it is useful in a harness for a different reason: you can SIGSTOP a writer, inspect the on-disk state while it is frozen, and SIGCONT it, catching it mid-write in a reproducible way.

Three rungs of crash simulationA vertical ladder with three rungs. Rung 1, SIGTERM, labelled "gentle — exit handlers run, Python buffers flushed." Rung 2, SIGKILL, labelled "real crash — kernel keeps page cache but process dies mid-write." Rung 3, power cut, labelled "hardest — kernel dies too, only durable bytes survive." A column on the right shows which buffers survive at each rung as a checklist.Rung 1 — SIGTERMexit handlers run; Python buffer flushednot a real crash testRung 2 — SIGKILL (kill -9)process dies immediately, no cleanupkernel still flushes page cacheRung 3 — power cutkernel dies, page cache goneonly fsync'd bytes survivewhat survivesPython bufferyesnonokernel page cacheyesyesnofsync'd + honest diskyesyesyesTERMKILLPLUG
Three levels of crash simulation, and which caching layer each one invalidates. For a single-machine store with fsync on every commit, a rung-2 test with SIGKILL plus a fresh process that reads the log back is close enough to a rung-3 power cut to catch the overwhelming majority of bugs — and it runs a thousand times a minute.

For the rest of this chapter, the harness uses rung 2 — SIGKILL — because it is cheap, deterministic, and scriptable. For real production hardening, you graduate to rung 3 with dm-log-writes or a VM, which is covered in Going deeper.

The harness — writes, kills, recovers, in a loop

The shape of a power-loss test harness is simple enough to state in one sentence: a parent process spawns a writer child, tells the writer which records to produce and in what order, kills the writer at a random moment, spawns a reader child on the same file, and checks the invariants. Let us build that.

Here is the full harness in ~50 lines of Python.

# crash_harness.py — power-loss tester for AppendOnlyKV
import os, sys, time, random, signal, subprocess, json

LOG = "harness.log"
ACK = "harness.ack"   # parent-visible record of which puts the writer claimed were durable

def writer():
    """Child: write N records to the log; append each k to ACK only after fsync returns."""
    from appendkv import AppendOnlyKV   # the store from chapter 2, now fsync'd per chapter 3
    db = AppendOnlyKV(LOG)
    ack = open(ACK, "a", buffering=1)   # line-buffered: acks are visible to parent promptly
    for i in range(100_000):
        db.put(f"key{i}", f"value{i}")           # this call must fsync internally
        ack.write(f"{i}\n")                      # only written AFTER fsync returns
        # no sleep; go as fast as we can so the parent's kill lands mid-write

def parent_loop(trials=200):
    losses = torn = reordered = 0
    for t in range(trials):
        for path in (LOG, ACK):
            if os.path.exists(path): os.remove(path)

        child = subprocess.Popen([sys.executable, __file__, "writer"])
        # pick a random kill delay: long enough for many writes, short enough to catch mid-write
        time.sleep(random.uniform(0.005, 0.050))
        child.send_signal(signal.SIGKILL)
        child.wait()

        acked = set()
        if os.path.exists(ACK):
            with open(ACK) as f:
                acked = {int(line) for line in f if line.strip().isdigit()}

        from appendkv import AppendOnlyKV
        db = AppendOnlyKV(LOG)
        present = {int(k[3:]): v for k, v in db.scan_all() if k.startswith("key")}

        # invariant 1: every acked write must be present with its declared value
        missing = {i for i in acked if i not in present}
        if missing: losses += len(missing)

        # invariant 2: present values must match what the writer deterministically produced
        wrong = {i for i, v in present.items() if v != f"value{i}"}
        if wrong: torn += len(wrong)

        # invariant 3: present keys should be a prefix of what the writer intended (no reorder)
        if present:
            max_seen = max(present)
            gap = {i for i in range(max_seen) if i not in present}
            if gap: reordered += len(gap)

        print(f"trial {t:03d}  acked={len(acked):>6}  present={len(present):>6}  "
              f"lost={len(missing)} torn={len(wrong)} gaps={len(gap) if present else 0}")

    print(json.dumps({"trials": trials, "total_lost": losses,
                      "total_torn": torn, "total_gaps": reordered}))

if __name__ == "__main__":
    (writer if len(sys.argv) > 1 and sys.argv[1] == "writer" else parent_loop)()

The harness assumes AppendOnlyKV.put calls fsync internally (chapter 3 made that change) and that a helper scan_all() yields (key, value) pairs from the log, silently skipping any torn tail line (a line that does not contain =, or whose checksum fails, once chapter 5 adds one).

Let us unpack the three invariants the parent checks, because they are the ones that map to the bugs you are hunting.

Invariant 1 — no lost acks. If the writer wrote i into the ACK file, it did so only after db.put(...) for record i returned. By the contract of put, that means the record was durable. So on recovery, every i that is in ACK must be in the log. Any that is not is a lost write — the worst durability bug, because the writer was told its data was safe and it wasn't.

Invariant 2 — no corrupted values. The writer produces deterministic records: key{i}value{i}. If the log contains key42 = valueXX for some mangled XX, the record has been corrupted — either by a partial write that scrambled a byte boundary, or by the scanner misinterpreting a torn tail as valid. This is a torn write (or a scanner bug).

Invariant 3 — no gaps. The writer wrote records in order, and each put returned before the next one started. So if key99 is durable but key50 is missing, something reordered the writes underneath you. This should be impossible in a single-threaded single-file append — if you see it, you have a bug in how you are flushing, or your "append" is actually doing a seek somewhere, or the filesystem is reordering metadata updates in a way your recovery is not handling.

The harness loopHorizontal flow of five boxes: parent spawns writer; writer appends+fsyncs records and writes acks; parent sleeps a random 5-50 ms; parent sends SIGKILL; parent reads log and ack file, compares. An arrow loops from the rightmost box back to the leftmost, labelled "repeat 200 times".1. spawnwriter childvia subprocess2. write+ackput() → fsyncack append3. kill -9SIGKILL afterrandom 5-50ms4. recover + checkscan log; compare to ACKverify invariantsloop 200 times — each iteration ~30ms — total ~6 seconds for a full sweep
The harness loop. Step 2 is inside the writer child; steps 1, 3, and 4 are inside the parent. Only the parent sees the full timeline, which is why only the parent can check the invariants.

One trial of the harness, step by step

Say the writer has just been killed, and the parent is about to check the invariants. Here is what the filesystem looks like, and what each check says.

harness.log:                  harness.ack:
key0=value0                   0
key1=value1                   1
key2=value2                   2
key3=value3                   3
key4=value                    (fifth ack never written)

The last line in harness.log is torn — value was cut off before 4\n made it to the page cache. harness.ack contains four lines, so the writer acknowledged putting key0 through key3. (It had called db.put("key4", "value4") before the kill, but put never returned, so the ack for 4 was never written.)

Check 1 — lost acks. acked = {0, 1, 2, 3}. Scanning the log, present = {0, 1, 2, 3} (the torn line 4 is skipped by the scanner because partition("=") gives key "key4" and value "value" — hmm, actually that is not skipped by our text-format scanner, which is exactly the point). With a checksum-framed record format from chapter 5, the torn line would fail its CRC and be discarded; present = {0,1,2,3}, and invariant 1 passes.

Check 2 — corrupted values. With the text-format scanner, present now includes a bogus {4: "value"}, and the invariant 2 check v != f"value{i}" trips: we have detected a torn write surviving as a fake record. This is the harness doing its job.

Check 3 — gaps. None. The present keys are a contiguous prefix.

Verdict. On one trial the harness would report lost=0 torn=1 gaps=0, and you would go fix the scanner to require framing or checksums. You run 200 trials, aggregate, and get a statistical picture of where your store fails.

Run the harness. On a correct store with proper fsync + checksumming, all three counters are zero across 200 trials. On the toy from chapter 2 — no fsync, no checksums — you will see lost writes and torn writes immediately. Every production database team runs a harness of this shape, continuously, on every commit.

The four classes of failure you will find

When you run the harness on a buggy store, the failures do not distribute evenly. They fall into four classes, each with a different root cause and a different fix. Learn these four; every failure you will ever debug is one of them.

Lost writes

A write that was acknowledged as durable is gone after recovery. This is the worst class — the store told the application its data was safe and it wasn't.

Root cause. The write was not actually durable at the moment the ack went out. Somewhere between put() and the ack.write(), a layer was skipped — no fsync, a flush() that pushed only to the page cache, a missing directory fsync after a new file was created.

Tiny reproducer.

# BAD: writes acknowledged before fsync → lost on crash
def put_bad(self, k, v):
    self._f.write(f"{k}={v}\n")
    self._f.flush()          # only pushes to kernel page cache
    # missing: os.fsync(self._f.fileno())
    return "acked"           # we lied

# GOOD: fsync before we return
def put_good(self, k, v):
    self._f.write(f"{k}={v}\n")
    self._f.flush()
    os.fsync(self._f.fileno())
    return "acked"

Torn writes

A record on disk is half-written — some bytes of the new value, some bytes of garbage or of the old value. The scanner either sees it as a malformed record (and hopefully rejects it) or, worse, interprets it as a real record with a wrong value.

Root cause. A write of N bytes is not atomic across the stack. Your write() might translate to several sector or block writes internally, and only some of them complete before the crash. Filesystems like ext4 give you atomicity for writes within a single 4KB page under certain mount options, but anything larger can tear across pages.

Tiny reproducer. With the text format, a record like age=16\n can be cut off anywhere — including just after ag — and look like a new line that starts with ag and continues into whatever garbage is on disk. The fix is length-prefixed, checksum-framed records (chapter 5):

# framed record: [4-byte length][payload][4-byte CRC32]
import struct, binascii
def encode(kv_bytes: bytes) -> bytes:
    return struct.pack("<I", len(kv_bytes)) + kv_bytes + struct.pack("<I", binascii.crc32(kv_bytes))

def decode_next(f):
    hdr = f.read(4)
    if len(hdr) < 4: return None           # clean EOF
    (n,) = struct.unpack("<I", hdr)
    body = f.read(n + 4)
    if len(body) < n + 4: return None      # torn tail: stop recovery here
    payload, crc = body[:n], struct.unpack("<I", body[n:])[0]
    if crc != binascii.crc32(payload): return None  # corruption: stop recovery here
    return payload

Reordered writes

The log contains key99 but is missing key50, even though the writer produced them in order. The filesystem or the disk has committed later writes before earlier ones.

Root cause. Two very different things can produce this symptom. First, the directory entry and the file data are separate metadata operations; creating a new file then writing to it can have the data land on disk before the name does, so on recovery the data is unreachable. Fix: after creating a file, fsync the parent directory. Second, the OS writeback scheduler and the disk's internal write buffer can reorder queued writes; fsync inserts a barrier but does not guarantee ordering across unrelated fsyncs. Fix: issue fsyncs in the strict order you want them persisted, and do not use nobarrier mount options.

Tiny reproducer.

# BAD: new file written to, but parent directory not fsynced
# on recovery, the file may not appear in its directory at all
f = open("new.log", "w")
f.write("data\n")
f.flush(); os.fsync(f.fileno())   # file data is durable
f.close()
# missing: dir_fd = os.open(".", os.O_DIRECTORY); os.fsync(dir_fd)
# on a crash before the dir entry is flushed, "new.log" is nowhere

# GOOD: fsync the directory too
f = open("new.log", "w")
f.write("data\n")
f.flush(); os.fsync(f.fileno())
f.close()
dir_fd = os.open(".", os.O_DIRECTORY)
os.fsync(dir_fd)
os.close(dir_fd)

Silent corruption

A record reads back as valid — no torn framing, no missing bytes — but its value is not what was written. A bit has flipped on the platter. No syscall will catch this because, as far as the OS is concerned, the disk returned what it returned and there was no I/O error.

Root cause. Cosmic rays, DRAM errors, SSD wear, firmware bugs, controller bit-flips. The frequency is low — on consumer SSDs, order of one error per 10^{15} bits read — but a large database reads 10^{15} bits in a week.

Fix. Per-record checksums (CRC32C is the standard choice) that you verify on every read. If the checksum fails, you do not return the value to the application; you return an error. Recovery from that error is a separate question — it is why real systems replicate. A single-disk database can detect silent corruption but cannot repair it.

Tiny reproducer. Simulate a bit-flip:

# simulate a single-bit flip somewhere in the file
import os
with open("harness.log", "r+b") as f:
    f.seek(os.path.getsize("harness.log") // 2)
    byte = f.read(1)
    f.seek(-1, 1)
    f.write(bytes([byte[0] ^ 0x01]))   # flip the lowest bit
# re-run the scanner. Without checksums, you will read a wrong value and not know.
# With CRC32 framing, the scanner will reject that record with a corruption error.
The four classes of durability failureA two-by-two grid. Top-left: Lost writes — "ack issued but data gone"; fix "fsync before ack." Top-right: Torn writes — "record half-written"; fix "length prefix + CRC." Bottom-left: Reordered writes — "later record present, earlier absent"; fix "fsync directory, ordered barriers." Bottom-right: Silent corruption — "valid record, wrong bytes"; fix "CRC32C on every read."Lost writessymptom: ack issued, data gone on recoveryroot cause: fsync skipped or in wrong orderfix: fsync BEFORE the ack returnsTorn writessymptom: record half-written on diskroot cause: write() not atomic across bytes/pagesfix: length prefix + CRC framingReordered writessymptom: later record present, earlier absentroot cause: metadata/data reorder; no dir fsyncfix: fsync parent directory; ordered barriersSilent corruptionsymptom: record reads as valid, value is wrongroot cause: bit-flip in media/DRAM/controllerfix: CRC32C on every record, verified on read
The four classes of durability bug your harness will find. Fix each with the marked mechanism. Chapter 5 builds length-prefixed framing with CRC32C into our store.

Jepsen-lite for a single-node store

The harness above is what you write on day one. It covers SIGKILL — rung 2 of the violence ladder. To climb further, you need two more capabilities:

Fault injection below the store. Instead of killing the process, intercept the syscalls it makes, and lie to it about which ones succeeded. For fsync in particular, you want a mode where the store thinks the data is durable but actually is not — because that is what a lying SSD does. On Linux, the cleanest way is an eBPF probe that hooks sys_fsync and, in a fraction of cases, turns the fsync into a no-op before the kernel ever runs it. Your harness now tests the store's behaviour under a realistically adversarial storage stack.

# conceptual — real eBPF code is in C with bcc or bpftrace
# bpftrace -e 'kprobe:vfs_fsync /pid == TARGET/ { @n = @n + 1; if (@n % 10 == 0) { override(0); } }'
# every 10th fsync returns success without actually doing anything

Signal-driven probes inside the store. A SIGSTOP from the parent freezes the store at a random instant. The parent can inspect the on-disk state — a snapshot — then SIGCONT the store and compare to see what moved. This is how Jepsen-style tools narrow down which exact operation was in flight when the failure happened.

This combination — parent-controlled kills, eBPF-based fsync interception, on-disk snapshots between signals — is sometimes called Jepsen-lite, because it brings the Jepsen style of adversarial testing (normally aimed at distributed systems) down to a single-node store. For single-machine correctness, this is enough. When you get to replicated systems in Build 5, you graduate to Jepsen proper.

Common confusions

Going deeper

If you just wanted a harness that shakes out the obvious durability bugs, the 50-line loop above is enough. The rest of this section points at the industrial-strength tools that real storage teams use, and the historical incidents that motivated them.

Jepsen proper

Jepsen — built by Kyle Kingsbury — is the gold-standard adversarial testing framework for database correctness. It is aimed primarily at distributed systems (it partitions the network, skews clocks, pauses nodes, reshuffles packets) but many of its ideas scale down cleanly to a single-node store: the harness records the history of every operation the client issued, every response the database returned, and every fault injected, and then checks at the end whether a linearisable history could have produced that sequence. When it cannot, the tool prints the shortest counterexample. Kingsbury's Jepsen reports have found correctness violations in MongoDB, Cassandra, etcd, CockroachDB, RabbitMQ, and many others — published, reproducible, and archived at jepsen.io.

The technique transfers down: a single-node Jepsen-lite tool records every put and get, kills the process at a random moment, restarts, and checks that the returned history is compatible with some serial order consistent with the ack/no-ack signals. If an ack'd put is not visible on recovery, that is a linearisability violation by the database's own contract.

ALICE and CrashMonkey

ALICE (OSDI 2014, Pillai et al. — the same paper cited in chapter 1) is an academic tool that systematically explores every valid crash state a filesystem might produce between two fsync points, and runs the application's recovery code on each. It modelled ext3, ext4, btrfs, xfs; it found durability bugs in LevelDB, SQLite, HDFS, Git, Mercurial, and VMware's WAL. ALICE is what taught the industry that "call fsync where it matters" is a lot harder in practice than it sounds.

CrashMonkey (OSDI 2018, Mohan et al.) is the spiritual successor, more automated and scalable. It snapshots the block device after every barrier-crossing write, replays every possible subset of unsynced operations, and checks the application state on recovery. It found crash-consistency bugs in mainline ext4 and btrfs.

You are not expected to run ALICE or CrashMonkey yourself during Build 1 — they are substantial tools — but you should know their names and shape. When you graduate to a real storage engine, they are the tools you reach for.

dm-log-writes — deterministic power-cut replay

Linux ships a device-mapper target called dm-log-writes that records every block write issued to a disk, with barriers marked. You run your database on top of a dm-log-writes device; it produces a log of every write. Then you replay the log up to any chosen point (just before a barrier, in between two barriers, at the exact instant of the Nth fsync) and mount the resulting image as the state of the disk at that simulated crash moment. Point your recovery code at it. If recovery cannot handle every prefix that ended between barriers, you have a bug.

This is the closest thing to a deterministic power-cut test on commodity Linux, and it is what kernel filesystem developers use to test their own changes. It is used by Btrfs, XFS, and the LWN-published benchmarks of database fsync behaviour.

eBPF for fsync interception and lying-disk simulation

eBPF lets you attach small programs to kernel tracepoints and syscall entry/exit points without modifying the kernel. For our use:

bcc and bpftrace are the usual front-ends. Written correctly, an eBPF-driven harness lets you model the behaviour of dishonest consumer-grade SSDs on honest enterprise hardware — which means your tests cover the user's hardware, not yours.

Real-world incidents that motivated all of this

The reason every serious database team runs a crash harness is that every serious database has lost data to a crash bug. A partial roll-call:

The pattern is consistent: the bug is not in the obvious line of code. It is in the interaction between the store, the filesystem, and the disk, exposed by a specific schedule of operations that a crash interrupts. The only tool that finds it is a harness that runs that schedule and that crash, over and over.

Where this leads next

The harness you have built can find lost writes, torn writes, reordered writes, and silent corruption — but fixing the first two requires a new record format. The text-line format from chapter 2 cannot detect a torn tail reliably, and it has no place to hang a checksum.

References

  1. Pillai et al., All File Systems Are Not Created Equal: On the Complexity of Crafting Crash-Consistent Applications, OSDI 2014 — the ALICE paper. The empirical demonstration that write + fsync is, in practice, much harder than it looks.
  2. Mohan et al., Finding Crash-Consistency Bugs with Bounded Black-Box Crash Testing, OSDI 2018 — the CrashMonkey paper, and the current state of the art in automated crash testing.
  3. Kyle Kingsbury, Jepsen — the canonical adversarial testing framework and its library of database analyses (MongoDB, etcd, Cassandra, CockroachDB, and many more).
  4. PostgreSQL wiki, Fsync Errors — the 2018 "fsyncgate" post-mortem. A required read on why fsync is not as simple as its man page.
  5. Linux kernel documentation, dm-log-writes — the deterministic block-layer power-cut replay tool used by kernel filesystem developers.
  6. SQLite, How SQLite Is Tested — the test strategy of one of the most rigorously crash-tested databases in existence, including anomaly testing and fault injection.