Digitizing Errors — padho-wiki

Note: Company names, engineers, incidents, numbers, and scaling scenarios in this article are hypothetical — even when they resemble real ones. See the full disclaimer.

In short

A continuous single-qubit error like e^{-i\epsilon X/2} = \cos(\epsilon/2)\,I - i\sin(\epsilon/2)\,X looks like a problem for error correction — there are infinitely many such rotations, one for every angle. The discretisation theorem resolves the paradox. Any single-qubit operator expands as aI + bX + cY + dZ in the Pauli basis; when applied to an encoded state and followed by syndrome measurement, the continuous superposition collapses onto one of four discrete outcomes — "no error", "X error", "Y error", or "Z error" — each with a definite probability. Whichever outcome is measured, the recovery is a single Pauli correction. The continuous error space becomes a finite, correctable set the instant the syndrome is read. This is why a QEC code that handles \{I, X, Y, Z\} on each qubit automatically handles every continuous error, every amplitude-damping event, every unknown-axis rotation, every Kraus channel supported on Pauli's. One theorem; the entire field of fault-tolerant quantum computing rests on it.

In 1994, when Peter Shor and Andrew Steane started thinking seriously about protecting quantum information from noise, the room was full of physicists who thought they could not succeed. The objection, raised by David DiVincenzo, William Unruh, Rolf Landauer, and others, was compact and lethal:

Quantum errors are continuous. A real qubit does not flip cleanly from |0\rangle to |1\rangle — it rotates by some tiny angle \epsilon, then another tiny angle, then another. Over many gates, these rotations accumulate into unbounded drift. To correct such a continuous error, you would have to measure the angle precisely, which collapses the very superposition you are trying to protect. Classical error correction works because classical errors are discrete (a bit is either flipped or not flipped). Quantum errors are not.

The objection sounds airtight. A bit has two states; a qubit has a continuum of states on the Bloch sphere; therefore quantum errors form a continuum; therefore no finite correction scheme can cover them.

And yet Shor's 1995 paper corrected every single-qubit error, including continuous rotations, using only a finite list of corrections: I, X, Y, Z on each physical qubit. How?

The answer is this chapter. It is one idea, and once you have it, the rest of QEC makes sense in a way it cannot before. The idea is called the discretisation of errors, or simply the discretisation theorem. It sits at the heart of every quantum error correcting code, every threshold theorem, every fault-tolerance proof.

The key observation — Pauli's are a basis

Pick any 2\times 2 complex matrix M. Write it down as an array:

M \;=\; \begin{pmatrix} m_{00} & m_{01} \\ m_{10} & m_{11} \end{pmatrix}.

This matrix has four complex entries; it is a four-complex-dimensional object. The Pauli matrices — I, X, Y, Z — are also 2\times 2 complex matrices, and there are exactly four of them. The question is whether the four Pauli's span the space of all 2\times 2 matrices.

They do. Here is the explicit decomposition. Write

M \;=\; a\,I + b\,X + c\,Y + d\,Z

with complex coefficients a, b, c, d. Expanding each Pauli:

I = \begin{pmatrix} 1 & 0 \\ 0 & 1 \end{pmatrix}, \quad X = \begin{pmatrix} 0 & 1 \\ 1 & 0 \end{pmatrix}, \quad Y = \begin{pmatrix} 0 & -i \\ i & 0 \end{pmatrix}, \quad Z = \begin{pmatrix} 1 & 0 \\ 0 & -1 \end{pmatrix}.

Comparing entries with M:

a + d = m_{00}, \quad a - d = m_{11}, \quad b - ic = m_{01}, \quad b + ic = m_{10}.

Solve the four linear equations:

a = \tfrac{m_{00} + m_{11}}{2}, \quad b = \tfrac{m_{01} + m_{10}}{2}, \quad c = \tfrac{i(m_{01} - m_{10})}{2}, \quad d = \tfrac{m_{00} - m_{11}}{2}.

Why this always has a solution: the four coefficients are four independent linear combinations of the four matrix entries. A 4\times 4 system with an invertible coefficient matrix (which this is — the Pauli matrices are linearly independent, a quick determinant check confirms it) always has a unique solution. Every 2\times 2 matrix has exactly one decomposition into Pauli's.

So every 2\times 2 matrix is a unique linear combination of \{I, X, Y, Z\}. In particular:

Every single-qubit unitary U is a linear combination of Pauli's.
Every single-qubit Kraus operator K is a linear combination of Pauli's.
Every single-qubit error you can physically produce is a linear combination of Pauli's.

This is the foundation on which the rest of the chapter is built.

The four Pauli matrices $\{I, X, Y, Z\}$ form a basis for the four-complex-dimensional space of $2\times 2$ matrices. Every single-qubit operator — unitary, Kraus, Hermitian, general — is a unique linear combination of the four. This is the raw fact behind the discretisation theorem.

A continuous error as a superposition

Pick a concrete continuous error: a tiny rotation of a single qubit about the x-axis by angle \epsilon. From the Pauli X, Y, Z chapter, you know this is the unitary

R_x(\epsilon) \;=\; e^{-i\epsilon X/2} \;=\; \cos(\epsilon/2)\,I - i\sin(\epsilon/2)\,X.

Why the identity e^{-i\theta X} = \cos\theta\,I - i\sin\theta\,X: expand the exponential as a power series e^{-i\theta X} = I - i\theta X - \tfrac{\theta^2}{2} X^2 + \tfrac{i\theta^3}{6} X^3 - \cdots. Use X^2 = I to collapse even powers to I and odd powers to X. Grouping gives \cos\theta\,I - i\sin\theta\,X. Here \theta = \epsilon/2.

The coefficients are real: a = \cos(\epsilon/2) for the I term, b = -i\sin(\epsilon/2) for the X term (this b is purely imaginary — the overall "continuous error" has both real and imaginary Pauli components). For tiny \epsilon, the I coefficient is close to 1 and the X coefficient is close to 0.

So this specific continuous error, at the operator level, is a linear combination of the identity and X. Both are in the Pauli basis. Only two of the four Pauli's are involved because R_x is a rotation about a Pauli axis; a general rotation R_{\hat n}(\epsilon) about an arbitrary axis would involve all four.

Now apply this operator to an encoded state. Suppose you have a logical qubit \alpha|0\rangle_L + \beta|1\rangle_L encoded in Shor's 9-qubit code (or any other QEC code that handles X errors on qubit j). Apply R_x(\epsilon) to the j-th physical qubit:

R_x(\epsilon) \cdot \text{encoded state} \;=\; \cos(\epsilon/2)\,|\text{encoded}\rangle \;-\; i\sin(\epsilon/2)\,X_j|\text{encoded}\rangle.

This is a quantum superposition of two worlds: the world where nothing happened (amplitude \cos(\epsilon/2)) and the world where a clean X error hit qubit j (amplitude -i\sin(\epsilon/2)). The continuous error has become a superposition of discrete Pauli errors, linearly in the amplitudes.

A continuous rotation $R_x(\epsilon)$ applied to an encoded state splits the state into two branches in superposition: one where "nothing happened" with amplitude $\cos(\epsilon/2)$, and one where "$X_j$ hit" with amplitude $-i\sin(\epsilon/2)$. The branches are not "probabilities" yet — they coexist as a quantum superposition. Measurement is what turns the superposition into an outcome.

The syndrome measurement — continuous becomes discrete

Here is where the magic happens. You do not measure the data qubits directly — that would destroy the logical information. Instead, you measure the syndrome: a collection of commuting Pauli operators whose outcomes tell you which error occurred without revealing \alpha or \beta.

For Shor's code, the syndromes are Z_iZ_j within each block (for X errors) and X_iX_j\cdots across blocks (for Z errors). For the bit-flip code, the syndromes are Z_0Z_1 and Z_1Z_2. In every QEC code, each syndrome operator has the property that:

On the "no-error" branch, it returns +1 (eigenvalue).
On the "X_j error" branch, some syndrome flips to -1, uniquely identifying the affected qubit.

Start from the superposition:

|\Phi\rangle \;=\; \cos(\epsilon/2)\,|\psi\rangle_L \;-\; i\sin(\epsilon/2)\,X_j|\psi\rangle_L.

The "no-error" branch |\psi\rangle_L is a +1 eigenstate of every syndrome. The "X_j error" branch X_j|\psi\rangle_L is a -1 eigenstate of the syndrome that detects errors on qubit j (and +1 of the others). So the two branches belong to different eigenspaces of the syndrome operator.

Now measure the syndrome. By the usual rule of quantum measurement, the state collapses onto one of the eigenspaces:

With probability |\cos(\epsilon/2)|^2 = \cos^2(\epsilon/2), the outcome is "no error", and the post-measurement state is |\psi\rangle_L — the original, untouched logical state.
With probability |{-i}\sin(\epsilon/2)|^2 = \sin^2(\epsilon/2), the outcome is "X_j error", and the post-measurement state is X_j|\psi\rangle_L — a clean, discrete X error on qubit j, the kind the code was designed to correct.

Why the -i factor washes out of the probability: |{-i}\sin(\epsilon/2)|^2 = (-i)(i)\sin^2(\epsilon/2) = \sin^2(\epsilon/2). The complex-number modulus kills the phase; only the magnitude squared matters for probability.

This is the discretisation. Before the measurement, the error was continuous: a superposition of identity and X_j with continuous-valued amplitudes \cos(\epsilon/2) and -i\sin(\epsilon/2). After the measurement, the error is discrete: either nothing happened, or a clean X_j happened. No intermediate state; no partial rotation; no "half an error".

And the correction is correspondingly discrete. If the outcome was "no error", do nothing. If the outcome was "X_j error", apply X_j (which is its own inverse: X_j \cdot X_j = I). Either way, the logical state ends up as |\psi\rangle_L, exactly as it was before the rotation.

Why this matters. Without syndrome measurement, QEC would be impossible — you would have to track continuous error accumulation, which the no-cloning theorem forbids. Syndrome measurement is what converts the continuous error space into a discrete correction problem. It is not a side-effect of QEC; it is the central mechanism. Every modern code, from the surface code to the colour code to LDPC codes, works by the same trick: measure an operator whose eigenvalues index the discrete error classes, let measurement do the "collapsing", then apply a finite recovery.

The discretisation theorem — formal statement

Discretisation theorem

Let C be a quantum error-correcting code that corrects every single-qubit Pauli error \{I_j, X_j, Y_j, Z_j\} on each physical qubit j. Then C corrects every single-qubit error describable by any linear operator on that qubit — every unitary rotation, every Kraus operator of an amplitude-damping or depolarising channel, every possible single-qubit interaction with a noisy environment.

Proof. Let E_j be any single-qubit operator (linear map) acting on qubit j. Expand it in the Pauli basis:

E_j \;=\; a\,I_j + b\,X_j + c\,Y_j + d\,Z_j

for some complex a, b, c, d. Apply to the encoded state:

This is a superposition of four branches, one for each Pauli. Each branch is a \pm 1 eigenstate of the various syndrome operators — with a unique syndrome pattern per branch, because the code by assumption corrects each Pauli.

Measuring the syndrome collapses the state onto one of the four branches:

With probability |a|^2, onto |\psi\rangle_L (no error).
With probability |b|^2, onto X_j|\psi\rangle_L.
With probability |c|^2, onto Y_j|\psi\rangle_L.
With probability |d|^2, onto Z_j|\psi\rangle_L.

(These probabilities sum to |a|^2 + |b|^2 + |c|^2 + |d|^2, which equals 1 only if E_j is unitary; for non-unitary Kraus operators they sum to something less, and the remainder probability goes into orthogonal error-correcting branches that the code also handles.)

Apply the Pauli correction matching the measured syndrome. The state returns to |\psi\rangle_L. The code has corrected an arbitrary single-qubit error E_j using only a finite list of four corrections. \blacksquare

Why the non-unitary case still works: if E is a Kraus operator — part of a full channel \mathcal E = \sum_k K_k \rho K_k^\dagger — then each K_k expands as its own Pauli combination. Each branch has its own syndrome and its own correction. The total probability of "some syndrome fires" is 1 because the full channel is trace-preserving. The code handles every Kraus operator independently, by the same argument.

Worked examples

Example 1: Phase-flip code catches a tiny Z rotation

Take the 3-qubit phase-flip code from phase-flip-code. Its logical states are |0\rangle_L = |{+}{+}{+}\rangle and |1\rangle_L = |{-}{-}{-}\rangle. Its stabilisers are X_0X_1 and X_1X_2, which flag Z errors. The code corrects any single Z error on one of the three qubits.

Now imagine a continuous Z error — a tiny rotation about the z-axis by angle \theta on qubit 1:

R_z(\theta) \;=\; e^{-i\theta Z_1/2} \;=\; \cos(\theta/2)\,I - i\sin(\theta/2)\,Z_1.

Does the phase-flip code correct this?

Step 1. Apply the rotation to the encoded state. Start with an arbitrary logical state \alpha|0\rangle_L + \beta|1\rangle_L. Apply R_z(\theta) on qubit 1:

A superposition of two branches: identity (with amplitude \cos(\theta/2)) and Z_1 error (with amplitude -i\sin(\theta/2)). Why splitting works: linearity of the rotation on the encoded state. The rotation acts on the first physical qubit only; the logical state is a linear combination that the rotation distributes across.

Step 2. Measure the syndrome. For the phase-flip code, a Z_1 error anticommutes with the stabiliser X_0X_1: (X_0X_1)(Z_1) = -Z_1(X_0X_1), because X and Z anticommute on qubit 1. So X_0X_1 has eigenvalue -1 on the Z_1-error branch and +1 on the no-error branch. The second stabiliser X_1X_2 also anticommutes with Z_1 (same reasoning), so it also reads -1 on the error branch.

Syndrome (s_1, s_2) = (+1, +1) or (-1, -1). Identifying the error:

(+1, +1): no error → do nothing.
(-1, -1): Z_1 error → apply Z_1 correction.

Step 3. Compute the probabilities.

P(\text{no error}) = |\cos(\theta/2)|^2 = \cos^2(\theta/2).
P(Z_1\text{ error}) = |-i\sin(\theta/2)|^2 = \sin^2(\theta/2).

For small \theta, \sin^2(\theta/2) \approx \theta^2/4 — quadratically suppressed in the rotation angle.

Step 4. Apply the correction. Either way, the state is restored. If "no error", the state is already \alpha|0\rangle_L + \beta|1\rangle_L. If "Z_1 error", apply Z_1 to get back:

Z_1 \cdot Z_1(\alpha|0\rangle_L + \beta|1\rangle_L) \;=\; \alpha|0\rangle_L + \beta|1\rangle_L.

Why Z_1 Z_1 = I: the Pauli matrices square to identity, Z^2 = I. So applying Z_1 twice is the identity. Correcting a Z_1 error means applying Z_1.

Result. The phase-flip code correctly handles the continuous rotation R_z(\theta) on qubit 1, collapsing it to either the no-error outcome (probability \cos^2(\theta/2)) or a clean Z_1 outcome (probability \sin^2(\theta/2)). Both outcomes restore the logical state after correction.

The phase-flip code catching a continuous $Z$ rotation on qubit 1. The rotation splits the state into two branches (no error, $Z_1$ error). Syndrome measurement collapses onto one branch with probability $\cos^2(\theta/2)$ or $\sin^2(\theta/2)$. The matching Pauli correction restores the logical state in both cases.

Example 2: Amplitude damping on one qubit of the bit-flip code

Amplitude damping is the physical process by which a qubit loses energy to its environment — the |1\rangle state decays toward |0\rangle with probability \gamma per unit time. Its Kraus operators are

K_0 \;=\; \begin{pmatrix} 1 & 0 \\ 0 & \sqrt{1-\gamma} \end{pmatrix}, \qquad K_1 \;=\; \begin{pmatrix} 0 & \sqrt\gamma \\ 0 & 0 \end{pmatrix}.

K_0 is "no decay happened" (but the |1\rangle amplitude has shrunk by \sqrt{1-\gamma}). K_1 is "decay happened" — the qubit dropped to |0\rangle and emitted a photon.

The bit-flip code catches X errors on one of three qubits, using Z-stabilisers. Will it catch amplitude damping?

Step 1. Expand K_0 in Pauli's. Substitute into the decomposition formulas:

K_0 \;=\; \tfrac{1 + \sqrt{1-\gamma}}{2}\,I + \tfrac{1 - \sqrt{1-\gamma}}{2}\,Z.

Why only I and Z: K_0 is diagonal, so b = 0 (no off-diagonal X-type terms) and c = 0 (no Y-type terms). Only I and Z contribute, matching what you would get by symmetry.

For small \gamma, \sqrt{1-\gamma} \approx 1 - \gamma/2, so K_0 \approx (1 - \gamma/4)I + (\gamma/4)Z. The identity part dominates; a small Z component shows up.

Step 2. Expand K_1 in Pauli's. K_1 = \sqrt\gamma\,|0\rangle\langle 1|. The outer product |0\rangle\langle 1| = \tfrac{1}{2}(X + iY). So

K_1 \;=\; \tfrac{\sqrt\gamma}{2}\,(X + iY).

Why |0\rangle\langle 1| = (X + iY)/2: check the matrix |0\rangle\langle 1| = \begin{pmatrix} 0 & 1 \\ 0 & 0 \end{pmatrix}. The combination (X + iY)/2 = (\begin{smallmatrix} 0 & 1 \\ 1 & 0 \end{smallmatrix})/2 + i(\begin{smallmatrix} 0 & -i \\ i & 0 \end{smallmatrix})/2 = (\begin{smallmatrix} 0 & 1/2 \\ 1/2 & 0 \end{smallmatrix}) + (\begin{smallmatrix} 0 & 1/2 \\ -1/2 & 0 \end{smallmatrix}) = (\begin{smallmatrix} 0 & 1 \\ 0 & 0 \end{smallmatrix}). Correct.

Step 3. Apply to encoded state. Suppose K_0 acts on qubit j of the bit-flip-encoded state:

K_0 \cdot |\psi\rangle_L \;=\; \tfrac{1 + \sqrt{1-\gamma}}{2}\,|\psi\rangle_L + \tfrac{1 - \sqrt{1-\gamma}}{2}\,Z_j|\psi\rangle_L.

Two branches: I and Z_j. But the bit-flip code does not detect Z errors — its stabilisers are Z_iZ_j and Z_jZ_k, and Z_j commutes with both. So the syndrome is (+1, +1) on both branches, and the measurement does nothing. The Z component of amplitude damping leaks through the bit-flip code undetected.

Similarly, K_1 = \tfrac{\sqrt\gamma}{2}(X + iY) splits into X and Y branches. The X_j branch is caught by the bit-flip syndrome. The Y_j = iX_jZ_j branch is an X error with a phase the code cannot detect (because Z_j is invisible).

Step 4. Interpret. The bit-flip code corrects the X-component of amplitude damping but leaves the Z-component untouched. This is why the bit-flip code alone is not sufficient for realistic noise — amplitude damping has both components, and you need a code like Shor's or Steane's that handles both X and Z errors.

Result. Amplitude damping decomposes as a mixture of I, X, Y, and Z operations on qubit j. Shor's code (which handles all four) corrects amplitude damping exactly. The bit-flip code (which handles only X) corrects only the X and I components, leaving the Z and Y components as uncorrected residual errors. This is a concrete demonstration of why general codes (Shor, Steane, surface) are needed for realistic quantum hardware.

Amplitude damping decomposed into Pauli branches. The bit-flip code catches only the $X$ branch (and $I$ trivially); the $Y$ and $Z$ branches leak through as uncorrected residual errors. Shor's 9-qubit code catches all four branches. This is why realistic codes need $X$ and $Z$ protection, not just one.

Common confusions

"Discretisation means continuous errors become small, not zero." No — discretisation means continuous errors become discrete, not small. After syndrome measurement, you are in one of a finite number of Pauli-error branches. The probability of each branch depends on the continuous parameters (e.g. \cos^2(\epsilon/2)), but the branches themselves are discrete. Correction restores the logical state exactly, not approximately.
"Syndrome measurement is itself a quantum operation — isn't there residual quantum-ness?" Syndrome measurement is a projective measurement whose outcome is a classical bit (or set of bits). After the measurement, the syndrome value is purely classical — you can write it down on paper. The "quantum-ness" of the superposition is what the measurement collapses; what remains is a clean, discrete, classical syndrome pattern.
"The discretisation theorem says continuous errors cannot hurt you." It says continuous single-qubit errors cannot hurt a code that corrects single-qubit Pauli's. Multi-qubit correlated errors — where two or more qubits experience noise at the same time — are a separate problem, handled by codes with higher distance (surface code, Steane code concatenated, etc.). The discretisation applies per-qubit; correlated noise must be below the code's correction capability, which is the distance of the code. See why QEC is hard for the distance concept.
"Before the syndrome measurement, the error is real; the measurement just reveals it." No. Before the syndrome, the state is in a genuine quantum superposition of no-error and error branches. The amplitudes \cos(\epsilon/2) and -i\sin(\epsilon/2) interfere coherently. Only after the measurement does the superposition collapse to one branch. This is the same as the standard measurement collapse for any qubit, applied to the syndrome subsystem.
"The discretisation theorem is an approximation." It is exact. No approximations, no small-\epsilon assumptions, no limits. Any single-qubit operator whatsoever decomposes exactly as aI + bX + cY + dZ with exact complex coefficients, and syndrome measurement exactly collapses the corresponding four-branch superposition. The recovery exactly restores the logical state. The continuous-to-discrete transition is a consequence of quantum measurement being a discrete-outcome operation, which is exact.

Going deeper

If you have followed the main argument — continuous errors decompose into Pauli's, syndrome measurement collapses the superposition, correction is discrete — you have the heart of this chapter. This section adds the formal proof, the connection to the stabiliser formalism (ch.120), and the links to the threshold theorem and fault-tolerant quantum computing.

The formal discretisation theorem — slightly more precise

Let C be an [[n, k, d]] stabiliser code (see Pauli group and stabilisers) that corrects every Pauli error of weight at most \lfloor (d-1)/2 \rfloor. Let \mathcal E be any quantum channel whose Kraus operators, each restricted to weight at most \lfloor (d-1)/2 \rfloor, have Pauli-basis expansions supported on correctable Pauli patterns.

Claim. C corrects \mathcal E exactly.

Proof sketch. Each Kraus operator K_k has Pauli expansion K_k = \sum_P a_{k,P}\,P with the sum restricted to correctable P. Apply K_k to an encoded state, measure the syndrome, and apply the matching Pauli recovery P^{-1}. By the Knill-Laflamme conditions (\langle\psi_i|P_a^\dagger P_b|\psi_j\rangle = c_{ab}\delta_{ij} for any correctable P_a, P_b and code-basis states |\psi_i\rangle), the recovery composes with each branch correctly, giving back the logical state on the support of the code. Summing over Kraus operators k and Pauli branches P, the channel \mathcal E composed with the recovery yields the identity on the code subspace. QED.

The Knill-Laflamme conditions, proved in 1997, give the exact algebraic characterisation of "what does this code correct". The discretisation theorem is the statement that the conditions apply branch-by-branch in the Pauli expansion; no continuous-error subtlety survives.

Why {I, X, Y, Z} is a basis — a slicker argument

There is a prettier way to see that the Pauli matrices span the space of 2\times 2 matrices, via the trace inner product on matrices. Define \langle A, B\rangle = \tfrac{1}{2}\,\text{tr}(A^\dagger B) for 2\times 2 matrices A, B. Check that the four Pauli's are orthonormal under this inner product:

\langle I, I\rangle = \tfrac{1}{2}\text{tr}(I) = 1, \qquad \langle X, X\rangle = \tfrac{1}{2}\text{tr}(X^2) = \tfrac{1}{2}\text{tr}(I) = 1,

and similarly for Y, Z. And cross-products \langle I, X\rangle = \tfrac{1}{2}\text{tr}(X) = 0, and so on — every Pauli has zero trace except I, and products of different Pauli's have zero trace. Four orthonormal vectors in a four-dimensional space are automatically a basis, so every 2\times 2 matrix M expands as

M \;=\; \sum_{P \in \{I,X,Y,Z\}} \langle P, M\rangle\,P \;=\; \tfrac{1}{2}\text{tr}(M)\,I + \tfrac{1}{2}\text{tr}(XM)\,X + \tfrac{1}{2}\text{tr}(YM)\,Y + \tfrac{1}{2}\text{tr}(ZM)\,Z.

Same answer as the linear-algebra computation earlier, but via inner-product geometry. For multi-qubit systems, the analogous basis is all tensor products of Pauli's, and the same argument shows they are orthonormal under the generalised trace inner product.

Extension to multi-qubit codes — the Pauli group on n qubits

The discretisation theorem extends exactly to multi-qubit codes. The Pauli group on n qubits, \mathcal P_n, consists of tensor products of single-qubit Pauli's (with phase factors \pm 1, \pm i). The group has 4^n \cdot 4 elements. Pauli operators on n qubits span the space of 2^n \times 2^n matrices via the same trace-orthonormality argument.

Any n-qubit error E decomposes as E = \sum_{P \in \mathcal P_n} a_P\,P. Syndrome measurement in an [[n, k, d]] stabiliser code projects onto one Pauli branch; the matching Pauli is applied as recovery. The discretisation theorem says: if the code corrects each Pauli in the support of E, then it corrects E exactly.

This is the basis for all stabiliser codes, which is the subject of Pauli group and stabilisers.

Discretisation and the threshold theorem

The threshold theorem (Aharonov-Ben-Or 1996, Kitaev 1997, Knill-Laflamme-Zurek 1998) says that below a critical physical error rate p_{\text{th}} (somewhere between 10^{-3} and 10^{-2} depending on the code and fault-tolerance protocol), concatenating a QEC code produces exponentially suppressed logical errors with polylog overhead. The theorem depends on the discretisation in a critical way: without discretisation, each level of concatenation would contribute continuously drifting errors that accumulate, and the suppression would not be exponential. With discretisation, each level contributes only a discrete Pauli error (with probability proportional to (p/p_{\text{th}})), which the next level can correct.

The threshold theorem's conclusion — "arbitrary-precision quantum computation is possible in principle, given p < p_{\text{th}} physical error rates" — rests directly on the discretisation theorem. Without the continuous-to-discrete conversion at each level, fault-tolerance would be impossible in the strict sense.

Non-Pauli errors and leakage

One important caveat: the discretisation theorem applies to errors that are representable as linear operators on the qubit's 2D Hilbert space. Real hardware has leakage — transitions out of the computational subspace into higher excited states (e.g., |0\rangle, |1\rangle \to |2\rangle in a three-level system). Leakage is not a 2\times 2 matrix operation and therefore not captured by the Pauli-basis expansion.

Leakage has to be handled separately, either by engineering qubits with well-separated levels (hard) or by adding leakage-reduction units that re-pump leaked population back into the qubit subspace (active protocols). This is an ongoing research area. The discretisation theorem gives you fault-tolerance against anything that stays in the qubit subspace; leakage is outside that scope.

The deep reason the theorem holds — measurement projects

Zooming out: the discretisation theorem is ultimately a consequence of a single fact about quantum mechanics — measurement outcomes are discrete. When you measure an observable with a discrete spectrum, the post-measurement state is in one of finitely many eigenspaces, and the outcome is one of finitely many eigenvalues. This is the "collapse" postulate of quantum mechanics.

Syndrome measurement is one specific application. The observables being measured (the stabiliser generators) each have a spectrum of \{+1, -1\}, which is discrete. The outcome is a string of \pm 1's — a classical syndrome. The continuous-to-discrete transition is the measurement postulate in action.

So the discretisation theorem is not really a theorem about error correction — it is a theorem about quantum measurement, specialised to the syndrome-measurement setting. Every QEC code is built on this single quantum-mechanical fact.

Where this leads next

Pauli X, Y, Z — the three single-qubit Pauli operators that together span all single-qubit unitaries.
Why QEC is hard — the three walls (no-cloning, continuous errors, measurement collapse); this chapter resolves the second one.
Bit-flip code and Phase-flip code — the two 3-qubit codes whose discretisation behaviour is worked out above.
Shor 9-qubit code — the first code that corrects every single-qubit error, built on the discretisation theorem.
Pauli group and stabilisers — the next chapter, building the group-theoretic framework in which discretisation lives.
Threshold theorem — the fault-tolerance result that rests on discretisation.

References

Peter Shor, Scheme for reducing decoherence in quantum computer memory (1995), Phys. Rev. A 52, R2493 — arXiv:quant-ph/9508027. The paper in which the discretisation of errors first appears as an argument, en route to the 9-qubit code.
Andrew Steane, Multiple particle interference and quantum error correction (1996) — arXiv:quant-ph/9601029. The 7-qubit code paper; gives the discretisation argument cleanly in §2.
Daniel Gottesman, Stabilizer codes and quantum error correction (PhD thesis, 1997) — arXiv:quant-ph/9705052. The thesis that formalises stabiliser codes and makes the discretisation theorem into its standard modern form.
John Preskill, Lecture Notes on Quantum Computation, Chapter 7 — theory.caltech.edu/~preskill/ph229. Pedagogical presentation of the discretisation argument as part of the QEC framework.
Nielsen and Chuang, Quantum Computation and Quantum Information (2010), §10.2 — Cambridge University Press. Standard textbook treatment; theorem 10.2 is the discretisation statement.
Wikipedia, Quantum error correction — overview including the discretisation theorem and its consequences.