Error Mitigation — padho-wiki

Note: Company names, engineers, incidents, numbers, and scaling scenarios in this article are hypothetical — even when they resemble real ones. See the full disclaimer.

In short

Error mitigation is a collection of classical post-processing techniques that take noisy quantum measurements and extract better estimates of the expectation values a noiseless circuit would have produced. It is specifically a NISQ-era toolkit — a pragmatic response to the reality that today's hardware has per-gate errors of 10^{-3} to 10^{-2} and full quantum error correction needs thousands of physical qubits per logical one. The main techniques: zero-noise extrapolation (ZNE) runs the circuit at multiple deliberately-amplified noise levels (by stretching gate durations or inserting G G^\dagger pairs that algebraically cancel but pick up noise) and extrapolates the observed expectation value back to the zero-noise limit; probabilistic error cancellation (PEC) characterises the noise channel to high accuracy and samples from a quasi-probability distribution whose expected value inverts the noise; readout-error calibration measures a classical bit-flip confusion matrix and inverts it in software; symmetry verification post-selects only the measurement outcomes that preserve known conserved quantities (particle number, parity); dynamical decoupling inserts identity-equivalent gate sequences during idle time to average out low-frequency coherent errors. Error mitigation does not replace error correction — it estimates expectation values, not pure states, and its shot-overhead grows exponentially with circuit depth. But for the shallow circuits NISQ machines can run, it routinely recovers signal that would otherwise be buried. Even on future fault-tolerant computers, error mitigation will sit on top of error correction, polishing the last bit of residual logical noise.

A NISQ quantum computer running a VQE calculation on a small molecule has, in principle, done something remarkable: it has prepared a quantum state that encodes the ground-state energy of the molecular Hamiltonian. In practice, when you measure that state to extract the energy, the number you read off is wrong — not catastrophically wrong, but wrong by a few milli-Hartree, which is chemically meaningless. The circuit's gates have collectively introduced noise: every two-qubit gate has a 0.3% chance of failing; the measurement itself has a 2% confusion matrix; idle qubits are decohering while other qubits are being operated on. The clean state you wanted never quite materialised. What you measured is a shadow of it.

If you had a fault-tolerant quantum computer — hundreds of logical qubits, each built from thousands of physical qubits with active syndrome measurement and correction — you could run the circuit as written and get the clean answer. You do not have that machine. You will not have it until the 2030s. Meanwhile you want to run VQE now, and extract chemistry results now, and not wait.

Error mitigation is the toolkit that answers this demand. It is not error correction. It does not prevent errors or correct errors mid-circuit. It accepts that your circuit will run noisily and uses clever classical post-processing — extrapolation, quasi-probabilistic cancellation, calibration inversion, post-selection — to extract a better estimate of the expectation value a noiseless circuit would have produced. The techniques are pragmatic, approximate, and NISQ-specific. They are also, as of 2025, the single most important reason that results from NISQ hardware are anywhere near useful.

This chapter explains the five main mitigation techniques, why each one works, where each one breaks, and why — even once fault-tolerant machines arrive — error mitigation will remain a complementary polish rather than a deprecated predecessor.

Error mitigation is not error correction

Before the techniques, the crucial distinction. Read these two sentences slowly:

Quantum error correction actively detects errors during a computation and applies corrections in real time. The output is a quantum state that, with overwhelming probability, is the same as the noiseless state. Requires logical qubits, syndrome extraction, decoders. See threshold theorem and logical qubits in practice.
Quantum error mitigation does not detect or correct anything during the computation. The final state is noisy, and no attempt is made to make it un-noisy. Instead, classical post-processing of many noisy runs extracts an estimate of what the noiseless expectation value of an observable would have been. No logical qubits, no syndrome extraction.

Error correction aims for a clean quantum state you can use for arbitrary downstream quantum operations. Error mitigation aims for a scalar number — the expectation value of an observable — that you can use for classical post-processing. Different outputs, different costs, different applicability. They are complementary, not competing.

The single most important consequence: error mitigation can only help you estimate classical numbers read off the quantum computer. If your algorithm outputs an expectation value \langle \psi | O | \psi \rangle — as VQE, QAOA, quantum kernel estimation, and most NISQ-friendly algorithms do — mitigation is your friend. If your algorithm produces a quantum state that must feed into further quantum operations (as in Shor's algorithm, or a fault-tolerant subroutine), mitigation cannot help, and only error correction can make the state clean enough to use.

Technique 1: zero-noise extrapolation (ZNE)

The simplest and most widely-deployed mitigation technique. The idea is straightforward: if you cannot run the circuit at zero noise, run it at several amplified noise levels and extrapolate to zero.

The setup

Let \langle O \rangle(\lambda) be the expectation value of observable O when the circuit is run with noise scaled by a factor \lambda \geq 1. At \lambda = 1, you run the circuit natively on the hardware. At \lambda = 2, 3, \ldots, you run a modified version whose effective noise is 2 or 3 times larger.

If the noise is weak enough that you can Taylor-expand the expectation value in \lambda:

\langle O \rangle(\lambda) = \langle O \rangle(0) + c_1 \lambda + c_2 \lambda^2 + \ldots

then you can fit the coefficients from measurements at a few \lambda values and read off \langle O \rangle(0) — the noiseless expectation value — as the intercept of the fit.

Why this works: in most noise models, the error in an expectation value is analytic in the noise strength. A few data points in the regime where the expansion is accurate pin down the function; extrapolation to \lambda = 0 recovers the clean value.

Amplifying the noise

The technical question is: how do you amplify the noise without running on different hardware? Two popular techniques:

Pulse stretching — on a platform where gates are implemented by microwave pulses of duration T, running the same gate for 2T or 3T (with rescaled amplitude) increases the exposure to decoherence proportionally. Noise scales roughly linearly with duration.

Identity-insertion — for any gate G, the pair G G^\dagger G = G (a single gate) and G^\dagger G G = G (three gates, algebraically identical) have the same noiseless action but different noise. The three-gate version picks up roughly three times the single-gate noise. For a circuit, replacing every gate G with G G^\dagger G triples the effective noise.

ZNE in action. Three noisy measurements at $\lambda = 1, 2, 3$ are fitted (here with a simple linear model) and extrapolated to $\lambda = 0$. The extrapolated value ($0.68$) is close to the true noiseless value ($0.70$) but imperfect — higher-order terms in $\lambda$ are a bias source. The technique improves with more fit points and better noise models, and is the default first-line mitigation on Compustar and Querion hardware in 2024-2025.

The cost and the bias

ZNE has two enemies.

Statistical variance: each data point at each \lambda is measured by a finite number of shots. The standard deviation of the extrapolated value can be larger than the standard deviations of the individual points (extrapolation amplifies error bars). To get accuracy \varepsilon on the extrapolated value, you need roughly O(\varepsilon^{-2}) shots per \lambda point — typical NISQ mitigation budgets sit at 10^4 to 10^6 shots.

Systematic bias: if the true \lambda-dependence of \langle O \rangle has high-order terms (quadratic, exponential), a linear fit introduces bias. Common remedies: fit a richer model (polynomial of degree 2, exponential), or restrict to small \lambda where the linear approximation is valid.

The technique is cheap to implement (no additional hardware calibration needed beyond knowing how to stretch pulses) and widely applicable (works for any observable, any circuit). It is the entry-level mitigation technique and the one most NISQ algorithm papers use.

Technique 2: probabilistic error cancellation (PEC)

ZNE is cheap but approximate. PEC is expensive but, in principle, exact — it cancels the noise channel rather than extrapolating around it.

The idea

Every noisy gate implements a channel \mathcal{E} = \mathcal{N} \circ \mathcal{U} where \mathcal{U} is the ideal gate and \mathcal{N} is the noise channel (a CPTP map like depolarising or amplitude damping). If you could apply \mathcal{N}^{-1} after every noisy gate, you would have undone the noise and recovered \mathcal{U}.

The catch: \mathcal{N}^{-1} is not in general a CPTP map. You cannot physically implement it as a quantum operation. But you can write \mathcal{N}^{-1} as a quasi-probability decomposition — a signed linear combination of implementable channels:

\mathcal{N}^{-1} = \sum_j q_j \mathcal{O}_j, \qquad \sum_j q_j = 1, \quad q_j \in \mathbb{R}.

Some q_j are negative. That is the key: if all q_j were non-negative the decomposition would be a standard channel and there would be no need for mitigation. The negativity is the quantum-correction signature.

How to sample from it

To compute \text{Tr}[O \rho'] where \rho' = \mathcal{N}^{-1}(\rho) is the noise-corrected state (which is not a physical state, but that is OK — we never prepare it):

Define total negativity \gamma = \sum_j |q_j| and probabilities p_j = |q_j|/\gamma.
For each shot: sample a channel index j according to p_j; apply \mathcal{O}_j to the noisy qubits; measure O.
Multiply the measured value by \gamma \cdot \text{sign}(q_j) — this is the per-shot quasi-probability weight.
Average over many shots.

The expected average converges to \text{Tr}[O \rho']. The sign flip handles the negative q_j; the factor \gamma rescales.

The sampling-overhead cost

Here is the critical limitation of PEC. The variance of the estimator per shot grows as \gamma^2 per noisy gate. For a circuit with k noisy gates, the variance is multiplied by \gamma^{2k}, so to achieve a fixed accuracy the number of shots must grow as \gamma^{2k}.

For a typical depolarising channel with error rate p = 10^{-2}, \gamma \approx 1 + 2p \approx 1.02. For a 100-gate circuit: \gamma^{200} \approx (1.02)^{200} \approx 52 — a 50\times shot overhead, manageable. For a 1000-gate circuit: \gamma^{2000} \approx 1.6 \times 10^{17} — infeasible.

PEC's shot overhead grows exponentially in circuit depth. This is the hard theoretical wall that all error mitigation hits, and it is why mitigation cannot substitute for error correction on large circuits.

PEC in one picture. Each noisy gate gets a quasi-probability decomposition of its inverse noise. Per-shot the algorithm samples one of the $\mathcal{O}_j$ according to $p_j = |q_j|/\gamma$ and weights the measurement by $\gamma \cdot \text{sign}(q_j)$. The shot-overhead multiplies gate-by-gate, giving an exponential-in-depth cost that is the fundamental reason PEC only works for shallow circuits.

When PEC works

PEC is more accurate than ZNE when you have a good noise model. For shallow circuits on well-characterised hardware (Quantinuum ion traps, recent Compustar devices after high-quality tomography), PEC can recover expectation values to sub-percent accuracy. It is routinely used in production NISQ pipelines for small problems.

PEC fails when: the noise model is wrong (quasi-probability inversion is sensitive to mis-characterisation), the circuit is deep (exponential overhead), or when correlations between gates are stronger than the independent-channel approximation assumes.

Technique 3: readout-error calibration

A simpler, cheaper technique, often stacked on top of ZNE or PEC.

Every real measurement has a confusion matrix M where M_{ij} is the probability of reading outcome i when the true state was |j\rangle. For a single qubit M is 2 \times 2; for n qubits M is 2^n \times 2^n. Measurement errors are typically 1%-3% per qubit.

Calibration procedure:

Prepare each computational-basis state |j\rangle deterministically (by applying the appropriate X gates to |0\ldots 0\rangle).
Measure and record the empirical outcome distribution — that is a column of M.
Repeat for all 2^n basis states to fill M.

Inversion in post-processing: given a measured probability vector \mathbf{p}_{\text{noisy}}, the calibration-corrected estimate of the true probability vector is \mathbf{p}_{\text{true}} = M^{-1} \mathbf{p}_{\text{noisy}}.

For n beyond about 10, full M is too large; local readout-error assumption (tensor-product M = M_1 \otimes M_2 \otimes \ldots) is a common approximation. Modern production pipelines use sparse matrix techniques to scale to 50+ qubit calibrations.

Readout-error calibration is almost always worth doing; it is cheap in both shot count and classical compute, and it removes a dominant source of bias on most NISQ platforms.

Technique 4: symmetry verification

A post-selection technique that exploits the structure of your problem.

Many physically-motivated problems have conserved quantities: particle number, total spin, parity. For instance, a molecular Hamiltonian conserves electron number — a VQE ansatz for \text{H}_2 starts from a 2-electron state and no gate should change the electron count. Noise-induced errors that do change the conserved quantity can be detected by measuring the conserved operator and discarding the shot if the wrong value is observed.

Procedure: run the circuit, measure the conserved operator Q (or a unitary reflecting the symmetry), and only keep shots whose Q-measurement matches the expected value. The retained shots are statistically cleaner — the noise that violated the symmetry has been filtered out.

Cost: fraction of shots discarded is roughly the symmetry-violation probability. If the circuit has total noise p_{\text{total}} and some fraction f of errors are symmetry-violating, you keep (1 - f \cdot p_{\text{total}}) shots.

Limitations: only filters the symmetry-violating subset of errors; errors that preserve the conserved quantity pass through undetected. Useful as a mid-strength technique, often stacked with ZNE or readout calibration for a multiplicative win.

Symmetry verification is popular in quantum chemistry because electron-number, spin, and parity symmetries are abundant and exploiting them is almost free.

Technique 5: dynamical decoupling

A technique that acts during the circuit rather than after it. Dynamical decoupling inserts identity-equivalent sequences of gates during times when a qubit would otherwise be idle, in order to average out low-frequency coherent noise.

The simplest example: the X \cdot X = I sequence. Insert an X gate, let some time pass, then insert another X. Algebraically nothing has happened. Physically, the first X flipped the qubit; during the idle time, coherent phase errors accumulated with opposite sign; the second X flipped it back; the phase errors cancel on average.

More sophisticated sequences (CPMG, XY4, UDD) are designed to cancel specific noise-spectrum components. They are cheap — a few extra gates per idle window — and effective against slow, coherent, structured noise (exactly the kind that dominates in superconducting qubits with nearby spectator dynamics).

Dynamical decoupling is a hardware-control technique dressed up as a software trick. It is usually applied automatically by the compiler on Compustar Heron and Quantinuum platforms. When you run a NISQ circuit in 2025, dynamical decoupling is already happening; the algorithm researcher rarely has to think about it, but should know it is on.

Combining the techniques

In practice, NISQ algorithms stack mitigations. A typical pipeline:

A typical NISQ pipeline applies the techniques in a natural order. Dynamical decoupling happens during compilation. Symmetry verification filters shots as they come in. Readout-error calibration corrects the measurement histogram. Zero-noise extrapolation takes the final mitigated expectation values at several noise levels and extrapolates back to zero. The combined effect is usually much larger than any single technique alone.

The 2023 Compustar Nature paper by Kim et al. (Nature 618, 500, "Evidence for the utility of quantum computing before fault tolerance") used ZNE stacked with readout calibration to recover expectation values on a 127-qubit Ising-model simulation that was beyond exact classical simulation. Whether that specific result was truly beyond heuristic classical simulation is debated (subsequent tensor-network simulations narrowed the gap), but the paper was a watershed demonstration that stacked mitigation on Compustar Heron-class hardware can extract signal from circuits far noisier than naive estimates would allow.

Worked examples

Example 1: ZNE on synthetic data

Setup. You run a 200-gate NISQ circuit measuring \langle Z_0 \rangle on the first qubit. You do this at noise amplification factors \lambda = 1, 2, 3 (using identity-insertion: \lambda = 1 is the native circuit, \lambda = 2 doubles every gate, \lambda = 3 triples). You observe:

\lambda = 1: \langle Z_0 \rangle = 0.55 \pm 0.02
\lambda = 2: \langle Z_0 \rangle = 0.42 \pm 0.02
\lambda = 3: \langle Z_0 \rangle = 0.32 \pm 0.02

Step 1. Fit a linear model. Assume \langle Z_0 \rangle(\lambda) = a + b \lambda and find a, b by least squares. Centre: \bar \lambda = 2, \bar y = 0.43. b = \sum (\lambda_i - \bar \lambda)(y_i - \bar y) / \sum(\lambda_i - \bar\lambda)^2 = [(-1)(0.12) + 0 + (1)(-0.11)] / 2 = -0.115. Why this formula: it minimises the squared residuals of a line fit, the standard least-squares recipe when the three points are equally spaced in \lambda.

Step 2. Extract the intercept. a = \bar y - b \bar\lambda = 0.43 - (-0.115)(2) = 0.66.

Step 3. Report. Extrapolated noiseless value: \langle Z_0 \rangle(0) = a = 0.66.

Step 4. Uncertainty. Propagating the per-point \pm 0.02 uncertainty through the linear fit: \sigma_a \approx 0.02 \cdot \sqrt{1 + \bar\lambda^2 / \text{Var}(\lambda)} \approx 0.02 \cdot \sqrt{1 + 4/\tfrac{2}{3}} = 0.02 \cdot \sqrt{7} \approx 0.053.

Result. \langle Z_0 \rangle_{\text{extrapolated}} = 0.66 \pm 0.05. Compare to the raw \lambda = 1 value 0.55 \pm 0.02 — the extrapolation has a larger error bar but a smaller bias. If the true noiseless answer is 0.70, the extrapolated estimate is closer in absolute terms, but the wider uncertainty reflects honest statistical cost. The wider bar is the price of extrapolation, paid in shots.

Example 2: PEC decomposition for a single-qubit depolarising channel

Setup. A depolarising channel with error rate p = 0.02 acts on a single qubit:

\mathcal{N}(\rho) = (1 - p)\rho + \frac{p}{3}(X\rho X + Y\rho Y + Z\rho Z).

Step 1. Compute the inverse channel. For depolarising noise, the inverse \mathcal{N}^{-1} has the same Pauli-twirl structure:

\mathcal{N}^{-1}(\rho) = \alpha \rho + \beta(X\rho X + Y\rho Y + Z\rho Z)

with \alpha = (4 - p)/(4 - 4p) and \beta = -p / (4 - 4p). Why these coefficients: solve \mathcal{N} \circ \mathcal{N}^{-1} = \mathcal{I} at the Pauli-decomposition level. \mathcal{N} shrinks Pauli components by 1 - 4p/3; inverting that factor gives these coefficients after algebra.

Step 2. Plug in p = 0.02. \alpha = 3.98 / 3.92 \approx 1.0153, \beta = -0.02/3.92 \approx -0.0051.

Step 4. Per-shot sampling recipe. With probability p_I = 1.0153/1.0306 = 0.985, apply identity (no correction). With probability p_X = 0.0051/1.0306 = 0.005, apply X and multiply the measurement by -\gamma = -1.0306. Same for Y and Z. Estimator of \langle O \rangle: average the weighted measurements over many shots.

Step 5. Overhead for a 50-gate circuit. Every gate gets its own PEC correction (assume each gate has its own noise channel of similar strength). Variance overhead: \gamma^{2k} = (1.0306)^{100} \approx 19.3. To match the shot cost of a noiseless run within statistical uncertainty, multiply the shot count by roughly 20.

Result. For this modest circuit at low gate error, PEC's overhead is \sim 20\times — well within reach of modern NISQ budgets. At p = 10^{-3} the same 50-gate circuit has \gamma^{100} \approx 1.1, almost no overhead. At a 500-gate circuit with p = 10^{-2}: \gamma^{1000} \approx 3.5 \times 10^{12}, infeasible. The exponential-depth wall.

Limits of error mitigation

The sharp theoretical statements, which NISQ practitioners now take as common knowledge:

Mitigation estimates expectation values, not states. The output is a classical number, not a clean quantum state. You cannot feed a mitigated expectation value into a further quantum circuit.
Shot overhead is exponential in the noise-weighted circuit depth. PEC's overhead is \gamma^{2k} per gate; ZNE's variance grows polynomially in the extrapolation distance. For deep circuits, no amount of classical post-processing recovers the signal.
Mitigation needs shallow circuits. The useful circuits are those where p \cdot k < 1 — total expected error less than one. Beyond that threshold, mitigation fails gracefully (big error bars) or catastrophically (biased estimates).
Mitigation is useless against non-Markovian or coherent correlated noise unless the correlations are modelled explicitly. Idealised channel-per-gate noise models are the setting where mitigation theory works cleanly.
Mitigation will not replace error correction. Shor's on RSA-2048 needs \sim 10^{10} gates at logical error 10^{-10}. No mitigation technique reaches that regime. Only logical encoding and active syndrome-based correction does.

Error mitigation in the fault-tolerant era

A common misconception: once fault-tolerant computers arrive, error mitigation will be obsolete. This is wrong. Fault-tolerant quantum computers will have logical error rates — still not zero, just exponentially suppressed. For high-precision expectation-value estimation (quantum chemistry beyond chemical accuracy, high-loop quantum field theory) even logical error rates of 10^{-8} might be too large. Error mitigation will sit on top of error correction: the logical circuit runs through the error-corrected hardware, and mitigation techniques extract a final polish on the logical-error-rate noise.

The Querion Quantum AI group has already published papers on "concatenated error mitigation" that prove ZNE and PEC work on logical qubits just as well as on physical qubits, with the relevant noise rate being the logical one. The interaction is mostly free — mitigation on top of correction is just another layer in the stack.

Common confusions

"Error mitigation and error correction are competing approaches"

They solve different problems. Correction cleans the quantum state; mitigation cleans the classical expectation value. On a fault-tolerant device they will both be used. Mitigation alone cannot scale to Shor's-depth circuits; correction alone is overkill for shallow VQE expectation values on small molecules.

"ZNE recovers the true noiseless value exactly"

No. ZNE estimates the noiseless value with a confidence interval whose width reflects finite-shot statistics and model bias. It is always an estimate, never a proof.

"Error mitigation works for any circuit"

No. Depth matters. The shot overhead for PEC is exponential in depth. For ZNE, the extrapolation becomes biased when the noise is too strong. For circuits beyond roughly p \cdot k \sim 1 total noise, no mitigation technique reliably recovers signal — this is the NISQ noise horizon in its mitigation form.

"Calibrating readout errors is enough"

Readout calibration removes the measurement part of the error budget. Gate errors, idle decoherence, crosstalk — all remain. Readout calibration is necessary but not sufficient on almost any NISQ platform.

"Mitigation is a fancy name for averaging"

No. Averaging reduces statistical variance (at rate 1/\sqrt{N}) but cannot remove systematic bias from the noise-channel structure. Mitigation techniques are specifically about removing the systematic bias — the consistent pull of the noise away from the true value. Averaging helps once the bias is removed; it does not remove the bias itself.

"PEC is strictly better than ZNE because it's exact"

PEC is exact in the limit of a perfectly characterised noise model and infinite shots. On real hardware the noise model is imperfect, and the shot overhead is larger than ZNE's. ZNE is cheaper, faster, and often more robust to noise-model errors; PEC is more accurate when the model is good and the depth is shallow. They are complementary, not ordered.

The India angle

Compustar-India's quantum group, headquartered in Bengaluru, has published on error mitigation for the Compustar Quantum Network's Indian-partner devices (IIT Bombay, IIT Madras, IISc Bangalore run experiments on Compustar Eagle and Heron devices via cloud). TechSetu Research has contributed to mitigation pipelines for QAOA on logistics-oriented workloads. The National Quantum Mission's algorithm verticals include a specific work-package on error mitigation development for Indian-domestic NISQ hardware planned for 2027 onwards. When the IIT Madras superconducting platform comes online, mitigation will not be an afterthought — it will be the primary path to extracting any useful signal.

Going deeper

The rest of this chapter is the formal technical content: the Temme-Bravyi-Gambetta original derivation of ZNE, the Endo-Benjamin-Li derivation of PEC, the sampling-overhead theorem in its sharp form, and recent results on the scaling of mitigation under realistic noise.

The Temme-Bravyi-Gambetta 2017 framework

The original ZNE paper — Error mitigation for short-depth quantum circuits, arXiv:1612.02058 — developed two complementary techniques. The first is Richardson extrapolation: fit an n-th order polynomial in \lambda to n+1 measurements and read off the intercept. The variance of the intercept grows combinatorially in n — the technique is useful for small n (2 or 3 typically) but does not benefit from more measurements than that.

The second is exponential extrapolation — assume \langle O \rangle(\lambda) = \langle O \rangle_{\text{noise}} + A e^{-\beta \lambda} and fit \langle O \rangle_{\text{noise}}, A, \beta from three or more measurements. This fits better when the physical noise is well-approximated by an exponential decay of the observable, which happens for many simple noise models.

The paper also establishes the bias-variance trade-off that defines ZNE: the extrapolation reduces bias (it removes the leading \lambda-dependence) at the cost of increased variance (the extrapolated intercept has a larger error bar than any individual measurement). The question "is ZNE worth it" reduces to "is the bias reduction worth the variance increase?" The answer on NISQ hardware today is almost always yes.

The Endo-Benjamin-Li 2018 derivation of PEC

The PEC paper — Practical quantum error mitigation for near-future applications, arXiv:1712.09271 — developed the quasi-probability formalism in its application-ready form. The core technical contribution:

For any CPTP noise channel \mathcal{N}, the inverse \mathcal{N}^{-1} admits a quasi-probability decomposition \mathcal{N}^{-1} = \sum_j q_j \mathcal{O}_j where each \mathcal{O}_j is physically implementable.
The decomposition is optimal (minimum \gamma = \sum |q_j|) for each noise channel, and can be computed by linear programming given the channel's Pauli or Kraus decomposition.
For independent gate-wise noise, the total \gamma_{\text{total}} = \prod_k \gamma_k factorises, giving the \gamma^{2k} sampling overhead.
The estimator is unbiased in expectation (the quasi-probability weights cancel the noise exactly on average), with variance growing as \gamma^{2k}.

The technique has been elaborated significantly since 2018 — learned noise models, circuit-tailored decompositions, correlated-noise extensions — but the 2017-2018 framework remains the conceptual core.

The sampling-overhead theorem

The sharp result (Wang-Endo-Benjamin 2021, Takagi-Wang 2022): for any unbiased mitigation protocol that corrects a gate with noise of Pauli-diagonal component p, the per-gate sampling overhead is at least \gamma \geq (1 + p)/(1 - p). For k independent noisy gates, the total overhead is \gamma^{2k}. This is a lower bound — no unbiased protocol can do better.

The theorem implies that mitigation cannot asymptotically extend circuit depth beyond the noise horizon: to mitigate a circuit of depth k \cdot p \gg 1, you need \exp(O(kp)) shots, which is infeasible. This is the rigorous version of "mitigation is a NISQ-era tool, not a fault-tolerant replacement."

Kim et al. 2023 and the utility claim

The Compustar Nature paper Kim et al. 2023 — Evidence for the utility of quantum computing before fault tolerance — demonstrated ZNE on a 127-qubit Ising-model simulation on Compustar Heron. The claim: at the largest circuit sizes tested (roughly 2800 two-qubit gates), the mitigated Compustar result agreed with smaller-circuit exact simulations and with extrapolated classical tensor-network simulations, and extended past the point where exact classical simulation was feasible.

The subsequent debate — did classical methods catch up via improved tensor-network contractions? (Yes, partially) Did Compustar's result constitute a clean quantum advantage? (Probably not in the strict sense) — does not detract from the paper's core contribution: it showed that stacked mitigation on modern hardware produces useful expectation values at circuit sizes that naive per-gate-error calculations suggest should be hopeless.

The lesson is more general: mitigation's effective noise horizon on real hardware is larger than its worst-case theoretical horizon, because modern noise has structure (biased errors, correlated errors with short range, non-uniform gate fidelities) that mitigation can exploit.

Connections to dequantization

The honest pair for a NISQ claim is mitigation plus dequantization: mitigation extracts clean signal from the noisy quantum circuit, and dequantization asks whether a classical algorithm with comparable resources can match the (now clean) quantum result. Both must pass for a clean quantum-advantage claim on NISQ hardware: the circuit must be large enough that mitigation can recover signal, the algorithm must be one that dequantization does not eat, and the end-to-end comparison must favour quantum. As of 2025, few practical problems pass both filters — which is why honest quantum-advantage claims at the NISQ scale remain narrow.

Where this leads next

Error mitigation is one of two bridges that NISQ-era quantum computing relies on; the other is variational algorithm design (variational algorithms generally, VQE in practice). Both are NISQ-era responses to the limitation that fault tolerance is not yet available. The long arc of the curriculum eventually crosses into full fault-tolerant quantum computing through the threshold theorem and logical qubits in practice — at which point mitigation becomes a polish on top of correction rather than the primary technique.

The companion chapter dequantization provides the other half of the NISQ-era reader's honesty-check: before claiming that a mitigated NISQ result beats classical, verify that the classical algorithm in the sampling-access model does not match.

References

Kristan Temme, Sergey Bravyi, Jay Gambetta, Error mitigation for short-depth quantum circuits (2017) — arXiv:1612.02058.
Suguru Endo, Simon C. Benjamin, Ying Li, Practical quantum error mitigation for near-future applications (2018) — arXiv:1712.09271.
Youngseok Kim et al., Evidence for the utility of quantum computing before fault tolerance (Nature 618, 500, 2023) — arXiv:2304.11119.
John Preskill, Lecture Notes on Quantum Computation, Chapter 7 — theory.caltech.edu/~preskill/ph229.
Wikipedia, Quantum error mitigation.
Qiskit Textbook, Error mitigation tutorial.