Uncomputation — padho-wiki

Note: Company names, engineers, incidents, numbers, and scaling scenarios in this article are hypothetical — even when they resemble real ones. See the full disclaimer.

In short

Uncomputation — also called Bennett's trick — is the standard way to free a quantum scratch qubit without measuring it. The pattern has three stages. First, a subroutine F computes f(x) into an ancilla that starts at |0\rangle, giving |x\rangle|f(x)\rangle. Second, you use the ancilla — typically a controlled operation or a phase-kickback step — doing whatever you wanted f(x) to do. Third, you apply the inverse F^\dagger, which runs the first circuit backwards and sends the ancilla from |f(x)\rangle back to |0\rangle. Net effect: the ancilla is clean, the data register has picked up whatever effect step 2 produced, and no garbage is left behind to destroy interference. Skipping the third stage is the commonest quantum-algorithm bug: an ancilla left entangled with the data register behaves like a silent measurement, collapses the superposition by partial trace, and kills the quantum speedup. Uncomputation costs you a factor of two in circuit depth; that factor is the price of keeping the algorithm quantum.

You have a quantum subroutine that computes f(x). Somewhere inside, a scratch qubit catches the output — it starts at |0\rangle, passes through a few Toffolis or CNOTs, and ends up holding |f(x)\rangle. You use the scratch qubit for exactly one purpose: to fire a conditional phase, a controlled rotation, something that needed the value of f. That job is done. Now what?

You cannot just throw the scratch qubit away. A quantum circuit is a unitary operation — every gate has an inverse and no information is destroyed along the way — so "throwing a qubit away" is not a primitive the hardware even offers. You cannot reset it to |0\rangle with a classical assignment either, because that is non-unitary too. And if you measure it, the act of measurement collapses more than just the scratch qubit: it collapses every part of your main register that the scratch qubit got entangled with during the computation, and the carefully built superposition your algorithm depends on is gone.

The trick that saves you is almost too simple to believe: run the computation backwards. Because F is a unitary, it has an inverse F^\dagger, and because F took the ancilla from |0\rangle up to |f(x)\rangle, applying F^\dagger takes it straight back down to |0\rangle. The scratch qubit is clean, no measurement was performed, and whatever side effect you produced along the way — the conditional phase, the rotation, the kick — stays intact on the data register. That move is uncomputation, and almost every non-trivial quantum algorithm uses it somewhere inside.

This chapter walks through why dirty ancillas are silently dangerous, the three-stage compute-use-uncompute pattern that fixes them, two worked examples (a one-qubit CNOT uncomputation with a phase kickback, and a Toffoli-AND uncomputation that realises a three-qubit controlled phase), and the space-time trade-off that Bennett formalised in 1989.

Why a dirty ancilla is a silent measurement

Start with the clearest version of what goes wrong.

Suppose you have a data register in a superposition \sum_x \alpha_x |x\rangle, and a subroutine F that computes some function f by sending

|x\rangle |0\rangle \;\xrightarrow{F}\; |x\rangle |f(x)\rangle.

Apply F to the joint state of the register and a fresh ancilla and — by linearity — you get

|\Psi\rangle = \sum_x \alpha_x |x\rangle |f(x)\rangle.

If f is not a constant function, different branches of the superposition carry different values of f(x) on the ancilla. That means the register and the ancilla are entangled. They are no longer independent pieces of the state; the ancilla carries information about which branch the register is in.

Why entanglement here is the same thing as a measurement-in-disguise: when two systems are entangled and you ignore one of them — without measuring it, without uncomputing it, just letting it drift off — the system you kept behaves as if the other system had been measured. The mathematical statement is the partial trace (covered in ch.42). Operationally, the effect is indistinguishable from an actual measurement followed by "forgetting" the outcome. The superposition on the kept system turns into a classical probabilistic mixture.

Trace out the ancilla to get the effective state of the data register alone:

\rho_{\text{reg}} \;=\; \operatorname{Tr}_{\text{anc}}|\Psi\rangle\langle\Psi| \;=\; \sum_x |\alpha_x|^2 \, |x\rangle\langle x|.

The off-diagonal elements — the ones that would normally carry phase information — are gone. The state is diagonal in the computational basis, which is another way of saying: the register looks like a classical random variable that takes value x with probability |\alpha_x|^2. Every phase the register was carrying has been deleted by the partial trace.

Left: if the ancilla is left entangled with the register, tracing it out (which is what "ignoring" it amounts to) collapses the register to a mixed state with no phase information. Right: uncomputing the ancilla first disentangles the two systems, so tracing gives back the original pure superposition with all phases intact.

This is the garbage problem. A dirty ancilla is not a neutral bystander — from the data register's point of view, it is indistinguishable from a silent measurement that discarded its outcome. Every quantum algorithm that needs interference to work — Deutsch, Grover, Shor, quantum phase estimation — fails silently the moment you forget to clean an ancilla. You still get an answer, but it is the answer a classical probabilistic algorithm would give you, not the quantum one. No speedup.

The rest of this chapter is how to avoid the garbage problem without measuring.

The compute-use-uncompute pattern

Bennett's uncomputation trick has three stages. Write out a subroutine F that, on the two-register state |x\rangle|0\rangle_a, computes some function into the ancilla:

F \,|x\rangle |0\rangle_a \;=\; |x\rangle |f(x)\rangle_a.

Pick any single-or-multi-qubit operation U_g you want to apply to the ancilla — this is the "use" step, and U_g is whatever you were going to do with the value of f.

The full three-stage circuit is

F^\dagger \cdot (I \otimes U_g) \cdot F.

Read right-to-left as operators act in circuits. Apply F first: ancilla goes from |0\rangle to |f(x)\rangle. Apply U_g on the ancilla: whatever U_g does, it does on |f(x)\rangle, possibly producing an entangled or phase-modified state. Apply F^\dagger: the ancilla is run backwards, and — provided U_g did not change the computational-basis index on the ancilla — the ancilla returns to |0\rangle.

The case most people use uncomputation for is exactly when U_g is a phase gate or a controlled-phase operation — something that multiplies the ancilla's computational-basis amplitudes by phases but does not move amplitudes between |0\rangle and |1\rangle. In that case the ancilla still has |f(x)\rangle sitting on it after the use step (possibly with a phase), so F^\dagger can uncompute it exactly.

Bennett's compute-use-uncompute pattern. $F$ loads $f(x)$ onto the ancilla; a phase-only gate $U_g$ acts on the ancilla (possibly kicking a phase back onto the data); $F^\dagger$ sends the ancilla back to $|0\rangle$. The net action on the data register is whatever $U_g$ kicked back — with no ancilla garbage.

The whole three-stage circuit is itself a unitary, and its action on |x\rangle|0\rangle can be read in one line:

F^\dagger (I \otimes U_g) F \,|x\rangle|0\rangle_a \;=\; F^\dagger (I \otimes U_g) |x\rangle|f(x)\rangle_a.

If U_g multiplies |f(x)\rangle by a phase e^{i\theta(f(x))} (without moving amplitude into other kets), this becomes

F^\dagger \left(e^{i\theta(f(x))} |x\rangle|f(x)\rangle_a\right) \;=\; e^{i\theta(f(x))} |x\rangle|0\rangle_a.

Why the phase "rides out" past the F^\dagger: global phase factors commute with every linear operation. F^\dagger acts on the state, and whatever scalar sits in front of the state comes out unchanged. So the phase e^{i\theta(f(x))} was produced by U_g on the ancilla, gets uncomputed-past by F^\dagger, and ends up as a phase on the pure product state |x\rangle|0\rangle.

The result has the ancilla back at |0\rangle and the data register's branch |x\rangle multiplied by a phase that depends on f(x). This is the single most useful thing uncomputation buys you: a phase that depends on a computed value, with no garbage on any ancilla. Deutsch-Jozsa, Grover's oracle, and the "use" step of phase estimation are all this pattern.

Why reversing F actually cleans the ancilla

Make the claim concrete. F is unitary, so F^\dagger F = I. Applied to the specific state |x\rangle|0\rangle_a:

Plug in what F does to that state:

So F^\dagger takes |x\rangle|f(x)\rangle back to |x\rangle|0\rangle, exactly. If the "use" step U_g only added a phase (did not change which ket the ancilla was in), then F^\dagger still gets a valid input to invert.

The pattern also works for classically-controlled data: x need not be a single basis state; it can be any superposition \sum_x \alpha_x |x\rangle. By linearity, F takes each branch |x\rangle|0\rangle to |x\rangle|f(x)\rangle and F^\dagger takes each branch back — the whole compute-use-uncompute block acts on a superposed data register without ever inspecting it.

Worked example 1: phase kickback with CNOT-use-CNOT

A minimal uncomputation, using the smallest tools.

Example 1: CNOT-Z-CNOT realises a Z on the control

You will use a CNOT (control = data, target = ancilla) as your "compute" step F. What does it compute? It writes the value of the data qubit into the ancilla — this is the quantum analogue of a copy-in-the-computational-basis (it is not a full clone; it is a controlled copy that leaves the two qubits entangled). After the CNOT, the ancilla carries |a\rangle in the computational basis.

Step 1 — apply F = \text{CNOT}.

\text{CNOT}\,(\alpha|00\rangle + \beta|10\rangle) \;=\; \alpha|00\rangle + \beta|11\rangle.

Why: CNOT leaves |00\rangle alone (control is 0, target untouched) and flips the target on |10\rangle to give |11\rangle (control is 1, target flips). The amplitudes \alpha, \beta are carried along by linearity.

Step 2 — use the ancilla with U_g = Z. Apply a Pauli-Z to the ancilla. Z acts as Z|0\rangle = |0\rangle and Z|1\rangle = -|1\rangle:

(I \otimes Z)(\alpha|00\rangle + \beta|11\rangle) \;=\; \alpha|00\rangle - \beta|11\rangle.

Why only one term flipped sign: Z leaves the ancilla's |0\rangle branch alone and flips the sign of the |1\rangle branch. On the full state, only the |11\rangle term has the ancilla in |1\rangle — so only that branch picks up the minus sign.

Step 3 — uncompute by applying F^\dagger = \text{CNOT}. CNOT is its own inverse, so F^\dagger = F.

\text{CNOT}(\alpha|00\rangle - \beta|11\rangle) \;=\; \alpha|00\rangle - \beta|10\rangle.

Why the -\beta stays: CNOT sends |00\rangle \to |00\rangle and |11\rangle \to |10\rangle; it does not touch the coefficient in front of each term, it just permutes the kets.

Factor the ancilla out:

Result. The circuit CNOT-Z-CNOT, acting on |a\rangle \otimes |0\rangle_a, is equivalent to Z \otimes I. You used the ancilla as a middleman to carry the "Z" effect onto the data qubit, and at the end of the compute-use-uncompute block the ancilla was clean and the data qubit had a Z applied.

This is a tiny version of phase kickback: a controlled U_g on an ancilla that holds the data's computational-basis value kicks the phase back onto the data qubit. In this case U_g = Z and the kickback is exactly a Z on the data. Same idea powers every oracle in Deutsch and Grover.

The compute-use-uncompute pattern with the smallest possible $F$: a single CNOT writes the data qubit's value to the ancilla; a $Z$ on the ancilla kicks a phase back; a second CNOT uncomputes the ancilla. Net effect: $Z$ on the data, ancilla clean.

Worked example 2: Toffoli-AND-use-Toffoli realises a CCZ

Scale the pattern up one qubit. A 3-qubit controlled-controlled-Z (CCZ) applies a Z to a target qubit only when two control qubits are both |1\rangle. You will realise CCZ using Bennett's uncomputation, with an ancilla holding the AND of the two controls.

Example 2: CCZ built from 2 Toffolis and a Z via uncomputation

Setup. Three data qubits |a\rangle, |b\rangle, |c\rangle and one fresh ancilla |0\rangle_a. You want to apply Z to the c qubit iff a = b = 1 — in other words, a CCZ gate across the three data qubits.

Strategy. Use a Toffoli (with ancilla as target) to land |a \wedge b\rangle onto the ancilla. Use the ancilla to fire a controlled-Z onto the target |c\rangle. Apply a second Toffoli (same controls, same target — Toffoli is its own inverse) to uncompute the ancilla.

Step 1 — compute with Toffoli. Apply a Toffoli with controls on qubits a, b and target on the ancilla.

\text{Toffoli}\,|a\rangle|b\rangle|c\rangle|0\rangle_a \;=\; |a\rangle|b\rangle|c\rangle|a \wedge b\rangle_a.

Step 2 — use the ancilla with a controlled-Z on qubit c. Apply a CZ gate with control on the ancilla and target on qubit c. CZ multiplies the state by -1 iff both control and target are |1\rangle; otherwise it does nothing.

Breaking into the four classical cases for (a, b):

If a = b = 0, ancilla = 0: CZ does nothing regardless of c. State unchanged.
If exactly one of a, b is 1, ancilla = 0: again CZ does nothing.
If a = b = 1, ancilla = 1: CZ applies Z on c — the state picks up a -1 iff c = 1.

So the only case where anything happens is a = b = 1, and in that case qubit c gets a Z. That is exactly the CCZ action — except that the ancilla currently holds |a \wedge b\rangle, which is 1 on that branch, not 0. Left alone, that ancilla is garbage.

Step 3 — uncompute with a second Toffoli. Apply Toffoli again (controls a, b; target ancilla). Because Toffoli is self-inverse,

\text{Toffoli}\cdot\text{Toffoli} = I,

and the ancilla returns to |0\rangle on every branch. Meanwhile, the phase flip on qubit c (from the use step) rides through the second Toffoli — Toffoli does not touch qubit c, and a global phase on a branch is not disturbed by any unitary on other qubits.

Step 4 — put it together. After all three stages, the joint state has picked up a minus sign only on the branch where a = b = c = 1 and the ancilla is clean on every branch.

|a\rangle|b\rangle|c\rangle|0\rangle_a \;\mapsto\; (-1)^{a\wedge b\wedge c} |a\rangle|b\rangle|c\rangle|0\rangle_a.

The net action on the three data qubits is \text{CCZ}: a Z fires on c iff both a and b are |1\rangle. The ancilla has been used as scratch and returned to |0\rangle — usable again in the next subroutine.

Result. You built a CCZ using 2 Toffolis plus 1 ancilla-controlled CZ, via compute-use-uncompute. The cost of uncomputation is the second Toffoli — a doubling of the AND-computation depth — and the reward is an ancilla that is not entangled with the data at the end of the subroutine.

Why this is the canonical pattern in fault-tolerant QC. On a surface-coded fault-tolerant machine, Toffolis are expensive (they need \sim 7 T gates apiece, and each T gate consumes a magic state that is expensive to distil). When a decomposition replaces one CCZ with two Toffolis plus one CZ (CZ is cheap — Clifford), you have spent 14 T gates to save the cost of some alternative decomposition that would have required more. The bookkeeping here is what fault-tolerant resource estimates actually measure — every bit of it is uncomputation on some ancilla wire.

Bennett's pattern gives you a 3-qubit CCZ gate from 2 Toffolis, one ancilla-controlled CZ, and a clean ancilla at the end. This is the standard factorisation used in Qiskit, Cirq, and every fault-tolerant compiler that needs to synthesise controlled-controlled operations.

Bennett's space-time trade-off

Uncomputation is not free. Every gate in the forward computation has to be reapplied in reverse — the F^\dagger block is a mirror image of F, and takes roughly the same depth. So the circuit depth goes up by a factor of about 2, and the gate count roughly doubles as well.

The alternative is to not uncompute: let the ancilla wires carry garbage, do not reverse any computation. That gives you half the depth, but the ancilla count balloons. Every intermediate result accumulates onto its own wire and stays there. For a computation with time complexity T, you would need \Theta(T) ancilla qubits just to hold all the scratch values, versus the \Theta(S) ancillas (where S is the classical space complexity of f) that a clean uncomputation scheme uses.

Charles Bennett, in his 1989 paper [1], worked out the whole trade-off curve. His result: for a classical reversible computation of time T and space S, there is a family of schedulers parameterised by a continuous knob k such that the quantum implementation uses

\text{time} \approx T^{1+\varepsilon}, \qquad \text{space} \approx S \cdot \varepsilon^{-1}

for small \varepsilon > 0 — you can trade time for space or vice versa along a Pareto curve, and there is no free lunch at either endpoint.

Bennett's 1989 space-time trade-off. Store every intermediate on its own wire and you pay with $\Theta(T)$ ancillas but the reversal is fast. Recompute intermediates instead of storing them and the ancilla count drops to $\Theta(S)$, but the depth grows polynomially. Real quantum algorithms pick a point on this curve based on what the hardware can support.

The slogan: reversible computation costs about 2\times time but bounded space. You pay a small constant in depth for the right to keep the ancilla count small, and that is almost always the right trade on a device where physical qubits are the scarce resource.

Phase-estimation for Shor's algorithm on 2048-bit RSA runs headfirst into this trade-off. Older proposals stored every modular-exponentiation intermediate on its own ancilla wire; newer proposals like Gidney–Ekerå [4] use Bennett-style scheduling to shrink the ancilla count, at the cost of more Toffolis per computation. The balance the community has settled on gives a fault-tolerant factoring circuit with \sim 20 million physical qubits, most of which are ancillas — already optimised against the Bennett curve.

Common confusions

"Uncomputing is the same as measuring the ancilla." No. Measurement is a non-unitary, irreversible event that collapses the ancilla into a classical outcome and — if the ancilla was entangled — also collapses the branches of the data register. Uncomputation is a pure unitary operation that leaves the quantum state fully reversible. They have opposite effects: measurement destroys superposition; uncomputation preserves it.
"You always need to uncompute." If you care about keeping the data register in a quantum superposition — almost every QC algorithm — yes, you must clean every ancilla. The only exception is algorithms that plan to measure the ancilla as part of their output (phase estimation, syndrome extraction), and even then the ancilla's interaction with the data has to be structured so that the measurement does not accidentally collapse the data. Measurement-based and uncomputation-based ancilla management are both valid strategies, and real algorithms mix them.
"Uncomputation works on noisy ancillas." Only in special cases. If the ancilla is in a mixed state rather than clean |0\rangle, the F^\dagger step returns it to whatever it started in (not to |0\rangle), and the data register may pick up noise. Clean uncomputation assumes clean ancillas. "Dirty-ancilla" algorithms that tolerate arbitrary starting states are an active research topic, but they are the exception rather than the rule.
"Compute-use-uncompute is just F^\dagger U_g F — any U_g works." Only if U_g acts on the ancilla in a way that leaves the ancilla's computational-basis index alone (i.e., U_g is a phase-like gate). If U_g moves amplitude from |f(x)\rangle to some other ket, the ancilla no longer holds |f(x)\rangle going into F^\dagger, and F^\dagger does not return the ancilla to |0\rangle. For kickback-style use steps (CZ, controlled phases), this condition is automatically satisfied.
"Uncomputation is just a trick; real hardware would never bother." On the contrary — uncomputation is baked into every compilation pipeline. Qiskit, Cirq, and t|ket⟩ all have explicit "uncompute" blocks in their IR. When a compiler optimises a circuit, one of the first passes looks for places where a subroutine and its inverse cancel, which is exactly the compute-and-never-use case. The framework is so standard that programmers rarely write .inverse() by hand — they structure code as "open a block, do the thing, close the block," and the compiler emits the uncomputation automatically.
"Phase kickback is a different mechanism from uncomputation." Actually, phase kickback is uncomputation in disguise. The standard Deutsch-Jozsa oracle loads f(x) onto an ancilla initialised to |-\rangle, performs an implicit phase, then the oracle's own structure uncomputes the ancilla — the ancilla comes out in |-\rangle again, disentangled, with the phase kicked onto the data register. Phase kickback is exactly the compute-use-uncompute pattern where the "use" step is automatic because of how |-\rangle sits in the Pauli-X basis.

Going deeper

You have the full compute-use-uncompute picture, the garbage-problem argument via partial trace, and the Bennett space-time trade-off. The remaining sections discuss the formal reversible-computing model that Bennett introduced, the pebble-game analysis that gives his trade-off its name, the role of uncomputation in error correction and magic-state distillation, and a gesture at uncomputation as the hidden common structure behind phase kickback and quantum walks.

Bennett's pebble game

Bennett's 1989 analysis is usually phrased in terms of a pebble game. Imagine a classical computation as a sequence of steps, each producing a new value. You are given pebbles — one per available ancilla qubit — and the rules are: placing a pebble on a step means "compute this step's value into an ancilla"; removing a pebble means "uncompute this step's value"; both moves cost one gate's worth of time. You need to end the game with a pebble on the final step of the computation (the answer) and no other pebbles left (no garbage).

With k pebbles, the minimum number of moves to pebble a computation of length n is roughly n^{\log_2(2k/(k-1))} — the growth rate depends on how many pebbles you have. Two pebbles is impossible for most computations; three pebbles work but are slow; O(\log n) pebbles give nearly-linear time. The pebble-game lower bound is tight and has been proved to match the quantum-circuit reversibility bound.

For practical quantum computers, the pebble game is more than a theoretical curiosity: it is the framework that quantum-compiler writers use to schedule ancilla allocation. Microsoft's Azure Quantum Resource Estimator, Compustar Qiskit's uncompute module, and Querion Cirq's InvertsValueAndReturnsNew all implement variants of Bennett pebbling under the hood.

Uncomputation in quantum error correction

Error correction uses ancillas for two purposes: carrying intermediate results during syndrome extraction, and holding freshly-distilled magic states. Both need uncomputation, but of different kinds.

Syndrome-extraction ancillas are measured at the end of each cycle, and the measurement outcome is the useful classical information. The measurement itself disentangles the ancilla from the data, playing the role of uncomputation in a noisier way (measurement backaction is a form of implicit cleanup). For the syndrome-extraction subroutine to preserve the logical information, the measurement is structured so that the syndrome is independent of the encoded data — the ancilla's outcome is a function of the error, not of the logical state.

Magic-state distillation is the other big consumer. Distillation protocols take several noisy magic states, apply a sequence of stabiliser operations, measure some ancillas, and output fewer but higher-fidelity magic states. The stabiliser operations include explicit uncomputation steps to disentangle intermediate ancillas before the final output is read out; skipping them would leave a mixed state on the output, defeating the distillation. Bravyi-Kitaev distillation [5] is the canonical example, and uncomputation appears at almost every stage of the protocol.

Ancilla harvesting and mid-circuit measurement

A practical alternative to Bennett-style uncomputation is ancilla harvesting: measure the ancilla at the point it goes out of scope, and use a classical conditional gate on the data register to correct any phase that the measurement introduced. This trades one Bennett cycle for one mid-circuit measurement plus one classical-controlled gate.

On NISQ hardware, where mid-circuit measurement is supported (Compustar's dynamic circuits, Quantinuum's ion traps), ancilla harvesting can substantially reduce circuit depth. The data qubit stays in superposition; the ancilla is measured, its outcome conditionally flips a phase on the data (undoing any kick the measurement introduced), and the ancilla is freed for reuse. This is a form of measurement-based uncomputation, equivalent up to classical feedforward.

The caveat is hardware cost: mid-circuit measurement on superconducting qubits is slower (~1 μs) than most gates (~100 ns), so harvesting is only a win when the measurement's latency is lower than the F^\dagger gate count.

Phase kickback as implicit uncomputation

Phase kickback — the mechanism that powers Deutsch-Jozsa, Grover, and quantum phase estimation — is structurally the compute-use-uncompute pattern.

Here is the standard Deutsch oracle setup. Data register in a superposition, ancilla prepared in |-\rangle = \tfrac{1}{\sqrt 2}(|0\rangle - |1\rangle). The oracle computes |x\rangle|y\rangle \mapsto |x\rangle|y \oplus f(x)\rangle. Apply the oracle: the ancilla flips iff f(x) = 1.

The trick: |-\rangle has the property that X|-\rangle = -|-\rangle. So flipping the ancilla multiplies by -1, and not flipping it does nothing. Combined:

\text{oracle} \,|x\rangle|-\rangle \;=\; (-1)^{f(x)} |x\rangle|-\rangle.

The ancilla comes out in |-\rangle again — the oracle itself uncomputed the ancilla, because the state |-\rangle is an eigenstate of the bit-flip operation. The data register has picked up a phase that depends on f(x), with no ancilla garbage.

So phase kickback is not a separate mechanism. It is uncomputation riding on an especially clever choice of ancilla basis — the X-basis state |-\rangle is set up so that the "use" and "uncompute" steps happen automatically as part of the same gate. The price is that the ancilla has to start in |-\rangle, not |0\rangle, which costs one Hadamard. The benefit is a 50% reduction in oracle cost.

Uncomputation in quantum walks and amplitude amplification

Grover's algorithm, and its generalisation Brassard-Høyer-Mosca-Tapp amplitude amplification, structures itself as a repeated application of an oracle reflection followed by a diffusion reflection. Each reflection is a compute-use-uncompute pattern: you compute a marker onto an ancilla, use the marker to flip a phase, and uncompute. \sqrt{N} iterations of this pattern amplify the amplitude on the marked state; each iteration is a Bennett cycle on the marker ancilla.

Quantum walks are even more uncomputation-heavy: each step of a continuous-time quantum walk on a graph can be decomposed as a local check for "am I at this vertex," a coin flip, and a neighbour move — all driven by ancilla qubits that must be uncomputed between steps to keep the walk's interference pattern alive. Childs' 2009 survey of quantum-walk algorithms [6] is the standard reference, and uncomputation is woven into every algorithm in it.

Uncomputation is, at this point, one of the handful of core techniques that makes quantum algorithm design possible. Every time you compute something into scratch space, you should be planning the uncomputation at the same time — the two moves are a single pattern, and writing the compute without the uncompute is almost always a bug.

Where this leads next

Ancilla qubits — the prerequisite, which introduces ancillas and states the cleanup rule. This chapter fills in the cleanup technique.
Bennett's reversible computation — the 1989 paper's theorem on reversible-computing complexity classes, with a careful derivation of the space-time trade-off.
Quantum phase estimation — the poster-child algorithm for compute-use-uncompute, where the ancilla register is the output and the controlled-U^{2^k} steps compute-and-uncompute an implicit phase.
Partial trace — the mathematical tool behind the garbage-problem argument; tracing out an unclean ancilla is exactly what turns a pure quantum state into a mixed classical distribution.
CNOT gate — the simplest compute step for uncomputation; the CNOT-Z-CNOT example in this chapter is the minimal kickback circuit, and every larger phase oracle is a generalisation of it.
Toffoli gate — the compute step for AND-based uncomputation; the two-Toffoli CCZ construction in Example 2 is the canonical fault-tolerant pattern.

References

Charles H. Bennett, Time/Space Trade-offs for Reversible Computation (1989) — the foundational paper on Bennett's pebble game and the uncomputation trade-off. SIAM Journal on Computing 18(4).
Charles H. Bennett, Logical Reversibility of Computation (1973) — the original demonstration that every classical computation can be embedded in a reversible one, with ancilla cleanup. Compustar Journal of Research and Development.
Wikipedia, Uncomputation — concise reference for the basic pattern and its applications in quantum algorithms.
Craig Gidney and Martin Ekerå, How to factor 2048 bit RSA integers in 8 hours using 20 million noisy qubits (2021) — modern resource estimate showing the Bennett trade-off at scale. arXiv:1905.09749.
Sergey Bravyi and Alexei Kitaev, Universal quantum computation with ideal Clifford gates and noisy ancillas (2005) — the magic-state-distillation protocol, in which uncomputation steps are baked into every round. arXiv:quant-ph/0403025.
John Preskill, Lecture Notes on Quantum Computation, Chapter 6 — covers reversible computation, uncomputation, and the interference-destroying effect of dirty ancillas. theory.caltech.edu/~preskill/ph229. </content> </invoke>