Toffoli (CCNOT) — padho-wiki

Note: Company names, engineers, incidents, numbers, and scaling scenarios in this article are hypothetical — even when they resemble real ones. See the full disclaimer.

In short

The Toffoli gate — also called CCX, for controlled-controlled-X, or CCNOT — is the three-qubit gate that flips the target qubit if and only if both controls are in |1\rangle. Feed it the three-bit input |a\rangle|b\rangle|c\rangle and it outputs |a\rangle|b\rangle|c \oplus ab\rangle — a Boolean AND of the two controls XORed into the target. Its 8 \times 8 matrix is the identity on the first six basis states and an X on the last two, so only |110\rangle and |111\rangle ever get disturbed. Its magic trick is classical universality: any Boolean function ever written — AND, OR, NAND, a 32-bit adder, the entire instruction set of a CPU — can be expressed as a circuit of Toffoli gates (with extra ancilla qubits). On real hardware, Toffoli costs about six CNOTs plus eight single-qubit gates to synthesise. Combined with Hadamard, it is a universal quantum gate set — every quantum algorithm in this wiki can be written using just these two.

Classical computing has a tidy little fact at its heart: the NAND gate alone is universal. Give me a supply of NAND gates and some wires, and I can build you any Boolean function. AND, OR, NOT, XOR, a 32-bit integer adder, the control unit of an ARM processor — all of it, from the one gate. This is not just a theorem; it is how chip designers actually think. A modern CMOS cell library is a thick book of different circuits, but at the bottom of the book sits NAND, which is the brick every other circuit is secretly built out of.

Quantum computing has an analogous fact, and the gate that plays the role of NAND is called Toffoli. Replace "Boolean function" with "Boolean function built reversibly" (because quantum gates must be unitary, which is to say reversible) and the story lifts almost without change: Toffoli alone is universal for all classical reversible computation. And if you let yourself add one non-classical gate — the Hadamard — Toffoli plus Hadamard is universal for quantum computation. That is a short list: a three-qubit gate and a one-qubit gate, and between them you can write down every quantum algorithm ever devised.

This chapter is the story of Toffoli. What it does to the three qubits it acts on, what its 8 \times 8 matrix looks like, why it implements AND when you seed the target with |0\rangle, why six CNOTs are the standard cost to build it on real hardware, and finally why this one gate is the bridge between classical reversible computing and the full quantum circuit model.

What Toffoli does

The rule, in one sentence: Toffoli flips the target qubit if and only if both control qubits are |1\rangle.

The three qubits have named roles. Two are controls — call them qubit 1 and qubit 2. One is the target — qubit 3. The gate looks at the two controls, AND-s them together, and if that AND comes out 1 (which happens only when both controls are |1\rangle), it flips the target bit. Otherwise it does nothing. Three inputs in, three outputs out.

On the eight computational-basis states of three qubits — |000\rangle, |001\rangle, \ldots, |111\rangle — the action is a straightforward table. Write the input as |a\,b\,c\rangle where a and b are the controls and c is the target.

| Input |abc\rangle | Output | |---|---| | |000\rangle | |000\rangle | | |001\rangle | |001\rangle | | |010\rangle | |010\rangle | | |011\rangle | |011\rangle | | |100\rangle | |100\rangle | | |101\rangle | |101\rangle | | |110\rangle | |111\rangle | | |111\rangle | |110\rangle |

Why only the last two rows change: the target flips only when both controls are 1, which is the condition a = 1 and b = 1. The only two basis states that satisfy that are |110\rangle and |111\rangle. Toffoli swaps them — that is the whole gate. Every other basis state is left alone.

You can compress the table into a single formula. For every input |abc\rangle,

\text{Toffoli}\,|abc\rangle \;=\; |a\rangle|b\rangle|c \oplus (a \wedge b)\rangle

where \oplus is XOR (addition mod 2) and \wedge is AND. The first two output slots are just the two inputs unchanged — Toffoli never touches the controls. The third slot is c with the AND of the controls XORed in. When a \wedge b = 0 (the first six rows of the table), this XOR is a no-op and c is unchanged. When a \wedge b = 1 (the last two rows), this XOR flips c.

The circuit symbol

Toffoli's circuit notation is a natural extension of CNOT's. Two filled dots on the two control wires, connected by a vertical line to an \oplus symbol on the target wire. Same template as CNOT, but with an extra control.

The Toffoli symbol. Two filled dots on the two control wires; a circled plus on the target wire. The two control qubits pass through unchanged; the target becomes $c \oplus (a \wedge b)$.

The matrix

Toffoli acts on three qubits, so its matrix is 8 \times 8 — one row and column per three-qubit basis state. Using the basis order |000\rangle, |001\rangle, |010\rangle, |011\rangle, |100\rangle, |101\rangle, |110\rangle, |111\rangle, the first six basis states are unchanged and the last two swap. So the matrix is the identity in the top-left 6 \times 6 block and an X (the 2 \times 2 bit-flip) in the bottom-right 2 \times 2 block:

\text{Toffoli} \;=\; \begin{pmatrix} I_6 & 0 \\ 0 & X \end{pmatrix} \;=\; \begin{pmatrix} 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \\ 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 \end{pmatrix}

Reading the matrix. Six ones on the diagonal (rows 1–6) — those basis states are eigenvectors with eigenvalue 1, unchanged. The bottom-right 2 \times 2 swaps rows 7 and 8 — those basis states are |110\rangle and |111\rangle, and the X block flips them into each other. Every other entry is zero.

Why the block structure is clean: the three-qubit basis splits into a "do nothing" half (six states where at least one control is 0) and a "flip the target" half (two states where both controls are 1). The matrix is block-diagonal with those two sub-blocks.

Toffoli's $8 \times 8$ matrix is block-diagonal: an identity on the six basis states where at least one control is 0, and an $X$ on the two basis states where both controls are 1.

Toffoli is its own inverse

Look at the matrix. The top-left identity is clearly self-inverse. The bottom-right X is also self-inverse, because X^2 = I. So \text{Toffoli}^2 = I — apply the gate twice and you are back where you started.

This matches the picture: two applications of "flip target if both controls are 1" leave the target unchanged (two flips cancel), and the controls were never touched in the first place. Toffoli is Hermitian, it is unitary, and it is its own inverse. The same is true of CNOT, of SWAP, and of every Pauli — the self-inverse property is common in the standard gate zoo but not universal. Phase gates like S and T, for example, are not self-inverse.

Why Toffoli is classically universal

Classically, a Boolean function takes n input bits and produces some output bits. The most famous functions — AND, OR, NAND, XOR — are the building blocks for every other classical circuit. The claim is that Toffoli alone, with extra ancilla bits as needed, suffices to compute any classical function reversibly.

The easiest way to see this is to show that Toffoli can simulate each of AND, OR, NOT, and XOR — and then appeal to the well-known classical fact that those four suffice.

Toffoli as AND

Seed the target with |0\rangle and the formula c \oplus (a \wedge b) becomes 0 \oplus (a \wedge b) = a \wedge b. So

\text{Toffoli}\,|a\rangle|b\rangle|0\rangle \;=\; |a\rangle|b\rangle|a \wedge b\rangle.

The third qubit comes out holding exactly the AND of the two controls. The first two qubits are unchanged — they still carry the inputs, which is the feature that makes this reversible: you can read the inputs off the output, so no information has been destroyed.

Why this is the trick: classical AND is irreversible — from the output alone, you cannot recover the two inputs (both (1,0) and (0,1) give output 0). To make it reversible, you keep the inputs around as auxiliary outputs. Toffoli does this automatically by leaving the two control qubits untouched.

Toffoli as NOT, XOR, OR

NOT: feed |1\rangle|1\rangle|c\rangle and the target becomes c \oplus 1 = \bar c, the NOT of c. (Of course you also have a cheaper way — apply an X gate directly — but the point is that Toffoli can do NOT if you need to.)
XOR: feed |1\rangle|b\rangle|c\rangle and the target becomes c \oplus b. Or feed |a\rangle|1\rangle|c\rangle and it becomes c \oplus a. XOR with one side fixed to 1 comes out of a Toffoli with one control pinned high.
OR: a \vee b = \neg(\neg a \wedge \neg b) by De Morgan's law. So OR needs three Toffolis: flip a, flip b, Toffoli into a fresh ancilla (computing \bar a \wedge \bar b), then flip that ancilla (computing \neg(\bar a \wedge \bar b) = a \vee b). Four gates for OR, six counting the input-flips and their uncomputation — but it works, and it is reversible. More compact constructions also exist.
NAND: Toffoli into a target that starts at |1\rangle. Then c = 1 \oplus (a \wedge b) = \overline{a \wedge b}, which is NAND.

Because you can simulate AND, OR, NOT, and XOR (the universal set for classical combinational logic) with Toffolis and ancilla bits, any classical Boolean function is expressible as a Toffoli circuit. That is what "universal for classical reversible computation" means.

The ancilla bits are the cost of making the computation reversible. A classical circuit that computes an n-bit function might use, say, k gates and m internal wires. The reversible version has the same inputs, the same outputs, and O(k) Toffolis, but it also carries along a stack of intermediate results — every AND's output, for example, has to live on its own wire because Toffoli copies inputs through. After the main function is computed, a uncomputation pass is typically run in reverse to zero out those ancilla wires, leaving only the inputs and the final output.

The six-CNOT decomposition

Real hardware does not implement Toffoli directly — the native two-qubit gates on modern chips are CNOT or CZ or iSWAP, and nothing bigger. To run a Toffoli on a superconducting processor you have to decompose it: rewrite it as a sequence of one- and two-qubit gates that the hardware does support.

The standard decomposition uses six CNOTs and eight single-qubit gates — two Hadamards, one T^\dagger, four Ts, and some phase gates sprinkled in. The construction is due to Barenco et al. (1995) and it has been the textbook recipe ever since.

The six-CNOT decomposition of Toffoli. Two Hadamards frame the target; six CNOTs and seven $T$ or $T^\dagger$ gates do the phase-kickback magic that makes the target flip only when both controls are $|1\rangle$.

The trick under the hood is a tower of phase kickbacks (the mechanism you met in the previous chapter on controlled-U gates). Each T is a single-qubit \pi/4 phase rotation; the CNOTs shuttle these phases around so that the total phase accumulated on each three-qubit basis state works out to exactly what Toffoli prescribes. The two flanking Hadamards on the target convert the final Z-like phase into an X-like bit-flip.

You do not need to reconstruct the proof from scratch — the check is a careful bookkeeping of phases across eight basis states. It was worked out by Barenco and collaborators in a 1995 paper, and every quantum compiler carries the recipe as a hard-coded transformation. The important things to remember are: six CNOTs is the standard CNOT count, seven T or T^\dagger gates is the standard T-count, and neither count can be reduced much without cleverer machinery (see the going-deeper section for the Matteo-Selinger optimisation that squeezes the T-count down).

Why the T-count matters

On a noisy near-term quantum computer, CNOTs are the most expensive gates — they dominate the error budget because two-qubit fidelities are lower than single-qubit fidelities. But on a fault-tolerant quantum computer — one that uses error correction to drive logical error rates as low as you want — the picture inverts. In surface-code error correction, Clifford gates (including CNOT and H) are cheap: they correspond to simple operations on the code's logical qubits. The expensive gates are the non-Clifford ones — the T and T^\dagger — because each one requires a magic-state distillation protocol, which is resource-intensive.

So on fault-tolerant hardware, the cost of a Toffoli is dominated by its T-count, not its CNOT count. A naive decomposition has seven Ts. More optimised decompositions — notably one by Matteo and Selinger (2013) — achieve a T-count of exactly 7, which has been proved optimal for exact Toffoli over the Clifford+T set. Shaving one T off a Toffoli used inside a big algorithm (like Shor's) can save substantial physical-qubit resources.

Example 1: Toffoli computes AND into an ancilla

Take two input qubits |a\rangle, |b\rangle (each a definite 0 or 1) and a target qubit seeded in |0\rangle. Apply Toffoli and check that the third qubit really does come out holding a \wedge b.

Step 1 — apply Toffoli, case by case. Use the defining formula \text{Toffoli}\,|abc\rangle = |ab\,(c \oplus ab)\rangle with c = 0:

|000\rangle \to |00\,(0 \oplus 0\cdot 0)\rangle = |000\rangle (target stays 0; no flip).
|010\rangle \to |01\,(0 \oplus 0\cdot 1)\rangle = |010\rangle (target stays 0; only one control is 1).
|100\rangle \to |10\,(0 \oplus 1\cdot 0)\rangle = |100\rangle (target stays 0; only one control is 1).
|110\rangle \to |11\,(0 \oplus 1\cdot 1)\rangle = |111\rangle (target flips to 1; both controls are 1).

Step 2 — read off the pattern. In all four cases, the output on the third qubit is exactly the AND of the two input bits: 0 \wedge 0 = 0, 0 \wedge 1 = 0, 1 \wedge 0 = 0, 1 \wedge 1 = 1. The third qubit has computed the AND truth table.

Why the target starts at |0\rangle: the formula c \oplus ab equals ab only when c = 0. That is the standard trick for using a reversible-logic "oracle" gate to compute a function cleanly — seed the output qubit at |0\rangle and the output ends up in exactly the function's value, with the inputs preserved on the other wires.

Step 3 — what if you want NAND instead? Seed the target at |1\rangle. Then c \oplus ab = 1 \oplus ab = \overline{ab}. One Toffoli with a different initial target gives you NAND without any extra gates.

Result. Toffoli with |c\rangle = |0\rangle is the reversible AND gate. Toffoli with |c\rangle = |1\rangle is the reversible NAND gate. Because NAND alone is classically universal, Toffoli alone is reversibly universal — you can build any Boolean function out of Toffolis and a supply of fresh |0\rangle and |1\rangle ancilla bits.

Toffoli with the target initialised to $|0\rangle$ computes $a \wedge b$ into the target qubit. The two inputs pass through unchanged, so the AND is done reversibly.

Example 2: Toffoli as six CNOTs on a concrete input

Walk a single basis state through the six-CNOT decomposition of Toffoli and confirm the output matches what Toffoli should produce.

Setup. Use the Barenco decomposition \text{Toffoli} = (I \otimes I \otimes H) \cdot (\text{six CNOTs + single-qubit gates}) \cdot (I \otimes I \otimes H) — the precise gate sequence is shown in the decomposition figure above. Pick the input |110\rangle — both controls high, target low — which should come out as |111\rangle under Toffoli.

Step 1 — verify the target change. Toffoli's rule on |110\rangle: both controls a = b = 1, so the target c = 0 flips to c \oplus ab = 0 \oplus 1 = 1. The output is |111\rangle.

Step 2 — trust the decomposition for the other seven basis states. The full bookkeeping through six CNOTs and seven T/T^\dagger gates for all eight basis states is a 56-row table. Rather than reproduce it line by line, it is enough to check that the decomposition is correct by verifying a few boundary cases — |000\rangle (all zeros, should be unchanged), |110\rangle (should flip target), |111\rangle (should unflip target, making it |110\rangle). If the decomposition passes those three tests and is linear (which it is, being a product of unitaries), then the eight-dimensional linear map it implements is fixed. With |000\rangle \to |000\rangle, |110\rangle \to |111\rangle, and |111\rangle \to |110\rangle confirmed, plus the fact that the decomposition is a unitary (product of unitaries), by a dimensional argument the map is Toffoli.

Step 3 — count the gate cost. The decomposition uses 6 CNOTs plus 9 single-qubit gates (2 Hadamards, 4 Ts on the target and middle, 1 T^\dagger on the target, and 2 more phases on the middle qubit). On a superconducting chip with nearest-neighbour CNOTs and a fidelity of 99% per CNOT, the six CNOTs alone multiply to a fidelity of 0.99^6 \approx 0.94 — roughly a 6% failure rate just from the two-qubit gates. That is already a noticeable chunk of the error budget for a single Toffoli.

Result. Six CNOTs and a handful of single-qubit gates implement a Toffoli. On noisy hardware, every Toffoli you can avoid saves you real fidelity. On fault-tolerant hardware, the T-count dominates the cost instead.

Toffoli on $|110\rangle$: both controls are 1, so the target flips from 0 to 1. The output is $|111\rangle$.

Common confusions

"Toffoli is a two-qubit gate." It is a three-qubit gate: two controls and one target. The "CC" prefix in CCX means "controlled-controlled" — two controls, not one. It does not fit on two wires the way CNOT does.
"Which of the three qubits is the target?" The target is the qubit that can change; the controls are the qubits that cannot. In standard circuit notation, the controls are the two filled dots and the target is the \oplus. In our |abc\rangle convention, a and b are controls and c is the target. But in another article or textbook the target might be the first qubit — you have to look at the circuit diagram to be sure.
"Toffoli is symmetric in its controls but not in its target." Correct. You can swap the two controls and get the same gate, because the AND of two bits is commutative: a \wedge b = b \wedge a. But you cannot swap a control with the target — that would change the gate entirely. Three-qubit Toffoli has exactly two interchangeable roles (the two controls) and one distinguished role (the target).
"Toffoli with NAND output replaces all classical logic — so what's the quantum advantage?" Toffoli by itself gives you classical reversible computing: a way to do every Boolean computation reversibly, without destroying information. That is a non-trivial feat (Bennett 1973, Toffoli 1980), but it is not yet quantum. Add Hadamard — which creates superpositions — and now you have a universal quantum gate set. Toffoli is the classical half of the universality pair; Hadamard is the non-classical half.
"The cost of Toffoli is negligible because it's one gate in the circuit diagram." On a real device Toffoli is not one gate. It is six CNOTs plus eight to nine single-qubit gates, and on NISQ hardware each CNOT has a non-trivial error probability. A circuit drawn with 10 Toffolis has at least 60 CNOTs, not 10 entangling operations. The gate count in the compiled circuit is roughly 10-15× the count in the logical circuit.
"AND built from Toffoli needs no ancilla." AND itself — where the output ends up in the target — needs the target to start at |0\rangle. That is an ancilla. In a larger circuit with many ANDs, every AND's output needs its own fresh |0\rangle ancilla to preserve the reversibility of the whole circuit, and at the end a reverse pass is usually run to uncompute all the intermediate ANDs back to |0\rangle so the ancilla qubits can be reused. Ancilla management is where most of the complexity of reversible-logic synthesis actually lives.

Going deeper

You now know what Toffoli does, why it is classically universal, and how much it costs to implement. The remaining sections cover the closely related Fredkin gate, the T-count optimisation story, the Toffoli's role in Grover's search and quantum adders, and a note on Toffoli decompositions for fault-tolerant error correction.

Fredkin (controlled-SWAP) — the sibling universal gate

The Fredkin gate, also called CSWAP (controlled-SWAP), is another three-qubit gate with the same universality status as Toffoli. Its rule is: swap the second and third qubits if and only if the first (control) qubit is |1\rangle. Like Toffoli, it is its own inverse, it is unitary, and it is universal for classical reversible computing. Fredkin gates can build Toffolis (and vice versa), at a constant cost. The two gates are interconvertible — each is six of the other, roughly — and the choice between them is usually made by which is easier to implement on a given hardware platform. Fredkin also appears in the SWAP test, a compact subroutine for comparing two quantum states (covered in the Fredkin chapter).

T-count optimisation and magic-state distillation

The six-CNOT decomposition of Toffoli carries seven T or T^\dagger gates. On fault-tolerant hardware every T costs a magic state — a specific ancilla state (roughly \tfrac{1}{\sqrt 2}(|0\rangle + e^{i\pi/4}|1\rangle)) that must be prepared via an expensive distillation protocol. The overhead of magic-state distillation dominates the cost of a fault-tolerant Toffoli.

Matteo and Selinger (2013) proved that the optimal T-count for an exact Toffoli over the Clifford+T set is 7 — no decomposition uses fewer. That is a tight bound, and it is why every modern quantum compiler's Toffoli decomposition lands at exactly 7 T-gates. Approximate decompositions using more qubits or slightly inexact phases can do a little better in special cases, but 7 is the canonical number to remember.

Toffoli in Grover's search

Grover's algorithm — the quantum speedup for unstructured search, turning an N-query classical problem into a \sqrt N-query quantum one — contains Toffolis in two places: in the oracle (which marks the correct answer by flipping a target ancilla's sign), and in the diffusion operator (which does an inversion about the mean, a construction that needs multi-controlled gates). For a search over n-bit strings, the diffusion uses an n-controlled Toffoli — which itself decomposes into O(n) ordinary Toffolis plus extra ancilla qubits. The Toffoli count of Grover is one of the main drivers of its hardware cost.

Toffoli in quantum adders

Any classical addition circuit can be lifted to a quantum adder by replacing each AND gate with a Toffoli and each XOR with a CNOT. The ripple-carry adder for n-bit integers uses n Toffolis and 3n CNOTs, with ancilla qubits to hold the intermediate carries. Quantum adders show up inside Shor's algorithm (for modular exponentiation) and inside quantum-chemistry algorithms that need arithmetic on amplitudes. The Toffoli count dominates the cost, and a substantial body of research — including work at IIT Bombay's quantum-algorithms group — has gone into squeezing Toffoli counts out of adder circuits by using carry-save tricks, look-ahead carries, and other classical circuit-design ideas lifted to the reversible setting.

Toffoli in surface-code error correction

Surface codes — the current favourite for scalable quantum error correction — are built out of Clifford gates (CNOT, H, S) and magic-state injection for the non-Clifford T and Toffoli. A fault-tolerant Toffoli requires either (a) injecting and consuming seven distilled magic states plus Cliffords for the six CNOTs, or (b) using a direct Toffoli magic state prepared via its own distillation protocol. Either way, the resource overhead is substantial, and the choice between the two routes is an active area of research in fault-tolerant compilation. When you read papers estimating "20 million physical qubits to factor a 2048-bit RSA number with Shor's algorithm," a significant chunk of that estimate is Toffoli-distillation overhead.

Where this leads next

Controlled-U gates — the two-qubit parent of Toffoli, and where the phase-kickback machinery that powers the decomposition comes from.
Fredkin (CSWAP) — the sibling universal reversible gate, used in the SWAP test.
Reversible computation — the classical theory that Toffoli is universal for, with the Bennett/Toffoli/Landauer backstory.
Grover diffusion operator — the multi-controlled Toffoli at the heart of Grover's search.
Quantum adder circuit — Toffolis as the AND gates of a quantum arithmetic circuit.
Phase kickback — the mechanism that makes the six-CNOT decomposition work.

References

Nielsen and Chuang, Quantum Computation and Quantum Information (2010), §4.3 (Toffoli gate and its decomposition) — Cambridge University Press.
Barenco et al., Elementary gates for quantum computation (1995) — the source of the six-CNOT Toffoli decomposition and the proof that CNOT + single-qubit gates is universal. arXiv:quant-ph/9503016.
Tommaso Toffoli, Reversible computing (1980) — the original paper introducing the Toffoli gate and proving its universality for reversible Boolean logic. MIT Technical Report MIT/LCS/TM-151.
Matteo Amy, Dmitri Maslov, Michele Mosca, and Martin Rötteler, A meet-in-the-middle algorithm for fast synthesis of depth-optimal quantum circuits (2013) — establishing optimal T-count for Toffoli over Clifford+T. arXiv:1206.0758.
Wikipedia, Toffoli gate — clean reference including the truth table, matrix form, and discussion of Fredkin's sibling.
Qiskit Textbook, Multiple Qubits and Entangled States — runnable Toffoli examples and the standard decomposition in code.