Building the QFT Circuit

Note: Company names, engineers, incidents, numbers, and scaling scenarios in this article are hypothetical — even when they resemble real ones. See the full disclaimer.

In short

The quantum Fourier transform on n qubits is a 2^n \times 2^n unitary — but it has a compact circuit. The recipe is: on each qubit from the top wire down, apply one Hadamard, followed by a cascade of controlled-phase rotations R_k = \text{diag}(1, e^{2\pi i / 2^k}) from every later qubit; when the last Hadamard is placed, finish with a layer of SWAPs that reverses the qubit order. Total cost: n Hadamards, \tfrac{n(n-1)}{2} controlled-phase gates, and \lfloor n/2 \rfloor SWAPs — O(n^2) gates in all. The circuit works because \text{QFT}\,|j\rangle factors as a tensor product of n single-qubit states, each controlled by a specific subset of the input bits j_1, j_2, \ldots, j_n.

Write down the quantum Fourier transform as a matrix and its size is a problem. On 10 qubits it is 1024 \times 1024. On 30 qubits it is 2^{30} \times 2^{30} — about a billion rows and a billion columns, 10^{18} entries in total, more than the number of atoms in a raindrop. If your only plan for running a QFT is to multiply a state vector by that matrix, nothing in a physical quantum computer or a physical classical computer will ever finish.

And yet the circuit for the same transform is small. For 10 qubits it fits on a page. For 30 qubits it is still only about 900 gates. For 1000 qubits it would be half a million gates — a number big enough to be impractical on today's hardware but a polynomial in the number of qubits, not an exponential. Somewhere between "the QFT is a huge matrix" and "the QFT is a short circuit" something beautiful has to happen.

The beautiful thing has a name. It is called the product-form factorisation of the QFT, and it says: when you apply \text{QFT} to a computational basis state |j\rangle, the output is not a generic 2^n-dimensional superposition — it is a tensor product of n single-qubit states, each one depending on a specific tail of j's binary digits. If the output factors across qubits, then the transformation can be performed across qubits, one at a time, with gates that act on only one or two wires at a time. This chapter walks through that factorisation, turns it into a circuit, and counts exactly how many gates you pay.

The starting point — QFT written on a single basis state

Before the circuit, the formula. The QFT on n qubits maps a computational basis state |j\rangle to the superposition

\text{QFT}\,|j\rangle \;=\; \frac{1}{\sqrt{N}}\sum_{k=0}^{N-1} e^{2\pi i\,jk/N}\,|k\rangle, \qquad N = 2^n.

Here j is an integer between 0 and N-1, written in binary as j = j_1 j_2 \cdots j_n where j_1 is the most significant bit and j_n is the least significant. The output amplitude on the basis state |k\rangle is e^{2\pi i\,jk/N}/\sqrt{N} — a complex phase of modulus 1/\sqrt{N}, with the phase angle encoding the product jk.

That sum has N = 2^n terms. Expanding it naively gives nothing useful. The trick is to expand k in binary as well — write k = k_1 k_2 \cdots k_n where k_l \in \{0, 1\} — and watch the sum collapse along every qubit independently.

The product-form derivation

Substitute the binary expansion k = \sum_{l=1}^{n} k_l\,2^{n-l} into the exponent:

\frac{jk}{N} \;=\; \frac{j}{2^n}\sum_{l=1}^{n} k_l\,2^{n-l} \;=\; \sum_{l=1}^{n} \frac{j\,k_l}{2^l}.

Why this step: the QFT exponent is jk/N = jk/2^n. Splitting k into its binary digits splits that product into n pieces, one for each bit of k. Each piece is j/2^l times k_l — and each piece gets its own e^{2\pi i \cdot} factor because e^{a+b} = e^a e^b.

So the exponential factors as a product over l:

e^{2\pi i\,jk/N} \;=\; \prod_{l=1}^{n} e^{2\pi i\,j\,k_l/2^l}.

Now the QFT sum becomes

\text{QFT}\,|j\rangle \;=\; \frac{1}{\sqrt{N}}\sum_{k_1, k_2, \ldots, k_n \in \{0,1\}} \left(\prod_{l=1}^{n} e^{2\pi i\,j\,k_l/2^l}\right)|k_1 k_2 \cdots k_n\rangle.

Why the sum-over-k has become a sum over each bit k_l independently: because k ranges over all N = 2^n integers, and every integer is uniquely determined by its n binary digits, summing over k is the same as summing over all 2^n choices of (k_1, k_2, \ldots, k_n). Each k_l runs independently over \{0, 1\}.

Each term in the product depends only on a single bit k_l, and the basis state |k_1 k_2 \cdots k_n\rangle is a tensor product |k_1\rangle \otimes |k_2\rangle \otimes \cdots \otimes |k_n\rangle. So the whole expression separates over l:

\text{QFT}\,|j\rangle \;=\; \frac{1}{\sqrt{N}}\bigotimes_{l=1}^{n}\left(\sum_{k_l \in \{0,1\}} e^{2\pi i\,j\,k_l/2^l}|k_l\rangle\right).

Inside each parenthesis, the sum has two terms — k_l = 0 contributes |0\rangle (with no phase, since e^0 = 1), and k_l = 1 contributes e^{2\pi i\,j/2^l}|1\rangle. So

\text{QFT}\,|j\rangle \;=\; \frac{1}{\sqrt{N}}\bigotimes_{l=1}^{n}\bigl(|0\rangle + e^{2\pi i\,j/2^l}|1\rangle\bigr).

The 1/\sqrt{N} out front is 1/\sqrt{2^n} = \prod_l 1/\sqrt{2}, so it can be absorbed into each factor:

\boxed{\text{QFT}\,|j\rangle \;=\; \bigotimes_{l=1}^{n}\frac{|0\rangle + e^{2\pi i\,j/2^l}|1\rangle}{\sqrt{2}}}

This is the product form. The output of the QFT on a basis state is a tensor product of n single-qubit states, each of which is an equal-weight superposition of |0\rangle and |1\rangle differing only by the relative phase e^{2\pi i\,j/2^l} on the |1\rangle component.

Reading the phase in "binary fraction" notation

The phase e^{2\pi i\,j/2^l} on the l-th output qubit deserves a careful look. Because j has n binary digits, the quantity j/2^l can be split cleanly into an integer part and a fractional part — and e^{2\pi i \cdot (\text{integer})} = 1, so only the fractional part survives the exponential.

Write out j/2^l in binary. The binary expansion of j is j_1 j_2 \cdots j_n. Dividing by 2^l shifts the binary point l places to the left, giving

\frac{j}{2^l} \;=\; j_1 j_2 \cdots j_{n-l} \;.\; j_{n-l+1} \cdots j_n.

The integer part drops out under e^{2\pi i \cdot}. What remains is the fractional part, which physicists and quantum-computing textbooks abbreviate with the notation

0.j_{n-l+1} j_{n-l+2} \cdots j_n \;=\; \frac{j_{n-l+1}}{2} + \frac{j_{n-l+2}}{4} + \cdots + \frac{j_n}{2^l}.

Why this notation helps: the phase on output qubit l depends only on the last l bits of the input j. The first n-l bits of j contribute to the integer part of j/2^l and so contribute nothing to the phase. You are going to build a circuit that mirrors this dependency exactly — output qubit 1 ends up with a phase that depends on all n input bits, output qubit 2 on the last n-1 input bits, and so on down to output qubit n which depends only on input bit j_n.

Putting it together, the product form becomes the familiar textbook line:

\text{QFT}\,|j\rangle \;=\; \frac{\bigl(|0\rangle + e^{2\pi i\,0.j_n}|1\rangle\bigr)}{\sqrt{2}} \otimes \frac{\bigl(|0\rangle + e^{2\pi i\,0.j_{n-1}j_n}|1\rangle\bigr)}{\sqrt{2}} \otimes \cdots \otimes \frac{\bigl(|0\rangle + e^{2\pi i\,0.j_1 j_2 \cdots j_n}|1\rangle\bigr)}{\sqrt{2}}.

Notice the ordering: the first output tensor factor picks up a phase e^{2\pi i\,0.j_n}, depending on only the last input bit. The last output tensor factor picks up a phase e^{2\pi i\,0.j_1 j_2 \cdots j_n}, depending on all the input bits. That ordering is why the circuit will need a layer of SWAPs at the end — the output naturally comes out in reverse qubit order.

The product form for $n = 4$. Each output qubit is a one-qubit superposition; the phase on the $l$-th factor uses a binary fraction whose digits are a tail of $j$'s bits. Output qubit 1 (leftmost tensor factor) uses only $j_n$; output qubit $n$ uses all of $j_1 j_2 \cdots j_n$.

From the product form to a circuit

Now the circuit. The product form tells you what the output should look like; the circuit-design task is: starting from the input |j\rangle = |j_1\rangle|j_2\rangle\cdots|j_n\rangle on the computational basis, which gates produce each of those tensor factors in turn?

Focus on the first output qubit — the leftmost tensor factor, with phase e^{2\pi i\,0.j_n}. Its phase depends only on the single bit j_n. But wait — you want to build this factor on wire 1, whose input state is |j_1\rangle, not |j_n\rangle. The answer is to build the factor on wire 1 using information from the other wires, via controlled gates, and then fix the wire labelling at the end with SWAPs. Hold that thought.

Alternative cleaner organisation: build the l-th tensor factor on wire l using input |j_l\rangle as the starting computational state on that wire, and the other bits j_{l+1}, \ldots, j_n as controls for extra phases. That is exactly what the standard QFT circuit does. The output ends up in reverse order — the first qubit holds what should be the last factor, and vice versa — and a SWAP layer at the end fixes that. Let's do the construction step by step.

The first wire: Hadamard plus controlled phases

Start with qubit 1 in state |j_1\rangle. Apply a Hadamard:

H|j_1\rangle \;=\; \frac{|0\rangle + e^{2\pi i\,0.j_1}|1\rangle}{\sqrt{2}}.

Why Hadamard produces exactly this phase: H|0\rangle = (|0\rangle + |1\rangle)/\sqrt{2}, which is (|0\rangle + e^{0}|1\rangle)/\sqrt{2}, and 0.j_1 = 0 when j_1 = 0, so the phase is 1. H|1\rangle = (|0\rangle - |1\rangle)/\sqrt{2} = (|0\rangle + e^{i\pi}|1\rangle)/\sqrt{2}, and when j_1 = 1 the fraction 0.j_1 = 1/2, so e^{2\pi i\,0.j_1} = e^{i\pi} = -1. In both cases the formula holds.

So after one Hadamard, wire 1 already carries e^{2\pi i\,0.j_1}|1\rangle in its upper amplitude. But the target tensor factor on wire 1 (after the final SWAP) should carry e^{2\pi i\,0.j_1 j_2 \cdots j_n}|1\rangle — a phase with n binary digits, not 1. The remaining digits j_2, j_3, \ldots, j_n need to be folded in as extra phase contributions.

Each remaining bit j_m (for m = 2, 3, \ldots, n) contributes j_m / 2^m to the fraction 0.j_1 j_2 \cdots j_n. The gate that adds a phase e^{2\pi i / 2^k} to |1\rangle conditioned on another qubit being |1\rangle is the controlled-R_k gate, where

R_k \;=\; \begin{pmatrix} 1 & 0 \\ 0 & e^{2\pi i / 2^k} \end{pmatrix}.

Applying controlled-R_2 from qubit 2 (as control) to qubit 1 (as target) adds e^{2\pi i \cdot j_2/4} to the |1\rangle amplitude on wire 1 (and only when j_2 = 1, which is when the phase is non-trivial). Applying controlled-R_3 from qubit 3 to qubit 1 adds e^{2\pi i \cdot j_3/8}. Continuing down to controlled-R_n from qubit n to qubit 1, you have added all the phase contributions j_2/4, j_3/8, \ldots, j_n/2^n to the |1\rangle component on wire 1.

Why the controlled-R_k gate does this: the matrix R_k multiplies the |1\rangle state by e^{2\pi i/2^k}. When wrapped as a controlled gate, this multiplication happens only when the control qubit is in state |1\rangle. Because the control is in the computational basis state |j_m\rangle, the phase is applied only when j_m = 1 — which is exactly when that bit contributes to the binary fraction.

The total phase accumulated on the |1\rangle amplitude of wire 1 is now

e^{2\pi i(j_1/2 + j_2/4 + j_3/8 + \cdots + j_n/2^n)} \;=\; e^{2\pi i\,0.j_1 j_2 j_3 \cdots j_n}.

And wire 1 is in the state (|0\rangle + e^{2\pi i\,0.j_1 j_2 \cdots j_n}|1\rangle)/\sqrt{2} — exactly the last tensor factor of the product form. One down.

The remaining wires: the same move, shorter each time

Move on to wire 2. Its input is still |j_2\rangle (the earlier controlled-R_k gates did not touch any wire's computational-basis component — they only dropped phases conditioned on the control being |1\rangle, which does not disturb the control's computational-basis state). Apply a Hadamard to wire 2:

H|j_2\rangle \;=\; \frac{|0\rangle + e^{2\pi i\,0.j_2}|1\rangle}{\sqrt{2}}.

Now add the controlled-R_k phases from the bits below wire 2 — that is, from j_3, j_4, \ldots, j_n. Specifically, controlled-R_2 from wire 3 onto wire 2, controlled-R_3 from wire 4 onto wire 2, and so on. After n - 2 controlled-phase gates, wire 2 carries the state (|0\rangle + e^{2\pi i\,0.j_2 j_3 \cdots j_n}|1\rangle)/\sqrt{2} — the second-to-last tensor factor of the product form.

Continue this pattern. Wire 3 gets a Hadamard followed by n - 3 controlled-phases (from wires 4, 5, \ldots, n). Wire n gets just a single Hadamard and no controlled-phases — because there are no qubits below it to control phases from.

After wire n has been processed, every tensor factor of the product form has been built on some wire — but the wire on which each factor lives is the reverse of what the product form says. Wire 1 ended up holding (|0\rangle + e^{2\pi i\,0.j_1 j_2 \cdots j_n}|1\rangle)/\sqrt{2}, which is the last factor in the tensor product; wire n ended up holding (|0\rangle + e^{2\pi i\,0.j_n}|1\rangle)/\sqrt{2}, which is the first.

The final layer: SWAPs to reverse the order

Fix the wire labelling with a layer of SWAP gates that reverses the qubit order. Swap wire 1 with wire n, wire 2 with wire n-1, wire 3 with wire n-2, and so on. If n is even, you perform n/2 SWAPs; if n is odd, you perform (n-1)/2 SWAPs and the middle wire stays put. Either way the number is \lfloor n/2 \rfloor.

After the SWAP layer, the output matches the product form line by line: wire l carries exactly the l-th tensor factor (|0\rangle + e^{2\pi i\,0.j_l j_{l+1} \cdots j_n}|1\rangle)/\sqrt{2}. The QFT has been performed.

Example 1: the 2-qubit QFT

Write the circuit explicitly for n = 2. The input is |j\rangle = |j_1 j_2\rangle on two wires.

Example 1: 2-qubit QFT, end to end

Step 1 — Hadamard on wire 1. The state |j_1\rangle becomes (|0\rangle + e^{2\pi i\,0.j_1}|1\rangle)/\sqrt{2}. Wire 2 is still |j_2\rangle. The joint state is

\frac{|0\rangle + e^{2\pi i\,0.j_1}|1\rangle}{\sqrt{2}} \otimes |j_2\rangle.

Why Hadamard on wire 1 works here: it converts the computational-basis bit j_1 into a one-qubit superposition with a phase e^{i\pi j_1} on the |1\rangle component. That phase equals e^{2\pi i\,0.j_1} because 0.j_1 = j_1/2.

Step 2 — controlled-R_2 from wire 2 onto wire 1. The controlled-R_2 gate adds the phase e^{2\pi i/4} = i to the |1\rangle amplitude on wire 1, conditioned on wire 2 being |1\rangle. Wire 2 is currently in the computational basis state |j_2\rangle, so the phase is applied exactly when j_2 = 1 — i.e., the phase becomes e^{2\pi i\,j_2/4}. Combining with the existing phase on wire 1:

\text{wire 1} \;=\; \frac{|0\rangle + e^{2\pi i(j_1/2 + j_2/4)}|1\rangle}{\sqrt{2}} \;=\; \frac{|0\rangle + e^{2\pi i\,0.j_1 j_2}|1\rangle}{\sqrt{2}}.

Wire 2 is untouched, still |j_2\rangle.

Why adding the two phases in the exponent works: e^{2\pi i\,j_1/2} \cdot e^{2\pi i\,j_2/4} = e^{2\pi i(j_1/2 + j_2/4)}. The sum in the exponent is exactly the binary fraction 0.j_1 j_2 = j_1/2 + j_2/4.

Step 3 — Hadamard on wire 2. Now process wire 2 the same way. Its current state is |j_2\rangle. Apply H:

\text{wire 2} \;=\; \frac{|0\rangle + e^{2\pi i\,0.j_2}|1\rangle}{\sqrt{2}}.

No controlled-phases are needed on wire 2 for n = 2 — there are no wires below it to serve as controls.

Step 4 — SWAP wire 1 and wire 2. The current joint state is

\frac{|0\rangle + e^{2\pi i\,0.j_1 j_2}|1\rangle}{\sqrt{2}} \otimes \frac{|0\rangle + e^{2\pi i\,0.j_2}|1\rangle}{\sqrt{2}}.

The product form says the l=1 factor should be (|0\rangle + e^{2\pi i\,0.j_2}|1\rangle)/\sqrt{2} and the l=2 factor should be (|0\rangle + e^{2\pi i\,0.j_1 j_2}|1\rangle)/\sqrt{2}. The pre-SWAP state has them reversed. A single SWAP between wires 1 and 2 fixes the ordering:

\text{QFT}\,|j_1 j_2\rangle \;=\; \frac{|0\rangle + e^{2\pi i\,0.j_2}|1\rangle}{\sqrt{2}} \otimes \frac{|0\rangle + e^{2\pi i\,0.j_1 j_2}|1\rangle}{\sqrt{2}}.

Result. The 2-qubit QFT circuit: H, C-R_2, H, SWAP. That is 3 single-qubit or two-qubit gates — not counting the SWAP as a primitive — plus the SWAP itself. The gate count matches the general formula: n = 2 gives n(n-1)/2 = 1 controlled-phase, n = 2 Hadamards, and \lfloor n/2\rfloor = 1 SWAP. Total: 4 gates.

Verify against the matrix. The 4 \times 4 QFT matrix for N = 4 is F_{jk} = e^{2\pi i\,jk/4}/2. Writing out a few entries: F_{00} = 1/2, F_{01} = 1/2, F_{11} = e^{i\pi/2}/2 = i/2, F_{22} = e^{i\pi}/2 = -1/2, F_{33} = e^{i 3\pi/2}/2 = -i/2. Compute the circuit's action on |j\rangle = |01\rangle (so j = 1, i.e. j_1 = 0, j_2 = 1):

Product form: (|0\rangle + e^{2\pi i\,0.1}|1\rangle)/\sqrt{2} \otimes (|0\rangle + e^{2\pi i\,0.01}|1\rangle)/\sqrt{2} = (|0\rangle + e^{i\pi}|1\rangle)/\sqrt{2} \otimes (|0\rangle + e^{i\pi/2}|1\rangle)/\sqrt{2} = (|0\rangle - |1\rangle)/\sqrt{2} \otimes (|0\rangle + i|1\rangle)/\sqrt{2}.
Expand: \tfrac{1}{2}(|00\rangle + i|01\rangle - |10\rangle - i|11\rangle).
Matrix: apply F to (0, 1, 0, 0)^T, giving the column (1/2, i/2, -1/2, -i/2)^T, which is the same state. Matches.

The complete 2-qubit QFT. Four gates: Hadamard on wire 1, controlled-$R_2$ from wire 2 to wire 1, Hadamard on wire 2, and a SWAP to fix the output ordering.

What this shows. On two qubits the full QFT takes exactly 4 gates, and the product-form derivation tracks each one. The matrix the circuit computes is the 4 \times 4 Fourier matrix, with the only wrinkle being the output order reversal — which the SWAP layer repairs.

Example 2: the 3-qubit QFT

Example 2: 3-qubit QFT, every gate labelled

Now build the circuit for n = 3. The input is |j_1 j_2 j_3\rangle; the output should be (|0\rangle + e^{2\pi i\,0.j_3}|1\rangle)/\sqrt{2} \otimes (|0\rangle + e^{2\pi i\,0.j_2 j_3}|1\rangle)/\sqrt{2} \otimes (|0\rangle + e^{2\pi i\,0.j_1 j_2 j_3}|1\rangle)/\sqrt{2}.

Gate 1 — Hadamard on wire 1. Wire 1 becomes (|0\rangle + e^{2\pi i\,0.j_1}|1\rangle)/\sqrt{2}.

Gate 2 — controlled-R_2 from wire 2 to wire 1. Adds phase e^{2\pi i\,j_2/4} to |1\rangle on wire 1. After this gate, wire 1 carries (|0\rangle + e^{2\pi i\,0.j_1 j_2}|1\rangle)/\sqrt{2}.

Gate 3 — controlled-R_3 from wire 3 to wire 1. Adds phase e^{2\pi i\,j_3/8} to |1\rangle on wire 1. After this gate, wire 1 carries (|0\rangle + e^{2\pi i\,0.j_1 j_2 j_3}|1\rangle)/\sqrt{2} — its final form before the SWAP.

Gate 4 — Hadamard on wire 2. Wire 2 becomes (|0\rangle + e^{2\pi i\,0.j_2}|1\rangle)/\sqrt{2}.

Gate 5 — controlled-R_2 from wire 3 to wire 2. Adds phase e^{2\pi i\,j_3/4} to |1\rangle on wire 2. After this gate, wire 2 carries (|0\rangle + e^{2\pi i\,0.j_2 j_3}|1\rangle)/\sqrt{2}.

Gate 6 — Hadamard on wire 3. Wire 3 becomes (|0\rangle + e^{2\pi i\,0.j_3}|1\rangle)/\sqrt{2}.

Gate 7 — SWAP wire 1 with wire 3. Wire 1's content (which is the l = 3 tensor factor) moves to wire 3, and wire 3's content (the l = 1 factor) moves to wire 1. Wire 2 is left alone — it is the middle, and it already holds the correct l = 2 factor.

Why only one SWAP is needed here: for n = 3 the number of SWAPs is \lfloor 3/2 \rfloor = 1. The middle wire is its own image under the order-reversal, so it requires no action. In general, each SWAP of wires l and n+1-l handles the two ends of the ordering at once.

Gate count. Three Hadamards, three controlled-phases, one SWAP — 7 gates total. This matches the formula: n(n-1)/2 = 3 controlled-phases, n = 3 Hadamards, \lfloor n/2 \rfloor = 1 SWAP.

The 3-qubit QFT circuit. Seven gates in total: three Hadamards, three controlled-phase rotations, and one SWAP. Each controlled-phase contributes a binary digit of the output phase to the wire currently being built.

What this shows. The pattern is visibly regular — each wire gets n - l + 1 operations (1 Hadamard plus n - l controlled-phases), and the wires below it contribute successively smaller phase chunks R_2, R_3, \ldots. Longer circuits for n = 4, 5, \ldots follow the same template. The regularity is why it is easy to code up the QFT as a loop in any quantum SDK.

The total gate count

The gate budget for the n-qubit QFT is easy to total. On wire l, the circuit performs one Hadamard and n - l controlled-phase gates. Summing over l = 1, 2, \ldots, n:

(\text{Hadamards}) + (\text{controlled-phases}) \;=\; n + \sum_{l=1}^{n}(n - l) \;=\; n + \frac{n(n-1)}{2}.

Add \lfloor n/2 \rfloor SWAPs at the end:

\text{total} \;=\; n + \frac{n(n-1)}{2} + \left\lfloor \frac{n}{2} \right\rfloor.

For n = 10: 10 + 45 + 5 = 60. For n = 30: 30 + 435 + 15 = 480. For n = 100: 100 + 4950 + 50 = 5100. The growth is O(n^2) — dominated by the controlled-phase triangle.

Both algorithms share a staircase structure that halves the problem at each step — the FFT butterfly on the left, and the triangular cascade of QFT controlled-phase gates on the right.

The FFT butterfly analogy

If you have seen the classical fast Fourier transform, the QFT circuit may look familiar in a peculiar way. The FFT is a clever algorithm that computes the classical DFT of N numbers in O(N \log N) arithmetic operations (instead of the naive O(N^2)). Its central move is the butterfly: pair up indices k and k + N/2, compute their sum and difference with a twiddle factor, and recurse on each half.

The QFT circuit has the same staircase shape. On wire l, a Hadamard plays the role of the butterfly's "sum-and-difference" combination — it takes a computational-basis input and produces an equal superposition. The controlled-phase gates play the role of the twiddle factors — they mix in phases that depend on the remaining bits of the input. The SWAP layer at the end of the QFT plays the role of the "bit-reversal permutation" at the end of most FFT implementations — both rearrange the output into natural index order.

The deep connection is that the QFT literally is a Fourier transform over the same mathematical object the FFT transforms over — the cyclic group \mathbb{Z}_N. The difference is that the classical FFT writes down N classical amplitudes, manipulates them one number at a time, and outputs N classical amplitudes. The QFT keeps the amplitudes implicit in the wavefunction and manipulates them in parallel using n qubits. The gate count is O(n^2) = O((\log N)^2) for the quantum version versus O(N \log N) for the classical FFT — a qualitative separation that only makes sense because of how differently the two representations store data. That comparison is the subject of the next chapter.

The approximate QFT — dropping tiny phases

For large n there is an important optimisation. Look at the controlled-R_k phases in the circuit: as k grows, the phase e^{2\pi i/2^k} becomes exponentially close to 1. By the time k reaches \log n + c (for any fixed constant c), the phase is so close to the identity that ignoring the gate entirely introduces negligible error.

The approximate QFT, introduced by Don Coppersmith in 1994, drops every controlled-R_k with k greater than some threshold m. The remaining circuit has roughly n \log n controlled-phase gates instead of n(n-1)/2. For n = 1000 qubits, that is a reduction from half a million gates to a few thousand — a factor of 100 or so in gate count — at a cost of 1/\text{poly}(n) fidelity loss.

For applications such as Shor's factoring algorithm, the approximate QFT is good enough. Shor's algorithm only cares about the large peaks of the output distribution, and Coppersmith's analysis shows that those peaks are preserved under the approximation. The exact QFT is used in this chapter because it is the cleaner object; the approximate QFT is what you run on real hardware for large circuits. You will meet the approximate QFT in more detail in its own chapter.

Common confusions

"The final SWAP layer is optional." No. Without the SWAPs, the output is still a valid quantum state — but the tensor factors sit on the "wrong" wires. Any algorithm using the QFT as a subroutine (like Shor's or phase estimation) relies on the specific correspondence between bit positions and phase fractions, so the SWAP layer is part of the transform, not a cosmetic flourish. The only time you skip SWAPs is when the next operation is also a QFT-shaped thing that cancels the bit reversal — but that is a compiler-level optimisation, not a modification of the QFT definition.
"The QFT is just n Hadamards." No. Applying only Hadamards gives you H^{\otimes n}, which is the QFT of the group \mathbb{Z}_2^n (with bitwise-XOR group operation). The QFT as typically meant in Shor's algorithm is the QFT of \mathbb{Z}_{2^n} (with modular-addition group operation), which is a genuinely different transform — its matrix entries are e^{2\pi i\,jk/2^n}, not (-1)^{j \cdot k}. The two agree only for n = 1 (where \mathbb{Z}_2 = \mathbb{Z}_2^1). The controlled-phase gates in the QFT circuit are exactly what makes it the "mod 2^n" transform.
"Coppersmith's approximation is a hack." It is not. It is a rigorous theorem: dropping phases with k > \log n + O(1) changes the QFT's output on any input by at most 1/\text{poly}(n) in \ell^2 norm. For phase-estimation applications the error propagates gently, and the success probability of the overall algorithm is essentially unchanged. The approximate QFT is what Shor's algorithm actually uses in practice.
"The QFT and the inverse QFT have completely different circuits." No. The inverse QFT is built by running the QFT circuit backwards and conjugating each gate. Hadamard is its own inverse. The inverse of controlled-R_k is controlled-R_k^{-1} = \text{controlled-diag}(1, e^{-2\pi i/2^k}) — same shape, opposite sign on the phase. SWAP is its own inverse. So the inverse QFT circuit has the same shape as the QFT, traversed in reverse order, with the controlled-phase angles negated. For phase-estimation circuits, what you often run is an inverse QFT rather than a forward QFT.
"Controlled-R_k is a 'different' gate from controlled-phase." The controlled-R_k gates used in this chapter are a special case of the controlled-phase family from the controlled-U chapter, with \phi = 2\pi/2^k. Whenever you see controlled-R_k written in a QFT context, read it as "controlled-phase with the specific angle 2\pi/2^k." Some textbooks use the notation R_k for the matrix \text{diag}(1, e^{2\pi i/2^k}); others use R_k for a rotation of angle \pi/2^{k-1} about the z-axis (which differs by a global phase). The QFT circuit works out the same either way.

Going deeper

You now have the full QFT circuit — its derivation, its gate list, its cost. The advanced material below covers the recursive view that makes the circuit shape obvious, the formal statement and proof of the approximate QFT's error bound, how the QFT generalises to non-power-of-2 dimensions, and the relationship between QFT and phase estimation at the circuit level.

The recursive QFT decomposition

The QFT circuit has a recursive structure that you can read off the product-form derivation. Write \text{QFT}_n for the QFT on n qubits. Then

\text{QFT}_n \;=\; (\text{QFT}_{n-1} \otimes I) \cdot (\text{controlled-phases linking qubit 1 to qubits 2..n}) \cdot (I \otimes H) \cdot (\text{SWAP cascade})

or, said more plainly, \text{QFT}_n on the top n qubits splits into a block that runs \text{QFT}_{n-1} on the bottom n - 1 qubits plus a small collection of additional gates that involve qubit 1. This is the quantum analogue of the FFT's divide-and-conquer structure.

Unrolling the recursion gives the iterative gate sequence from this chapter. Setting up the recursion formally (and proving it equals the matrix F from first principles) is a standard exercise in any quantum-algorithms text — Nielsen & Chuang §5.1.3 works through it carefully.

The formal error bound for the approximate QFT

The approximate QFT of threshold m is the circuit with every controlled-R_k for k > m removed. Let |\psi_{\text{exact}}\rangle = \text{QFT}|\phi\rangle and |\psi_{\text{approx}}\rangle = \text{AQFT}_m|\phi\rangle for any input |\phi\rangle. Coppersmith's 1994 bound says

\|\psi_{\text{exact}} - \psi_{\text{approx}}\| \;\leq\; \frac{n(n-1)}{2} \cdot \frac{2\pi}{2^m}.

Setting m = \lceil \log_2 n + \log_2(1/\epsilon) + 2 \rceil guarantees the error is at most \epsilon. For any polynomially-small \epsilon, this gives m = O(\log n) and a total gate count of O(n \log n). That is a genuine asymptotic improvement over the exact QFT's O(n^2).

The bound is obtained by noting that each dropped gate differs from the identity by a phase of magnitude at most 2\pi/2^m, and there are at most n(n-1)/2 such dropped gates. A union bound over gates gives the total displacement.

QFT on arbitrary N — not only powers of 2

The chapter has assumed N = 2^n. For general N the QFT is still defined by the same formula F_{jk} = e^{2\pi i\,jk/N}/\sqrt{N}, but there is no n-qubit quantum register that exactly encodes \mathbb{Z}_N unless N is a power of 2.

The standard workaround is to embed \mathbb{Z}_N inside \mathbb{Z}_{2^n} where 2^n \geq N, run the n-qubit QFT circuit, and deal with the mismatch at the analysis level. Kitaev's 1995 construction (used in Shor's algorithm) gives an efficient circuit that approximates \text{QFT}_N to arbitrary accuracy using O(n \log n) gates on an n-qubit register. The details — involving a clever reduction to phase estimation of the shift operator on \mathbb{Z}_N — are beyond this chapter but appear in Kitaev-Shen-Vyalyi or in Preskill's notes.

Inverse QFT — one flip of every angle

The inverse QFT \text{QFT}^\dagger is the transform that takes \text{QFT}|\phi\rangle back to |\phi\rangle. Because the QFT is unitary, \text{QFT}^\dagger is just the conjugate transpose of the QFT matrix, with entries e^{-2\pi i\,jk/N}/\sqrt{N}.

At the circuit level: reverse the order of the gates, replace each controlled-R_k with controlled-R_k^{-1} (i.e. controlled-\text{diag}(1, e^{-2\pi i/2^k})), and leave the Hadamards and SWAPs alone (they are self-inverse). The circuit has the same gate count — n Hadamards, n(n-1)/2 controlled-phases, \lfloor n/2 \rfloor SWAPs. In phase-estimation circuits (and in the final stages of Shor's factoring algorithm), the gate you actually apply is \text{QFT}^\dagger, not \text{QFT} — because the "Fourier domain" is where the answer sits and you want to decode it into the computational basis for measurement.

Compilation context — running the QFT on real hardware

On real superconducting hardware today, the QFT circuit for n qubits is one of the standard benchmarks. TIFR Mumbai's experimental group and the IIT Bombay quantum computing lab both run small QFTs (up to n = 5 or so) as part of verifying two-qubit gate fidelities. The bottleneck is almost always the controlled-phase gates: each one has to be compiled into a sequence of CNOTs plus single-qubit rotations, and the CNOT error rate (roughly 10^{-3} on good hardware in 2025) means that an n = 10 QFT with \sim 45 controlled-phases accumulates enough error to wash out the output. Coppersmith's approximation, combined with high-fidelity two-qubit gates, is what makes a useful QFT tractable on near-term machines.

Where this leads next

QFT defined — the matrix-level definition and the interpretation as a change of basis, which this chapter builds into a circuit.
Why QFT is efficient — the comparison with classical FFT, and the subtlety about what the O(n^2) gate count does and does not buy you.
Approximate QFT — Coppersmith's O(n \log n) circuit, the error bound, and when the approximation is safe for downstream algorithms.
Quantum phase estimation — the algorithm that uses an inverse QFT to read out an eigenvalue, and the main application of the QFT outside of Shor's factoring.
Controlled-U gates — the parent family of the controlled-R_k gates used throughout the QFT circuit.

References

Nielsen and Chuang, Quantum Computation and Quantum Information (2010), §5.1 (quantum Fourier transform and its circuit) — Cambridge University Press.
John Preskill, Lecture Notes on Quantum Computation, Ch. 6 (QFT circuit, inverse QFT, phase estimation) — theory.caltech.edu/~preskill/ph229.
Don Coppersmith, An approximate Fourier transform useful in quantum factoring (1994) — arXiv:quant-ph/0201067.
Wikipedia, Quantum Fourier transform — circuit diagram, matrix form, and comparisons with the classical FFT.
Qiskit Textbook, Quantum Fourier Transform — runnable 3-qubit and 4-qubit QFT circuits with SWAP layers.
Wikipedia, Cooley-Tukey FFT algorithm — the classical FFT butterfly, for comparison with the QFT circuit shape.