Order Finding — The Quantum Idea

Note: Company names, engineers, incidents, numbers, and scaling scenarios in this article are hypothetical — even when they resemble real ones. See the full disclaimer.

In short

Order finding is the problem: given a positive integer N and an a coprime to N, find the smallest r \geq 1 with a^r \equiv 1 \pmod N. Classically this is as hard as factoring. Quantumly, it is a direct application of phase estimation.

Define the modular-multiplication operator U|y\rangle = |ay \bmod N\rangle on the n-qubit register that indexes integers 0, 1, \ldots, N-1. U is unitary because it permutes basis states. Its eigenstates are

|u_s\rangle \;=\; \frac{1}{\sqrt{r}}\sum_{k=0}^{r-1} e^{-2\pi i\, s k / r}\,|a^k \bmod N\rangle,\qquad s \in \{0, 1, \ldots, r-1\},

with eigenvalues e^{2\pi i\, s/r}. Quantum phase estimation on U with target |1\rangle — which is a uniform superposition \tfrac{1}{\sqrt r}\sum_s |u_s\rangle of all r eigenstates — returns \widetilde\varphi \approx s/r for a random s. Feed the result into the classical continued-fractions algorithm, and out pops r. This is the quantum subroutine of Shor's factoring algorithm.

You have seen quantum phase estimation in ch. 70-72. Feed it a unitary U and an eigenstate |u\rangle with U|u\rangle = e^{2\pi i \varphi}|u\rangle, and it spits out a binary approximation of \varphi. It is a universal machine for reading eigenvalue phases.

Order finding uses that machine on a very specific, very structured unitary: multiplication by a modulo N. The phase that comes out of phase estimation is s/r — a fraction whose denominator is exactly the number you are trying to find. One classical post-processing step (continued fractions) recovers r from s/r. That is the whole quantum idea.

This chapter builds the idea from the inside. You will see why modular multiplication is unitary, what its eigenstates actually look like, why the state |1\rangle is the magic target register that lets the algorithm work even though you do not know r (and therefore could not prepare any single eigenstate), how phase estimation samples a random s/r, and how continued fractions turns that sample into r. The next chapter (ch. 77) wires this subroutine into the full Shor pipeline — but first you need to understand what is happening inside the quantum black box.

What "order" means, and why you want it

Take N = 15 and a = 7. Compute the powers:

7^1 = 7, \quad 7^2 = 49 = 4, \quad 7^3 = 28 = 13, \quad 7^4 = 91 = 1 \pmod{15}.

After four steps the sequence returns to 1. The order of 7 modulo 15 is r = 4. From here on the powers cycle: 7^5 = 7, 7^6 = 4, and so on.

The order r is the length of the orbit \{a^0, a^1, \ldots, a^{r-1}\} = \{1, a, a^2, \ldots, a^{r-1}\} of a under multiplication mod N. By Lagrange's theorem, r divides the size of (\mathbb{Z}/N)^\times, so r is bounded above, but finding it for an n-bit N with classical methods takes exponential time — no algorithm is known that does better than factoring N itself.

From the previous chapter (ch. 75), you know why this matters: if you can find the order r of a random a modulo N, you can factor N in polynomial time. The quantum subroutine for order finding is therefore the quantum subroutine for factoring.

The orbit of $a = 7$ modulo $N = 15$. Multiplication by $7$ rotates the residues $1 \to 7 \to 4 \to 13 \to 1$ around a cycle of length $4$. The order of $a$ is the length of this cycle — the smallest power for which $a^r \equiv 1$. Finding $r$ quickly for large $N$ is what the quantum algorithm does.

The unitary — multiplication by a

Define an operator U_a on the Hilbert space spanned by the basis states |0\rangle, |1\rangle, \ldots, |N-1\rangle by

U_a\,|y\rangle \;=\; |ay \bmod N\rangle.

This is a linear operator (extend by linearity to superpositions). To use phase estimation you need to know two things: that U_a is unitary, and what its eigenvalues and eigenstates look like.

U_a is unitary. Consider its action on the subset \{1, 2, \ldots, N-1\} that is coprime to N. Multiplication by a is a bijection of this set onto itself — its inverse is multiplication by a^{-1} \bmod N, which exists because \gcd(a, N) = 1. A linear operator that maps a basis to a permutation of itself is unitary: it takes orthonormal basis vectors to orthonormal basis vectors, and it preserves inner products. So U_a is unitary on the coprime-to-N subspace.

Why only on the coprime subspace: if y shares a factor with N, the orbit y, ay, a^2 y, \ldots also shares that factor, so U_a permutes the coprime elements among themselves. The non-coprime elements form other orbits. For Shor's algorithm, you only ever need U_a to act on the coprime-to-N subspace; the standard construction of the quantum circuit for U_a leaves the non-coprime basis states alone.

The orbit of 1. Starting from |1\rangle, apply U_a repeatedly: |1\rangle \to |a\rangle \to |a^2\rangle \to \cdots \to |a^{r-1}\rangle \to |1\rangle. The orbit has exactly r elements — the order of a. These r basis states |1\rangle, |a\rangle, |a^2\rangle, \ldots, |a^{r-1}\rangle form an r-dimensional invariant subspace of U_a: they are closed under the action of U_a (and under U_a^{-1}, which is multiplication by a^{-1}).

On this r-dimensional subspace, U_a acts like a cyclic-shift operator: each basis vector shifts to the next, and the last wraps around to the first. That is the structure you are going to diagonalise.

The eigenstates — Fourier combinations over the orbit

A cyclic-shift operator on r states is diagonalised by the discrete Fourier basis on \mathbb{Z}/r. Explicitly: for each s \in \{0, 1, \ldots, r-1\}, define

|u_s\rangle \;=\; \frac{1}{\sqrt{r}}\sum_{k=0}^{r-1} e^{-2\pi i\, s k / r}\,|a^k \bmod N\rangle.

These r states are eigenstates of U_a, and the eigenvalue of |u_s\rangle is e^{2\pi i\, s/r}.

Deriving the eigenvalue equation. Apply U_a to |u_s\rangle and see what comes out:

U_a\,|u_s\rangle \;=\; \frac{1}{\sqrt{r}}\sum_{k=0}^{r-1} e^{-2\pi i\, s k / r}\,U_a\,|a^k \bmod N\rangle.

Why pull U_a inside the sum: U_a is linear, so it distributes over the superposition. The coefficients e^{-2\pi i s k/r} are just numbers, untouched by the operator.

Now U_a |a^k \bmod N\rangle = |a^{k+1} \bmod N\rangle — multiplying by a shifts the exponent. Substitute:

U_a\,|u_s\rangle \;=\; \frac{1}{\sqrt{r}}\sum_{k=0}^{r-1} e^{-2\pi i\, s k / r}\,|a^{k+1} \bmod N\rangle.

Re-index the sum. Let j = k + 1; as k runs from 0 to r-1, j runs from 1 to r. The term j = r has |a^r \bmod N\rangle = |1\rangle = |a^0\rangle (because r is the order), so it corresponds to j = 0 wrapped around. Rewriting:

U_a\,|u_s\rangle \;=\; \frac{1}{\sqrt{r}}\sum_{j=0}^{r-1} e^{-2\pi i\, s (j-1) / r}\,|a^j \bmod N\rangle.

Why re-indexing works cleanly: the orbit is a cycle of length r, so shifting the index by 1 and wrapping at r is the same as shifting the sum variable. The exponential at j = 0 is e^{-2\pi i s(-1)/r} = e^{2\pi i s/r}, exactly what j = r would have been had the index not wrapped.

Pull out the common factor e^{2\pi i s / r}:

U_a\,|u_s\rangle \;=\; e^{2\pi i\, s/r}\,\cdot\,\frac{1}{\sqrt{r}}\sum_{j=0}^{r-1} e^{-2\pi i\, s j / r}\,|a^j \bmod N\rangle \;=\; e^{2\pi i\, s/r}\,|u_s\rangle.

That is the eigenvalue equation. |u_s\rangle is an eigenstate of U_a with eigenvalue e^{2\pi i\, s/r}. One derivation, r eigenstates — one for each s = 0, 1, \ldots, r-1.

The four eigenstates of $U_a$ for $a = 7$, $N = 15$, $r = 4$. Each $|u_s\rangle$ is an equal-weight superposition of the four orbit elements $|1\rangle, |7\rangle, |4\rangle, |13\rangle$, but with phases $e^{-2\pi i s k/4}$ that rotate a full turn $s$ times as $k$ cycles through $0, 1, 2, 3$. The eigenvalue $e^{2\pi i s/r}$ comes out of the winding number.

The catch, and the trick — |1\rangle is a uniform superposition of eigenstates

To run phase estimation on U_a, you need an eigenstate |u_s\rangle loaded into the target register. But look at |u_s\rangle:

|u_s\rangle \;=\; \frac{1}{\sqrt{r}}\sum_{k=0}^{r-1} e^{-2\pi i\, s k / r}\,|a^k \bmod N\rangle.

Preparing this requires knowing r and s. But r is the number you are trying to find! You cannot prepare any single eigenstate without already having the answer.

Here is where quantum mechanics saves you. Consider the computational basis state |1\rangle. Sum the eigenstates:

\frac{1}{\sqrt{r}}\sum_{s=0}^{r-1}|u_s\rangle \;=\; \frac{1}{\sqrt{r}}\sum_{s=0}^{r-1}\frac{1}{\sqrt{r}}\sum_{k=0}^{r-1} e^{-2\pi i\, s k / r}\,|a^k\rangle \;=\; \frac{1}{r}\sum_{k=0}^{r-1}\Bigl(\sum_{s=0}^{r-1} e^{-2\pi i\, s k / r}\Bigr)\,|a^k\rangle.

Why swap the sums: both sums are finite and over disjoint variables (s and k), so Fubini applies. Pulling the sum over s inside lets you collect the geometric series on s for each fixed k.

The inner sum over s is a geometric series. For k = 0, every term is e^{0} = 1, so the sum is r. For k \neq 0 (and k < r), the sum is \sum_{s=0}^{r-1} \omega^s where \omega = e^{-2\pi i k/r} is a non-trivial r-th root of unity; by the geometric-series identity, this equals (1 - \omega^r)/(1 - \omega) = 0. So the sum collapses:

\frac{1}{\sqrt{r}}\sum_{s=0}^{r-1}|u_s\rangle \;=\; \frac{1}{r}\cdot r\,|a^0\rangle \;=\; |1\rangle.

Equivalently,

|1\rangle \;=\; \frac{1}{\sqrt{r}}\sum_{s=0}^{r-1}|u_s\rangle.

The computational-basis state |1\rangle is a uniform superposition of all r eigenstates of U_a. You do not need to know r to prepare it — you just load the number 1 into the register. The quantum magic is that when you run phase estimation on U_a with target |1\rangle, linearity carries every eigenstate component through the circuit in parallel, and the measurement samples one eigenvalue at random.

Why this is a breakthrough: the superposition trick completely evades the "I don't know r so I can't prepare an eigenstate" problem. The state |1\rangle is the simplest basis state to prepare, and it secretly contains all r eigenstates with equal amplitude. Phase estimation does not mind — it processes the superposition linearly and picks out one eigenvalue per run.

The state $|1\rangle$ is secretly a uniform superposition of all $r$ eigenstates of $U_a$. Running phase estimation with $|1\rangle$ as the target samples one eigenvalue $e^{2\pi i s/r}$ per run, with $s$ drawn uniformly from $\{0, 1, \ldots, r-1\}$. You never had to know $r$ to start.

The algorithm — phase estimation plus continued fractions

Putting the pieces together, the full order-finding algorithm looks like this.

Input. A composite N and an a \in \{2, 3, \ldots, N-1\} with \gcd(a, N) = 1.

Output. The order r of a mod N.

Algorithm.

Choose ancilla width t. Use t = 2n + 1 ancilla qubits where n = \lceil \log_2 N \rceil. This ensures enough precision that continued fractions will recover r with high probability.
Prepare registers. t ancillas in |0\rangle^{\otimes t}, and an n-qubit target register in |1\rangle.
Run phase estimation on U_a. Apply Hadamards to the ancillas, then the ladder of controlled-U_a^{2^k} gates, then inverse QFT, then measure the ancillas. Because |1\rangle is a uniform sum of eigenstates |u_s\rangle, the measurement yields a t-bit integer y with

\widetilde\varphi \;=\; \frac{y}{2^t} \;\approx\; \frac{s}{r}

for some s \in \{0, 1, \ldots, r-1\} drawn uniformly at random. The approximation is accurate to within 1/2^{t+1} (by the QPE accuracy bound). 4. Apply continued fractions. Treat \widetilde\varphi = y/2^t as a rational number and compute its continued-fraction expansion. The convergents are successive rational approximations in lowest terms. One of them will be s/r with r < N; stop when you find a convergent whose denominator r' satisfies |y/2^t - s'/r'| < 1/(2 N^2). 5. Verify. Compute a^{r'} \bmod N. If it is 1, output r'; otherwise, return to step 2 and try again.

Each run produces a candidate r'. The probability that continued fractions succeeds on a single run is \Omega(1/\log\log r) \geq \Omega(1/\log\log N); repeat O(\log\log N) times to find r with probability arbitrarily close to 1.

Most of the total work happens inside phase estimation — specifically, inside the controlled-U_a^{2^k} gates. The classical post-processing (continued fractions) is essentially free: it runs in O(n^3) time on the measurement outcome.

Worked examples

Example 1: the eigenstate for $N = 15$, $a = 7$, $s = 0$

Continue with N = 15, a = 7, r = 4. Write down the s = 0 eigenstate:

|u_0\rangle \;=\; \frac{1}{\sqrt 4}\sum_{k=0}^{3} e^{-2\pi i \cdot 0 \cdot k/4}\,|7^k \bmod 15\rangle \;=\; \frac{1}{2}\bigl(|1\rangle + |7\rangle + |4\rangle + |13\rangle\bigr).

Every phase factor is e^0 = 1, so |u_0\rangle is the uniform superposition over the orbit with no phases. Verify it is an eigenstate with eigenvalue e^{0} = 1.

Step 1. Apply U_a to each basis component individually.

Why line-by-line: the operator acts linearly, so you can process each basis state separately and add the results. Each application multiplies the index by 7 and reduces mod 15.

Step 2. Sum the images with the |u_0\rangle coefficients (all 1/2):

U_7|u_0\rangle \;=\; \frac{1}{2}\bigl(U_7|1\rangle + U_7|7\rangle + U_7|4\rangle + U_7|13\rangle\bigr) \;=\; \frac{1}{2}\bigl(|7\rangle + |4\rangle + |13\rangle + |1\rangle\bigr).

Why the sum is identical to |u_0\rangle: U_7 permutes the four orbit elements, so the sum of the images is just a reordering of the original sum. Equal weights mean the reordering does not change the state.

Step 3. Compare to |u_0\rangle:

U_7|u_0\rangle \;=\; \frac{1}{2}\bigl(|1\rangle + |7\rangle + |4\rangle + |13\rangle\bigr) \;=\; |u_0\rangle.

Result. U_7|u_0\rangle = 1\cdot|u_0\rangle. The eigenvalue is 1 = e^{2\pi i \cdot 0/4}, matching s = 0.

Applying $U_a$ to $|u_0\rangle$ permutes the four orbit elements — $1 \to 7 \to 4 \to 13 \to 1$ — but because all coefficients are equal, the sum is unchanged. The eigenvalue is $1$, exactly $e^{2\pi i \cdot 0/4}$ as predicted.

What this shows. The s = 0 eigenstate is the simplest — flat amplitudes, eigenvalue exactly 1. It is invariant under U_a because permuting equal weights leaves the sum alone. The other eigenstates (s = 1, 2, 3) have non-trivial phases and pick up non-trivial eigenvalues i, -1, -i for exactly the same reason, rearranging a few signs.

Example 2: extracting $r = 4$ from a measurement, via continued fractions

Suppose you have run phase estimation on U_7 (with N = 15, r = 4) using t = 8 ancilla qubits, and the measurement outcome is y = 192. Compute \widetilde\varphi and recover r.

Step 1. Compute the phase estimate.

\widetilde\varphi \;=\; \frac{y}{2^t} \;=\; \frac{192}{256} \;=\; \frac{3}{4}.

Why y/2^t is the right fraction: the measurement gives an integer y between 0 and 2^t - 1; dividing by 2^t converts it to a binary fraction in [0, 1), which is exactly the format of the phase \varphi = s/r that phase estimation is trying to read out.

In this idealised example, the answer lands exactly on 3/4. In a real run, t is chosen large enough that y/2^t is a t-bit approximation of s/r; here the approximation happens to be exact because 3/4 is a clean binary fraction.

Step 2. Apply the continued-fractions algorithm to 3/4.

Continued fractions expands a rational number as \widetilde\varphi = a_0 + \cfrac{1}{a_1 + \cfrac{1}{a_2 + \cfrac{1}{\ddots}}}, where a_0 is the integer part and each a_i is the integer part of the next reciprocal. For 3/4:

a_0 = \lfloor 3/4\rfloor = 0. Remainder: 3/4.
Reciprocal of 3/4 is 4/3. a_1 = \lfloor 4/3\rfloor = 1. Remainder: 4/3 - 1 = 1/3.
Reciprocal of 1/3 is 3. a_2 = 3. Remainder: 0.

So 3/4 = [0; 1, 3] = 0 + \cfrac{1}{1 + \cfrac{1}{3}}. Verify: 1 + 1/3 = 4/3, reciprocal 3/4.

Step 3. Read off the convergents.

The convergents are successive truncations: p_0/q_0 = 0/1, p_1/q_1 = 0 + 1/1 = 1/1, p_2/q_2 = 3/4. Why convergents matter: each convergent is the best rational approximation to \widetilde\varphi among fractions with denominator \le q_i. You are looking for the convergent whose denominator equals the true r. The continued-fractions theorem guarantees this works provided |\widetilde\varphi - s/r| < 1/(2r^2), which holds whenever t = 2n + 1 with n = \lceil\log_2 N\rceil.

Step 4. Try each convergent. The last convergent is p_2/q_2 = 3/4, which is already in lowest terms. The candidate r is the denominator r' = 4.

Step 5. Verify. Compute 7^4 \bmod 15 = 2401 \bmod 15 = 1.

Result. r = 4, the true order of 7 mod 15. The phase estimation measurement returned s = 3, one of the three non-zero s values that give a coprime numerator; continued fractions converted 3/4 to the denominator 4 in two steps.

Applying continued fractions to the measured phase $\widetilde\varphi = 3/4$. Three steps of integer-part extraction and reciprocation produce the expansion $[0; 1, 3]$; the final convergent is $3/4$ itself, whose denominator $4$ is the order $r$. Verification: $7^4 \bmod 15 = 1$.

What this shows. Continued fractions is the classical post-processing that turns a phase estimate y/2^t \approx s/r into the integer r. When s and r are coprime (three out of four possible s values for r = 4), the last convergent has denominator exactly r. When they are not coprime — for example s = 2, giving 2/4 = 1/2 with denominator 2 — the algorithm returns a divisor of r, and you need another run. The run-and-verify loop costs O(\log\log N) repetitions on average.

Common confusions

"|1\rangle is itself an eigenstate of U_a." No. Applying U_a to |1\rangle gives |a\rangle, a different computational-basis state. |1\rangle is not fixed by U_a — it is the first element of an orbit of length r. The eigenstates are the Fourier-over-the-orbit superpositions |u_s\rangle. The magic of |1\rangle is that it equals \tfrac{1}{\sqrt r}\sum_s |u_s\rangle, a uniform superposition of eigenstates.
"Phase estimation returns r directly." No. Phase estimation returns \widetilde\varphi \approx s/r for a random s — a fraction, not an integer. Recovering r from s/r is the job of continued fractions, a classical post-processing step. If you measure y = 0 (which corresponds to s = 0), you learn nothing; try again.
"The whole algorithm is quantum." No. The quantum part is phase estimation on U_a — which is itself dominated by the classical-looking subroutine of modular exponentiation implemented reversibly on a quantum register. Continued fractions is purely classical. The reduction from factoring to order finding is classical. The verification a^{r'} \bmod N \stackrel{?}{=} 1 is classical. The quantum computer is doing one specific thing: extracting s/r from the eigenvalue structure of U_a.
"You need to know r to pick the right ancilla count t." You only need an upper bound, and r < N gives you one. Choose t = 2n + 1 where n = \lceil \log_2 N\rceil, and the precision is sufficient for any r up to N. This is why the ancilla register is "roughly twice the size of the data register" in every Shor-circuit diagram.
"Continued fractions always works." It succeeds when \gcd(s, r) = 1, which happens for \varphi(r)/r fraction of s values, where \varphi is Euler's totient function. For typical r this is \Omega(1/\log\log r), and the expected number of repetitions to find a coprime s is logarithmic. If you get an s with \gcd(s, r) = d > 1, continued fractions returns r/d, a proper divisor of r — sometimes useful, often enough to compute the true r by taking the LCM of two or three runs.

Going deeper

If you came to understand how order finding works quantumly — the unitary U_a, its eigenstates, why |1\rangle is a uniform sum of them, how phase estimation samples s/r, and how continued fractions recovers r — you have it. The rest is the structural anatomy: a formal proof of the eigenstate expansion, the continued-fractions algorithm itself, failure analysis, modular-exponentiation cost, and Kitaev's alternative formulation.

Formal derivation of the |1\rangle eigenstate expansion

Starting from the eigenstate definitions, the sum-over-s identity is a two-line Fourier-over-\mathbb{Z}/r argument. Writing \omega = e^{-2\pi i/r} (a primitive r-th root of unity):

\frac{1}{\sqrt r}\sum_{s=0}^{r-1}|u_s\rangle \;=\; \frac{1}{r}\sum_{k=0}^{r-1}\Bigl(\sum_{s=0}^{r-1}\omega^{sk}\Bigr)|a^k\rangle \;=\; \frac{1}{r}\sum_{k=0}^{r-1}\bigl(r\cdot\delta_{k,0}\bigr)|a^k\rangle \;=\; |a^0\rangle \;=\; |1\rangle,

where the inner sum equals r when k \equiv 0 \pmod r and 0 otherwise — the orthogonality of characters of \mathbb{Z}/r. More generally, any computational-basis state |a^j\rangle on the orbit decomposes as

|a^j\rangle \;=\; \frac{1}{\sqrt r}\sum_{s=0}^{r-1} e^{2\pi i\, s j /r}\,|u_s\rangle,

so running QPE on any orbit state returns s/r for random s — the choice of |1\rangle is just the easiest to prepare. (Proof: apply the inverse Fourier on the orbit to both sides of the |u_s\rangle definition.)

The continued-fractions algorithm, formally

Given a rational x \in (0, 1), the continued-fraction expansion is the unique sequence [a_0; a_1, a_2, \ldots] with a_0 \ge 0, a_i \ge 1 for i \ge 1, defined by x_0 = x, a_i = \lfloor x_i\rfloor, x_{i+1} = 1/(x_i - a_i) (stopping if x_i = a_i).

The i-th convergent p_i/q_i is the rational obtained by truncating the expansion at a_i. Convergents satisfy the Legendre theorem: if |x - p/q| < 1/(2q^2) for some fraction p/q in lowest terms, then p/q is a convergent of x.

Application to order finding. By the QPE accuracy bound, the measured \widetilde\varphi = y/2^t satisfies |\widetilde\varphi - s/r| \le 1/2^{t+1}. Choose t = 2n + 1; then 1/2^{t+1} \le 1/(2^{2n+2}) \le 1/(2 N^2) \le 1/(2 r^2) (since r \le N). By Legendre, s/r (in lowest terms) is a convergent of \widetilde\varphi. Compute all convergents — there are O(n) of them by a theorem of Khinchin — and find the one whose denominator r' satisfies a^{r'} \equiv 1 \pmod N. That is r (if \gcd(s, r) = 1) or a divisor of r (otherwise).

Failure modes and success probability

A single run fails to return r in two cases:

s = 0. The measured phase is 0/r = 0, carrying no information. Happens with probability 1/r.
\gcd(s, r) > 1. The continued-fractions convergent s/r reduces to s'/r' with r' < r a proper divisor. Happens with probability 1 - \varphi(r)/r, where \varphi is Euler's totient.

Combining, the probability of success per run is \varphi(r)/r, which for typical r is \Omega(1/\log\log r) (by Mertens' theorem; the worst case is when r is a primorial). Repeat O(\log\log N) times and take the LCM of the results; with probability > 1/2, the LCM is r. Total expected runs: O(\log\log N).

A careful analysis (Nielsen and Chuang §5.3.1) shows that in practice two independent runs suffice to pin down r with probability > 3/4: on average the \gcd-returned divisors of r span enough information that two LCMs hit r.

Modular exponentiation cost — the expensive part

The bottleneck of the quantum order-finding circuit is implementing the ladder of controlled-U_a^{2^k} gates for k = 0, 1, \ldots, t-1 = 2n. Each gate U_a^{2^k} is multiplication by a^{2^k} \bmod N, which is a specific permutation of the coprime-to-N residues, built from binary adders and multipliers.

A straightforward implementation of U_a^{2^k} uses O(n^2) Toffoli gates. With t = 2n + 1 ancillas, the total Toffoli count is O(n^3) — cubic in the bit-size of N. For 2048-bit RSA, that is \sim 2048^3 \approx 10^{10} Toffoli gates per run. Each Toffoli requires \sim 7 T gates in the standard decomposition, pushing the T-count to \sim 10^{11}. This is why Shor's algorithm is expensive even for "small" cryptographically relevant N — the constant factors hidden in O(n^3) are enormous.

Modern optimisations (Coppersmith, windowed arithmetic, Montgomery form, Gidney's recent work on modular multipliers) reduce the logical-Toffoli count by factors of 10 to 100, but the asymptotic remains O(n^3).

Kitaev's alternative — one-ancilla iterative order finding

Kitaev's 1995 construction (arXiv:quant-ph/9511026) replaces the t-ancilla QFT-based phase estimation with a single-ancilla iterative version. For order finding, this means extracting the bits of s/r one at a time, feeding each measurement outcome back as a classical control for the next round's controlled-U_a^{2^k}.

Advantages: one ancilla instead of 2n + 1, at the cost of O(n) circuit depth rounds with classical feedforward. Useful on NISQ devices where qubit count is scarce but coherence time per round is adequate.

Disadvantages: the classical post-processing to reconstruct s/r is more intricate, and round-to-round phase errors accumulate differently. For full-scale fault-tolerant Shor, the QFT-based version is standard.

Indian context — Aadhaar, UPI, and the post-quantum horizon

Order finding is the real quantum threat to RSA-based cryptography, which underpins much of India's digital infrastructure. Aadhaar authentication uses RSA signatures on biometric queries; UPI transactions sign payment messages with RSA and ECDSA; GSTIN filings, DigiLocker, eSign, and MCA21 all rely on RSA-based public-key infrastructure today. If a fault-tolerant quantum computer ran the order-finding subroutine on the modulus of any one of these signing keys, the cryptosystem would fall.

The timeline is years to decades away — current hardware is \sim 10^4 qubits short of breaking 2048-bit RSA, as shown by Gidney-Ekerå (2021). But the "harvest now, decrypt later" threat is immediate: an adversary can record encrypted traffic today and decrypt it in 15 years when the quantum computer exists. India's National Quantum Mission (2023, ₹6000 crore) has a working group on post-quantum cryptography migration precisely to get ahead of this; the CERT-In and MeitY advisories from 2024 onward recommend lattice-based signatures (Kyber, Dilithium) for new deployments. The next chapter (ch. 77) assembles order finding into the full Shor circuit and puts concrete numbers on the migration deadline.

Where this leads next

The Shor circuit end to end — the full factoring pipeline: pick a, run order finding, recover factors, with complete resource estimates for 2048-bit RSA.
Factoring reduces to order finding — the classical reduction that explains why order finding factors N.
Quantum phase estimation — the subroutine that order finding invokes.
Simon's algorithm — the (\mathbb{Z}/2)^n analogue: hidden period over the Boolean hypercube, the template Shor transposed to \mathbb{Z}/N.
Continued fractions — the classical number-theory algorithm used for post-processing.

References

Peter Shor, Polynomial-time algorithms for prime factorization and discrete logarithms on a quantum computer (1994) — arXiv:quant-ph/9508027. The founding paper — order finding is §5.
Wikipedia, Shor's algorithm — includes a clean derivation of the eigenstates of modular multiplication.
Nielsen and Chuang, Quantum Computation and Quantum Information §5.3 — Cambridge University Press. Order finding with the continued-fractions analysis.
John Preskill, Lecture Notes on Quantum Computation Chapter 7 — theory.caltech.edu/~preskill/ph229. Order finding as phase estimation, with the |1\rangle = \sum_s |u_s\rangle/\sqrt r derivation.
Alexei Kitaev, Quantum measurements and the Abelian stabilizer problem (1995) — arXiv:quant-ph/9511026. The one-ancilla iterative formulation of phase estimation.
Artur Ekert and Richard Jozsa, Quantum computation and Shor's factoring algorithm (1996) — arXiv:quant-ph/9508027. A readable review of order finding and its place in the factoring algorithm.