Simon's Algorithm — padho-wiki

Note: Company names, engineers, incidents, numbers, and scaling scenarios in this article are hypothetical — even when they resemble real ones. See the full disclaimer.

In short

Simon's algorithm (Daniel Simon, 1994) solves a "hidden period" problem: you are given an oracle for a function f: \{0,1\}^n \to \{0,1\}^n that is promised to be 2-to-1 with f(x) = f(y) if and only if y = x or y = x \oplus s, for some hidden non-zero string s \in \{0,1\}^n. Your job is to find s. Any classical algorithm — deterministic or randomised — needs \Omega(2^{n/2}) queries to find s with high probability: finding s requires finding a collision x \ne y with f(x) = f(y), and by the birthday paradox, random queries only collide after about 2^{n/2} tries. Simon's quantum algorithm finds s in O(n) queries. This is an exponential separation against randomised classical — the first such separation in the history of quantum algorithms. It landed in 1994 and was read within months by Peter Shor, who adapted the template from the group (\mathbb{Z}/2)^n to \mathbb{Z}/N\mathbb{Z} and built the quantum factoring algorithm. If Bernstein-Vazirani is "one query to learn a string", Simon is "a handful of queries to learn a structure", and Shor is "a handful of queries to factor an integer and break RSA". Simon's algorithm is the hinge.

Until Simon's paper appeared in 1994, the situation looked like this. Quantum computers could beat classical computers at toy problems — Deutsch (1985) for one-bit promise problems, Deutsch-Jozsa (1992) for the n-bit constant-vs-balanced promise, Bernstein-Vazirani (1993) for parity functions. In every case, the separation either was not exponential (Bernstein-Vazirani: 1 vs n) or vanished when the classical algorithm was allowed to use randomness and tolerate a small error. Randomised classical computation was almost as powerful as quantum computation for every problem anyone had found. The sceptics had a fair point: maybe quantum mechanics, once you allowed classical probability, stopped being interesting for computation.

Daniel Simon's 1994 paper ended that debate.

Simon described a problem — the problem you will learn below — for which any classical algorithm, however clever, however randomised, needs about 2^{n/2} queries to solve with high probability. His quantum algorithm solves it in O(n) queries. The ratio is exponential: 2^{n/2} divided by n grows without bound. For n = 100, that is 2^{50} \approx 10^{15} classical queries versus 100 quantum queries. A separation of this size, against randomised classical, had never been demonstrated before, for any problem.

Peter Shor read Simon's paper in the spring of 1994. Within months he had transposed Simon's algorithm from the group \mathbb{F}_2^n (bit-strings under XOR) to the group \mathbb{Z}/N\mathbb{Z} (integers mod N), turning "find the period of an XOR-invariant function" into "find the period of a modular-exponentiation function." Finding the period of a^x \bmod N lets you factor N in polynomial time — a classical reduction that had been known since Miller (1976). Shor's paper appeared in October 1994. Within six months of Simon's work, quantum computing had gone from "interesting but maybe not practically important" to "able to break the cryptosystem that protects most of the internet, in principle, as soon as the hardware exists."

Simon's algorithm is the hinge. Learn it carefully, because every major quantum algorithm that followed — Shor's, the hidden-subgroup framework, period-finding in general — is a refinement of what you are about to see.

The problem — a hidden period on the Boolean hypercube

You are given access to a function f: \{0,1\}^n \to \{0,1\}^n via the standard oracle

U_f|x\rangle|y\rangle \;=\; |x\rangle|y \oplus f(x)\rangle,

where x, y \in \{0,1\}^n and \oplus is bitwise XOR. (The target register has n qubits here, not just one, because f's output is n-bit.) You are promised that there exists some non-zero s \in \{0,1\}^n such that, for every x, y \in \{0,1\}^n,

f(x) = f(y) \;\iff\; y = x \text{ or } y = x \oplus s.

Equivalently: f is invariant under XOR with s — f(x \oplus s) = f(x) for all x — and is otherwise 1-to-1, meaning every output value is hit by exactly two inputs x and x \oplus s.

Your job: find s.

Why this is called a "hidden period": on the Boolean hypercube \{0,1\}^n (with the XOR group structure), the set \{0, s\} is a subgroup of size 2. The function f is constant on each coset \{x, x \oplus s\} of this subgroup. Just as the ordinary sine function is periodic with period 2\pi (meaning \sin(x + 2\pi) = \sin(x) for all x), f is periodic with period s in the hypercube's XOR arithmetic. Simon's algorithm is sometimes called the "hidden periodicity problem" or the "abelian hidden-subgroup problem over \mathbb{F}_2^n" for exactly this reason.

A tiny concrete example at n = 2. Let s = 11. Then f must satisfy f(x) = f(x \oplus 11), so f(00) = f(11) and f(01) = f(10). One legitimate f sends 00, 11 \mapsto 01 and 01, 10 \mapsto 10 — a 2-to-1 function whose fibres \{00, 11\} and \{01, 10\} are cosets of \{00, 11\} in \mathbb{F}_2^2. Any such f is a valid Simon oracle for s = 11.

Simon's promise at $n = 3$, $s = 110$. The function $f$ is $2$-to-$1$: each output value has exactly two preimages, and those two preimages differ by XOR with the hidden string $s$. Finding any collision $x \ne y$ with $f(x) = f(y)$ immediately reveals $s = x \oplus y$.

The classical lower bound — the birthday wall at 2^{n/2}

Before the quantum algorithm, understand what a classical algorithm must do.

Claim. Any classical algorithm (deterministic or randomised) that finds s with probability at least 2/3 needs at least \Omega(2^{n/2}) queries to f.

The reason: you need a collision. The only way to learn anything about s from f-values is to find two inputs x \ne y with f(x) = f(y); then you know s = x \oplus y. Every query gives you one new f-value; to find a collision, you need two distinct queries that returned the same output.

The birthday calculation. Query f on q distinct random inputs. You get q output values. The number of pairs of queries is \binom{q}{2} \approx q^2/2. For each pair, the probability that they collide (given that f is 2-to-1 over 2^{n-1} output values) is 1/(2^{n-1} - 1) \approx 2^{-(n-1)}. The expected number of collisions is therefore about q^2 \cdot 2^{-n}. For this to be \Omega(1) — i.e. for a collision to be likely to exist — you need q = \Omega(2^{n/2}).

Why the argument is tight: the upper bound is also O(2^{n/2}) — query f on 2^{n/2} random inputs, check for collisions, and with constant probability you will find one. The matching lower bound (any algorithm needs \Omega(2^{n/2}) queries) uses a careful adversary argument: the adversary reveals f-values consistently with some valid s, and as long as no collision has been seen, many different s remain consistent — so the algorithm cannot identify s. Only when a collision appears is the adversary forced to commit to a specific s. The birthday bound says collisions do not appear until \Omega(2^{n/2}) queries.

For n = 50, classical algorithms need about 2^{25} \approx 3 \times 10^7 queries — feasible, but slow. For n = 100, they need about 2^{50} \approx 10^{15} — a quadrillion queries, essentially infeasible. For n = 200, they need 2^{100} — more than the number of atoms in a planet. The classical algorithm grows exponentially in n.

The quantum algorithm — O(n) queries

Simon's algorithm uses 2n qubits: an n-qubit input register and an n-qubit target register.

One Simon subroutine (repeated O(n) times):

Initialise both registers to |0\rangle^{\otimes n}. State: |0\rangle_X|0\rangle_Y.
Hadamard the input register: apply H^{\otimes n} to the first n qubits. State: \tfrac{1}{\sqrt{2^n}}\sum_x |x\rangle|0\rangle.
Query the oracle: apply U_f. State: \tfrac{1}{\sqrt{2^n}}\sum_x |x\rangle|f(x)\rangle.
Measure the target register. (This can also be done at the end of the circuit — see the deferred-measurement principle — but measuring it now makes the analysis cleaner.) Outcome: some specific value y_0 = f(x_0). The input register collapses to the preimage set of y_0, which is the two-element set \{x_0, x_0 \oplus s\}. State: \tfrac{1}{\sqrt{2}}(|x_0\rangle + |x_0 \oplus s\rangle)|y_0\rangle.
Hadamard the input register again: apply H^{\otimes n}. State: a superposition over y to be analysed below.
Measure the input register. Outcome: some z \in \{0,1\}^n.

What each subroutine produces. One linear equation z \cdot s = 0 \pmod 2 on the hidden string s. (The derivation is in the walk-through below.)

Post-processing (classical): Run the subroutine roughly n + O(1) times. You get roughly n equations z_1 \cdot s = 0, z_2 \cdot s = 0, \ldots. With high probability the z_i span an (n-1)-dimensional subspace of \mathbb{F}_2^n, which determines s up to a single non-zero choice. Solve the linear system using Gaussian elimination over \mathbb{F}_2 to find s.

Verify: Query f(0) and f(s); if they agree, s is correct.

The total query complexity is O(n) — each subroutine uses one oracle query, and you run O(n) of them. The classical post-processing is polynomial: Gaussian elimination on an n \times n matrix over \mathbb{F}_2 runs in O(n^3) time. No queries required for the post-processing — you are just crunching the z_i's you already collected.

One subroutine of Simon's algorithm. The $Y$ register is measured first to collapse the $X$ register to a two-element superposition; the final Hadamards and measurement of $X$ produce a vector $z$ satisfying $z \cdot s = 0 \pmod 2$. Repeat $O(n)$ times and solve the resulting linear system.

Why it works — the walk-through

Follow the state through the circuit.

After step 2 — Hadamards on input. The state is \tfrac{1}{\sqrt{2^n}}\sum_x |x\rangle|0\rangle.

After step 3 — oracle. The state is \tfrac{1}{\sqrt{2^n}}\sum_x |x\rangle|f(x)\rangle. This is an entangled state between X and Y: each x branch is correlated with its own f(x) in the Y register.

After step 4 — measure Y. Measurement in the computational basis returns some specific y_0 \in \{0,1\}^n. Which y_0? Every y_0 in the image of f is equally likely, with probability 2/2^n = 1/2^{n-1} (because each output value has exactly two preimages and the input register was uniform). After the measurement, the X register is collapsed to those branches consistent with the observed y_0:

|\psi_X\rangle \;=\; \frac{1}{\sqrt{2}}\bigl(|x_0\rangle + |x_0 \oplus s\rangle\bigr),

where x_0 and x_0 \oplus s are the two preimages of y_0. The specific x_0 you get depends on the measurement outcome and is not under your control — but the fact that the X register is now an equal superposition over \{x_0, x_0 \oplus s\} is guaranteed.

Why measuring Y does this: before the measurement, the state was \tfrac{1}{\sqrt{2^n}}\sum_x |x\rangle|f(x)\rangle. Group the sum by the value of f: pair up each collision \{x_0, x_0 \oplus s\} and write |x_0\rangle|y_0\rangle + |x_0 \oplus s\rangle|y_0\rangle = (|x_0\rangle + |x_0 \oplus s\rangle)|y_0\rangle. The full state is a sum of 2^{n-1} such terms, one per output value. Measuring Y projects onto one term — one output value y_0 — and the X register inherits the two-element superposition.

After step 5 — Hadamards on X again. Apply H^{\otimes n} to \tfrac{1}{\sqrt{2}}(|x_0\rangle + |x_0 \oplus s\rangle). Use the Hadamard-transform identity

H^{\otimes n}|x\rangle \;=\; \tfrac{1}{\sqrt{2^n}}\sum_{z} (-1)^{x \cdot z}|z\rangle.

Apply term-by-term:

H^{\otimes n} \tfrac{1}{\sqrt{2}}(|x_0\rangle + |x_0 \oplus s\rangle) \;=\; \tfrac{1}{\sqrt{2^{n+1}}}\sum_z \bigl[(-1)^{x_0 \cdot z} + (-1)^{(x_0 \oplus s) \cdot z}\bigr]|z\rangle.

Use the identity (x_0 \oplus s) \cdot z = x_0 \cdot z + s \cdot z \pmod 2 (both inner products are mod 2). Factor out (-1)^{x_0 \cdot z}:

=\; \tfrac{1}{\sqrt{2^{n+1}}}\sum_z (-1)^{x_0 \cdot z}\bigl[1 + (-1)^{s \cdot z}\bigr]|z\rangle.

The bracket [1 + (-1)^{s \cdot z}] is 2 if s \cdot z = 0 (both terms +1) and 0 if s \cdot z = 1 (+1 and -1 cancel). So the surviving z are exactly those with s \cdot z = 0 \pmod 2:

=\; \tfrac{1}{\sqrt{2^{n-1}}}\sum_{z : s \cdot z = 0} (-1)^{x_0 \cdot z}|z\rangle.

The state is a uniform (up to signs) superposition over the 2^{n-1} strings z orthogonal to s in \mathbb{F}_2^n.

Step 6 — measure X. Each such z is measured with probability 1/2^{n-1} — uniform over the hyperplane s \cdot z = 0.

The outcome is a random vector satisfying z \cdot s = 0 \pmod 2. That is what one subroutine produces.

The mechanism of Simon's algorithm. Measuring $Y$ collapses the $X$ register to a two-element superposition at offset $s$. The Hadamard transform then converts this offset into a constraint $z \cdot s = 0$ on the measurement outcome $z$. Each subroutine produces one such constraint.

Why O(n) subroutines suffice

After k subroutines you have k vectors z_1, \ldots, z_k, each independently sampled uniformly from the hyperplane \{z : z \cdot s = 0\}, which has 2^{n-1} elements. You want these vectors to span the hyperplane — i.e. to have rank n - 1 over \mathbb{F}_2 — so that s is the unique non-zero vector in the (orthogonal) complement.

Coupon collector for \mathbb{F}_2 subspaces. The probability that k uniformly random vectors from a d-dimensional \mathbb{F}_2-subspace fail to span it is bounded by d \cdot 2^{-(k-d+1)} (a union bound over proper subspaces). Setting d = n - 1, you get the failure probability below 1/4 for k = n + 1, and it drops geometrically afterwards. So k = n + O(1) subroutines suffice to span with high probability.

Once you have n - 1 linearly independent z_i, solve the system z_i \cdot s = 0 \pmod 2 by Gaussian elimination over \mathbb{F}_2 in O(n^3) time. The system has a one-dimensional solution space; s is its unique non-zero element. Verify by computing f(0) and f(s) — they must agree.

Total query complexity: O(n). Total classical post-processing: O(n^3). No query in the post-processing.

Worked examples

Example 1: $n = 3$, $s = 101$ — one subroutine traced in full

Take n = 3 and s = 101. Consider an oracle with f(000) = f(101) = A, f(001) = f(100) = B, f(010) = f(111) = C, f(011) = f(110) = D, for some distinct output labels A, B, C, D. (The specific A, B, C, D values are not important for the analysis.)

Step 1–2 — Hadamard the input register. State: \tfrac{1}{\sqrt{8}}\sum_x |x\rangle|000\rangle, where the sum runs over all eight x \in \{0,1\}^3.

Step 3 — oracle query. State: \tfrac{1}{\sqrt{8}}\sum_x |x\rangle|f(x)\rangle. Grouped by output value, this is

\tfrac{1}{\sqrt{8}}\bigl[(|000\rangle + |101\rangle)|A\rangle + (|001\rangle + |100\rangle)|B\rangle + (|010\rangle + |111\rangle)|C\rangle + (|011\rangle + |110\rangle)|D\rangle\bigr].

Step 4 — measure Y. Suppose the outcome is y_0 = B (each of the four outcomes is equally likely, with probability 1/4; this subroutine got B). The input register collapses to

|\psi_X\rangle \;=\; \tfrac{1}{\sqrt{2}}(|001\rangle + |100\rangle).

Notice: 001 \oplus 100 = 101 = s. The two branches differ by XOR with s, as guaranteed by the promise.

Step 5 — Hadamard the input register. Apply the identity H^{\otimes 3}|x\rangle = \tfrac{1}{\sqrt{8}}\sum_z (-1)^{x \cdot z}|z\rangle to each term:

H^{\otimes 3}\,\tfrac{1}{\sqrt{2}}(|001\rangle + |100\rangle) \;=\; \tfrac{1}{\sqrt{16}}\sum_z \bigl[(-1)^{001 \cdot z} + (-1)^{100 \cdot z}\bigr]|z\rangle.

Compute the bracket for each of the eight z:

z	001 \cdot z	100 \cdot z	bracket	amplitude
000	0	0	+1 + 1 = 2	2/\sqrt{16} = 1/2
001	1	0	-1 + 1 = 0	0
010	0	0	+1 + 1 = 2	1/2
011	1	0	-1 + 1 = 0	0
100	0	1	+1 - 1 = 0	0
101	1	1	-1 - 1 = -2	-1/2
110	0	1	+1 - 1 = 0	0
111	1	1	-1 - 1 = -2	-1/2

Non-zero amplitudes on z \in \{000, 010, 101, 111\}. Check each of these against s = 101:

000 \cdot 101 = 0 \pmod 2 ✓
010 \cdot 101 = 0 + 0 + 0 = 0 \pmod 2 ✓
101 \cdot 101 = 1 + 0 + 1 = 0 \pmod 2 ✓
111 \cdot 101 = 1 + 0 + 1 = 0 \pmod 2 ✓

Every surviving z satisfies z \cdot s = 0, as promised. The four zero-amplitude outcomes \{001, 011, 100, 110\} all have z \cdot s = 1.

Why the all-zeros outcome z = 000 is allowed: 000 \cdot s = 0 trivially for any s, so z = 0^n is always in the orthogonal hyperplane. This outcome is informationally useless ("every s is orthogonal to zero"), but it is allowed. In the post-processing you simply discard all-zero vectors and collect non-trivial ones.

Step 6 — measure X. Each of the four surviving z's is measured with probability |1/2|^2 = 1/4. Suppose the outcome is z_1 = 010. You have learnt: 010 \cdot s = 0, i.e. s_2 = 0. (JobBoard s = 101 has s_2 = 0.)

One Simon subroutine at $n = 3$, $s = 101$, with $Y$-outcome $y_0 = B$. Four $z$-outcomes are possible, each with probability $1/4$: the four vectors orthogonal to $s$ in $\mathbb{F}_2^3$. The four vectors *not* orthogonal to $s$ have zero amplitude and are never measured.

What one subroutine produces. One equation z \cdot s = 0 \pmod 2. You cannot determine s from one equation — there are 2^{n-1} - 1 = 3 non-zero vectors satisfying any given equation. You need more subroutines.

Example 2: $n = 3$, $s = 101$ — collecting three equations and solving

Continuing from Example 1, run the subroutine two more times. Suppose the outcomes are:

Subroutine 1: z_1 = 010 (from the trace above).
Subroutine 2: z_2 = 111.
Subroutine 3: z_3 = 010 (same as z_1 — can happen, each subroutine is independent).

You now have three equations on s = (s_1, s_2, s_3):

\begin{aligned}z_1 \cdot s &= s_2 = 0 \\ z_2 \cdot s &= s_1 + s_2 + s_3 = 0 \\ z_3 \cdot s &= s_2 = 0 \pmod 2. \end{aligned}

Step — rank of the system. The vectors z_1 = 010 and z_3 = 010 are identical, contributing the same equation. The independent equations are s_2 = 0 and s_1 + s_2 + s_3 = 0, so the system has rank 2. There are 2^{3-2} = 2 solutions: s = 000 and s = 101. The promise rules out s = 000, so s = 101.

Actually, with only two independent equations you have uniquely determined s (up to zero). Two subroutines out of three were useful; one was redundant. This is why Simon's post-processing needs roughly n - 1 independent vectors, which after coupon-collector analysis typically requires n + O(1) total subroutines.

Step — verify. Compute f(000) and f(101) using the oracle. If they match, s = 101 is confirmed. In our example: f(000) = A = f(101), so yes, s = 101. Correct.

The classical post-processing of Simon's algorithm. Collect equations $z_i \cdot s = 0$ from subroutines. Reduce to independent rows over $\mathbb{F}_2$. The unique non-zero solution is $s$. One last oracle query ($f(0)$ vs $f(s)$) verifies the answer.

What this shows. Each subroutine gives you one linear equation. After O(n) subroutines, you collect enough equations to pin s down uniquely. The quantum work is O(n) oracle queries; the classical post-processing is O(n^3) Gaussian elimination. Compare to classical: \Omega(2^{n/2}) oracle queries and then some arithmetic. At n = 100: Simon uses \sim 100 queries; classical uses \sim 10^{15}.

Common confusions

"Simon's algorithm is the n-bit generalisation of Deutsch-Jozsa." No. Deutsch-Jozsa's promise is that f is constant or balanced; Simon's promise is that f has a hidden XOR period. The two problems are genuinely different — the set of functions satisfying Simon's promise is disjoint from the set satisfying Deutsch-Jozsa's (except in trivial corner cases). The two algorithms share a structural ancestor (the Hadamard–oracle–Hadamard template) but solve different problems with different proofs of different speedups.
"Simon's algorithm is NP-hard classically." No. The problem is exponential in query complexity (2^{n/2} queries), but each query is simple and the post-processing after queries is cheap. The full classical time complexity is \Theta(2^{n/2} \cdot \text{poly}(n)). This is exponential but far below NP-hard: NP-complete problems are conjectured to require \Omega(2^n) time, and Simon's problem requires only \Omega(2^{n/2}). Simon's problem sits in a complexity class often called "promise-BQP" or the oracle version of it; it is not known to be NP-hard.
"Simon's algorithm directly inspired Shor's algorithm." Yes. This is historically correct. Simon's 1994 paper contained the crucial insight: a quantum Fourier-type transform over an abelian group converts a "hidden subgroup" into measurable outcomes. Shor, reading Simon's paper, recognised that period-finding modulo N — a sub-problem equivalent to integer factoring — has exactly the same structure with the group replaced by \mathbb{Z}/N\mathbb{Z}. Shor replaced Simon's H^{\otimes n} (the Fourier transform over \mathbb{F}_2^n) with the quantum Fourier transform over \mathbb{Z}/N\mathbb{Z}, and Simon's trick-with-XOR-periods became Shor's factoring algorithm. Shor himself credits Simon explicitly.
"Simon's output is guaranteed." Not on any single subroutine. Each subroutine produces a random vector orthogonal to s; you need to collect enough to span the orthogonal hyperplane and solve for s. The probability of success rises exponentially with the number of subroutines: n + 10 subroutines succeed with probability above 1 - 2^{-10}.
"Simon's algorithm scales cleanly to real hardware." On current NISQ devices, the circuit depth and the entangling operations inside U_f produce significant errors. Demonstrations of Simon's algorithm on Compustar Quantum hardware have been run for small n (around n = 3 or 4), but scaling to the n = 50 to 100 regime where the classical advantage would be visible requires either much lower error rates or fault-tolerant error correction. The asymptotic speedup is real; extracting it in practice waits on engineering.
"Simon's algorithm is about 'trying all preimages in parallel'." No. The oracle call U_f does put the X register in a superposition over all x — but no information about any individual f(x) is read out that way. The key step is the measurement of Y, which probabilistically collapses X to a specific coset \{x_0, x_0 \oplus s\}, followed by a Hadamard transform that encodes s into the measurement distribution of a new output z. The "magic" is interference, not parallel enumeration — the standard quantum-computing hype check applies here as well as everywhere else.

Going deeper

You have the algorithm, the proof, two worked examples, and the classical lower bound. The rest of this section frames Simon's result in the broader quantum-algorithms landscape: the Fourier-sampling interpretation, the direct pathway from Simon to Shor, the rigorous query-complexity lower bound, the hidden-subgroup abstraction, and what oracle separations between BQP and BPP do and do not establish.

The Fourier-over-\mathbb{F}_2^n interpretation

Like Bernstein-Vazirani, Simon's algorithm is a Fourier-sampling algorithm over the group \mathbb{F}_2^n. The Hadamard transform H^{\otimes n} is exactly the quantum Fourier transform for this group. Here is the structural rephrasing.

The state \tfrac{1}{\sqrt{2}}(|x_0\rangle + |x_0 \oplus s\rangle) is a superposition supported on the coset x_0 + H of the subgroup H = \{0, s\}. The Fourier transform of a coset indicator function is a character sum supported on H^\perp, the annihilator of H: the set of z with z \cdot h = 0 for every h \in H. Since H = \{0, s\}, the annihilator is \{z : z \cdot s = 0\} — exactly the set Simon's final measurement samples from. Fourier analysis turns "coset of H" into "uniform distribution on H^\perp", and sampling from H^\perp for random x_0 allows you to reconstruct H^\perp, which determines H which determines s.

This is the general "hidden-subgroup problem" recipe:

Use the oracle to prepare a coset state \tfrac{1}{\sqrt{|H|}}\sum_{h \in H}|x_0 + h\rangle for a random coset representative x_0.
Fourier-transform to get a state supported on the annihilator H^\perp.
Measure to sample from H^\perp.
Repeat and solve for H.

For G = \mathbb{F}_2^n and |H| = 2, this recipe is exactly Simon's algorithm. For G = \mathbb{Z}/N\mathbb{Z} and H generated by the order r of some element, this recipe is exactly Shor's algorithm. The two are the same algorithm at the abstract level — only the underlying group changes.

The pathway from Simon to Shor

The transition from Simon (1994) to Shor (1994) is one of the most important episodes in the history of computing. Here is what Shor did, in rough outline.

Simon's group: \mathbb{F}_2^n. Simon's Fourier transform: H^{\otimes n}. Simon's hidden structure: H = \{0, s\}, a subgroup of size 2.

Shor's group: \mathbb{Z}/Q\mathbb{Z} for a carefully chosen Q \approx N^2. Shor's Fourier transform: the quantum Fourier transform \mathrm{QFT}_Q (not the Hadamard; a much more complex circuit involving controlled phase gates). Shor's hidden structure: the subgroup r \mathbb{Z}/Q\mathbb{Z}, where r is the order of a modulo N (the smallest positive integer with a^r \equiv 1 \pmod N).

The classical reduction: Miller (1976) showed that factoring N reduces in polynomial time to computing the order r of a random element a \bmod N. Therefore, a polynomial-time quantum algorithm for order-finding is a polynomial-time quantum algorithm for factoring.

The post-processing: where Simon uses Gaussian elimination over \mathbb{F}_2, Shor uses the continued-fraction algorithm to extract r from a noisy sample in \mathbb{Z}/Q\mathbb{Z}.

Every other step is structurally the same. Shor's paper credits Simon's algorithm as the direct inspiration. The lesson for the quantum-algorithms research programme is profound: find more groups, and find more hidden-subgroup problems over those groups, and you might find more quantum speedups.

Rigorous query-complexity lower bound — the adversary argument

The claim that classical algorithms need \Omega(2^{n/2}) queries to solve Simon's problem can be proved rigorously using an adversary argument. The sketch: an adversary maintains a set S of possible values of s consistent with the queries asked so far, and answers queries using a "most non-committal" f. As long as the algorithm has not seen a collision, a large fraction of s \in \{0,1\}^n \setminus \{0\} remains consistent. The first collision commits the adversary to a specific s = x \oplus y; before the first collision, the algorithm has no information. Standard birthday-paradox calculations then give the \Omega(2^{n/2}) bound.

More precisely (Simon, 1994; Brassard-Hoyer, 1997): for any classical algorithm making q queries and succeeding with probability \ge 2/3, we need q \ge \Omega(2^{n/2}). The constant hidden in the \Omega is computable explicitly and is close to 1/2 for concrete versions of the argument.

Quantum upper bound in detail. Simon's algorithm uses exactly one oracle query per subroutine, and n + O(1) subroutines to succeed with high probability. So the quantum query complexity is at most n + O(1) = O(n). Exact constants can be tuned: if you collect n + 20 vectors, the probability of failing to determine s drops below 2^{-20} \approx 10^{-6}.

The separation \Omega(2^{n/2}) vs O(n) is exponential by any definition. It is also the first such gap provable against randomised classical, establishing a true separation between the oracle classes BPP and BQP for the first time.

The hidden-subgroup problem (HSP) — the unifying framework

The hidden-subgroup problem for a group G is: given a function f: G \to S that is constant on and distinct between cosets of some hidden subgroup H \le G, find H. This definition covers an enormous range of algorithmic problems.

G = \mathbb{F}_2^n, |H| = 2: Simon's problem.
G = \mathbb{F}_2^n, |H| = 1 or |H| = 2^n: Deutsch-Jozsa (detect constant vs balanced).
G = \mathbb{F}_2^n, H is the kernel of a linear functional: Bernstein-Vazirani.
G = \mathbb{Z}/N\mathbb{Z}, H = r\mathbb{Z}/N\mathbb{Z}: Shor's order-finding.
G = \mathbb{Z}^n, H a sublattice: Regev's lattice algorithm (quantum reductions for LWE).

For abelian groups G, the HSP has a known polynomial-time quantum algorithm using the quantum Fourier transform — Simon's algorithm is the \mathbb{F}_2^n instance, Shor's is the \mathbb{Z}/N\mathbb{Z} instance. For non-abelian groups, the HSP is mostly open: a polynomial-time quantum algorithm would break graph isomorphism (over the symmetric group) and various cryptographic lattice problems. Making progress on non-abelian HSP is one of the most active research directions in quantum algorithms as of the mid-2020s, with Indian researchers including Umesh Vazirani and various groups at TIFR and IISc contributing to both positive results and lower bounds.

Oracle separations — what they do and do not prove

Simon's algorithm establishes that relative to an oracle — i.e. when we count queries to a black box — quantum computers are exponentially more powerful than classical randomised computers for some problems. This is a relativised separation: there exists an oracle O such that \mathrm{BQP}^O \supsetneq \mathrm{BPP}^O.

What this does not prove: an unrelativised separation between BQP and BPP, i.e. a problem solvable by a polynomial-time quantum algorithm but not by any polynomial-time classical randomised algorithm. In the unrelativised world, one can always hope to exploit the internal structure of a specific function (rather than treating it as a black box) and achieve classically what would be impossible in the oracle model. Proving \mathrm{BQP} \ne \mathrm{BPP} unconditionally is still an open problem in complexity theory; it would imply \mathrm{P} \ne \mathrm{PSPACE}, which is a famous open question.

What Simon's result does mean is that any proof that \mathrm{BQP} = \mathrm{BPP} cannot relativise — it cannot work for all oracles. This is a significant barrier: almost every known complexity-theoretic technique (diagonalisation, simulation arguments) relativises. So Simon's oracle separation tells us that if we ever do prove BPP = BQP, we will need fundamentally new techniques. The same barrier, in a different direction, informs conjectures that \mathrm{BQP} \ne \mathrm{BPP} — most complexity theorists believe Shor's factoring problem demonstrates an actual separation, but this is conjectural.

Shor's algorithm's practical impact — factoring integers, breaking RSA — makes the separation feel concrete. Simon's result provides the theoretical foundation on which that confidence rests.

Indian context — theoretical quantum computing at IIT and TIFR

Quantum-algorithms research in India is led by groups at TIFR Mumbai (the ShaktiCorp Institute of Fundamental Research), IISc Bangalore, and the IITs at Madras, Bombay, Kanpur, and Delhi. Several Indian-origin researchers have made notable contributions to the hidden-subgroup framework that Simon's algorithm initiated. Arul Lakshminarayan at IIT Madras works on quantum chaos and quantum information. Apoorva Patel at IISc has written influential papers on quantum algorithms, including an early connection between Grover's algorithm and DNA replication. Shantanav Chakraborty and others at IIIT Hyderabad work on quantum walks and Fourier sampling generalisations. The broader Indian-origin diaspora in quantum algorithms is large: Umesh Vazirani (Berkeley) on foundations; Ashwin Nayak (Waterloo) on query complexity; Sanjeev Arora (Princeton) on classical-quantum comparisons. A student in India starting a PhD in quantum computing today has many Indian and Indian-origin advisors to choose from — a situation that did not exist in 1994 when Simon's algorithm appeared.

Where this leads next

Shor's algorithm — the direct successor. Replace \mathbb{F}_2^n with \mathbb{Z}/N\mathbb{Z} and the Hadamard with the QFT, and you get integer factoring.
Hidden subgroup problem — the unifying framework. Simon, Shor, Bernstein-Vazirani, Deutsch-Jozsa are all HSP instances.
Quantum Fourier transform — the generalisation of the Hadamard to arbitrary abelian groups.
Bernstein-Vazirani algorithm — the linear-speedup predecessor: learn a string from one query.
Deutsch-Jozsa algorithm — the one-bit predecessor with only an exponential deterministic-vs-quantum gap.
Grover's algorithm — a different kind of speedup (quadratic, not exponential) with much broader applicability.

References

Daniel Simon, On the power of quantum computation (1994) — the original paper. Published version SIAM J. Comput. 26, 1474 (1997). Widely circulated from 1994 in the FOCS '94 proceedings.
Wikipedia, Simon's problem — standard encyclopaedic treatment with worked small-n example.
John Preskill, Lecture Notes on Quantum Computation, Ch. 6 (quantum algorithms) — theory.caltech.edu/~preskill/ph229.
Nielsen and Chuang, Quantum Computation and Quantum Information, §6.5 — Cambridge University Press.
Cleve, Ekert, Macchiavello, Mosca, Quantum algorithms revisited (1998) — unified treatment of Simon, Shor, Deutsch-Jozsa, Bernstein-Vazirani as Fourier-sampling. arXiv:quant-ph/9708016.
Qiskit Textbook, Simon's algorithm — runnable Python implementation with hardware demonstration.