In short

Grover's algorithm is a rotation. All the action happens in a two-dimensional plane inside the much larger N-dimensional Hilbert space. The plane is spanned by two orthogonal unit vectors: the marked basis state |w\rangle, and the uniform superposition |s'\rangle = \frac{1}{\sqrt{N-1}}\sum_{x \ne w}|x\rangle over the unmarked inputs.

The starting state |s\rangle = H^{\otimes n}|0\rangle^{\otimes n} lives in this plane. It can be written as |s\rangle = \sin\theta\,|w\rangle + \cos\theta\,|s'\rangle with \sin\theta = 1/\sqrt N. So |s\rangle sits at a small angle \theta above the |s'\rangle axis.

Each Grover iteration G = D \cdot O is the composition of two reflections — O reflects about the |s'\rangle axis, D reflects about |s\rangle — and two reflections compose to a rotation by twice the angle between their axes. So G rotates the state by 2\theta toward |w\rangle.

After k iterations the state sits at angle (2k+1)\theta above |s'\rangle. You want that angle to hit \pi/2 (so the state is parallel to |w\rangle and measurement yields w with probability \sin^2(\pi/2) = 1). Solving (2k+1)\theta = \pi/2 gives k \approx \frac{\pi}{4\theta} - \frac{1}{2}, which for large N is k \approx \frac{\pi}{4}\sqrt N. That is the \sqrt N count — it drops out of trigonometry, not out of deep physics.

You have seen Grover's circuit — Hadamards, oracle, diffusion, repeat, measure — and you have seen the oracle and diffusion operators individually. You know the algorithm works. What you have not yet seen is why it works.

The amplitude tables from the circuit chapter showed one thing: the probability of measuring |w\rangle grows from 1/N to near 1 over about \sqrt N iterations. That is a numerical fact. What the tables do not explain is why it is exactly \sqrt N — not \log N, not N^{2/3}, not N. The explanation is geometric, and it is clean enough that you can draw it on the back of a matchbook.

Here is the claim. The Grover algorithm never leaves a 2-dimensional plane inside the N-dimensional state space. Inside that plane, every Grover iteration is a rotation by a fixed angle 2\theta. The initial state starts near one axis, and you want to reach the other axis — a rotation of 90° = \pi/2 in total. If the per-iteration rotation is 2\theta \approx 2/\sqrt N, you need about (\pi/2) / (2/\sqrt N) = (\pi/4)\sqrt N iterations.

That is the entire algorithm in one sentence. The rest of this chapter explains why the plane exists, why the axes are the natural ones, why the iteration is a rotation, and why the rotation angle is exactly what it is.

The two axes of the plane

Pick a marked input w \in \{0,1\}^n. You know exactly one input is marked, so |w\rangle is a fully specified computational basis vector — one of the N = 2^n possibilities. All the others are unmarked; collectively they form a set of N - 1 basis states.

Define

|s'\rangle = \frac{1}{\sqrt{N-1}}\sum_{x \ne w}|x\rangle.

Why divide by \sqrt{N-1} and not N - 1: to make |s'\rangle a unit vector. Its inner product with itself is \langle s' | s'\rangle = \frac{1}{N-1}\sum_{x \ne w}\langle x | x\rangle = \frac{N-1}{N-1} = 1. The prefactor 1/\sqrt{N-1} is exactly what you need so that when the amplitudes 1/\sqrt{N-1} are squared and summed over the N - 1 unmarked terms, the total probability comes out to 1. Normalisation, nothing more.

This is the uniform superposition over unmarked inputs — the state you would get if someone handed you |s\rangle with the marked term surgically removed and the remainder renormalised.

Two things are true about the pair (|w\rangle, |s'\rangle):

  1. They are orthogonal. The inner product \langle w | s'\rangle = \frac{1}{\sqrt{N-1}}\sum_{x \ne w}\langle w | x\rangle = 0 because \langle w | x\rangle = 0 for every x \ne w. So |w\rangle and |s'\rangle span a 2-dimensional subspace.

  2. The initial state |s\rangle lives in that subspace. You can write

|s\rangle = \frac{1}{\sqrt N}\sum_x|x\rangle = \frac{1}{\sqrt N}|w\rangle + \frac{1}{\sqrt N}\sum_{x \ne w}|x\rangle = \frac{1}{\sqrt N}|w\rangle + \frac{\sqrt{N-1}}{\sqrt N}|s'\rangle.

The coefficient on |s'\rangle is \sqrt{N-1}/\sqrt N because when you pull out the normalisation, \sum_{x \ne w}\frac{1}{\sqrt N}|x\rangle = \frac{1}{\sqrt N}\cdot \sqrt{N-1}\cdot |s'\rangle.

Why this matters: the entire algorithm will take place inside this 2D plane. Every application of the oracle and the diffusion operator sends states in the plane to other states in the plane. You will prove this in the next section. The practical consequence is that you can visualise Grover's algorithm as a 2D picture — a moving arrow in a plane — instead of as an abstract journey through N-dimensional complex space. That is an enormous reduction in complexity, and it is the single move that turns Grover from a black box into an algorithm you can reason about on paper.

Defining the angle θ

Introduce a convenient angle \theta by

\sin\theta = \frac{1}{\sqrt N}, \qquad \cos\theta = \frac{\sqrt{N-1}}{\sqrt N}.

Check: \sin^2\theta + \cos^2\theta = \frac{1}{N} + \frac{N-1}{N} = 1. So \theta is a well-defined angle between 0 and \pi/2, and the initial state decomposes as

|s\rangle = \sin\theta\,|w\rangle + \cos\theta\,|s'\rangle.

Geometrically, in the plane with |s'\rangle as the horizontal axis and |w\rangle as the vertical axis, |s\rangle is a unit vector at angle \theta above the horizontal.

For large N, \sin\theta \approx \theta (the small-angle approximation), so \theta \approx 1/\sqrt N. For N = 10^6, \theta \approx 10^{-3} radians — about 0.057°. The initial state is almost flat against the |s'\rangle axis, and measurement would give you a marked input with probability \sin^2\theta = 1/N — one in a million. To find w you must rotate this near-horizontal arrow up to the vertical axis. The rest of the chapter is a geometry problem about how fast you can rotate.

The 2D plane spanned by |w⟩ and |s'⟩A 2D coordinate plane with the horizontal axis labelled ket s prime (unmarked superposition) and the vertical axis labelled ket w (marked state). The unit circle is drawn as a dashed arc. The initial state ket s is shown as an arrow from the origin at a small angle theta above the horizontal, very close to the horizontal axis. Labels indicate ket s has coordinates (cos theta, sin theta) with sin theta = 1 over root N. A caption below reads "the entire algorithm lives in this plane — no component ever leaves it".Grover's 2D plane — |w⟩ vertical, |s'⟩ horizontal|s'⟩uniform over unmarked|w⟩marked basis state|s⟩= sin θ |w⟩ + cos θ |s'⟩θsin θ = 1/√N, cos θ = √(N−1)/√N|s⟩ sits at a tiny angle above the horizontal — probability of |w⟩ is sin²θ = 1/N
The two-dimensional plane in which Grover's algorithm lives. The horizontal axis is $|s'\rangle$ (uniform superposition over the $N - 1$ unmarked states); the vertical axis is $|w\rangle$ (the marked basis state). These two unit vectors are orthogonal, so they span a 2D subspace of the full $N$-dimensional Hilbert space. The initial Grover state $|s\rangle = H^{\otimes n}|0\rangle^{\otimes n}$ lives in this plane at a small angle $\theta$ above $|s'\rangle$, where $\sin\theta = 1/\sqrt N$. Every subsequent state the algorithm visits also lives in this plane — the algorithm is a trajectory on a flat sheet of paper, not a journey through $N$-dimensional space.

Two reflections make a rotation

The Grover iteration is G = D \cdot O, where O is the phase oracle and D is the diffusion operator. Each of these, restricted to the 2D plane, turns out to be a reflection — an operation that flips a state about some axis. And a basic fact of 2D geometry is that two reflections compose to a rotation, where the rotation angle is twice the angle between the two reflection axes.

The oracle is a reflection about the |s'⟩ axis

The phase oracle acts on basis states as

O|x\rangle = \begin{cases}-|x\rangle & \text{if } x = w,\\ +|x\rangle & \text{if } x \ne w.\end{cases}

In the 2D plane, O|w\rangle = -|w\rangle and O|s'\rangle = +|s'\rangle (since |s'\rangle is a combination of unmarked basis vectors, all unchanged by O). So in the basis (|s'\rangle, |w\rangle) the matrix of O is

O = \begin{pmatrix} 1 & 0 \\ 0 & -1 \end{pmatrix}.

This is the matrix of reflection about the horizontal (|s'\rangle) axis: it leaves the horizontal component unchanged and flips the sign of the vertical (|w\rangle) component. Geometrically, O takes any state in the plane and reflects it about the |s'\rangle axis.

Why this is a reflection and not a rotation: a reflection flips across a line and preserves distances; it has determinant -1. Check: \det O = (1)(-1) - 0 = -1. A rotation has determinant +1. The sign of the determinant is the deep algebraic signature of orientation-reversing (reflection) versus orientation-preserving (rotation). O reverses orientation, so it is a reflection.

The diffusion operator is a reflection about |s⟩

The diffusion operator is D = 2|s\rangle\langle s| - I. Its action on any vector |v\rangle is

D|v\rangle = 2\langle s | v\rangle\,|s\rangle - |v\rangle.

Decompose |v\rangle into its component along |s\rangle and its component orthogonal to |s\rangle: |v\rangle = \alpha|s\rangle + \beta|s^\perp\rangle. Then \langle s | v\rangle = \alpha, so

D|v\rangle = 2\alpha|s\rangle - \alpha|s\rangle - \beta|s^\perp\rangle = \alpha|s\rangle - \beta|s^\perp\rangle.

The component along |s\rangle is preserved; the component orthogonal to |s\rangle flips sign. That is the definition of a reflection about the |s\rangle axis.

So D is reflection about the |s\rangle axis, which makes angle \theta with the |s'\rangle axis.

Composition gives a rotation by 2θ

You now have two reflections:

The composition G = D \cdot O is two reflections in a row, with the axes separated by angle \theta. Basic 2D geometry says: the composition of two reflections, about axes separated by angle \theta, is a rotation by 2\theta (in the direction going from the first axis to the second).

Why two reflections make a rotation: if you reflect about axis 1, then reflect about axis 2, a vector at angle \alpha from axis 1 first goes to angle -\alpha from axis 1, which is angle \theta - (-\alpha + \theta) = \theta + \alpha... easier to see directly. Pick any vector, perform both reflections, and compare start and end angles — the net rotation is 2\theta, full stop. A quick verification by composing matrices: reflection-about-horizontal is \text{diag}(1, -1); reflection-about-a-line-at-angle-\theta has matrix \bigl(\begin{smallmatrix}\cos 2\theta & \sin 2\theta \\ \sin 2\theta & -\cos 2\theta\end{smallmatrix}\bigr); their product is \bigl(\begin{smallmatrix}\cos 2\theta & -\sin 2\theta \\ \sin 2\theta & \cos 2\theta\end{smallmatrix}\bigr), which is exactly the rotation by 2\theta. The detailed matrix computation is in the going-deeper section.

So one Grover iteration rotates the state, in the 2D plane, by angle 2\theta — and the rotation goes from |s'\rangle toward |w\rangle (upward, the direction you want).

Two reflections compose to a rotation by 2θThree panels showing the action of the oracle, the diffusion operator, and their composition on the initial state. Panel 1: ket s starts at angle theta above horizontal. Panel 2: oracle O reflects across the horizontal axis, sending ket s to angle minus theta (below horizontal). Panel 3: diffusion D reflects across the ket s axis (at angle theta), sending the reflected state up to angle 3 theta. Net effect: rotation by 2 theta counterclockwise from the starting position.G = D · O — reflection, then reflection, equals rotation by 2θ1. start: |s⟩|s'⟩|w⟩θ|s⟩2. after O (reflect ↻ |s'⟩)|s'⟩|w⟩O|s⟩−θ3. after D (reflect about |s⟩)|s'⟩|w⟩|s⟩ axisG|s⟩net: |s⟩ rotates by 2θ upward, from angle θ to angle 3θ above |s'⟩repeat k times → state at angle (2k+1)θ
One Grover iteration $G = D \cdot O$ visualised as two reflections. The oracle $O$ reflects the state across the horizontal $|s'\rangle$ axis, sending $|s\rangle$ from angle $\theta$ to angle $-\theta$. The diffusion $D$ then reflects across the $|s\rangle$ axis (the dashed line at angle $\theta$), sending the reflected state up to angle $3\theta$. Net: the state has rotated by $2\theta$ counter-clockwise. After $k$ iterations it sits at angle $(2k+1)\theta$.

The rotation formula

You are now equipped to track the state through every iteration. Start at

|s\rangle = \sin\theta\,|w\rangle + \cos\theta\,|s'\rangle \quad (\text{angle } \theta).

After one iteration:

G|s\rangle = \sin(3\theta)\,|w\rangle + \cos(3\theta)\,|s'\rangle \quad (\text{angle } 3\theta).

After two iterations:

G^2|s\rangle = \sin(5\theta)\,|w\rangle + \cos(5\theta)\,|s'\rangle \quad (\text{angle } 5\theta).

After k iterations the pattern is

G^k|s\rangle = \sin((2k+1)\theta)\,|w\rangle + \cos((2k+1)\theta)\,|s'\rangle.

The probability of measuring |w\rangle at this point is the squared modulus of the amplitude on |w\rangle:

P_k = \sin^2\bigl((2k+1)\theta\bigr).

When to stop

You want P_k as close to 1 as possible. \sin^2 is maximised when its argument is \pi/2 (giving \sin^2(\pi/2) = 1). So the ideal k satisfies

(2k+1)\theta = \frac{\pi}{2} \quad \Longleftrightarrow \quad k = \frac{\pi}{4\theta} - \frac{1}{2}.

k must be an integer, so take k_\text{opt} = \mathrm{round}\!\left(\dfrac{\pi}{4\theta} - \dfrac{1}{2}\right). For large N, \theta \approx 1/\sqrt N, and the formula collapses to

k_\text{opt} \approx \frac{\pi}{4}\sqrt N - \frac{1}{2} \approx \frac{\pi}{4}\sqrt N.

For N = 10^6, \pi/4 \cdot 10^3 \approx 785 iterations. For N = 10^{12}, about 785{,}000. The \sqrt N scaling is visible in the formula, not in the code.

Two worked examples

The abstract picture becomes concrete only when you push real numbers through it. Two cases — one where the geometry lands exactly on |w\rangle, one where it lands close but not exactly.

Example 1: $N = 4$ — the exact single-iteration case

Setup. Two qubits. N = 4. Exactly one marked state (say |w\rangle = |11\rangle; the algebra is the same for any choice).

Step 1. Compute \theta. The defining relation is \sin\theta = 1/\sqrt N = 1/\sqrt 4 = 1/2. So \theta = \pi/6 = 30°. Why this specific value matters: \pi/6 divides \pi/2 an integer number of times — specifically, \pi/2 = 3 \cdot \pi/6. So a single iteration rotates by 2\theta = \pi/3 = 60°, and starting from angle \theta = 30° the new angle is 30° + 60° = 90° = \pi/2 exactly.

Step 2. Starting state. In the plane with (|s'\rangle, |w\rangle) axes,

|s\rangle = \sin\frac{\pi}{6}|w\rangle + \cos\frac{\pi}{6}|s'\rangle = \frac{1}{2}|w\rangle + \frac{\sqrt 3}{2}|s'\rangle.

Angle above |s'\rangle axis: 30°. Probability of measuring |w\rangle right now: \sin^2(30°) = 1/4. Why 1/4: because N = 4 and every basis state has amplitude 1/\sqrt 4 = 1/2, so probability (1/2)^2 = 1/4. The geometric picture and the amplitude picture agree.

Step 3. One iteration rotates by 2\theta = 60°. New angle: 30° + 60° = 90°. New state:

G|s\rangle = \sin(90°)|w\rangle + \cos(90°)|s'\rangle = 1 \cdot |w\rangle + 0 \cdot |s'\rangle = |w\rangle.

The state is now exactly |w\rangle, with no residual component along |s'\rangle.

Step 4. Measure. P(\text{outcome} = w) = \sin^2(90°) = 1. The outcome is |w\rangle with probability 1 — deterministic.

Step 5. Check against formula. k_\text{opt} = \frac{\pi}{4\theta} - \frac{1}{2} = \frac{\pi}{4 \cdot \pi/6} - \frac{1}{2} = \frac{6}{4} - \frac{1}{2} = \frac{3}{2} - \frac{1}{2} = 1. One iteration, exactly.

N=4 Grover trajectory: one iteration lands on |w⟩ exactlyA 2D plane with |s'⟩ horizontal and |w⟩ vertical. An arc marks the unit circle. Two arrows from the origin: the initial state |s⟩ at 30 degrees above horizontal (labelled θ = 30°), and G|s⟩ at 90 degrees (pointing straight up along |w⟩). A curved arrow shows the 60 degree rotation from |s⟩ to G|s⟩. The angle between |s⟩ and |w⟩ is labelled 2θ = 60°.N = 4: one iteration → P(w) = 1 exactly|s'⟩|w⟩|s⟩angle 30°G|s⟩ = |w⟩angle 90°θ2θ = 60°θ = π/6, 3θ = π/2 — one rotation by 2θ exactly reaches the vertical axis
The $N = 4$ geometry. The initial state $|s\rangle$ sits at $30°$ above $|s'\rangle$; the rotation angle is $2\theta = 60°$; one iteration lands exactly on the $|w\rangle$ axis at $90°$. Probability $1$ — deterministic success on the first try.

Result. For N = 4, one oracle query suffices; the outcome is w with probability exactly 1. The classical worst-case takes 3 queries. Grover wins cleanly.

Example 2: $N = 16$ — close, but not exactly

Setup. Four qubits. N = 16. Marked state |w\rangle somewhere (the geometry is independent of which basis vector is marked).

Step 1. Compute \theta. \sin\theta = 1/\sqrt{16} = 1/4, so \theta = \arcsin(0.25) \approx 0.2527 radians, or about 14.48°.

Step 2. Optimal iteration count.

k_\text{opt} = \mathrm{round}\!\left(\frac{\pi}{4\theta} - \frac{1}{2}\right) = \mathrm{round}(3.108 - 0.5) = \mathrm{round}(2.608) = 3.

Why round up, not down: because k = 3 gives angle 7\theta \approx 1.77 rad \approx 101.4° (slightly past 90°), while k = 2 gives angle 5\theta \approx 72.4° (well short of 90°). The probability \sin^2(7\theta) \approx \sin^2(101.4°) \approx 0.961 beats \sin^2(5\theta) \approx \sin^2(72.4°) \approx 0.908. When in doubt, round the rotation count to whichever k makes (2k+1)\theta closest to \pi/2.

Step 3. Track the state. After k iterations, the angle is (2k+1)\theta and P_k = \sin^2((2k+1)\theta):

k (2k+1)\theta (rad) (2k+1)\theta (deg) P_k = \sin^2((2k+1)\theta)
0 0.253 14.48° 0.0625
1 0.758 43.43° 0.472
2 1.264 72.39° 0.908
3 1.769 101.34° 0.961
4 2.274 130.30° 0.580

At k = 3 the state is at angle 101.3°, slightly past the vertical. The probability of measuring |w\rangle is 0.961 — not exactly 1, but extremely close. At k = 4 the state has rotated past |w\rangle and is heading back toward |s'\rangle; the success probability has collapsed to 0.580.

Step 4. Run once and verify. Measure after k = 3. Get some outcome x. Evaluate f(x) with one classical oracle call. If f(x) = 1, you are done. If f(x) = 0, re-run. The expected number of re-runs is about 1/0.961 \approx 1.04 — essentially always one shot.

N=16 Grover trajectory through multiple iterationsA 2D plane with |s'⟩ horizontal and |w⟩ vertical. An arc marks the unit circle. Five arrows from the origin show the state at iterations k = 0, 1, 2, 3, 4 at angles 14.5°, 43.4°, 72.4°, 101.3°, 130.3° respectively. The arrow at k = 3 is highlighted in red as the optimal stopping point (closest to 90° but slightly over). The arrow at k = 4 is lighter, showing overshoot.N = 16: angles (2k+1)·14.48° for k = 0,1,2,3,4|s'⟩|w⟩k=0 (14.5°)k=1 (43.4°)k=2 (72.4°)k=3 (101.3°)k=4 (130.3°) — overshootthree iterations reach 101.3° — just past vertical — with P(w) ≈ 0.961a fourth iteration overshoots: the state rotates past |w⟩ and P drops to 0.58
Grover for $N = 16$. The state rotates by $2\theta \approx 28.96°$ per iteration. After $k = 0, 1, 2, 3, 4$ the angle above $|s'\rangle$ is $14.5°, 43.4°, 72.4°, 101.3°, 130.3°$. The closest approach to $|w\rangle$ (the vertical) is at $k = 3$ with probability $0.961$. At $k = 4$ the state has rotated past $|w\rangle$ — this is Grover's famous overshoot, and it is the geometric reason you have to stop at the right time.

Result. For N = 16, three iterations give \approx 96\% success probability — close to 1, but not exactly. The gap between 96\% and 100\% is the cost of N not being on the "magic" list where the rotation lands exactly on |w\rangle.

The big picture — why the rotation angle is 2/√N

Step back. What you have shown is that the rotation angle per Grover iteration is

2\theta = 2\arcsin\!\left(\frac{1}{\sqrt N}\right).

For large N, \arcsin(1/\sqrt N) \approx 1/\sqrt N, so 2\theta \approx 2/\sqrt N. Each iteration rotates the state by about 2/\sqrt N radians.

You need to rotate from near 0 to near \pi/2, a total of about \pi/2 radians. Number of iterations needed:

\frac{\pi/2}{2/\sqrt N} = \frac{\pi}{4}\sqrt N.

That is the \sqrt N count, and there is no physics in it — it is the ratio of two angles, \pi/2 and 2/\sqrt N. The only place N enters is through the initial angle \theta, and that in turn comes from |s\rangle's amplitude on |w\rangle being 1/\sqrt N, which comes from Hadamarding |0\rangle^{\otimes n}.

This is the honest answer to "why does Grover take \sqrt N queries": because the initial amplitude is 1/\sqrt N, the initial angle in the 2D plane is 1/\sqrt N, each reflection-pair doubles the amplitude for small rotations (specifically, \sin((2k+1)\theta) grows linearly in (2k+1) while k is small), and it takes O(\sqrt N) doublings to reach O(1). The quadratic speedup is built into the geometry of reflections in a 2D plane, where angles add (not amplitudes, not probabilities).

Common confusions

Going deeper

If you have the picture — a 2D plane, a rotation by 2\theta per iteration, \sqrt N iterations total — you have the thing. The rest of this section derives the reflection formulae rigorously, handles the case of multiple marked items, computes the exact overshoot formula, connects the geometric picture to amplitude amplification, and comments on the exact-one-iteration cases beyond N = 4.

Rigorous matrix derivation

Work in the orthonormal basis (|s'\rangle, |w\rangle) — horizontal axis first, vertical axis second. Every state you care about is a real linear combination (no complex amplitudes appear in Grover's trajectory, because the starting state is real and the operators O and D have real matrices in this basis).

Oracle O. Acts as +1 on |s'\rangle and -1 on |w\rangle:

O = \begin{pmatrix}1 & 0 \\ 0 & -1\end{pmatrix}.

This is standard reflection-about-the-x-axis; determinant -1, trace 0, eigenvalues +1 and -1.

Diffusion D. In the general plane, D = 2|s\rangle\langle s| - I is reflection about the |s\rangle axis. |s\rangle has coordinates (\cos\theta, \sin\theta) in the basis (|s'\rangle, |w\rangle). The standard 2D formula for reflection about the line through origin making angle \theta with the x-axis is

D = \begin{pmatrix}\cos 2\theta & \sin 2\theta \\ \sin 2\theta & -\cos 2\theta\end{pmatrix}.

Why this is the reflection matrix: a vector at angle \alpha reflects to a vector at angle 2\theta - \alpha. Write the input as (\cos\alpha, \sin\alpha) and the output as (\cos(2\theta - \alpha), \sin(2\theta - \alpha)). Expand using angle-subtraction formulae: \cos(2\theta - \alpha) = \cos 2\theta \cos\alpha + \sin 2\theta \sin\alpha, \sin(2\theta - \alpha) = \sin 2\theta \cos\alpha - \cos 2\theta \sin\alpha. Read off the matrix columns.

Product G = D \cdot O. Multiply:

G = \begin{pmatrix}\cos 2\theta & \sin 2\theta \\ \sin 2\theta & -\cos 2\theta\end{pmatrix}\begin{pmatrix}1 & 0 \\ 0 & -1\end{pmatrix} = \begin{pmatrix}\cos 2\theta & -\sin 2\theta \\ \sin 2\theta & \cos 2\theta\end{pmatrix}.

That is exactly the 2D rotation matrix R_{2\theta} (counter-clockwise, by angle 2\theta). So G^k = R_{2k\theta}, and applied to |s\rangle at angle \theta:

G^k|s\rangle = R_{2k\theta}\begin{pmatrix}\cos\theta \\ \sin\theta\end{pmatrix} = \begin{pmatrix}\cos((2k+1)\theta) \\ \sin((2k+1)\theta)\end{pmatrix}.

Re-expressing in the (|s'\rangle, |w\rangle) basis: G^k|s\rangle = \cos((2k+1)\theta)|s'\rangle + \sin((2k+1)\theta)|w\rangle. QED.

Outside the 2D plane spanned by (|s'\rangle, |w\rangle), the Grover iteration G acts as the identity — any component of the state in the orthogonal complement is untouched. So the entire evolution stays in the 2D plane, confirming the reduction.

Multiple marked items — the general formula

Suppose M \ge 1 basis states are marked, forming a set W \subseteq \{0,1\}^n with |W| = M. Redefine

|w_\text{avg}\rangle = \frac{1}{\sqrt M}\sum_{x \in W}|x\rangle, \qquad |s'\rangle = \frac{1}{\sqrt{N-M}}\sum_{x \notin W}|x\rangle.

These are again orthogonal unit vectors. The initial state decomposes as

|s\rangle = \sqrt{\frac{M}{N}}|w_\text{avg}\rangle + \sqrt{\frac{N-M}{N}}|s'\rangle.

Define \theta_M by \sin\theta_M = \sqrt{M/N}. Then |s\rangle = \sin\theta_M|w_\text{avg}\rangle + \cos\theta_M|s'\rangle, the oracle flips the sign on |w_\text{avg}\rangle (because it flips sign on every marked basis vector, and |w_\text{avg}\rangle is a superposition of them), and the same two-reflection argument gives G = R_{2\theta_M}.

The optimal iteration count is k_\text{opt} \approx \frac{\pi}{4\theta_M} \approx \frac{\pi}{4}\sqrt{N/M}. With more marked items the initial angle is larger, the per-iteration rotation is larger, and fewer iterations are needed. At M = N/2 (half the inputs marked), \theta_M = \pi/4 and k_\text{opt} = 0 — a single measurement of the initial state already returns a marked input with probability 1/2, and one iteration would overshoot. Grover tells you to do nothing, which is correct.

The number of marked items M must be known to choose k_\text{opt}. When it is unknown, the Boyer-Brassard-Høyer-Tapp adaptive scheme (see How Many Iterations) recovers O(\sqrt{N/M}) queries in expectation, even without prior knowledge of M.

Overshoot — the exact formula

After k iterations (not necessarily optimal), the success probability is P_k = \sin^2((2k+1)\theta). Beyond the optimal k_\text{opt}, the probability oscillates: rising to its first peak near k_\text{opt}, falling to near-zero at the next zero of \sin, rising again, and so on. The full period (in iterations) is \pi/\theta \approx \pi\sqrt N.

For N = 16 (the Example 2 case), the full period is about \pi/0.2527 \approx 12.43 iterations. The table of probabilities over a full cycle:

k (2k+1)\theta (deg) P_k
0 14.5° 0.06
3 101.3° 0.96
6 188.2° 0.02
9 275.2° 0.99
12 362.1° 0.00

The algorithm is periodic. The second optimal stopping point is k = 9, after which the state has rotated all the way around and is back near |w\rangle. Running too many iterations does not fail silently; it actively destroys the answer you already had.

The geometric picture generalises: amplitude amplification

The step from Grover to general amplitude amplification is a single substitution. Let A be any unitary such that A|0^n\rangle = \sqrt p|\text{good}\rangle + \sqrt{1-p}|\text{bad}\rangle, where |\text{good}\rangle and |\text{bad}\rangle are orthogonal unit vectors. Pick a phase oracle O_\text{good} that flips the sign on the good subspace, and replace the Grover diffusion by

D_A = 2A|0^n\rangle\langle 0^n|A^\dagger - I,

which is reflection about A|0^n\rangle. Then G_A = D_A \cdot O_\text{good} is again a rotation in the 2D plane spanned by |\text{good}\rangle and |\text{bad}\rangle, with angle 2\theta_p where \sin\theta_p = \sqrt p.

Optimal iteration count: k_\text{opt} \approx \frac{\pi}{4\sqrt p}. This is the quadratic amplification: it takes O(1/\sqrt p) iterations to boost the success probability from p to near 1, versus the classical O(1/p) repetitions. Grover is the special case A = H^{\otimes n} and p = 1/N.

The beauty of this generalisation is that the whole geometric picture transfers. The same 2D plane, the same two reflections, the same rotation by 2\theta_p. Once you understand Grover geometrically, you understand amplitude amplification — and amplitude amplification shows up inside almost every non-trivial quantum algorithm (quantum counting, mean estimation, quantum Monte Carlo, Shor's-adjacent algorithms for algebraic-number-theory tasks).

Exact one-iteration cases

N = 4 is the cleanest exact-success case, but it is not unique. Exact one-iteration success requires 3\theta = \pi/2, i.e., \theta = \pi/6, i.e., \sin\theta = 1/2, i.e., N = 4. For standard Grover with one marked item, that is the only N.

For multiple marked items, the same analysis gives exact-one-iteration success when \sin\theta_M = 1/2, i.e., M/N = 1/4. So: N = 16 with M = 4, or N = 32 with M = 8, or in general N = 4M for any M. Each of those configurations, run with exactly one Grover iteration, returns a marked state with probability exactly 1.

More exotic "exact Grover" constructions — where you design the oracle and diffusion to hit |w\rangle exactly for arbitrary N — exist (Brassard-Høyer-Mosca-Tapp, Long, and others). They work by slightly perturbing the rotation angle to make (2k+1)\theta land exactly at \pi/2 for integer k. The cost is a slightly more complex iteration operator; the benefit is deterministic success for any N. For most practical purposes, the 9699\% success probability of standard Grover is fine and not worth the complication.

Where this leads next

References

  1. Lov K. Grover, A fast quantum mechanical algorithm for database search (1996) — arXiv:quant-ph/9605043. The original paper.
  2. Wikipedia, Grover's algorithm — geometric proof of correctness.
  3. Nielsen and Chuang, Quantum Computation and Quantum Information, §6.1.2 — Cambridge University Press.
  4. John Preskill, Lecture Notes on Quantum Computation, Chapter 6 — theory.caltech.edu/~preskill/ph229. The clearest geometric exposition.
  5. Boyer, Brassard, Høyer, Tapp, Tight bounds on quantum searching (1998) — arXiv:quant-ph/9605034. The rigorous analysis including the multiple-marked case.
  6. IBM Qiskit Textbook, Grover's Algorithm — implementation with the geometric picture explicit.