Phase Estimation Accuracy and Success Probability

Note: Company names, engineers, incidents, numbers, and scaling scenarios in this article are hypothetical — even when they resemble real ones. See the full disclaimer.

In short

QPE with t ancilla qubits produces a t-bit measurement outcome that, with probability at least 4/\pi^2 \approx 0.405, is the nearest t-bit integer to 2^t\varphi. The amplitude distribution has a sinc-like profile centred on the true phase, with tails that fall off as 1/\text{distance}. To guarantee precision \varepsilon (so that |\widetilde\varphi - \varphi| < \varepsilon) with failure probability at most \delta, use

t \;\ge\; \log_2(1/\varepsilon) + \log_2\!\Bigl(2 + \frac{1}{2\delta}\Bigr)

ancillas. The extra qubits beyond \log_2(1/\varepsilon) are "padding" that amplifies the correct outcome. Alternative boosting strategies: run QPE multiple times and take the median (majority vote), or wrap QPE in amplitude amplification for quadratic speedup. The trade-off: higher precision and confidence cost more U-applications and more qubits.

You have a unitary U, an eigenstate |u\rangle with eigenvalue e^{2\pi i\varphi}, and you want to learn \varphi. The previous chapter built the phase estimation circuit — t ancillas, a controlled-U^{2^k} ladder, an inverse QFT, and a measurement — and showed that when \varphi is exactly k/2^t, the measurement returns k deterministically.

That was the clean case. But eigenvalues of real unitaries rarely land on convenient binary fractions. The phase of T on |1\rangle is 1/8 — fine, exactly 3 bits. The phase of R_z(\pi/5) on |0\rangle is -1/20 — not a binary fraction at all, and \pi/5 never terminates in binary. What does QPE do then?

The answer has two parts. First, the measurement returns the nearest t-bit integer with high probability — and "high" is quantifiable. Second, the outputs that are not the nearest integer form a tail that decays as 1/\text{distance}, so even when QPE fails to land on the nearest integer, it usually lands close. The trade-off is controllable: add more ancillas, run more trials, or wrap in amplitude amplification — each boost has a specific cost in gate count and qubit count.

This chapter works out the distribution in detail, derives the canonical 4/\pi^2 success-probability bound, and then shows the three standard ways to drive the failure probability as low as you want. By the end you will know exactly how many ancillas and how many U-applications are required for any target accuracy — a resource-accounting question that sits at the centre of every real QPE deployment.

The setup and the question

Run QPE with t ancillas on an eigenstate with phase \varphi \in [0, 1). Let a be the nearest integer to 2^t \varphi (i.e. the "true answer" rounded to t bits) and let \delta\varphi = \varphi - a/2^t be the rounding error, with |\delta\varphi| \le 1/2^{t+1}.

The measurement outcome is some integer y \in \{0, 1, \ldots, 2^t - 1\} with probability P(y) that depends on the amplitudes of the post-inverse-QFT ancilla state. The question is:

What is P(a) — the probability of the correct rounding?
How does P(y) fall off for y \ne a?
How small can you make "P(\text{bad outcome})" by changing t?

The amplitude distribution — a sinc-like profile

Start from the ancilla state just before the inverse QFT. From the previous chapter, it is

|\Phi_\varphi\rangle \;=\; \frac{1}{\sqrt{2^t}}\sum_{x=0}^{2^t - 1} e^{2\pi i\, x\varphi}\,|x\rangle.

Apply \text{QFT}^\dagger. By the inverse QFT's definition, its matrix entry is (\text{QFT}^\dagger)_{yx} = \tfrac{1}{\sqrt{2^t}}\, e^{-2\pi i\, xy/2^t}. So the amplitude on |y\rangle after the inverse QFT is

\alpha_y \;=\; \langle y | \text{QFT}^\dagger | \Phi_\varphi\rangle \;=\; \frac{1}{2^t}\sum_{x=0}^{2^t - 1} e^{-2\pi i\, xy/2^t}\,e^{2\pi i\, x\varphi} \;=\; \frac{1}{2^t}\sum_{x=0}^{2^t - 1} e^{2\pi i\, x (\varphi - y/2^t)}.

Write \beta = \varphi - y/2^t. The sum becomes a geometric series with ratio r = e^{2\pi i \beta}:

\alpha_y \;=\; \frac{1}{2^t}\sum_{x=0}^{2^t - 1} r^x \;=\; \frac{1}{2^t}\cdot\frac{r^{2^t} - 1}{r - 1} \;=\; \frac{1}{2^t}\cdot\frac{e^{2\pi i\, 2^t \beta} - 1}{e^{2\pi i \beta} - 1}.

Why the geometric-series formula: \sum_{x=0}^{N-1} r^x = (r^N - 1)/(r - 1) whenever r \ne 1. This is the same identity that shows up in DFT orthogonality proofs; here it compresses a sum of 2^t phase-exponentials into a closed form involving just the endpoints.

Take the modulus squared to get the probability. Use the identity |e^{i\theta} - 1| = 2|\sin(\theta/2)|:

P(y) \;=\; |\alpha_y|^2 \;=\; \frac{1}{2^{2t}}\cdot\frac{4\sin^2(\pi\, 2^t \beta)}{4\sin^2(\pi \beta)} \;=\; \frac{\sin^2(\pi\, 2^t \beta)}{2^{2t}\,\sin^2(\pi \beta)}.

Here \beta = \varphi - y/2^t is the "distance" between the true phase and the fractional outcome.

Why this looks like a sinc function: for small arguments, \sin\theta \approx \theta, so the numerator behaves like (\pi\, 2^t \beta)^2 / (2^t)^2 = (\pi \beta)^2 and the denominator like (\pi \beta)^2 — their ratio is roughly 1 when \beta \ll 1/2^t, i.e. near the correct output. Far from the correct output, the numerator oscillates rapidly while the denominator grows, so P(y) decays — mimicking a \text{sinc}^2 function.

The measurement distribution of QPE on a phase $\varphi$ whose rounded value is $a = \text{round}(2^t\varphi)$. The dominant outcome $y = a$ occurs with probability at least $4/\pi^2 \approx 0.405$; outcomes at distance $d$ from $a$ fall off as $\sim 1/d^2$. Most of the remaining probability mass lies on the immediate neighbours $a \pm 1$.

The 4/π² lower bound on the correct outcome

Focus on the "correct" outcome y = a (the nearest integer to 2^t\varphi). For this outcome, \beta = \varphi - a/2^t = \delta\varphi, the rounding error, with |\delta\varphi| \le 1/2^{t+1}. Let \Delta = 2^t \delta\varphi \in [-1/2, 1/2].

Then

P(a) \;=\; \frac{\sin^2(\pi \Delta)}{2^{2t}\,\sin^2(\pi \Delta/2^t)}.

For t \ge 1, the denominator's argument \pi\Delta/2^t is small — at most \pi/2^{t+1}. For small arguments, \sin\theta \le \theta (with equality only at \theta = 0). This gives the bound \sin(\pi\Delta/2^t) \le \pi\Delta/2^t, so

P(a) \;\ge\; \frac{\sin^2(\pi \Delta)}{2^{2t}\cdot(\pi \Delta/2^t)^2} \;=\; \frac{\sin^2(\pi\Delta)}{\pi^2\Delta^2}.

\sin^2(\pi\Delta) \;\ge\; 4\Delta^2,

P(a) \;\ge\; \frac{4\Delta^2}{\pi^2 \Delta^2} \;=\; \frac{4}{\pi^2} \;\approx\; 0.405.

Why the bound is saturated at \Delta = \pm 1/2: that is the case where the phase \varphi sits exactly halfway between two adjacent t-bit fractions. The QPE output is split between the two nearest integers a and a \pm 1, each getting probability exactly 4/\pi^2 in the limit of large t, with the remaining 1 - 8/\pi^2 \approx 0.19 of probability spread over more distant outcomes. When \varphi is better-aligned with a t-bit grid point, P(a) is larger — at \Delta = 0, P(a) = 1 exactly.

So no matter what the true phase is, QPE returns the nearest t-bit rounding with probability at least 4/\pi^2 \approx 40.5\%. This is the canonical accuracy guarantee of QPE.

The decay of the tails

For outcomes y far from a, the probability decays as 1/|y - a|^2. To see this, set y = a + m for integer m and compute \beta = \varphi - y/2^t = \delta\varphi - m/2^t. For |m| \ge 1, the denominator \sin^2(\pi\beta) is at least \sin^2(\pi(|m|/2^t - 1/2^{t+1})) \approx (\pi(|m| - 1/2)/2^t)^2, and the numerator \sin^2(\pi 2^t \beta) = \sin^2(\pi(2^t\delta\varphi - m)) \le 1. So

P(a + m) \;\le\; \frac{1}{2^{2t}} \cdot \frac{1}{(\pi(|m| - 1/2)/2^t)^2} \;=\; \frac{1}{\pi^2(|m| - 1/2)^2}.

This tail decay is not exponential — it is inverse-square. The bulk of the probability lives near a, but a heavy-ish tail stretches out. This matters for the next question: how much do you have to pad t to guarantee tiny failure probability?

Resource accounting — how many ancillas for precision ε and confidence 1 − δ

You want an estimate \widetilde\varphi with |\widetilde\varphi - \varphi| < \varepsilon with probability at least 1 - \delta.

The idea: use t ancillas, where t is larger than the minimum needed just to represent \varepsilon in binary. The extra ancillas are "padding" that tighten the distribution.

Let t = n + p where n = \lceil\log_2(1/\varepsilon)\rceil is the minimum needed for precision \varepsilon, and p is the number of padding qubits. The QPE circuit runs with t ancillas; you keep only the top n bits of the outcome as your estimate. The question becomes: how large must p be so that the top-n-bit rounding is within \varepsilon of \varphi with probability \ge 1 - \delta?

The standard analysis (Nielsen & Chuang §5.2.1) goes as follows. The top-n-bit rounding is wrong by more than \varepsilon only if the full t-bit measurement outcome falls more than 2^p/2 units away from the correct t-bit rounding. Using the tail bound,

P(|y - a| \ge 2^{p-1}) \;\le\; \sum_{|m| \ge 2^{p-1}} \frac{1}{\pi^2(|m| - 1/2)^2} \;\le\; \frac{2}{\pi^2}\sum_{m=2^{p-1}}^{\infty}\frac{1}{(m-1/2)^2} \;\le\; \frac{2}{\pi^2}\cdot\frac{1}{2^{p-1} - 1}.

Why the sum-to-integral bound: the sum \sum_{m=M}^{\infty} 1/(m-1/2)^2 is bounded by the integral \int_{M-1}^{\infty} dx / (x-1/2)^2 = 1/(M-3/2). Simplifying and applying the factor-of-2 for the two-sided tail gives the displayed bound.

Setting this failure probability \le \delta and solving for p:

\frac{2}{\pi^2(2^{p-1} - 1)} \;\le\; \delta \;\Longleftrightarrow\; 2^{p-1} \;\ge\; 1 + \frac{2}{\pi^2\delta} \;\Longleftrightarrow\; p \;\ge\; 1 + \log_2\!\Bigl(1 + \frac{2}{\pi^2\delta}\Bigr).

Simplifying with a small constant shift gives the commonly-quoted form

t \;=\; n + p \;\ge\; \log_2(1/\varepsilon) + \log_2\!\Bigl(2 + \frac{1}{2\delta}\Bigr).

Why the second term looks "cheap": you only pay \log_2(1/\delta) extra qubits for each doubling of confidence. Getting from \delta = 1/2 to \delta = 10^{-6} costs about \log_2(10^6/2) \approx 20 extra qubits — completely manageable. The dominant cost scales with the desired precision \varepsilon, not with the confidence.

A concrete example: suppose you want 6-bit precision (\varepsilon = 2^{-6} \approx 0.016) with 99% success (\delta = 0.01). Then

t \;\ge\; 6 + \log_2(2 + 50) \;=\; 6 + \log_2(52) \;\approx\; 6 + 5.7 \;=\; 11.7 \;\Longrightarrow\; t = 12.

So 12 ancillas give you 6 bits of phase precision with 99% confidence. The "padding" of 6 extra qubits is the price of pushing the failure probability from 60% down to 1%.

Gate count in QPE versus the number of ancilla qubits $t$. Each additional bit of precision doubles the total number of $U$-applications. At $t = 20$ (phase precision $\approx 10^{-6}$), QPE demands about a million $U$-applications — and if $U$ itself is built from, say, $10^4$ gates, the total circuit is $10^{10}$ gates. This is why high-precision QPE remains out of reach of current NISQ hardware.

Boosting — three strategies compared

If adding more ancillas is not enough, or if you want even better guarantees, there are three canonical boosting techniques.

Strategy 1 — more ancillas (the logarithmic padding you just saw)

Idea. Scale t as n + O(\log(1/\delta)).

Cost. O(2^t) total U-applications, t ancilla qubits. Each extra qubit in t doubles U-applications.

When to use. The default. Cheap and transparent. Confidence scales as 2^{-p} per padding qubit; every extra qubit of padding halves the failure probability.

Strategy 2 — majority vote over multiple runs

Idea. Run QPE r times independently, each with the same t ancillas. Each run returns an estimate \widetilde\varphi_i. Compute the median (or a majority vote on each bit). For large r, the median concentrates around the true \varphi with high probability.

Cost. Total r \cdot 2^t U-applications. t ancilla qubits (reusable across runs). r parallel runs (or r sequential with reinitialisation).

How confidence scales. By a Hoeffding-like bound, r runs push the failure probability down to e^{-\Omega(r)} — exponentially fast in the number of runs. A few tens of runs achieve nine-nines confidence.

When to use. When ancilla qubits are expensive but circuit depth is forgiving (NISQ hardware). Majority voting is easier to implement than a longer coherent QPE — each run is shorter, reducing cumulative decoherence error.

Strategy 3 — amplitude amplification

Idea. Run QPE inside a Grover-like amplitude amplification. The amplitude on the "correct outcome" is at least 2/\pi \approx 0.64; amplitude amplification drives it arbitrarily close to 1 using O(1/\text{amp}) rounds.

Cost. O(1) rounds of amplitude amplification, each involving a full QPE forward-and-backward plus a reflection. Total U-applications is O(2^t), like standard QPE, but with a quadratic reduction in the number of repetitions compared to classical (multi-run) amplification.

When to use. In theoretical analyses and in algorithms where QPE is one subroutine inside a larger quantum routine. Amplitude amplification is a heavy hammer — it is worth using only when the cost of a single QPE run is already part of the budget.

The three strategies compose: for instance, run amplitude-amplified QPE with t ancillas and then take the median of r runs. Most practical proposals use some hybrid.

Worked examples

Example 1: the amplitude at y = a for a worst-case phase

Take a phase \varphi that lies exactly midway between two t-bit fractions — say \varphi = (a + 1/2)/2^t for some integer a. Then \Delta = 2^t\delta\varphi = 1/2, and from the amplitude formula

P(a) \;=\; \frac{\sin^2(\pi\cdot 1/2)}{2^{2t}\sin^2(\pi\cdot (1/2)/2^t)} \;=\; \frac{1}{2^{2t}\sin^2(\pi/2^{t+1})}.

Step 1. For large t, use \sin\theta \approx \theta with \theta = \pi/2^{t+1}:

\sin^2(\pi/2^{t+1}) \;\approx\; (\pi/2^{t+1})^2 \;=\; \pi^2/2^{2t+2}.

Step 2. Substitute:

P(a) \;\approx\; \frac{1}{2^{2t}\cdot\pi^2/2^{2t+2}} \;=\; \frac{2^{2t+2}}{2^{2t}\pi^2} \;=\; \frac{4}{\pi^2} \;\approx\; 0.405.

Why the approximation is tight here: for t \ge 3, the small-angle approximation \sin\theta \approx \theta is accurate to better than 2\% at \theta = \pi/2^{t+1}. So the bound P(a) \ge 4/\pi^2 is essentially saturated at this phase — it is the worst-case phase for QPE.

Step 3. By symmetry, P(a + 1) equals P(a) at this worst-case phase (the distribution is symmetric about the midpoint between a and a+1). So P(a) + P(a+1) = 8/\pi^2 \approx 0.81: more than 80\% of probability mass lies on the two nearest integers, a and a+1, with the remaining \approx 19\% spread over farther outcomes.

Result. At the worst-case phase \varphi = (a+1/2)/2^t, QPE returns a with probability \approx 4/\pi^2, returns a+1 with the same probability, and either way produces an estimate within 1/2^t of the true phase with probability \approx 81\%.

At the worst-case phase midway between two $t$-bit fractions, QPE splits its probability roughly evenly between the two nearest integers $a$ and $a+1$, each at $\approx 4/\pi^2$. The two nearest outcomes together carry $\approx 81\%$ of total probability; the remaining tail spreads out with $1/\text{distance}^2$ decay.

What this shows. The 4/\pi^2 bound is tight: at the worst-case phase, you cannot do better with standard QPE alone. The remaining \approx 60\% is not "lost" — it simply lands on the two nearest outcomes, either of which is within 1/2^t of the true phase. If your application only needs precision 1/2^t (not exact identification of the nearest integer), the effective success probability is \approx 81\% even at the worst case. This milder criterion is what matters for most applications.

Example 2: how many ancillas for 6-bit precision with 99% success

Target precision: \varepsilon = 2^{-6} \approx 0.016 (6 bits after the binary point). Target confidence: 1 - \delta = 99\%, so \delta = 0.01.

Step 1. Compute the minimum precision ancillas:

n \;=\; \lceil\log_2(1/\varepsilon)\rceil \;=\; \lceil\log_2(64)\rceil \;=\; 6.

Why: \varepsilon = 2^{-6} requires representing the phase to the nearest 1/64 — which is 6 bits of precision, hence n = 6 ancillas just to name that precision.

Step 2. Compute the padding:

p \;\ge\; 1 + \log_2\!\Bigl(1 + \frac{2}{\pi^2 \cdot 0.01}\Bigr) \;=\; 1 + \log_2(1 + 20.26) \;=\; 1 + \log_2(21.26) \;\approx\; 1 + 4.41 \;=\; 5.41.

Round up: p = 6. Why padding: extra qubits make the QPE distribution tighter, pushing the probability that the outcome falls outside the precision window below \delta. Each extra padding qubit roughly halves the failure probability.

Step 3. Total ancillas:

t \;=\; n + p \;=\; 6 + 6 \;=\; 12.

Step 4. Sanity-check with the simplified formula

t \;\ge\; \log_2(1/\varepsilon) + \log_2(2 + 1/(2\delta)) \;=\; 6 + \log_2(2 + 50) \;=\; 6 + \log_2(52) \;\approx\; 6 + 5.70 \;=\; 11.70.

Round up: t = 12. Consistent.

Step 5. Compute gate cost: 2^t - 1 = 2^{12} - 1 = 4095 applications of U. (If U itself is built from g elementary gates, total circuit gates \approx 4095 g.)

Result. 12 ancillas, 4095 controlled-U applications total, plus an O(t^2) = 144-gate inverse QFT. For a U that is, say, 1000 elementary gates, the full circuit is about 4.1 \times 10^6 gates. That is at the edge of what a high-fidelity NISQ machine can do without error correction.

Ancilla count grows linearly in the precision bits and logarithmically in the confidence. Each extra bit of precision roughly doubles the $U$-gate count; each extra 9 of confidence adds about $3$–$4$ ancillas. For 20-bit precision with 99% confidence, you need 26 ancillas and about $10^8$ $U$-applications — a cryptographically-relevant scale.

What this shows. Scaling QPE to useful precision is expensive. 6 bits of phase is only enough to distinguish about 64 distinct phase buckets — borderline for detailed chemistry or for reading off the order r in Shor's algorithm for larger integers. 20 bits pushes into 10^6 resolution, which is what Shor's algorithm for a 2048-bit RSA modulus actually needs. The difference between those two regimes — 4095 U-applications versus 10^8 U-applications — is why QPE is the heavy subroutine that makes fault-tolerant hardware a prerequisite for real cryptographic attacks.

Common confusions

"QPE always succeeds." It does not. QPE returns the nearest t-bit rounding with probability at least 4/\pi^2 \approx 40\%, not 100\%. The full success-probability analysis depends on how close \varphi is to the t-bit grid — deterministic when \varphi lands on a grid point, worst-case 40\% when it lands halfway between grid points.
"Adding more ancillas is cheap." It is cheap in qubits but expensive in gates. Each extra ancilla doubles the number of U-applications, because the 2^k in "controlled-U^{2^k}" doubles. Precision scaling is qubit-linear but gate-exponential. This is the main reason high-precision QPE remains out of reach of current NISQ hardware — you run out of coherence long before you finish the last few U^{2^k} gates.
"Boosting is free." All three boosting strategies cost something. More ancillas cost more U-applications. Majority voting costs more runs. Amplitude amplification costs more rounds. No strategy escapes the basic resource accounting: to boost from 40% to 99%, you pay either a logarithmic factor of extra qubits, a linear factor of extra runs, or a constant factor of extra amplification rounds.
"If φ is exactly representable in t bits, QPE fails sometimes." No — it is deterministic in that case. The formula P(a) = \sin^2(\pi\Delta)/[2^{2t}\sin^2(\pi\Delta/2^t)] at \Delta = 0 gives \sin^2(0)/\sin^2(0) = 1 in the limit (use L'Hôpital's rule or just recognise P(a) = 1/2^{2t}\cdot (2^t)^2 = 1). All other outcomes have amplitude exactly 0. Exact representability \Leftrightarrow deterministic QPE.
"The output is always a clean integer." Yes, the raw measurement is an integer y \in \{0, 1, \ldots, 2^t - 1\}. The interpretation \widetilde\varphi = y/2^t is a binary fraction. If you want an arbitrary-precision floating-point estimate of \varphi, you read y, divide by 2^t, and cast to double precision — no quantum nuance there.

Going deeper

If you are here to know how accurate QPE is and how to size a circuit for a target precision — you have it. The key numbers are 4/\pi^2 for raw success, \log_2(1/\delta) padding qubits for confidence scaling, and 2^t - 1 total U-applications. What follows: the full sinc distribution analysis, Hoeffding bounds for majority voting, amplitude amplification as a quadratic boost, iterative QPE tradeoffs, Heisenberg-limited phase estimation, and the applications that push each strategy to its limit.

The full probability distribution via the sinc profile

The closed-form expression

P(y) \;=\; \frac{\sin^2(\pi\, 2^t \beta)}{2^{2t}\sin^2(\pi\beta)}, \qquad \beta = \varphi - y/2^t

can be recognised as the square of a Dirichlet kernel — the analogue of the \text{sinc}^2 function in discrete signal processing. For small \beta, this is well-approximated by (\sin(\pi 2^t\beta)/(\pi 2^t\beta))^2 = \text{sinc}^2(2^t\beta).

Two useful corollaries:

(i) Main-lobe width. The main lobe of \text{sinc}^2(2^t\beta) has half-width 1/2^t in \beta-units, so in y-units the main lobe spans \pm 1 around the correct integer a. Most probability lies on \{a-1, a, a+1\}.

(ii) Aliasing. Because the denominator \sin^2(\pi\beta) is periodic with period 1, the distribution wraps around: outcomes at y = a + 2^t - 1 are "adjacent" to a under the cyclic identification. For typical \varphi well inside the grid this wrapping is negligible, but for \varphi near 0 or 1 it creates the "zero crossing" ambiguity of QPE — something to be aware of in Shor's algorithm implementations.

Majority voting — Hoeffding-style analysis

Suppose QPE succeeds (returns within \varepsilon of \varphi) with probability p \ge 4/\pi^2. Run r independent QPE instances and take the median of the outputs. By a Hoeffding bound, the probability that more than half of the outputs are "failure" outcomes (outside the \varepsilon window) decays as \exp(-2r(p - 1/2)^2). For p = 4/\pi^2 \approx 0.4, p - 1/2 = -0.1, which fails the Hoeffding assumption since p < 1/2.

The fix: make each run already boost p above 1/2 using a bit of padding, then majority-vote on top. For instance, with 1 padding qubit per run, each run succeeds with p \ge 0.8. Then 2r(p - 1/2)^2 = 0.18 r, so r = 30 runs reduce the failure probability to e^{-5.4} \approx 0.005. Combined: 12 ancilla qubits for precision 2^{-6} + one padding = 13 qubits per run, times 30 runs.

This is usually better than pure padding for NISQ: it keeps individual runs short (less decoherence) and trades qubit count for wall-clock time.

Amplitude amplification as a quadratic boost

Grover's amplitude amplification takes a unitary A and a "marking" operation O and produces a state where the O-marked amplitude is amplified. Applied to QPE:

Let A = QPE circuit (forward only, no measurement).
Let O = oracle that marks "correct" outcomes (those within \varepsilon of 2^t\varphi).
The amplitude on marked states after A is \sqrt{p} \ge 2/\pi.
Amplitude amplification takes this to \sim 1 in O(1/\sqrt{p}) rounds, each consisting of A, O, A^\dagger, reflection about |0\rangle, A.

Total cost: O(1/\sqrt p) \cdot \text{QPE-cost} = O(2^t) U-applications — a quadratic improvement over the O(r \cdot 2^t) = O(\log(1/\delta) \cdot 2^t) cost of majority voting. For very stringent confidence (say \delta = 2^{-40}), amplification beats voting.

This is the mechanism behind quantum counting: combine amplitude amplification with QPE on the Grover operator itself to directly output \theta (where \sin^2\theta = number of marked items / total), with full amplification giving quadratic speedup over classical sampling.

Heisenberg-limited phase estimation

The best-possible scaling for estimating \varphi using N total U-applications is Heisenberg-limited: error scales as 1/N, compared to the standard quantum limit of 1/\sqrt N that classical or non-adaptive sampling achieves. QPE saturates the Heisenberg limit (error \sim 1/2^t with 2^t applications), which is why it is the asymptotically-optimal algorithm for eigenvalue estimation.

Giovannetti, Lloyd, and Maccone (2006) [arXiv:quant-ph/0412078] formalised the Heisenberg limit for general quantum metrology; Higgins et al. (2007) implemented it experimentally using Bayesian adaptive protocols. The adaptive variant sometimes called Heisenberg-limited phase estimation achieves 1/N scaling with fewer qubits than vanilla QPE but more classical post-processing.

Iterative QPE — the NISQ-friendly variant

Iterative QPE extracts one bit of \varphi at a time using a single ancilla, with classical feedforward of previously-extracted bits as phase corrections on the remaining circuits. Total qubit count: 1 ancilla + target register. Total U-applications: still 2^t - 1, same as vanilla QPE. Cost structure: depth is still O(2^t) per full estimation, but the circuits are shorter per iteration, which reduces coherent error accumulation at the cost of classical overhead.

Dobšíček et al. (2007) and Svore et al. (2013) are standard references for the NISQ version. On current Compustar Quantum hardware, iterative QPE is typically the method of choice for demonstrations beyond trivial precisions.

Applications of high-precision QPE

Quantum chemistry. Ground-state energy of benchmark molecules (H₂, LiH, BeH₂) at chemical accuracy (\sim 1.6 \times 10^{-3} Hartree) needs phase precision \sim 10^{-4}, i.e. t = 14–16 qubits of QPE. Full fault-tolerant implementations for industrial chemistry (FeMoco, the nitrogenase catalyst) estimate t = 30–40, requiring 10^9–10^{12} gate operations. See Reiher et al. (2017) for the canonical resource estimate.
Shor's factoring. For a 2048-bit RSA modulus N, the order r of a random base a mod N can be up to 2^{2048}. QPE on the modular-multiplication unitary needs precision \sim 2^{-4096} to distinguish orders reliably — so t = 4096 ancilla qubits, and 2^{4096} U-applications, which is infeasible unless the modular-multiplication circuit is itself polynomial-size. It is (O(n^3)-gate). So total gate count is \sim n^3 \cdot 2^n, still astronomical. Careful constant optimisation, combined with Coppersmith's approximate QFT (next chapter), brings this down to \sim 10^{12} gates for 2048-bit factoring — infeasible on NISQ, feasible with a fault-tolerant machine of \sim 10^7 physical qubits.
Very-long-period order finding. Phase estimation with t = 80 can distinguish 10^{24} distinct phases — enough for cryptographic-scale period-finding (elliptic-curve discrete log of a 256-bit curve). The gate cost is \sim 2^{80} U-applications, which is again why fault-tolerant hardware is a prerequisite.

Indian context — resource analyses at IIT and IIIT

Resource-accounting analyses for Shor's algorithm and chemistry-scale QPE are an active research area in Indian quantum groups. Work at IIT Madras and IIIT Hyderabad on fault-tolerant compilation of QPE circuits, and at TIFR on approximate QFT implementations, contributes to the global effort to push the \varepsilon-\delta trade-off on real hardware. Industrial partners like C-DAC (Centre for Development of Advanced Computing) are building benchmarks for Indian-made quantum processors that include QPE as a standard reference algorithm.

Where this leads next

Quantum phase estimation — the circuit itself, which this chapter's accuracy analysis quantifies.
Applications of phase estimation — Shor, chemistry, HHL, amplitude amplification as QPE use cases.
Shor's algorithm — the canonical QPE application with careful precision analysis.
Amplitude amplification — Grover-style boosting, the third accuracy-improvement strategy.

References

Alexei Kitaev, Quantum measurements and the Abelian stabilizer problem (1995) — arXiv:quant-ph/9511026. Original phase estimation, bit-by-bit construction.
Nielsen and Chuang, Quantum Computation and Quantum Information (2010), §5.2.1 — the canonical accuracy analysis — Cambridge University Press.
Wikipedia, Quantum phase estimation algorithm.
Vittorio Giovannetti, Seth Lloyd, Lorenzo Maccone, Quantum Metrology (2006) — arXiv:quant-ph/0412078. Heisenberg-limited estimation.
Miroslav Dobšíček et al., Arbitrary accuracy iterative quantum phase estimation algorithm using a single ancillary qubit (2007) — arXiv:quant-ph/0610214. NISQ-friendly iterative QPE.
Qiskit Textbook, Quantum Phase Estimation — Accuracy Analysis — hands-on implementation and accuracy discussion.