In short
Factoring a 2048-bit RSA modulus on a fault-tolerant quantum computer takes, by the current best public estimate (Gidney-Ekerå 2021, arXiv:1905.09749), roughly 20 million physical qubits running for 8 hours. That number is built from four layers: \sim 4100 logical qubits for Shor's circuit, \sim 10^{10} T gates after windowed-arithmetic optimisation, a surface-code overhead of \sim 4000 physical qubits per logical qubit at code distance d \approx 25, and magic-state distillation factories that consume \sim 20 physical qubit-rounds per T gate. Today's leading hardware — IBM Condor at 1121 qubits, Google Willow at 105 qubits demonstrating below-threshold error correction, IonQ Tempo at 64 qubits — ships \sim 10^3 physical qubits without full error correction. The gap is four orders of magnitude in qubit count plus two orders of magnitude in error rate. Mosca's rule says: if your secret needs to stay secret for x years and migration takes y years and Q-day arrives in z years, you need x + y < z. With z \approx 15-25 years plausible, y \approx 5-10 for migration, any secret with x > 5 is already at risk today. India's NQM-aligned crypto-agility mandate (CERT-In 2024) targets 2030 for substantial migration — ahead of the quantum threat and in time for harvest-now-decrypt-later secrets.
Here is the most useful single number in all of quantum cryptanalysis. Twenty million physical qubits, running for eight hours, factors a 2048-bit RSA modulus. That is Gidney and Ekerå's 2021 estimate, and it is the one a Chief Information Security Officer at the State Bank of India, a policy analyst at MeitY, or a Member of Parliament asking about the Digital India crypto roadmap can actually hold on to. It replaces the academic shrug of "Shor's algorithm runs in polynomial time" with a concrete quantity you can compare against the best quantum hardware anyone has ever built — which, as of this chapter, is about a thousand physical qubits without error correction.
The gap between those two numbers — 2 \times 10^7 and 10^3 — is four orders of magnitude. That gap is where the entire engineering story of quantum cryptanalysis lives. Closing it is a research programme that spans superconducting fabrication, cryogenics, control electronics, surface-code error correction, and magic-state distillation. Every one of those has a scaling curve, and the product of those curves sets the date that RSA-2048 breaks operationally. The current consensus among specialists is that date lands somewhere between 2035 and 2050; a minority argue it is never.
This chapter builds the 20-million-qubit number from scratch — every layer, every factor, every conservative assumption. You will end up with a clear picture of what separates today's hardware from cryptanalytically-useful Shor, how much of that separation is a counting exercise versus a physics breakthrough, and what policy actions follow from the estimate even while the hardware does not yet exist. The India-specific migration calendar sits at the end, grounded in the same numbers.
The resource pyramid
The path from "polynomial-time algorithm" to "millions of physical qubits" is a tower of overheads, each multiplying the previous. The key diagram of the whole chapter:
The rest of this chapter walks down the pyramid, one layer at a time.
Layer 1: the logical-qubit count
Shor's algorithm for an n-bit modulus N needs three logical registers (see ch. 77):
- Ancilla register — t = 2n + 1 qubits, carrying the phase estimate.
- Data register — n qubits, holding the modular-multiplication target initialised to |1\rangle.
- Scratch register — additional qubits for the reversible implementation of modular exponentiation.
For n = 2048, the ancilla contributes 4097 qubits and the data contributes 2048. The scratch is where most of the design effort has gone. A naive reversible multiplication circuit needs \Omega(n^2) scratch qubits to undo intermediate computations; Beauregard's 2003 construction (arXiv:quant-ph/0205095) brings it down to 2n + 3; later refinements (Gidney 2019-2021) shave further with windowed arithmetic — the idea of performing w bits of multiplication in one shot using a classically-precomputed lookup table, at the cost of extra ancillas to index the table.
Gidney-Ekerå's final count, using windowed arithmetic with w \approx 5:
(Their exact count is 7853 in the paper's Table 2.)
Why three times n: each register is \sim n qubits. Ancilla at 2n, data at n, scratch at 2n+3 plus windowing tables — pruned carefully, they sum to \sim 3n with a slowly growing quadratic correction that windowing turns into an engineering knob.
Example 1: logical-qubit count for RSA-2048
Setup. RSA-2048 means N is a 2048-bit integer, so n = 2048.
Step 1. Ancilla (phase-estimation register). Size t = 2n + 1 = 4097. Why 2n+1: the phase estimation must resolve fractions with denominator up to N, so 2^{-t} \lesssim 1/N^2, giving t \ge 2\log_2 N. The "+1" is a small over-provisioning to push the success probability above 1/2.
Step 2. Data register. Size n = 2048. Holds |y\rangle, the state U_a acts on.
Step 3. Scratch for modular multiplication. Beauregard's reversible circuit uses 2n + 3 = 4099 scratch qubits during a single controlled-multiplication, released between gates. Not a separate permanent allocation — it is reused across the t controlled exponentiations. Why 2n+3 and not just n: reversible arithmetic cannot overwrite inputs. Every intermediate product must live on a fresh wire or be uncomputed with an inverse subroutine. Beauregard's specific design uses QFT-based adders with 2n+3 carrying qubits as the tight bound.
Step 4. Windowing lookup tables. With window size w = 5, the precomputed table of 2^w = 32 values of a^{kw} \bmod N costs \sim 32 \cdot n qubit-storage = \sim 65000 qubit-uses, but — because these are classical precomputations fed in as controlled-copy operations — they do not blow up the register count. The table contributes a roughly linear O(n) extra qubits for addressing, totalling a few hundred more.
Step 5. Sum. Ancilla + data + max(scratch, table overhead) = 4097 + 2048 + 4099 \approx 10244 peak logical qubits, of which about 7800 are allocated simultaneously at the algorithm's busiest moment. Gidney-Ekerå report 7853 as the peak. Round to \sim 8000 logical qubits.
Result. RSA-2048 requires \sim 8000 logical qubits — a useful number to carry around. For comparison, the corresponding RSA-1024 circuit needs \sim 4000; RSA-4096 needs \sim 16000.
What this shows. Eight thousand logical qubits is a lot — but it is not an unimaginable number. It is the kind of count that a near-future fault-tolerant machine might reach. The real wall is not the logical-qubit count; it is the physical-qubit cost per logical qubit, which the next layer multiplies in.
Layer 2: the T-gate count
Every logical gate in Shor's algorithm eventually has to be expressed in the Clifford + T gate set — the standard fault-tolerant universal set. Among these, the T gate (a \pi/4 Z-rotation) is overwhelmingly the expensive one, because it cannot be implemented "transversally" on the surface code and requires an off-line distillation process (Layer 4 below).
The T-gate budget therefore dominates the fault-tolerant cost. Counting Ts is how you measure the real difficulty of a quantum algorithm.
Where the T gates come from
Modular exponentiation is the source of almost all T gates in Shor:
- Controlled modular multiplication — O(n^2) Toffoli gates per multiplication (Toffolis implement reversible integer arithmetic).
- Controlled modular exponentiation — a ladder of t = 2n+1 controlled multiplications, giving O(n^3) Toffolis total.
- Toffoli decomposition — each Toffoli gate decomposes into 7 T gates plus Clifford gates in the standard circuit (Amy-Maslov-Mosca 2013).
So: \text{T gates} \sim 7 \cdot O(n^3). For n = 2048 without optimisation: \sim 6 \times 10^{10} T gates.
Windowed arithmetic (Gidney 2019-2021) pushes this down substantially. With window size w, you can perform w bits of multiplication in one lookup plus a controlled addition, saving a factor of w in T-gate count at the cost of some ancillas and a classical precomputation. Gidney-Ekerå optimise w end-to-end and report
Their Table 3 shows 2.7 \times 10^9 T gates for a full factoring run, achieved through layered windowing in both multiplication and exponentiation ladders.
Round to \sim 10^{10} T gates as the rule-of-thumb figure, because later fault-tolerance overheads add back factors the windowing saved.
Why the T count is what matters for fault tolerance: Clifford gates (Hadamard, CNOT, S, X, Z) have zero-cost transversal implementations on the surface code — you apply the gate independently to every physical qubit in the logical patch. T gates do not have transversal implementations (Eastin-Knill theorem); they need magic states, which must be distilled off-line. Distilling one magic state to the error rate Shor demands costs \sim 20 physical qubit-rounds, which is what pushes the physical-qubit pyramid into the millions.
Historical progression
| Year | Source | Logical qubits | T gates (n = 2048) | Physical qubits |
|---|---|---|---|---|
| 1994 | Shor | polynomial | polynomial | not estimated |
| 2002 | Beauregard | \sim 2n | \sim n^3 \approx 10^{10} | not estimated |
| 2005 | Van Meter, Itoh | \sim 5n | \sim 10^{10} | \sim 10^{10} |
| 2012 | Fowler, Martinis, Cleland | — | — | \sim 10^{9} |
| 2017 | Gheorghiu-Mosca | \sim 5n | \sim 10^{10} | 2 \times 10^{9} |
| 2019 | Gidney (windowed arithmetic) | \sim 3n | \sim 10^{9} | 5 \times 10^7 |
| 2021 | Gidney-Ekerå | \sim 8000 | \sim 3 \times 10^9 | \sim 2 \times 10^7 |
Each refinement has shaved an order of magnitude from the physical-qubit count over about seven years. If that trend continues (no guarantee), RSA-2048 could sit at \sim 10^6 physical qubits by 2028 and \sim 10^5 by 2035. The trend is partly algorithmic (windowed arithmetic, better distillation) and partly physics (surface codes with adjusted distances, alternative codes like hastings-haah codes). But the floor — information-theoretically, how few qubits could you possibly need? — is not yet established. There is no known lower bound excluding further improvements.
Layer 3: surface-code encoding
A physical qubit on today's best hardware has a gate error rate around 10^{-3} per two-qubit gate. Shor on RSA-2048 needs \sim 10^{10} operations executed without a single uncorrected error. The raw failure probability would be \sim 10^{10} \times 10^{-3} = 10^7 — guaranteed failure many times over.
The fix is quantum error correction. Encode each logical qubit in many physical qubits; detect errors via syndrome measurements; correct them before they accumulate. The leading code for superconducting and ion-trap hardware is the surface code, which encodes one logical qubit in a d \times d grid of physical qubits, where d is the code distance.
The code-distance formula
Below the code's threshold p_\text{th} \approx 1\%, the logical error rate falls exponentially in d:
For physical error rate p = 10^{-3} and threshold p_\text{th} = 10^{-2}, the ratio is 0.1. Need logical error rate p_L < 1/G where G = 10^{10} gates, so:
Gidney-Ekerå pick d = 27 to leave safety margin for syndrome-measurement errors and to handle the "worst case" gates that consume an extra code cycle of distance. The physical qubit overhead per logical qubit is then
With pipelining and lattice-surgery overhead they report an effective \sim 4000 physical qubits per logical qubit.
Why 2d^2 and not d^2: a surface-code patch needs both data qubits (the physical qubits encoding the logical state) and measure qubits (ancillas that read out syndromes). A standard planar layout interleaves them in a d \times d grid of data qubits plus a slightly smaller grid of (d-1) \times (d-1) measure qubits — \sim 2d^2 total. Routing patches between logical qubits adds further overhead.
Example 2: surface-code overhead at d = 25
Setup. Physical error rate p = 10^{-3}, logical target p_L < 10^{-10}, code distance d = 25.
Step 1. Data qubits in the patch. A distance-d surface code has d^2 = 625 data qubits arranged in a square. These carry the encoded logical information redundantly; a single physical error flips one data qubit but does not flip the logical qubit.
Step 2. Measure qubits for syndrome extraction. Between each pair of data qubits sits an ancilla used to measure the X- or Z-parity of the four surrounding data qubits. Layout: (d-1)(d-1) = 576 additional ancilla qubits. Why these extra qubits do the real work: data qubits alone cannot detect errors; you need to measure stabilisers (operators like X_1 X_2 X_3 X_4) that commute with the logical operators. Syndrome measurement requires ancillas to entangle with the data and be measured.
Step 3. Total per logical patch. Data + measure = 625 + 576 = 1201 physical qubits per logical qubit. Why the rounded number in the literature is 2d^2: d^2 + (d-1)^2 = 2d^2 - 2d + 1 \approx 2d^2 for large d. The exact count per patch is 1201 at d = 25, often quoted as "\sim 1251" in rougher approximations.
Step 4. Logical error rate check. With p/p_\text{th} = 0.1 and d = 25:
The algorithm has \sim 10^{10} logical-gate operations, so expected errors per run: 10^{10} \cdot 3 \times 10^{-13} = 3 \times 10^{-3}. A single run of Shor fails due to an uncorrected error with probability \sim 0.3\% — acceptable; the algorithm is self-verifying classically (just check p_1 \cdot p_2 = N and retry otherwise), so a 0.3\% failure just means running once more on average.
Step 5. Add routing overhead. The logical qubits must exchange information via lattice surgery — merging and splitting patches. Routing regions and buffer patches add roughly \sim 2\times overhead, bringing the effective per-logical cost to \sim 2400 physical qubits.
Step 6. Multiply by logical-qubit count. For Shor on RSA-2048 with \sim 8000 logical qubits: 8000 \times 2400 = 1.9 \times 10^7 physical qubits, just for the surface-code encoding layer. Magic-state distillation (Layer 4) adds further.
Result. At code distance d = 25 with p = 10^{-3}, each logical qubit costs \sim 2400 physical qubits including routing. Shor on RSA-2048 costs \sim 2 \times 10^7 physical qubits at this layer alone.
What this shows. The surface code is the largest multiplicative factor in the pyramid. A 2400\times overhead is the price of running 10^{10} operations on 10^{-3}-error-rate hardware. Drive the physical error rate down to 10^{-4} and the required distance drops to d \approx 13, with \sim 340 physical per logical — a \sim 7\times saving. Every factor of 10 improvement in physical error rate buys roughly a factor of 7 in physical-qubit count.
Layer 4: magic-state distillation
Surface codes implement Clifford gates transversally — cheaply, with no distillation overhead. But Shor's algorithm needs T gates (and their adjoint T^\dagger) to achieve universality, and the surface code cannot do T transversally. The workaround: consume a magic state |T\rangle = (|0\rangle + e^{i\pi/4}|1\rangle)/\sqrt 2 per T gate, where the magic state is produced in a dedicated subroutine.
Producing a magic state at sufficient quality is itself a fault-tolerant quantum computation: magic-state distillation. Noisy copies go in, cleaner copies come out. A single round of the standard Bravyi-Kitaev 15-to-1 protocol takes 15 noisy |T\rangle states and produces one cleaner |T\rangle with error suppressed from \epsilon to \approx 35 \epsilon^3. Multiple rounds can be cascaded.
For Shor on RSA-2048:
- Target T-gate count: \sim 10^{10}.
- Target logical T error rate: \sim 10^{-11} per T (so that the aggregate over 10^{10} Ts stays below 1).
- Required distillation rounds: typically 2–3, depending on input noise and protocol.
- Physical qubits per running distillation factory: \sim 10^5 at code distance d \approx 15 for the factory's internal logical qubits.
- T gates produced per factory per \mus: \sim 1 (limited by the distillation cycle time).
To feed the main algorithm's T-gate rate of \sim 10^5/s (over an 8-hour run, 10^{10} Ts / 3 \times 10^4 s), you need \sim 10 distillation factories running in parallel. Total physical-qubit cost for distillation: \sim 10^6 physical qubits.
Adding this to the Layer 3 surface-code encoding: 1.9 \times 10^7 + 10^6 \approx 2 \times 10^7 physical qubits.
Why distillation is its own layer: the main algorithm and the factories live on different parts of the same chip and communicate through lattice surgery. Optimising their scheduling — how many factories, what distillation protocol, how to route the distilled states to where they are needed — is a full optimisation problem in itself. Gidney-Ekerå's work on the 20-million-qubit estimate is largely an accounting of this scheduling, not of the abstract algorithm.
Layer 5: runtime and today's hardware
The wall-clock runtime of Shor on RSA-2048 follows from the T-gate count and the T-gate execution rate. Each T gate consumes one magic state, produced at \sim 10^5/s across all factories running in parallel. Total runtime:
With pipelining, buffering, and allowing parallelism within the algorithm, Gidney-Ekerå drive this down to \sim 8 hours.
Now the gap to today's hardware:
| Platform | Year | Physical qubits | 2-qubit gate error | Error correction? |
|---|---|---|---|---|
| Google Sycamore | 2019 | 54 | 5 \times 10^{-3} | Demonstration only |
| IBM Osprey | 2022 | 433 | 2 \times 10^{-3} | None in production |
| IonQ Aria | 2022 | 32 | 5 \times 10^{-4} | None |
| IBM Condor | 2024 | 1121 | \sim 2 \times 10^{-3} | None |
| Google Willow | 2024 | 105 | 3 \times 10^{-3} | Below-threshold demonstration at d = 3, 5, 7 |
| IonQ Tempo | 2025 | 64 | \sim 3 \times 10^{-4} | None in production |
| Quantinuum H2 | 2024 | 56 | \sim 2 \times 10^{-3} | Logical qubit demonstrations |
| Shor-capable (GE 2021) | ~2040 | \sim 2 \times 10^7 | < 10^{-3} | Full surface code, d = 25 |
Current hardware sits at \sim 10^3 physical qubits. The target is \sim 2 \times 10^7. Gap: 4 orders of magnitude.
When does RSA-2048 break?
The gap has been shrinking. In 2005 the physical-qubit estimate was \sim 10^{10}; in 2012, \sim 10^9; in 2019, \sim 5 \times 10^7; in 2021, \sim 2 \times 10^7. If these improvements continue at the same rate — and the physical-qubit counts on the hardware side continue doubling every 2-3 years — the lines cross somewhere in the mid-2030s.
Michele Mosca's rule, published in 2015 and updated in 2018, is the standard framing (arXiv:1512.03780):
If your data needs to stay secret for x years, and migration to quantum-safe cryptography takes y years, and Q-day arrives in z years, you need x + y < z.
Mosca's own estimates:
- z = 15 years (2040) with \sim 25\% probability
- z = 25 years (2050) with \sim 50\% probability
- z = 50 years (2075) with \sim 75\% probability
For realistic y = 5-10 years of migration time across large infrastructure, any secret with x > 5 years is already at risk under the z = 15 scenario. That is the policy logic behind harvest-now-decrypt-later being a present concern even though the quantum computer does not yet exist.
Hype check. "RSA is dead" is overstated; "RSA is safe for now" is also overstated. The accurate statement is: RSA-2048 is secure against every adversary possessing only classical computers, and will remain so for as long as GNFS is the best classical factoring algorithm — a state of affairs that has held for 30 years without challenge. RSA-2048 is not secure against a fault-tolerant quantum adversary, and such an adversary will likely exist sometime between 2035 and 2050. The gap between "classical safety" and "quantum threat" is large enough that governments, banks, and long-lived infrastructure providers must migrate, and small enough that the migration should be underway now.
What is definitely not true: today's quantum computers can break RSA. The largest honestly-factored integer on a real quantum computer is N = 21 (2024, with substantial compile-time assistance); RSA-2048 factoring is \sim 10^4 \times further in qubit count and \sim 10^7 \times further in gate count.
What is also not true: that symmetric crypto collapses. AES-256 remains quantum-secure at the 2^{128} post-Grover level, which is still astronomically hard. Post-quantum cryptography protects specifically the public-key algorithms — RSA, ECC, Diffie-Hellman — which Shor's algorithm breaks. Everything built on symmetric primitives (disk encryption, VPNs after key-exchange, PGP message bodies) keeps most of its security.
India's crypto-agility response
India's digital infrastructure is among the world's most cryptographically dense:
- Aadhaar — 1.4 billion enrolled identities, each with biometric templates signed by RSA-2048.
- UPI — 11+ billion transactions per month (2024), each carrying ECDSA or RSA signatures.
- DigiLocker — documents signed by the Controller of Certifying Authorities' PKI root, RSA-4096.
- e-Way bill system — GSTN-issued RSA certificates for every interstate freight movement.
- Banking — the RBI's 2021 Payments Vision 2025 document mandates cryptographic migration audits, and the 2024 Crypto-Agility Advisory extends this specifically to post-quantum.
The National Quantum Mission (launched April 2023, ₹6003 crore over 8 years) has four thematic hubs, one of which — at C-DOT and in partnership with IISc Bangalore — is responsible for post-quantum migration. CERT-In published the Crypto-Agility Framework in January 2024, recommending:
- All new deployments use hybrid schemes (classical + post-quantum) by end-2025.
- Critical national infrastructure (Aadhaar, UPI, DigiLocker) complete migration audits by end-2027.
- Substantial migration to post-quantum primitives by 2030 — ahead of the earliest plausible z in Mosca's rule.
| System | Current cryptography | Post-quantum migration target | Deadline |
|---|---|---|---|
| TLS (web / HTTPS) | ECDH + ECDSA | Kyber + Dilithium (hybrid) | 2027 |
| Aadhaar eSign | RSA-2048 + ECC-256 | Dilithium hybrid | 2028 |
| UPI signatures | ECDSA | Dilithium hybrid | 2029 |
| DigiLocker | RSA-4096 + ECC-384 | Dilithium + Falcon hybrid | 2029 |
| CCA PKI root | RSA-4096 | SPHINCS+ (offline) + Dilithium | 2030 |
| Long-lived secrets (MoD, intelligence) | Classified | Already migrated (per MoD 2024 statement) | 2024 |
Industrial enablement: TCS, Infosys, Wipro, and HCL run PQC practices offering migration tooling to enterprise clients. Open Quantum Safe (liboqs) integration is certified by NIC for government-contracted software. IIT Madras, IIT Bombay, IIT Kanpur, and IISc Bangalore offer graduate programmes in post-quantum cryptography, producing \sim 200 specialised cryptographers per year (projected to 500/year by 2028 under NQM).
The migration calendar is ahead of the threat calendar. For systems that follow the CERT-In roadmap, RSA-2048 breaking in 2040 will be a historical footnote rather than a crisis. For systems that do not, it will be a crisis. That is the honest policy situation.
Common confusions
-
"20 million qubits is forever unattainable." Almost certainly false. The gap has shrunk 3 orders of magnitude in 15 years. IBM's internal roadmap targets 100,000 physical qubits by 2033 (Kookaburra, Blue Jay) without error correction, and multi-chip interconnects targeting \sim 10^6 qubits. Sustained engineering rather than new physics is what is required.
-
"We just need more qubits." Wrong. More qubits without lower error rates buys nothing — the surface-code overhead explodes as (p/p_\text{th})^{-d/2}. A 10^5-qubit machine at p = 10^{-2} (at threshold) has zero useful logical qubits; the same machine at p = 10^{-4} has \sim 10^3 useful logical qubits. Both axes — count and quality — must improve.
-
"Logical qubit equals physical qubit." No. At d = 25 and p = 10^{-3}, a logical qubit costs \sim 2400 physical qubits. Media headlines conflating the two ("1000-qubit quantum computer") refer to physical qubits; cryptographic thresholds refer to logical qubits. The ratio is roughly 10^3-10^4 at realistic error rates.
-
"Breaking RSA is imminent." No. Imminent (today) means NISQ machines. Shor requires fault-tolerant machines with \sim 10^7 physical qubits, which no hardware roadmap projects before 2035. "Imminent" is off by at least a decade.
-
"Quantum will break AES-256 too." No. Grover's algorithm gives a quadratic speedup on unstructured search, effectively halving AES-256's key strength to 2^{128} post-Grover — still astronomically hard. AES-256 is considered quantum-safe at the 128-bit security level. The specifically threatened primitives are public-key algorithms (RSA, ECC, DH) that rely on algebraic structure Shor exploits.
-
"The 8-hour runtime is fast." Relatively. For comparison: a single RSA-2048 signature verification on a laptop takes \sim 1 millisecond. The quantum attack takes \sim 8 hours, or 3 \times 10^7 times longer than one honest verification. But a brute-force GNFS attack would take 10^{14} years on a laptop. "Brute-force GNFS" \to "8-hour Shor" is a 10^{17} speedup, which is the reason Shor matters at all.
Going deeper
The estimate above is the headline number. The "going deeper" material below covers Gidney-Ekerå's windowed arithmetic in more detail, alternative error-correction codes that might reduce the overhead, recent post-2021 refinements, and the hardware-specific roadmaps of IBM, Google, IonQ, Quantinuum, and PsiQuantum.
Gidney-Ekerå's windowed arithmetic
The key algorithmic innovation enabling the 2021 estimate is Gidney's windowed arithmetic (2019, arXiv:1905.07682). The idea: instead of performing modular multiplication bit-by-bit, process w bits at a time using a classically-precomputed lookup table.
Classical multiplication of a by b takes n steps (one per bit of b). Quantum multiplication inside Shor's controlled-U_a^{2^k} takes n reversible additions per multiplication. Windowed multiplication replaces this with \lceil n/w\rceil operations, each of which looks up a value a \cdot k \bmod N for k \in \{0, 1, \ldots, 2^w - 1\} in a table and adds it.
The lookup itself costs 2^w controlled additions inside the quantum circuit, so the net Toffoli count is \sim 2^w \cdot n / w — optimised at w \approx \log_2 n, giving \sim n^2 / \log n per multiplication rather than n^2. Over the full exponentiation ladder this saves a factor of \log n \approx 11 at n = 2048, contributing the order-of-magnitude improvement from the 2012 Fowler et al estimate to the 2021 Gidney-Ekerå estimate.
Alternative codes
The surface code is the leading choice but not the only one. Two families are under active research:
- Colour codes. Similar 2d^2 overhead but with natively transversal T gates (via "magic-state cultivation" variants), potentially saving some distillation overhead. Drawback: syndrome extraction is more complex.
- Bivariate bicycle codes (IBM 2024). A new class of low-density parity-check codes with \sim 0.2 physical per logical asymptotically, but requiring non-local connectivity. IBM's Heron and Flamingo roadmaps include these for mid-2030s deployment.
If bivariate bicycle codes mature, the \sim 2400 \times surface-code overhead could drop by 5-10\times, pushing RSA-2048 below 10^6 physical qubits. This is speculative but consistent with the rate of historical improvements.
Recent post-2021 refinements
Since Gidney-Ekerå's 2021 paper, further refinements have been published:
- Litinski 2023 (arXiv:2306.08585) — a "small ancilla" variant that reduces the distillation overhead by \sim 2\times.
- Chevignard-Fouque-Schrottenloher 2024 — an improved classical post-processing that reduces the required number of runs, saving \sim 30\% runtime.
- Hayes-Crabbé-Kliuchnikov-Roetteler 2024 — better compilation targeting specific hardware topologies.
None of these breaks the 20-million-qubit ceiling decisively; all of them could be combined for a 2-3\times reduction to \sim 10^7 physical qubits. That is still a factor of 10^4 above today's hardware.
Hardware roadmaps
Major hardware providers have published roadmaps (current as of early 2026):
- IBM — Heron (2024, 133 qubits, tunable couplers), Flamingo (2025, multi-chip), Kookaburra (2027, \sim 4000 qubits, modular), Blue Jay (2033, \sim 100000 qubits, multi-rack). Explicit target: fault-tolerant utility-scale quantum computing by 2033.
- Google — Willow (2024, 105 qubits with below-threshold error correction at d = 3, 5, 7), sustained milestone path to "scaled" machines by 2030.
- IonQ — Tempo (2025, 64 qubits), continued scaling of trapped-ion systems, targeting algorithmic qubit counts of \sim 1000 by 2028.
- Quantinuum — H-series trapped ions, 56-qubit H2 machine with logical qubit demonstrations at d = 3, 5, targeting fully fault-tolerant operation by 2030.
- PsiQuantum — photonic systems, distinct physics from superconducting/ion-trap, targeting fault-tolerant operation at \sim 10^6 qubit scale by 2029-2030 with their Omega manufacturing process.
None of these roadmaps promises Shor-capable hardware before 2035. Several target "utility-scale" fault tolerance (not necessarily Shor-capable) in the late 2020s or early 2030s. The cross-over from utility-scale to cryptanalysis-scale is the open question.
Why lower error rates matter more than more qubits
The surface-code overhead scales as (p/p_\text{th})^{-d/2}. For p = 10^{-3} (today's leading), d = 25 gives p_L \sim 10^{-12}, costing \sim 2400 physical/logical. If p drops to 10^{-4} (a ten-fold improvement), the same p_L is achieved at d = 13, costing \sim 400 physical/logical — a 6\times reduction.
In short: one order of magnitude in error rate = almost one order of magnitude in qubit savings. Hardware improvements on the error-rate axis are more valuable than equivalent improvements on the qubit-count axis.
Google Willow's 2024 below-threshold demonstration is important specifically because it validates that suppression works in practice — moving from "surface codes work in principle" to "surface codes work on real hardware at realistic physical error rates." It is the single most significant published result between 2021 and 2026 on the Shor-readiness path.
Where this leads next
- Shor's circuit end to end — the algorithm these resources are for.
- Post-quantum cryptography — the migration target: Kyber, Dilithium, Falcon, SPHINCS+.
- Discrete logarithm quantumly — Shor's companion algorithm that breaks Diffie-Hellman and ECC on similar resources.
- Modular exponentiation circuit — the dominant cost centre in the Shor resource estimate.
- Factoring and RSA — why RSA security reduces to integer factoring in the first place.
References
- Craig Gidney and Martin Ekerå, How to factor 2048 bit RSA integers in 8 hours using 20 million noisy qubits (2021) — arXiv:1905.09749. The current best concrete estimate.
- Austin Fowler, John Martinis, Andrew Cleland et al., Surface codes: Towards practical large-scale quantum computation (2012) — arXiv:1208.0928. The foundational resource paper on surface-code factoring estimates.
- Stephane Beauregard, Circuit for Shor's algorithm using 2n+3 qubits (2003) — arXiv:quant-ph/0205095. The space-optimal modular-exponentiation circuit.
- Peter Shor, Polynomial-time algorithms for prime factorization and discrete logarithms on a quantum computer (1994/1995) — arXiv:quant-ph/9508027. The founding paper.
- Michele Mosca, Cybersecurity in an era with quantum computers: will we be ready? (2015) — arXiv:1512.03780. The x + y < z policy framing.
- John Preskill, Lecture Notes on Quantum Computation, Chapter 7 — theory.caltech.edu/~preskill/ph229. Full derivation of Shor plus fault-tolerance treatment.