In short

Shor's algorithm for factoring a composite N in polynomial time on a quantum computer:

  1. Pick a random a \in \{2, 3, \ldots, N-1\}. Compute \gcd(a, N). If it is > 1, you already have a factor — return it.
  2. Quantum step (order finding, ch. 76): Use phase estimation on U_a|y\rangle = |ay \bmod N\rangle with target |1\rangle. Measure to get \widetilde\varphi \approx s/r. Apply continued fractions to recover the order r of a.
  3. Classical reduction (ch. 75): If r is odd or a^{r/2} \equiv -1 \pmod N, retry with a new a. Otherwise compute \gcd(a^{r/2} - 1, N) and \gcd(a^{r/2} + 1, N). One of these is a non-trivial factor of N.
  4. Recurse on the factors to reach prime factorisation.

The quantum circuit uses t = 2n + 1 ancilla qubits, n = \lceil\log_2 N\rceil data qubits, O(n^3) gates dominated by modular exponentiation, and one inverse QFT. For 2048-bit RSA, fault-tolerant resource estimates (Gidney-Ekerå 2021) require roughly 2 \times 10^7 physical qubits and \sim 8 hours of runtime under surface-code error correction. Today's machines ship \sim 10^3 physical qubits. Useful Shor is years away — but the threat is real enough that India's National Quantum Mission funds a post-quantum-cryptography migration working group, and "harvest now, decrypt later" is the actionable concern for any long-lived secret transmitted today.

Here is the algorithm that makes RSA, elliptic-curve DSA, and most of today's internet cryptography temporary: Shor's algorithm. You have already built the pieces across chapters 70 through 76. In this chapter the pieces are wired together end-to-end, tested on the toy case N = 15, and scaled up to the real question — what would it take to run this on a 2048-bit RSA modulus, and how far is the current hardware from that goal?

Two things you will get by the end. First, a complete walk-through of Shor's algorithm as one pipeline, with every wire labelled and every step cross-referenced back to the chapter that derived it. This is the chapter to remember Shor's algorithm by — the one that answers "what does the full thing look like." Second, honest resource estimates: the number of qubits, the depth of the circuit, the runtime, and the overhead of error correction, all in the form a policy maker at CERT-In or MeitY can actually use. The hype is enormous. The engineering gap is also enormous. Both are real, and this chapter holds both in view.

The algorithm in one picture

Here is the entire Shor pipeline as one flow. Classical steps in ink, quantum steps in accent. The classical wrapper is most of the algorithm — the quantum computer is invoked only for the order-finding subroutine in the middle.

The full Shor's algorithm pipelineA vertical flow diagram with five boxes. Top: input N. Next: pick random a, compute gcd of a and N; if gcd exceeds one, output factor. Next a decision box: is gcd equal to one. If yes, proceed. The middle box, highlighted in accent, contains the quantum order finding subroutine with sub-steps Hadamard ancillas, controlled U_a powers, inverse QFT, measure, continued fractions to recover r. Next box: check r is even and a to the r over two is not congruent to minus one mod N; if not, retry. Final box: compute gcd of a to the r over two plus one with N and gcd of a to the r over two minus one with N and output factors.Shor's algorithm — end-to-end pipelineInput: composite NPick random a ∈ {2, …, N−1}if gcd(a, N) ≠ 1, return that gcd as a factorQuantum order finding (ch. 76)1. H⊗t on t=2n+1 ancillas; target register holds |1⟩2. Controlled U_a^(2^k) ladder — modular exponentiation3. Inverse QFT on ancillas4. Measure ancillas → φ̃ = y/2^t ≈ s/r5. Classical: continued fractions → rCheck r even and a^(r/2) ≢ ±1 (mod N)if not, retry with new aReturn gcd(a^(r/2) − 1, N) and gcd(a^(r/2) + 1, N)
Shor's factoring algorithm is a short classical loop wrapped around a single expensive quantum subroutine. The quantum computer computes one specific thing — the order $r$ of a random $a$ modulo $N$ — and everything else (picking $a$, computing gcds, checking conditions, recursing) is ordinary arithmetic. This is the shape of every quantum advantage known: a big classical algorithm with a small, well-placed quantum kernel.

The five steps — one by one

Step 1 — Pick a random a and check \gcd(a, N)

Choose a uniformly at random from \{2, 3, \ldots, N-1\}. Compute \gcd(a, N) classically (Euclid's algorithm runs in O(\log^2 N) time). Two outcomes:

Step 2 — Quantum order finding

Now you invoke the subroutine from ch. 76. The unitary is U_a|y\rangle = |ay \bmod N\rangle, acting on an n-qubit register where n = \lceil \log_2 N\rceil. Its eigenstates |u_s\rangle have eigenvalues e^{2\pi i s/r}, and the state |1\rangle is a uniform superposition of all r of them.

The quantum circuit has three registers and five sub-steps:

  1. t = 2n + 1 ancilla qubits in |0\rangle^{\otimes t}. The ancillas will hold the phase estimate.
  2. n data qubits prepared in |1\rangle. This is the target for the controlled-U_a^{2^k} gates.
  3. Apply H^{\otimes t} to the ancillas — standard phase-estimation opener.
  4. Apply the ladder of controlled-U_a^{2^k} gates from ancilla k to the data register, for k = 0, 1, \ldots, t-1. This is the modular-exponentiation step — see the discussion of cost below.
  5. Apply the inverse QFT to the ancillas. Measure. The outcome y \in \{0, 1, \ldots, 2^t - 1\} yields the phase estimate \widetilde\varphi = y/2^t.
  6. Classical post-processing: apply continued fractions to \widetilde\varphi. The convergent with denominator r' \le N satisfying a^{r'} \equiv 1 \pmod N is the order r (or a divisor of it).
The quantum order-finding circuit with labelled registersA quantum circuit diagram with two register groups. Top: t ancilla qubits all starting in ket zero, each with a Hadamard gate, then connected to controlled gates labelled U_a to the power two to the k for k from zero to t minus one. After the controlled gates they all enter a large box labelled inverse QFT and exit to measurement meter symbols outputting the bit string y. Bottom: n data qubits initialised to ket one, used as the target for the controlled-U_a powers, and left untouched at the end.Quantum order-finding circuit (inside Shor)|0⟩|0⟩|0⟩|0⟩|1⟩t = 2n+1 ancillasn data qubitsHHHHU_aU_a²U_a⁴U_a^(2^(t−1))controlled modular exponentiation (the expensive part)QFT⁻¹y→ φ̃ = y/2^t
The quantum order-finding circuit. The ancilla register holds the phase estimate; the data register holds the catalytic target state $|1\rangle$ throughout. Each controlled-$U_a^{2^k}$ is a reversible implementation of multiplication by $a^{2^k} \bmod N$, the **modular-exponentiation** subcircuit — the largest and most expensive block in the whole algorithm.

Step 3 — Check the conditions on r

From ch. 75, two conditions must hold for order r to yield a factor:

Both conditions fail with bounded probability for any random a coprime to N. The standard theorem (Miller 1976, Ekert-Jozsa 1996): if N has at least two distinct odd prime factors, a random a succeeds with probability \ge 1/2. For RSA moduli N = pq, the success probability is \ge 1 - 2^{-(k-1)} where k is the number of distinct odd prime factors (so \ge 1/2 for k = 2). In practice, you rarely need more than three or four tries.

Step 4 — Extract the factor

With r even and a^{r/2} \not\equiv -1, compute:

x \;=\; a^{r/2} \bmod N,

then

p_1 \;=\; \gcd(x - 1, N), \qquad p_2 \;=\; \gcd(x + 1, N).

At least one of p_1, p_2 is a non-trivial factor of N. The argument (repeated from ch. 75): a^r - 1 \equiv 0 \pmod N, so (x - 1)(x + 1) \equiv 0 \pmod N, so N divides (x-1)(x+1). If N = pq, then p must divide one of x \pm 1 (but not both, because neither x - 1 nor x + 1 is a multiple of N when x \not\equiv \pm 1). So \gcd(x - 1, N) or \gcd(x + 1, N) picks out the p. Euclid's algorithm computes each gcd in O(\log^2 N) time.

Step 5 — Recurse

If p_1 and p_2 multiply to N and both are prime, you are done. Otherwise, apply the whole algorithm recursively to each factor until you reach primes. For N = pq (RSA), no recursion is needed — you get p and q directly.

Worked example: factor N = 15

Walk through the complete algorithm for N = 15. Every step is arithmetic you can verify by hand.

Example 1: $N = 15$ from start to finish

Step 1. Pick a = 7. Compute \gcd(7, 15) using Euclid:

\gcd(15, 7): \;15 = 2\cdot 7 + 1, \;7 = 7\cdot 1 + 0. \;\gcd = 1.

Why Euclid: it replaces the pair (m, n) with (n, m \bmod n) until one is zero. Each step shrinks the larger number by at least half, so it runs in O(\log N) steps. Proceed.

Step 2. Run quantum order finding on U_7|y\rangle = |7y \bmod 15\rangle. From ch. 76, the orbit of 1 is 1 \to 7 \to 4 \to 13 \to 1 (four elements), so r = 4. The phase estimation returns \widetilde\varphi \approx s/4 for random s \in \{0, 1, 2, 3\}. Suppose the measurement, with t = 8 ancillas, returns y = 192:

\widetilde\varphi = 192/256 = 3/4.

Why y = 192 is plausible: if s = 3, the phase is exactly 3/4, and a large t gives y/2^t close to 3/4; 192/256 = 3/4 is the nearest 8-bit fraction. Continued fractions on 3/4: the final convergent is 3/4 itself, denominator r' = 4. Verify: 7^4 \bmod 15 = 2401 \bmod 15 = 1. So r = 4.

Step 3. Check conditions. r = 4 is even — good. Compute 7^{r/2} = 7^2 = 49 \bmod 15 = 4. Is 4 \equiv -1 \pmod{15}? No (-1 \equiv 14). Conditions pass. Why both conditions matter: if r were odd, r/2 would not be an integer; if 7^{r/2} \equiv -1, then 7^{r/2} + 1 = 0 \bmod N and \gcd(0, N) = N — useless.

Step 4. Extract factors. With x = 4:

p_1 = \gcd(x - 1, N) = \gcd(3, 15) = 3,
p_2 = \gcd(x + 1, N) = \gcd(5, 15) = 5.

Both gcds are non-trivial factors.

Step 5. Verify the factorisation: 3 \times 5 = 15. Both are prime. Done.

Result. 15 = 3 \times 5. Total work: one random draw, one quantum run, a handful of classical gcds.

N = 15 worked example traceA flow diagram tracing the five steps for N equals fifteen. Step one: a equals seven, gcd equals one. Step two: quantum order finding returns r equals four. Step three: conditions met, x equals four. Step four: gcd of three and fifteen equals three, gcd of five and fifteen equals five. Step five: fifteen equals three times five.Factoring N = 15 with a = 7Step 1a = 7 (random)gcd(7, 15) = 1 ✓Step 2 (Q)QPE → φ̃ = 3/4CF → r = 4Step 3r even, 7² = 4 ≢ −1Step 4: gcd extractionx = 7^(r/2) = 7² = 4gcd(x − 1, N) = gcd(3, 15) = 3gcd(x + 1, N) = gcd(5, 15) = 5p₁ = 3, p₂ = 5Step 5: verify15 = 3 × 5both primes ✓factorisationcomplete
Complete trace of Shor's algorithm for $N = 15$ with $a = 7$. Order finding returns $r = 4$ via a QPE measurement of $\widetilde\varphi = 3/4$ and continued fractions. The gcds $\gcd(3, 15) = 3$ and $\gcd(5, 15) = 5$ split $N$ into its prime factors. The quantum computer did one thing — the order — and everything else was classical arithmetic.

What this shows. For a 4-bit N, the algorithm succeeds in one pass with minimal quantum work. The quantum circuit here uses t = 2n + 1 = 9 ancilla qubits and n = 4 data qubits — 13 qubits total. The modular-exponentiation step has O(n^3) \approx 64 reversible-arithmetic gates. This is demonstrable on essentially any quantum simulator, and has been run on real hardware (IBM, 2001 IBM NMR experiment by Vandersypen et al.) as the first experimental demonstration of Shor's algorithm.

The resource estimate for real RSA

For the toy N = 15, everything fits on a handful of qubits. For cryptographically relevant N — 2048-bit RSA — the numbers look very different. Here is what it actually takes.

Example 2: resource estimate for 2048-bit RSA

Setup. N is a 2048-bit RSA modulus, so n = 2048.

Logical qubits. The quantum circuit uses t = 2n + 1 = 4097 ancilla qubits and n = 2048 data qubits, plus a scratch register for modular exponentiation. Standard modular-exponentiation circuits (Beauregard 2003) need about 2n + 3 = 4099 ancilla-scratch qubits during multiplication. Total logical qubits: roughly 3n \approx 6000. Why the scratch matters: reversible arithmetic cannot overwrite inputs, so every intermediate result must live on its own wire or be un-computed. The 2n + 3 scratch count is the result of careful design — naïve implementations blow up to \Omega(n^2) qubits.

Logical gates. Modular exponentiation dominates. Each controlled-U_a^{2^k} costs O(n^2) Toffoli gates for multiplication, and there are t = O(n) of them. Total Toffoli count: O(n^3) \approx 10^{10} for n = 2048. With state-of-the-art windowed-arithmetic optimisations (Gidney 2019-2021): about 10^9 Toffolis.

Physical qubits (with error correction). Here is the brutal part. Current hardware has physical error rates around 10^{-3} per gate. To run 10^9 gates without a single error, you need logical error rates of 10^{-10} or better. The surface code achieves this by encoding one logical qubit in a d \times d grid of physical qubits, where d (the code distance) is typically 25 to 27 at 2048-bit scale. That is d^2 \approx 700 physical qubits per logical qubit.

Multiplying: 6000 \text{ logical} \times 700 \text{ physical/logical} + \text{ancillas for magic-state distillation} \approx 2 \times 10^7 physical qubits. This is the Gidney-Ekerå 2021 estimate (arXiv:1905.09749). Why the surface code overhead is so large: surface codes need local connectivity and repeated syndrome measurement to detect errors. Each logical T gate, in particular, requires a "magic state" distilled from many noisy physical states — a process that itself consumes thousands of physical qubits per distilled state. The d^2 scaling is an enormous cost but is the best known for realistic physical-error rates.

Runtime. A surface-code cycle is \sim 1\,\mu\text{s} on superconducting hardware. Magic-state distillation and logical gate operations run at \sim 1 kHz logical-gate rate. A 10^9-gate algorithm takes \sim 10^6 seconds \approx 12 days — optimised to \sim 8 hours with parallelism and pipelining in Gidney-Ekerå's scheme.

Compare to today. The largest superconducting processors in 2024-25 (IBM Condor, Google Willow) have \sim 10^3 physical qubits with error rates \sim 10^{-3} per 2-qubit gate. They are roughly 4 orders of magnitude short of what Gidney-Ekerå needs. Scaling physical-qubit counts by 10^4 while maintaining error rates is an open engineering problem that spans hardware, control electronics, cryogenics, and fabrication yield.

Timeline estimates (from Mosca, Gidney, National Academies reports): broken RSA is unlikely before \sim 20352040 under aggressive scaling assumptions; more likely \sim 20452050 under conservative ones; never, if a fundamental obstacle appears.

Result. \sim 2 \times 10^7 physical qubits, \sim 8 hours of runtime, to factor a 2048-bit RSA modulus. Today's hardware is 10^4 qubits short. The gap is real, the research is ongoing, and the migration to post-quantum cryptography is the correct policy response regardless of the timeline.

Resource estimate for breaking 2048-bit RSAA log-scale bar chart showing physical qubit counts. Today's machines at ten cubed physical qubits. Next a bar showing Gidney-Ekerå 2021 estimate for 2048-bit RSA at two times ten to the seventh. A horizontal arrow labelled four orders of magnitude gap spans between them. On the right a second chart showing runtime in seconds, going from roughly one millisecond per circuit today up to twenty-nine thousand seconds or eight hours for the full Shor run.Physical-qubit gap to fault-tolerant Shor10³10⁴10⁵10⁶10⁷physical qubitstoday (~10³)IBM Condor / Google Willow2048-RSA fault-tolerant(Gidney-Ekerå 2021: 2×10⁷)~4 orders of magnitude gap
The physical-qubit count needed to run Shor on a 2048-bit RSA modulus (Gidney-Ekerå 2021) is about $2 \times 10^7$, against today's ceiling of $\sim 10^3$ — a four-orders-of-magnitude gap that spans fabrication, cryogenics, control electronics, and error correction. Closing it is an active research programme, and the current consensus timeline puts operational Shor somewhere in the mid-2030s at the earliest.

What this shows. Shor's algorithm is polynomial in the bit-size of NO(n^3) gates — but the constants are enormous and the error-correction overhead multiplies them by another factor of \sim 10^4. "Polynomial" in theoretical computer science hides a lot of engineering reality. The gap between "provably polynomial" and "operationally fast" is where most of the quantum-computing engineering effort of the next decade will go.

Hype check. Shor's algorithm is the most dramatic theoretical result in quantum computing, and it is also one of the most misunderstood. It does not mean RSA is already broken. It does not mean "quantum computers can break all encryption." It means three specific things, each of which matters in a specific way.

First, Shor gives a polynomial-time quantum algorithm for factoring and discrete-log. No classical polynomial-time algorithm is known. This is a mathematical fact, provable in full, with no open problems about the algorithm itself.

Second, the engineering requirements for running Shor on a cryptographically relevant N are enormous — about 2 \times 10^7 physical qubits and 8 hours, versus today's \sim 10^3 qubits and milliseconds. Useful Shor is years to decades away, not months.

Third — and this is the actionable part — the "harvest now, decrypt later" threat is real today. An adversary can record encrypted traffic in 2025 and decrypt it in 2040 once the quantum computer is built. For long-lived secrets (state-level communications, intellectual property with decade-plus value, biometric templates in systems like Aadhaar), the risk horizon is now, not 2040. That is why post-quantum cryptography standardisation (NIST's Kyber, Dilithium, Falcon, SPHINCS+; India's CERT-In advisories from 2024 onward) is a policy priority regardless of when operational Shor arrives.

Symmetric encryption (AES-256) is only modestly affected — Grover's algorithm gives a O(\sqrt N) speedup on search, which is a 2× key-length reduction. AES-256 remains secure against quantum attacks. Hash-based signatures, lattice-based schemes, and code-based schemes are all post-quantum candidates already standardised. The transition is doable; it is underway; and it is the right response to Shor regardless of the threat timeline.

Common confusions

Going deeper

The walk-through above is the complete, end-to-end algorithm. The "going deeper" material below fills in the resource-estimate derivations, Coppersmith's optimisations to the QFT, Kitaev's alternative phase-estimation variant, error-correction overheads, and the India-specific post-quantum-crypto migration picture.

Full circuit-resource accounting

A complete resource count for factoring an n-bit N (following Beauregard 2003 and Gidney-Ekerå 2021):

Component Qubits Gates
Ancilla register 2n + 1
Data register n
Modular-exp scratch 2n + 3
Total logical qubits \sim 5n + 4
Modular multiplication O(n^2) Toffoli
Modular exponentiation (total ladder) O(n^3) Toffoli
Inverse QFT O(n^2) 1-qubit + 2-qubit
Classical post-processing O(n^3) bit ops

The Toffoli-count dominant term is modular multiplication inside the exponentiation ladder. For n = 2048: \sim 2 \times 10^9 Toffolis after windowed-arithmetic optimisation, requiring \sim 10^{10} T gates after Toffoli decomposition, which requires \sim 10^{10} magic states to execute.

Coppersmith's QFT optimisation

The inverse QFT used at the end of phase estimation has O(n^2) controlled-phase gates. Coppersmith (1994, reprinted in IBM RC 19642) observed that most of these rotations are exponentially small — rotations by angles 2\pi / 2^k for large k — and can be dropped with negligible effect on the output distribution. The resulting "approximate QFT" has O(n \log n) gates and produces a phase estimate that is within 2^{-m} of the true value with probability close to 1, for any truncation depth m.

The approximate QFT has been part of every serious Shor resource estimate since ∼1995. It reduces the QFT cost from being a small but non-negligible slice of the circuit to being essentially free relative to the modular-exponentiation core.

Kitaev's one-ancilla phase estimation for Shor

Kitaev's 1995 construction (arXiv:quant-ph/9511026) replaces the t-ancilla QFT-based phase estimation with a single-ancilla iterative version that extracts one bit of \widetilde\varphi at a time, feeding each measurement back as a classical control.

For Shor, the Kitaev variant needs 1 ancilla (versus 2n + 1) at the cost of O(n) rounds of classical feedforward. Circuit depth is similar; total qubit count drops by a factor of \sim 2; classical post-processing becomes more intricate but remains polynomial. This is attractive for NISQ-era demonstrations of Shor-like subroutines on small n, where qubit count is the scarce resource.

For full-scale fault-tolerant Shor at n = 2048, Kitaev's savings are swamped by the modular-exponentiation scratch qubits. The QFT-based and iterative variants both end up in the \sim 10^7 physical-qubit range.

Error-correction overhead — surface code accounting

The surface code is the leading quantum error correction scheme for superconducting and ion-trap hardware. It encodes a logical qubit in a d \times d grid of physical qubits, where d is the code distance — the minimum number of physical errors needed to cause a logical error.

The suppression is exponential in d: for physical error rate p < p_{\text{th}} (threshold \approx 10^{-2}), the logical error rate is \sim (p / p_{\text{th}})^{d/2}. To run G = 10^{10} gates with logical-error probability < 1, you need

p_L \;<\; 1/G \implies d \;\gtrsim\; 2\log_{p/p_{\text{th}}}(1/G).

For p = 10^{-3} and p_{\text{th}} = 10^{-2}, d \approx 27, so each logical qubit uses d^2 \approx 729 physical qubits. Add magic-state distillation factories (each distilling one |T\rangle state uses \sim 10^5 physical qubits and produces a state at \sim 1 kHz); for Shor at n = 2048 with 10^{10} T gates required, you need \sim 10 parallel distillation factories to feed the logical gates at full rate.

Total physical qubit budget: logical qubits × d^2 + distillation factories = 5 \cdot 2048 \cdot 729 + 10 \cdot 10^5 \approx 7 \cdot 10^6 + 10^6 \approx 8 \cdot 10^6 on the optimistic side, rising to \sim 2 \cdot 10^7 in Gidney-Ekerå's detailed accounting with pipelining and buffer space.

Indian post-quantum-cryptography migration

India's digital infrastructure rests on RSA and elliptic-curve cryptography at nearly every layer:

India's National Quantum Mission (NQM) (2023, ₹6000 crore over 8 years) has a working group on post-quantum cryptography migration. CERT-In issued advisories in 2024 recommending lattice-based schemes (CRYSTALS-Kyber for key exchange, CRYSTALS-Dilithium for signatures) for new deployments, aligned with NIST's 2024 PQC standards. The Ministry of Electronics and IT (MeitY) has initiated a "Crypto-Agility" programme to ensure that Aadhaar and UPI cryptographic protocols can be swapped without changing application-layer code.

Industrial buy-in is growing: TCS, Infosys, Wipro, and HCL all have PQC practices, often in partnership with NIST-standardised libraries (Open Quantum Safe). IIT Madras, IISc Bangalore, and TIFR Mumbai host postgraduate programmes on PQC that train the next generation of Indian cryptographers. The migration timeline — CERT-In aims for substantial completion by 2030 — is ahead of the expected Shor deployment timeline (2035+), and that is the correct ordering.

The specific Indian threat model includes "harvest now, decrypt later": foreign adversaries recording Aadhaar enrolment traffic or inter-bank messages today and decrypting in 2040 when Shor is operational. That risk is not zero — it is the reason for ahead-of-threat migration. NQM's Quantum Safe India initiative is a direct response.

Why the discrete-log algorithm fits the same template

Shor's 1994 paper actually proved two results: factoring is in BQP, and discrete logarithm is in BQP. The discrete-log algorithm uses the same order-finding machinery on the unitary U_{g,h}|x\rangle|y\rangle = |x\rangle|g^x h^{-y} \bmod p\rangle (where g is a generator of a cyclic group mod p and h is the target element); the eigenvalues encode the discrete log.

This is why breaking elliptic-curve DSA has the same order of difficulty as breaking RSA — both reduce to period finding on an appropriate group, which reduces to phase estimation. Post-quantum-crypto migration has to replace both RSA and ECC simultaneously.

Where this leads next

References

  1. Peter Shor, Polynomial-time algorithms for prime factorization and discrete logarithms on a quantum computer (1994/1995) — arXiv:quant-ph/9508027. The founding paper.
  2. Craig Gidney and Martin Ekerå, How to factor 2048 bit RSA integers in 8 hours using 20 million noisy qubits (2021) — arXiv:1905.09749. The modern resource-estimate reference.
  3. Wikipedia, Shor's algorithm.
  4. Nielsen and Chuang, Quantum Computation and Quantum Information §5.3 — Cambridge University Press. Shor's algorithm as phase estimation.
  5. John Preskill, Lecture Notes on Quantum Computation Chapter 7 — theory.caltech.edu/~preskill/ph229. Full derivation of Shor with continued-fractions analysis.
  6. Michele Mosca, Cybersecurity in an era with quantum computers: will we be ready? (2015) — arXiv:1512.03780. The policy-timeline reference used by CERT-In and NIST PQC working groups.