HHL Algorithm — padho-wiki

Note: Company names, engineers, incidents, numbers, and scaling scenarios in this article are hypothetical — even when they resemble real ones. See the full disclaimer.

In short

The HHL algorithm (Harrow, Hassidim, Lloyd, 2009) solves the linear system A\mathbf{x} = \mathbf{b} for an N \times N Hermitian matrix A by preparing a quantum state |x\rangle \propto A^{-1}|b\rangle in time \mathcal{O}(\mathrm{poly}(\log N, s, \kappa, 1/\varepsilon)) — exponentially faster than the classical \mathcal{O}(Ns\kappa) cost of conjugate-gradient methods, if all four preconditions hold. The pipeline is three blocks: quantum phase estimation on e^{iAt} with |b\rangle as input, which expands |b\rangle = \sum_k \beta_k |u_k\rangle in the eigenbasis of A and writes each eigenvalue \lambda_k into an ancilla register; controlled rotation of a single flag ancilla by an angle proportional to 1/\lambda_k, encoding the matrix inverse in amplitude; amplitude amplification plus uncomputation to clean up the phase-estimation register and boost the success probability. The output |x\rangle is a quantum state, not a classical vector — you can measure its overlaps, moments, or feed it into a further quantum computation, but reading all N entries costs \Omega(N) measurements and wipes out the speedup. The four preconditions: (1) A is s-sparse with s \ll N; (2) condition number \kappa is small (runtime is linear in \kappa); (3) |b\rangle can be prepared in \mathcal{O}(\log N) time — usually via QRAM or by being the output of another quantum subroutine; (4) the application asks for a summary of \mathbf{x}, not the full vector. Ewin Tang's 2018 dequantisation showed that several HHL-based quantum-machine-learning "exponential speedups" have classical algorithms, under the same sampling assumptions, that catch up to polynomial factors. The real regime where HHL keeps its speedup is narrow, but when it applies — certain PDE solvers, certain quantum-to-quantum linear-algebra subroutines inside larger quantum pipelines — the advantage is genuine.

Solving A\mathbf{x} = \mathbf{b} is the single most common operation in numerical computing. Every structural-engineering simulation at an Indian Railways bogie-design lab, every finite-element heat-transfer calculation at ISRO, every least-squares regression in an IIT Bombay machine-learning assignment, every pagerank-like computation at BharatBazaar's recommender system — all of them reduce, somewhere in their core, to solving a system of linear equations. Classical algorithms handle this well: Gaussian elimination is \mathcal{O}(N^3), conjugate gradient on sparse matrices is \mathcal{O}(Ns\kappa) where s is the row sparsity and \kappa the condition number. Libraries like LAPACK and SuiteSparse have been hammering on this problem for decades.

In 2009, Aram Harrow, Avinatan Hassidim, and Seth Lloyd published a paper whose title made a promise most quantum papers do not: "Quantum algorithm for solving linear systems of equations." Not factoring, not searching, not simulating — linear systems. The bread and butter of numerical methods, done in \mathrm{poly}(\log N) time instead of \mathrm{poly}(N). An exponential speedup for an everyday workload.

The HHL paper was one of the catalysts of the quantum-machine-learning boom of 2013–2019. Almost every QML speedup claim from that era — quantum support vector machines, quantum principal component analysis, quantum recommender systems, quantum Gaussian processes — has HHL inside it as the linear-algebra engine. If HHL is a fast linear-system solver, and ML is mostly fast linear algebra, then QML should be mostly fast. That was the pitch.

The pitch had fine print. Four preconditions. Each of them, by itself, is reasonable. Together, they mean that the machine-learning problems for which HHL's exponential speedup survives — as against a fair classical algorithm with the same input assumptions — are much rarer than the original wave of QML enthusiasm suggested. Understanding HHL is therefore two things: understanding a genuinely beautiful quantum algorithm, and understanding the rigour with which one has to compare quantum and classical speedups.

The picture — three blocks in a pipeline

The HHL pipeline. The algorithm is a composition of three quantum subroutines you have already met — state preparation, phase estimation, and amplitude amplification — glued together by an eigenvalue-inversion rotation in the middle. Each block does one thing, and the whole is more than the sum of its parts: $|b\rangle$ in at the left, $|x\rangle \propto A^{-1}|b\rangle$ out at the right, in polylogarithmic time.

The picture to carry: HHL writes \mathbf{b}'s expansion in the eigenbasis of A using phase estimation, inverts each eigenvalue in the amplitude, uncomputes the phase estimation to disentangle, and amplifies the success branch. That is the whole algorithm. Once you see the five stages, everything else is detail.

Notation and the idealised setup

Fix the problem. A is an N \times N Hermitian matrix with eigendecomposition A = \sum_k \lambda_k |u_k\rangle\langle u_k|. Assume (for now) that all \lambda_k are between 1/\kappa and 1 — this is exactly the small-\kappa regime; we will come back to why. The input vector \mathbf{b} \in \mathbb{C}^N has quantum-state encoding

|b\rangle = \frac{1}{\|\mathbf{b}\|} \sum_{i=0}^{N-1} b_i |i\rangle,

so a classical N-dimensional vector is stored in \log_2 N qubits as amplitudes. Expand in the eigenbasis:

|b\rangle = \sum_k \beta_k |u_k\rangle, \qquad \beta_k = \langle u_k | b \rangle.

The classical answer is \mathbf{x} = A^{-1} \mathbf{b} = \sum_k (\beta_k / \lambda_k) |u_k\rangle — divide each eigencomponent by its eigenvalue. The goal of HHL is to prepare the quantum state

|x\rangle = \frac{1}{\|\mathbf{x}\|} \sum_k \frac{\beta_k}{\lambda_k} |u_k\rangle

which is the quantum-state encoding of the classical solution.

Notice that this is already where the first caveat bites. The output is a normalised quantum state; the normalisation \|\mathbf{x}\| is lost. To read the answer you must measure expectation values of operators on |x\rangle, or use |x\rangle as input to a further quantum subroutine.

Block 1 — state preparation

The first stage is loading \mathbf{b} into |b\rangle. For a generic N-dimensional classical vector, this takes \Omega(N) classical arithmetic just to read the input, so straight state preparation does not give any speedup.

There are two escape routes. (i) If \mathbf{b} has structure — it is a smooth function sampled on a grid, or the indicator of a known set, or produced by another quantum subroutine — preparation can be done in \mathcal{O}(\log N) time. (ii) If you have a QRAM (quantum random-access memory) that stores \mathbf{b} in a data structure allowing \mathcal{O}(\log N) quantum access, state preparation is \mathcal{O}(\log N). QRAM is a hardware assumption — no scalable QRAM has been built, only proposed — and its plausibility is an active research question.

For the algorithm analysis, assume state preparation of |b\rangle is \mathcal{O}(\log N). Any paper claiming an HHL speedup on a specific problem has to justify this assumption for that problem; if it cannot, the speedup does not apply.

Block 2 — phase estimation in the eigenbasis of A

Now the main event. Quantum phase estimation takes a unitary U and an eigenvector |u\rangle of U and writes the corresponding eigenvalue's phase into an ancilla register.

Take U = e^{iAt} for a chosen time t > 0. Then U|u_k\rangle = e^{i\lambda_k t} |u_k\rangle, so the phase \varphi_k = \lambda_k t / (2\pi) encodes the eigenvalue \lambda_k (with t chosen so \varphi_k \in [0, 1)). Running PE with input |b\rangle = \sum_k \beta_k |u_k\rangle — which is a superposition of eigenvectors — produces, by linearity,

\sum_k \beta_k |u_k\rangle_D |\tilde\lambda_k\rangle_A,

where the subscripts D and A label the data register (holding the eigenvector) and the ancilla register (holding the eigenvalue estimate \tilde\lambda_k \approx \lambda_k). Why PE is the right tool: it is the quantum circuit that maps "this state is an eigenvector of U" to "here is the corresponding eigenvalue written in binary." Running it on a superposition of eigenvectors writes all eigenvalues into their matching superposition components in parallel.

Implementing U = e^{iAt} is itself a non-trivial subroutine — it is Hamiltonian simulation of the matrix A viewed as a Hamiltonian. For s-sparse A, Hamiltonian simulation of e^{iAt} to precision \varepsilon takes \mathcal{O}(s t \log(N) / \varepsilon) gates (using the original HHL simulation technique; later work — Berry-Childs, Low-Chuang — improved this to nearly linear in t and logarithmic in 1/\varepsilon).

The total cost of phase estimation is \mathcal{O}(s t \log(N) / \varepsilon) where t is chosen large enough that the phase is resolved to the required precision — t = \mathcal{O}(\kappa/\varepsilon).

Block 3 — eigenvalue inversion

This is the heart of HHL and the step that has no classical analogue. The ancilla register now holds the eigenvalue estimate \tilde\lambda_k as a binary number. You add a single extra flag ancilla, initially in |0\rangle, and apply a controlled rotation on it:

|0\rangle_F |\tilde\lambda_k\rangle_A \mapsto \left( \sqrt{1 - \frac{C^2}{\tilde\lambda_k^2}} |0\rangle + \frac{C}{\tilde\lambda_k} |1\rangle \right)_F |\tilde\lambda_k\rangle_A,

where C is a normalisation constant smaller than the smallest eigenvalue (C \le 1/\kappa). The controlled rotation is R_y(2\arcsin(C/\tilde\lambda_k)) conditioned on the ancilla register — a standard quantum-circuit construction using reversible arithmetic on the classical value \tilde\lambda_k.

Why this particular rotation: it is the minimum-machinery way to put a factor of 1/\lambda_k into an amplitude. The amplitude of |1\rangle_F after the rotation is exactly C/\lambda_k, so when you post-select on the flag being |1\rangle, the branch you project onto has each eigencomponent weighted by 1/\lambda_k — which is what matrix inversion does.

After the rotation the full state is

\sum_k \beta_k \left( \sqrt{1 - \frac{C^2}{\lambda_k^2}} |0\rangle_F + \frac{C}{\lambda_k} |1\rangle_F \right) |\tilde\lambda_k\rangle_A |u_k\rangle_D.

The "success" branch — the one where the flag reads |1\rangle — has data-register component \sum_k (\beta_k C / \lambda_k) |u_k\rangle, which after normalisation is exactly |x\rangle. The probability of the success branch is

P_{\text{success}} = C^2 \sum_k \frac{|\beta_k|^2}{\lambda_k^2} \ge \frac{C^2}{\lambda_{\max}^2} = \mathcal{O}(1/\kappa^2)

in the worst case. Why this is small: C is \mathcal{O}(1/\kappa), and squared makes \mathcal{O}(1/\kappa^2). Inverting large eigenvalues produces small amplitudes, and you pay for the smallness with a low success probability.

Block 4 — uncomputation

After the rotation, the ancilla register |\tilde\lambda_k\rangle_A is entangled with the data register, so the data register is a mixed state when you trace it out. Discarding the ancilla without cleaning it up destroys the quantum coherence you need for the output. Uncomputation fixes this: apply the inverse phase-estimation circuit, which maps |u_k\rangle_D |\tilde\lambda_k\rangle_A \mapsto |u_k\rangle_D |0\rangle_A. The ancilla register returns to |0\rangle, and the data register is disentangled from it.

The state after uncomputation is

\left( \sum_k \beta_k \sqrt{1 - C^2/\lambda_k^2} |u_k\rangle \right) |0\rangle_F + \left( \sum_k \frac{\beta_k C}{\lambda_k} |u_k\rangle \right) |1\rangle_F.

Now measure only the flag ancilla. If the outcome is |1\rangle, the data register collapses to the post-normalisation state |x\rangle — exactly what you want. If the outcome is |0\rangle, the run fails and you must retry. The probability of success per run is \mathcal{O}(1/\kappa^2).

Block 5 — amplitude amplification

A success probability of 1/\kappa^2 is too low to use directly (you would run the circuit \kappa^2 times on average). Amplitude amplification — the generalisation of Grover's search — boosts a success amplitude p to probability close to 1 in \mathcal{O}(1/\sqrt{p}) Grover iterations. Applying it to the flag-on-|1\rangle branch raises the probability of success from \mathcal{O}(1/\kappa^2) to \mathcal{O}(1) after \mathcal{O}(\kappa) amplitude-amplification iterations, each of which runs the full HHL-minus-amplification pipeline once.

So the total HHL cost (using improved Hamiltonian simulation) is roughly

T_{\text{HHL}} = \mathcal{O}\big(\kappa \cdot s \log(N)/\varepsilon \cdot \mathrm{poly}\log(1/\varepsilon)\big).

Compare classical conjugate gradient on the same sparse matrix: T_{\text{CG}} = \mathcal{O}(Ns\sqrt{\kappa} \log(1/\varepsilon)). The speedup is exponential in N — HHL scales as \log N, CG scales as N — and polynomial in \kappa. The quantum algorithm wins big when N is large and \kappa is small.

Two worked examples

Example 1 — A 2×2 toy HHL

Take the smallest possible case, where you can track every amplitude by hand:

A = \begin{pmatrix} 1 & 0 \\ 0 & 2 \end{pmatrix}, \qquad \mathbf{b} = \frac{1}{\sqrt{2}} \begin{pmatrix} 1 \\ 1 \end{pmatrix}.

The classical answer is \mathbf{x} = A^{-1}\mathbf{b} = \frac{1}{\sqrt{2}}\begin{pmatrix} 1 \\ 1/2 \end{pmatrix}, which after normalisation has components (2, 1)/\sqrt{5}. We will check that the quantum pipeline reproduces this.

Step 1. State preparation. Encode \mathbf{b} as |b\rangle = \tfrac{1}{\sqrt{2}}(|0\rangle + |1\rangle). In the eigenbasis of A, which is just the computational basis with \lambda_0 = 1 and \lambda_1 = 2, write |b\rangle = \beta_0 |0\rangle + \beta_1 |1\rangle with \beta_0 = \beta_1 = 1/\sqrt{2}.

Step 2. Phase estimation. After PE on e^{iAt} (with t chosen so the phases resolve), the state is

\frac{1}{\sqrt{2}} |0\rangle_D |\tilde 1\rangle_A + \frac{1}{\sqrt{2}} |1\rangle_D |\tilde 2\rangle_A,

where |\tilde 1\rangle_A and |\tilde 2\rangle_A are the ancilla registers encoding the eigenvalues 1 and 2. Why: each eigencomponent of |b\rangle gets its eigenvalue written in the ancilla — that is exactly what PE does on a superposition.

Step 3. Eigenvalue inversion. Set C = 1 (within the allowed range; in this toy case \kappa = 2). Apply the controlled rotation with angles 2\arcsin(1/1) = \pi on the |\tilde 1\rangle_A branch and 2\arcsin(1/2) on the |\tilde 2\rangle_A branch. The flag ancilla becomes:

on the \lambda = 1 branch: amplitude C/\lambda = 1 on |1\rangle_F, amplitude 0 on |0\rangle_F.
on the \lambda = 2 branch: amplitude C/\lambda = 1/2 on |1\rangle_F, amplitude \sqrt{3}/2 on |0\rangle_F.

So the state is

\frac{1}{\sqrt{2}} |0\rangle_D |\tilde 1\rangle_A |1\rangle_F + \frac{1}{\sqrt{2}} |1\rangle_D |\tilde 2\rangle_A \left( \frac{1}{2} |1\rangle_F + \frac{\sqrt 3}{2} |0\rangle_F \right).

Why: inverting \lambda = 1 multiplies by 1 (no weakening); inverting \lambda = 2 multiplies by 1/2. The ratio of the output eigencomponents is 1 : 1/2 = 2 : 1, which matches the classical \mathbf{x} = (1, 1/2).

Step 4. Uncompute. Run PE in reverse, which clears the ancilla register. The state becomes

\left( \frac{\sqrt 3}{2} \cdot \frac{1}{\sqrt 2} |1\rangle_D \right) |0\rangle_F + \left( \frac{1}{\sqrt 2} |0\rangle_D + \frac{1}{2} \cdot \frac{1}{\sqrt 2} |1\rangle_D \right) |1\rangle_F.

Step 5. Measure flag. Conditioned on |1\rangle_F, the data register holds \frac{1}{\sqrt 2} |0\rangle + \frac{1}{2\sqrt 2} |1\rangle, which after normalisation is \frac{1}{\sqrt{5}}(2|0\rangle + |1\rangle) — exactly the normalised (2, 1). The probability of the flag reading |1\rangle is \frac{1}{2} + \frac{1}{8} = \frac{5}{8}; amplitude amplification can push this toward 1 with \mathcal{O}(\kappa) = \mathcal{O}(1) extra iterations.

Result. The HHL output is exactly |x\rangle \propto A^{-1}|b\rangle, preparing the quantum-state encoding of the classical solution \mathbf{x}.

Amplitude flow for the toy 2×2 system. Stages 1 and 2 leave both eigencomponents equal. Stage 3 (eigenvalue inversion) halves the $\lambda = 2$ component because the amplitude picks up a factor $C/\lambda = 1/2$. Stage 4 renormalises, producing the 2:1 ratio that matches the classical solution $\mathbf{x} = (1, 1/2)$.

What this shows. The three big blocks do exactly what the picture promises: PE labels each eigencomponent with its eigenvalue, the controlled rotation weights amplitudes by 1/\lambda, and uncompute plus measurement extracts the inverted state. For a 2×2 matrix the full algebra fits on a page, and the result matches the classical answer exactly.

Example 2 — When the speedup survives

Here is the kind of problem where HHL actually gives a genuine end-to-end speedup: a quantum-mechanical simulation that itself needs a linear-system solve.

Setup. You are running a quantum algorithm to simulate the dynamics of a molecule — imagine an ISRO propellant-chemistry problem. At some point in the pipeline you need to solve A\mathbf{x} = \mathbf{b} where A is the Hamiltonian of a perturbation, \mathbf{b} is an already-prepared quantum state (the output of the previous step), and \mathbf{x} is the input to the next quantum step. All registers stay quantum; no one ever reads out \mathbf{x} as a classical vector.

The four preconditions, checked.

Sparsity. Physical Hamiltonians are local, so A is sparse: s = \mathcal{O}(\log N). ✓
Condition number. Well-conditioned Hamiltonians have \kappa = \mathcal{O}(\mathrm{poly}\log N). ✓
State preparation. |b\rangle is the output of a previous quantum subroutine — preparation cost is absorbed into that step, and is \mathcal{O}(\mathrm{poly}\log N) if the upstream step is itself polylog-efficient. ✓
Output use. |x\rangle feeds the next quantum subroutine; no classical readout of \mathbf{x} is needed. ✓

Step 1. Apply HHL. Total cost \mathcal{O}(s\kappa \log(N) \cdot \mathrm{poly}\log(1/\varepsilon)) = \mathcal{O}(\mathrm{poly}\log N \cdot \mathrm{poly}\log(1/\varepsilon)) — polylogarithmic in N.

Step 2. Classical comparison. Conjugate gradient on the same problem: \mathcal{O}(N s \sqrt{\kappa} \log(1/\varepsilon)) = \mathcal{O}(N \mathrm{poly}\log N). Linear in N.

Step 3. Speedup. Quantum/classical ratio is \mathcal{O}(N / \mathrm{poly}\log N) = \widetilde{\mathcal{O}}(N) — exponential in the qubit count n = \log_2 N.

Result. When all four preconditions hold, HHL delivers an exponential end-to-end speedup inside a larger quantum pipeline. The key is that |x\rangle never has to become classical; it stays quantum and feeds the next quantum step.

Runtime comparison between HHL (polylog in $N$) and classical conjugate gradient (linear in $N$) on a sparse, well-conditioned Hermitian system. The gap is exponential in the qubit count; at $N = 10^6$ the ratio is around $10^5$; at $N = 10^9$ it is around $10^8$. The caveat: this is end-to-end speedup only when the four preconditions hold *and* the output $|x\rangle$ stays quantum.

What this shows. HHL is not a universal linear-system solver; it is an exponentially faster linear-system subroutine inside a quantum pipeline that satisfies its preconditions. The right mental model is not "HHL replaces LAPACK" but "HHL is a quantum analogue of a sparse iterative solver that can only be called from inside other quantum programs."

Hype check

Hype check. You will see claims like "HHL gives an exponential speedup for machine learning" and "quantum computers will solve linear algebra 10^20 times faster." Both claims are real, under the four preconditions — but the preconditions are tight, and every specific machine-learning speedup that HHL enables has now been scrutinised for dequantisation. Ewin Tang's 2018 result showed that if you give a classical algorithm the same input structure that HHL requires (sampling access to rows of A, norm queries on \mathbf{b}), the classical algorithm achieves the same scaling up to polynomial factors. The exponential gap vanishes under fair comparison for several headline applications: quantum recommender systems, some quantum principal component analysis, some quantum support vector machines. What survives the scrutiny is a narrower set of applications — particularly those where HHL is used as a quantum-to-quantum subroutine (the linear system feeds another quantum computation, not a classical readout) and those with specific matrix structure (banded PDE discretisations, certain eigenvalue problems) where the classical sampling-access assumption is not available. The algorithm is beautiful; the speedup is real; the applications are rarer than the headlines suggest.

The four preconditions, stated cleanly

HHL's exponential speedup survives iff all four of these conditions hold on the problem:

1. Sparsity. A has at most s non-zero entries per row with s = \mathcal{O}(\mathrm{poly}\log N). The Hamiltonian-simulation subroutine inside HHL costs time linear in s. Dense matrices kill the speedup — Hamiltonian simulation of a dense A is \mathcal{O}(N).

2. Condition number. \kappa = \lambda_{\max}(A)/\lambda_{\min}(A) is small, ideally \mathcal{O}(\mathrm{poly}\log N). HHL's runtime is linear in \kappa (improved from quadratic by Childs-Kothari-Somma 2017 [2]). Ill-conditioned matrices — those from discretised PDEs with strong singularities, for instance — have \kappa that grows polynomially in N, which eats into the speedup.

3. State preparation. |b\rangle must be preparable in \mathcal{O}(\mathrm{poly}\log N) time. Either \mathbf{b} has known structure (it is a function sample, or it is generated by a quantum subroutine), or a QRAM is available. A generic classical N-vector requires \Omega(N) just to read.

4. Output use. The application extracts a summary of \mathbf{x} — an overlap \langle \mathbf{x} | \mathbf{m} \rangle, an expectation value \langle \mathbf{x} | M | \mathbf{x} \rangle, a statistical moment — not the full vector. Reading all N components of \mathbf{x} takes \Omega(N) measurements, which restores the classical scaling.

If any precondition fails, HHL's quantum advantage diminishes or disappears for that problem. The preconditions are not corners to be navigated around; they are the boundary of applicability.

Common confusions

"HHL gives |x\rangle the classical vector." No — HHL gives a quantum state |x\rangle whose amplitudes encode \mathbf{x}. Recovering the full vector requires \Omega(N) measurements, which costs exactly as much as the classical algorithm. You can only extract summaries in polylog time.
"Exponential speedup means HHL is always faster." No — the four preconditions have to hold. A dense matrix, an ill-conditioned matrix, a generic classical input vector, or a full-vector readout each breaks the speedup.
"HHL replaces LAPACK." No — LAPACK solves classical linear systems and returns classical vectors. HHL solves quantum-state-encoded linear systems and returns a quantum state. They do different jobs.
"Dequantisation shows HHL was wrong." No — dequantisation shows that a specific comparison was unfair. HHL's speedup is over arbitrary classical algorithms that read the full matrix. Dequantisation shows that for certain applications, classical algorithms with sampling access (the same access model HHL uses) match HHL up to polynomial factors. Both results are correct; they compare different things.
"HHL is a near-term algorithm." No — HHL requires fault-tolerant quantum computers. The circuit depth for a meaningful HHL run is in the billions of gates, far beyond NISQ coherence times. HHL is a long-term target, not a present-day tool.
"Amplitude amplification adds \sqrt{N} to HHL." No — amplitude amplification inside HHL adds a factor of \kappa (boosting success probability from 1/\kappa^2 to \mathcal{O}(1)). The exponential-in-N speedup is unaffected.

Going deeper

If you are here for the top-line summary — HHL solves sparse well-conditioned linear systems in polylog time, subject to input and output caveats, and several flagship applications have been dequantised — you have it. The rest digs into the improved complexity, QRAM, dequantisation, and the current state of quantum linear algebra beyond HHL.

Childs-Kothari-Somma 2017 — the improved complexity

The original HHL paper had runtime \mathcal{O}(\log(N) \cdot s^2 \cdot \kappa^2 / \varepsilon). Childs, Kothari, and Somma (2017 [2]) replaced the polynomial-in-1/\varepsilon scaling with polylogarithmic: \mathcal{O}(\log(N) \cdot s \kappa \cdot \mathrm{poly}\log(1/\varepsilon)). The improvement comes from two ingredients:

(i) a better Hamiltonian-simulation subroutine based on Linear Combinations of Unitaries (LCU) and quantum signal processing, which replaces the Trotter-based simulation with a more efficient construction;

(ii) a direct block-encoding of the inverse A^{-1} via polynomial approximation of 1/x on the spectrum of A, using the same signal-processing machinery.

The CKS algorithm is considered the definitive linear-system quantum algorithm. Subsequent work (Ambainis, Chakraborty et al., Lin-Tong 2020) refined the \kappa dependence for specific problem classes.

QRAM — the missing hardware

Quantum random-access memory (QRAM) is the hardware abstraction HHL assumes for state preparation. A QRAM with N classical cells allows any n = \log_2 N-qubit address register |i\rangle to retrieve the associated amplitude b_i in \mathcal{O}(\log N) time, including the ability to query superpositions of addresses.

No scalable QRAM has been built. Proposed implementations — bucket-brigade QRAM, error-correcting QRAM — face engineering challenges (hot-linking superposition access through \Theta(N) classical cells without decoherence). Without QRAM, classical-data HHL applications are capped by state preparation cost, which for a generic \mathbf{b} is \Omega(N) and destroys the speedup.

This is the biggest gap between theoretical HHL and practical HHL, and it is actively debated in the community.

Dequantisation — Tang 2018 and after

Ewin Tang, as an undergraduate, showed that the quantum recommender-system algorithm of Kerenidis and Prakash (2016) had a classical analogue matching its complexity up to polynomial factors. Tang's key observation: if the classical algorithm is given \ell^2-sampling access to the matrix A — the ability to sample row indices with probability proportional to row norm, and column indices within a row with probability proportional to entry magnitude — then the matrix can be subjected to classical randomised linear algebra with polylogarithmic costs similar to HHL's.

Subsequent dequantisation results extended the technique to principal component analysis, support vector machines, and several other HHL-based QML algorithms. The common thread: whenever a quantum algorithm's speedup comes from a "state preparation-like" assumption (QRAM access), there is often a classical analogue under sampling access that nearly matches it.

What survives dequantisation: problems where HHL is used as a quantum-to-quantum subroutine (output |x\rangle fed to another quantum step), problems with matrix structure that sampling access cannot exploit (banded PDE matrices, for instance), and problems where the natural input already lives on quantum hardware. The frontier of quantum linear-algebra speedups has moved to understanding exactly which sparse-matrix structures support genuine exponential advantages over classical sampling-based methods.

Applications in quantum machine learning

The original HHL-based QML wave (2013–2017) assumed that any ML task reducible to linear algebra could inherit HHL's speedup. Post-Tang, the careful version of these claims is:

Quantum principal component analysis (Lloyd-Mohseni-Rebentrost 2014): exponentially fast if the density matrix of the data is given as a quantum state (not as classical samples). Dequantisable for classical data with sampling access.
Quantum support vector machines (Rebentrost-Mohseni-Lloyd 2014): genuinely faster for certain kernel matrices where sampling access is not informative.
Quantum recommendation systems (Kerenidis-Prakash 2016): Tang's original dequantisation target; classical analogue matches to polynomial factors.
Quantum regression: faster only under the quantum-state-in assumption.

Quantum linear algebra beyond HHL

The modern approach to quantum linear algebra is quantum signal processing (QSP) and its descendant quantum singular value transformation (QSVT, Gilyén-Su-Low-Wiebe 2019). QSVT provides a unified framework for applying polynomial functions to the singular values of a block-encoded matrix, subsuming HHL (f(x) = 1/x), Hamiltonian simulation (f(x) = e^{ixt}), amplitude amplification (f(x) = 2x\sqrt{1-x^2} variant), and many other algorithms as special cases.

From the QSVT perspective, HHL is "apply the polynomial \sim 1/x to the eigenvalues of A." The entire machinery of quantum phase estimation is replaced by a more direct block-encoding approach that has better constants and a cleaner proof of optimality.

India's contribution — IISc and IIT-B quantum linear algebra

Research groups at the Indian Institute of Science (Bangalore) and IIT Bombay have contributed to the quantum-linear-algebra literature, including work on quantum algorithms for graph Laplacian systems (relevant to network analysis and semi-supervised learning) and on the quantum simulation of fluid dynamics PDEs (relevant to aerospace and chemical-engineering applications in Indian industry). The Raman Research Institute (Bangalore) and the National Quantum Mission have identified quantum linear algebra as a priority research area, with specific applications in Indian-industry problems like power-grid optimisation and financial derivative pricing.

Where this leads next

Phase estimation applications — the parent topic, with Shor's factoring, chemistry, and amplitude estimation alongside HHL as PE-based algorithms.
Dequantisation — Tang's classical algorithms that match HHL under sampling access, and what survives.
QRAM — the hardware assumption HHL requires for classical-data state preparation.
Quantum signal processing — the modern framework that subsumes HHL as a special case of polynomial signal transformations on block-encoded matrices.
Quantum machine learning — the field HHL launched, now disciplined by dequantisation into a narrower but genuine set of speedups.

References

Aram W. Harrow, Avinatan Hassidim, Seth Lloyd, Quantum algorithm for linear systems of equations (2009) — arXiv:0811.3171. The original HHL paper.
Andrew M. Childs, Robin Kothari, Rolando D. Somma, Quantum algorithm for systems of linear equations with exponentially improved dependence on precision (2017) — arXiv:1511.02306. Improved HHL complexity.
Ewin Tang, A quantum-inspired classical algorithm for recommendation systems (2018) — arXiv:1807.04271. The dequantisation result.
Wikipedia, HHL algorithm — overview including preconditions and applications.
John Preskill, Lecture Notes on Quantum Computation, Chapter 7 — theory.caltech.edu/~preskill/ph229. NISQ context and algorithm survey.
András Gilyén, Yuan Su, Guang Hao Low, Nathan Wiebe, Quantum singular value transformation and beyond (2019) — arXiv:1806.01973. The modern framework that subsumes HHL.