VQE — The Idea — padho-wiki

Note: Company names, engineers, incidents, numbers, and scaling scenarios in this article are hypothetical — even when they resemble real ones. See the full disclaimer.

In short

The Variational Quantum Eigensolver (VQE) finds the lowest eigenvalue of a Hermitian operator H by combining a quantum computer and a classical optimiser. You choose an ansatz U(\theta) — a parameterised quantum circuit — and use the quantum computer to prepare |\psi(\theta)\rangle = U(\theta)|0\rangle^{\otimes n}. You measure the expectation value E(\theta) = \langle\psi(\theta)|H|\psi(\theta)\rangle by decomposing H = \sum_k c_k P_k into a sum of Pauli strings, sampling each P_k with thousands of shots, and summing the results. You hand E(\theta) to a classical optimiser — Adam, COBYLA, SPSA — which returns the next \theta. You loop until E stops decreasing. The Rayleigh-Ritz variational principle guarantees E(\theta) \ge E_0 for any \theta, where E_0 is the true ground-state energy, so minimising E over \theta approaches E_0 from above. VQE was proposed by Peruzzo et al. in 2014 and is the flagship NISQ-era algorithm for quantum chemistry because the ansatz can be shallow (survives noise), noise tolerance is built into the stochastic optimiser, and the classical optimiser shoulders most of the computational load. Demonstrated on \text{H}_2, \text{LiH}, \text{BeH}_2, and small Fermi-Hubbard models to chemical accuracy. Where it does not yet win: as of 2025, no VQE run has demonstrated clear quantum advantage over state-of-the-art classical chemistry (CCSD(T), DMRG, selected CI) on a problem of practical industrial interest. The algorithm is real, the research is legitimate, and the utility is 2028+.

A chemist wants to know how much energy it takes to pull apart a hydrogen molecule. The answer is 4.478 electron-volts. You do not get to look it up in a table, because the table was built by someone — or something — that computed it. That computation is the electronic structure problem: given the nuclei of a molecule at fixed positions, find the ground state of its electron cloud and read off its energy. Everything in computational chemistry — every binding energy, every reaction rate, every drug-target affinity — eventually reduces to solving this one problem on harder and harder molecules.

Classical methods solve it to astonishing accuracy for small, well-behaved molecules. CCSD(T) gets \text{H}_2, \text{H}_2\text{O}, \text{CH}_4, and most organic chemistry to a few milli-Hartree. DFT handles materials and drug candidates with hundreds of atoms. Density-matrix renormalisation group (DMRG) pushes into strongly correlated systems. These are not weak tools.

But they fail, structurally and predictably, on the problems that matter most: transition-metal catalysts, nitrogen fixation in nitrogenase's FeMoco cluster, photosynthesis's oxygen-evolving complex, cytochrome oxidase. The failure is not about compute — throwing a bigger supercomputer at CCSD(T) does not fix it. The failure is that the approximations built into classical methods assume a kind of electronic structure that these molecules do not have. They have strong correlations, open shells, near-degeneracies — the regime where a single-reference perturbation is the wrong starting point.

This is the opening Feynman pointed at in 1982: use a quantum computer to simulate a quantum system. A quantum computer can natively represent the many-body electronic wavefunction without collapsing it into a single Slater determinant. But a full quantum simulation (phase estimation on a fault-tolerant machine) is a decade away. The question in 2014 was: could you extract the ground-state energy — just the number, not the full wavefunction — on a noisy, shallow, near-term machine?

Peruzzo et al. (2014) answered yes, and gave the answer a name. VQE — the Variational Quantum Eigensolver. It is the founding algorithm of the NISQ era, and a decade on, it is still the template for nearly every near-term quantum-chemistry experiment.

This chapter is about the core idea. The ansatz design, the measurement strategy, and where VQE sits in the bigger variational algorithms picture, you already know. The mechanics of picking a specific ansatz (VQE ansätze) and the practical engineering of running one (VQE in practice) get their own chapters. Here the task is to understand why the algorithm works at all — the variational principle, the measurement decomposition, and the honest calibration of what VQE can and cannot do.

The picture: driving energy downhill

Before any formula, the picture. A quantum state lives in a vast Hilbert space — 2^n complex dimensions for n qubits. The Hamiltonian H assigns an energy to every state, via E(\psi) = \langle\psi|H|\psi\rangle. Most states have high energy. A few have low energy. Exactly one (or a few, if there is degeneracy) has the lowest energy, called E_0 — the ground state.

VQE in one picture. Each parameter value $\theta$ gives a trial state $|\psi(\theta)\rangle$ and an energy $E(\theta)$. The variational principle says this energy is always at least $E_0$. The classical optimiser walks downhill along the $E(\theta)$ curve, starting from some initial guess, until it cannot go lower. The final energy $E(\theta^*)$ is an upper bound on $E_0$, and — if the ansatz is good — is close to $E_0$.

VQE lets you ride this landscape. You parameterise the state by \theta — a vector of knobs the circuit exposes. You turn the knobs, evaluate the energy, and the optimiser tells you which way to turn next. You are not going to hit E_0 exactly, because (a) your knobs may not cover every quantum state, and (b) each energy measurement is noisy. But you can get close. The closer, the more useful the number.

The variational principle — the mathematical spine

The theorem that makes VQE work is the Rayleigh-Ritz variational principle, which might have appeared in your first quantum-mechanics course.

The variational principle

Let H be a Hermitian operator with eigenvalues E_0 \le E_1 \le E_2 \le \cdots and normalised eigenstates |E_0\rangle, |E_1\rangle, \ldots. For any normalised state |\psi\rangle,

\langle \psi | H | \psi \rangle \ge E_0,

with equality if and only if |\psi\rangle is a ground state (an eigenstate of H with eigenvalue E_0).

Reading the definition. \langle \psi | H | \psi \rangle is the expectation value of H in the state |\psi\rangle — decoded, this means: if you prepared |\psi\rangle many times and measured the observable H each time, the average value you would get is this number. E_0 is the smallest eigenvalue, the lowest energy the system can have. The theorem says that no matter which state |\psi\rangle you happen to prepare, its expected energy cannot be lower than E_0; and it equals E_0 only when |\psi\rangle is actually a ground state.

The proof. Expand |\psi\rangle in the eigenbasis of H:

|\psi\rangle = \sum_k c_k |E_k\rangle, \qquad \text{with } \sum_k |c_k|^2 = 1 \text{ (normalisation)}.

Why we can do this: H is Hermitian, so its eigenvectors form an orthonormal basis. Every state can be written as a complex linear combination of them.

Then

\langle\psi|H|\psi\rangle = \sum_{j,k} c_j^* c_k \langle E_j | H | E_k \rangle = \sum_{j,k} c_j^* c_k E_k \delta_{jk} = \sum_k |c_k|^2 E_k,

Why the delta collapses the sum: \langle E_j | E_k \rangle = \delta_{jk} because eigenstates of a Hermitian operator are orthonormal, so only the diagonal j=k terms survive.

and since every E_k \ge E_0,

\sum_k |c_k|^2 E_k \ge E_0 \sum_k |c_k|^2 = E_0,

Why the inequality: in the sum, every |c_k|^2 \ge 0, and every E_k \ge E_0, so replacing each E_k by the smaller number E_0 can only decrease the total.

with equality only when all the weight |c_k|^2 sits on states with eigenvalue E_0 — that is, when |\psi\rangle is a ground state.

This is the whole reason VQE works: minimising the expectation value over a parameterised family cannot give an answer below the true ground state, so whatever minimum you find is an upper bound on E_0. The better your parameterisation covers ground states, the tighter the bound.

The variational bound in energy units. Any trial state sits above $E_0$. Minimising over the ansatz family drives the trial energy as close to $E_0$ as the family permits. The gap between the optimised trial energy and $E_0$ is the **ansatz error** — the part of the ground state your parameterised circuit cannot reach. A more expressive ansatz shrinks this gap; a less expressive one leaves it wide.

There is a less-advertised second part of the variational principle: the gap between \langle\psi|H|\psi\rangle and E_0 is quadratic in the error \||\psi\rangle - |E_0\rangle\|. So if your trial state is a 10% perturbation of the ground state (large by quantum-chemistry standards), the energy is only 1% off. This is why VQE can get chemical accuracy with imperfect circuits — energy is a kind way to measure state quality.

The algorithm, step by step

With the variational principle in hand, the algorithm is almost unsurprising.

VQE as a hybrid loop. The quantum computer's job: prepare $|\psi(\theta)\rangle$ and measure $\langle H \rangle$, reporting one number. The classical computer's job: take that number (and, for gradient-based optimisation, many more numbers from shifted circuits), and pick a better $\theta$. The loop runs until the energy stops going down.

The five steps of the algorithm:

Encode the problem. Choose the Hermitian operator H whose lowest eigenvalue you want. For quantum chemistry, H is the molecular electronic Hamiltonian after the Born-Oppenheimer approximation and a fermion-to-qubit mapping (Jordan-Wigner or Bravyi-Kitaev). Decompose H into a weighted sum of Pauli strings: H = \sum_k c_k P_k, where each P_k \in \{I, X, Y, Z\}^{\otimes n} and each c_k is a real number computed classically from orbital integrals.
Choose an ansatz. Pick a parameterised circuit U(\theta). UCCSD (physically motivated), hardware-efficient (NISQ-friendly), or ADAPT-VQE (grow the circuit adaptively) are the usual options. Parameter count ranges from a handful to a few thousand.
Prepare the trial state. Run U(\theta_0) on the quantum computer starting from |0\rangle^{\otimes n}. This produces |\psi(\theta_0)\rangle.
Measure the energy. For each Pauli string P_k, rotate into its eigenbasis and measure the qubits, repeat N_{\text{shots}} times (\sim 10^4 is typical), average to estimate \langle P_k \rangle. Combine: E(\theta_0) = \sum_k c_k \langle P_k \rangle.
Optimise. Hand E(\theta_0) to a classical optimiser. If gradients are needed, evaluate \partial_j E via the parameter-shift rule or finite differences — two extra circuit evaluations per parameter. Update \theta \to \theta_1. Repeat from step 3.

Convergence happens when the energy stops decreasing — usually 50-200 optimiser iterations for small molecules, with each iteration running tens of thousands of circuit evaluations on the QPU.

Why this fits NISQ

Three features make VQE the first quantum algorithm that actually ran on real near-term hardware.

Shallow circuits. The ansatz can have circuit depth O(10) to O(100) — well within the noise horizon of a 10^{-3}-error device. A 100-gate circuit on a 10^{-3}-error machine expects \sim 0.1 errors per shot, which is survivable. In contrast, Shor's algorithm needs 10^{10} gates on the same machine, which corrupts the state in the first microsecond.

Built-in noise tolerance. The classical optimiser is designed to handle noisy evaluations — Adam, SPSA, and stochastic gradient descent were invented for machine learning, where every loss evaluation is over a noisy mini-batch. VQE's energy estimates come with shot noise (O(1/\sqrt{N_{\text{shots}}})) and NISQ noise (some additional bias). As long as the bias does not move the minimum by more than chemical accuracy, the optimiser finds it.

Most work is classical. A VQE run for \text{H}_2 uses maybe 3 quantum parameters and 15 Pauli terms. The QPU is busy for a few seconds per iteration; the CPU and GPU do everything else. This matches the NISQ reality, where quantum hardware is expensive, slow, and scarce — you want to minimise how much of the algorithm lives on it.

The cost that isn't built in. VQE's shot budget is large. Each Pauli term needs O(10^4) shots to hit 1% precision; with hundreds of Pauli terms and hundreds of iterations and parameter-shift gradient, a medium-sized VQE can consume 10^{10} shots. Even at microsecond shot rates, that is hours of QPU time. Reducing the shot count is one of the central research problems in VQE — via measurement grouping, classical shadows, and smarter optimisers.

The measurement strategy

The quantum computer cannot measure H directly (it is a sum of tensor products of Pauli matrices, not a single observable in the computational basis). What it can measure is a Pauli string in its own eigenbasis. So VQE decomposes the problem.

Write H = \sum_k c_k P_k where each P_k is a tensor product of single-qubit Paulis, like P_k = Z_0 X_1 Y_2 I_3. For a chemistry Hamiltonian on n qubits, the number of distinct P_k is typically O(n^4).

To estimate \langle P_k \rangle = \langle\psi(\theta)| P_k |\psi(\theta)\rangle:

Prepare |\psi(\theta)\rangle on the quantum computer.
For each qubit, rotate into the eigenbasis of the corresponding single-qubit Pauli: Z needs no rotation (computational basis), X needs a Hadamard H, Y needs S^\dagger H.
Measure all qubits in the computational basis. Each shot returns a bitstring b_1 b_2 \ldots b_n.
Compute the eigenvalue of P_k for that bitstring: (-1)^{(\text{parity of bits corresponding to non-identity Paulis})}.
Average over N_{\text{shots}}: \langle P_k \rangle \approx \frac{1}{N_{\text{shots}}} \sum_i \text{eigenvalue}_i.

Sum with weights: E(\theta) = \sum_k c_k \langle P_k \rangle.

The $\text{H}_2$ molecule in the STO-3G basis maps to 4 qubits and a Hamiltonian with 15 Pauli-string terms (after symmetry reductions — fewer on some encodings). Each term is measured separately; the energy is a weighted sum. For real molecules, the count grows as $O(n^4)$, which is why measurement-grouping techniques matter at scale.

The cost: each term needs its own rotation layer and its own batch of shots. For \text{H}_2 that is 15 terms — cheap. For a medium molecule it might be 10^3 to 10^4 terms. This is why measurement-grouping is a big research topic. Mutually commuting Pauli strings can be measured simultaneously (they share an eigenbasis); qubit-wise commuting strings are even cheaper. Modern VQE implementations group Pauli terms into as few as O(n^3) or even O(n^2) measurement settings, saving an order of magnitude in shots.

Worked examples

Example 1: VQE for the hydrogen molecule

Setup. You want the ground-state energy of \text{H}_2 at equilibrium bond distance R = 0.74 Å, using the minimal STO-3G basis (two 1s orbitals, one per hydrogen, giving two spatial orbitals and four spin-orbitals). The Bravyi-Kitaev mapping produces a 4-qubit Hamiltonian with 15 Pauli-string terms. The true ground-state energy is E_0 = -1.1373 Hartree (computed by exact diagonalisation; VQE's job is to reach it without exact diagonalisation).

Step 1. Choose an ansatz. Use UCCSD with singles and doubles. After number-conservation and spin symmetries, the effective parameter count for \text{H}_2 collapses to 3 parameters. The circuit is 8 two-qubit gates deep — within NISQ noise horizons for any modern device. Why only 3 parameters: UCCSD in general has O(N^2 M^2) parameters for N electrons in M orbitals, but symmetries (particle number, total spin S^2, S_z) force most of them to zero. For \text{H}_2 with 2 electrons in 2 spatial orbitals, only 3 independent excitations survive.

Step 2. Initialise. Start the optimiser at \theta_0 = (0.01, 0.01, 0.01) — a small random perturbation of the Hartree-Fock reference state |1100\rangle (electrons in the lowest two spin-orbitals).

Step 3. Evaluate. Prepare |\psi(\theta_0)\rangle on the quantum computer. For each of the 15 Pauli strings, rotate qubits into the correct basis, measure 10^4 times, average to get \langle P_k \rangle, multiply by c_k, and sum. Wall-clock on a modern superconducting QPU: about 30 seconds of pure QPU time. Result: E(\theta_0) \approx -1.117 Hartree. Above the true ground state, as required.

Step 4. Optimise. Compute \partial_j E for each of the 3 parameters via the parameter-shift rule — two extra circuit evaluations per parameter, so 6 extra runs. The gradient is approximately (-0.04, -0.03, 0.01) Hartree per radian. Apply Adam with learning rate 0.1: \theta_1 \approx (0.014, 0.013, 0.009).

Step 5. Iterate. Repeat. After 40-60 iterations, the parameters have converged. Final values: \theta^* \approx (0.113, 0.002, 0.000). Final energy: E(\theta^*) \approx -1.1371 Hartree — within chemical accuracy (1.6 \times 10^{-3} Hartree) of the true -1.1373.

Result. VQE reproduces the \text{H}_2 ground-state energy to chemical accuracy on a 4-qubit NISQ machine. This is the smallest VQE demonstration in the field, and has now been done on every major hardware platform — Compustar, Querion, Rigetti, IonQ, Quantinuum, Rigetti, QuEra. It is a legitimate quantum-chemistry calculation; it is also a calculation a classical computer does faster and more accurately in microseconds. The demonstration value is proving the stack works end-to-end, not beating the classical baseline.

Example 2: VQE for lithium hydride (LiH)

Setup. LiH in STO-3G has 4 electrons in 6 spatial orbitals (12 spin-orbitals). After freezing the two 1s core electrons on lithium and using parity-based qubit reductions, the active Hamiltonian runs on 8 qubits with roughly 200 Pauli-string terms. The true ground-state energy at the equilibrium bond length (1.60 Å) is E_0 \approx -7.8823 Hartree.

Step 1. Ansatz. UCCSD now has more structure than \text{H}_2: after symmetries, about 20 independent parameters. The circuit is 30-50 gates deep per layer, up to hundreds of two-qubit gates total. This is tight on NISQ hardware — a 10^{-3}-error device has a noise horizon around 500 gates, so UCCSD for LiH is right at the edge. The hardware-efficient alternative (shallower, more parameters, less physically motivated) is a common trade.

Step 2. Initialise and run. Start at \theta_0 near the Hartree-Fock reference. Measurement: 200 Pauli terms × 10^4 shots = 2 \times 10^6 shots per energy evaluation. On a QPU at 10-100 kHz shot rate, that is 20-200 seconds per energy evaluation. Gradient evaluation (parameter-shift) multiplies by 2 \times 20 = 40 extra evaluations per optimiser step.

Step 3. Converge. After 100-300 optimiser iterations, a well-tuned VQE run on LiH lands within chemical accuracy of the true energy. This was first demonstrated by Kandala et al. (Compustar, 2017), on superconducting hardware with error mitigation, and has been repeated and refined many times since.

Result. LiH is the first molecule where VQE starts feeling non-trivial. 8 qubits, hundreds of Pauli terms, dozens of parameters, hours of QPU wall-clock — compared to \text{H}_2's seconds. LiH is still solvable by classical FCI or CCSD(T) to higher accuracy than VQE, but LiH is where the algorithmic machinery of VQE (measurement grouping, noise mitigation, careful ansatz choice) starts mattering. Larger molecules (BeH_2, H_2O, CH_4) have all been demonstrated on hardware with similar qualitative findings: VQE works; classical methods beat it on accuracy; the gap narrows as the ansatz and mitigation improve.

Common confusions

"VQE finds the ground state"

VQE finds an approximation to the ground state, parameterised by your ansatz. If the ansatz family cannot express the true ground state, the best you can do is the state in the family that has lowest energy — which is above E_0. In practice, VQE's answer is an upper bound on E_0, not equal to it. The gap is the ansatz error plus the optimisation error.

"VQE is the same as quantum phase estimation (QPE)"

Not remotely. QPE gives you the ground-state energy to exponential precision in one shot (asymptotically), but requires deep coherent circuits that need fault tolerance. VQE gives you a variational upper bound using shallow circuits, classical optimisation, and statistical sampling. QPE is the fault-tolerant-era algorithm for ground-state energies; VQE is the NISQ-era approximation.

"VQE is only for quantum chemistry"

VQE works for any Hermitian H. Quantum chemistry is the dominant application because molecular Hamiltonians have the right structure and industrial stakes. But VQE has been applied to lattice models (Hubbard, Heisenberg spin chains, Ising), combinatorial optimisation (where it reduces to QAOA), and even non-physics eigenvalue problems. The variational principle does not know what H represents.

"VQE's convergence to the ground state is guaranteed"

No. The classical optimiser can get stuck in a local minimum — a \theta where the gradient is zero but the energy is above the ansatz's global minimum. Multiple restarts, different initialisations, and careful optimiser tuning mitigate this, but do not eliminate it. For deep ansatzes, barren plateaus (gradient exponentially small in qubit count) make the problem worse; see the barren plateaus article.

"More parameters = better ansatz"

Up to a point. Adding parameters increases the state-space coverage, which in principle lowers the achievable energy. But it also expands the optimisation landscape (more local minima), amplifies the shot cost (more parameter-shift evaluations), and pushes the circuit toward barren-plateau territory. Real VQE papers spend a lot of ink on how many parameters is the right number, and the answer is problem-dependent.

Where VQE might win — and where it doesn't yet

Hype check. You will read claims that VQE is the path to practical quantum advantage in chemistry. The honest 2025 state is: no VQE run has demonstrated a clear, rigorously benchmarked quantum advantage over the best classical methods on a problem of real industrial interest. Small molecules (H_2, LiH, BeH_2, H_2O) match classical methods in accuracy — a success for the hardware, not a utility milestone. Medium molecules (20+ qubits active space) hit NISQ noise before they outrun classical DMRG or selected CI. Large, strongly-correlated systems (FeMoco, the nitrogenase cluster; oxygen-evolving complex in photosynthesis) are where the payoff would be enormous, but those require fault-tolerant quantum computers — not NISQ. VQE is real, important research. It has not yet changed any chemistry paper's conclusion. A 2028-2032 window is where careful observers expect that to start changing, as hardware scales and error mitigation improves.

The promising regimes where VQE might genuinely contribute (even in the NISQ era):

Strongly correlated small systems that DFT handles badly and CCSD(T) cannot: small transition-metal complexes, multi-reference bond-breaking states. VQE's expressiveness is not the bottleneck; noise is.
Active-space VQE: embed a small VQE subsystem inside a larger classical calculation. The classical method handles the weakly correlated bulk; VQE handles the strongly correlated core. This is a natural hybrid that could beat pure-classical methods on systems too big for full VQE.
Excited states and properties: VQE variants (subspace VQE, SSVQE, variational quantum deflation) go after excited states, transition dipoles, thermodynamic averages. This is an active area where the quantum side is adding capability beyond ground-state energies.

The regimes where VQE does not currently win:

Small molecules that CCSD(T) already gets: classical is faster and more accurate.
Large molecules where NISQ noise dominates: the ansatz error is not the bottleneck; the noise is.
Systems that DMRG handles: 1D-like systems with limited entanglement; DMRG is polynomial and essentially exact.

The field's honest self-assessment is that VQE is a vital NISQ testbed — the thing you run to prove your hardware and stack work end-to-end — more than a production quantum-chemistry tool. That is likely to stay true until hardware gets one to two orders of magnitude better or algorithmic advances (like adaptive ansatzes and sophisticated error mitigation) close the gap.

The India angle

Indian quantum-chemistry research has become active in VQE through both academia and industry. IIT Madras has a quantum algorithms group publishing on VQE for small molecules using Compustar Quantum Network hardware. IISc Bangalore works on VQE with symmetry-adapted ansatzes and error mitigation. TIFR Mumbai has theoretical work on the convergence guarantees of VQE-like algorithms. IIT Bombay's chemistry department and IIT Delhi both have VQE coursework and student projects on real hardware.

On the industry side, QpiAI (Bangalore, NQM-aligned) has a quantum chemistry software product targeting drug discovery and materials. TechSetu Research has published VQE variants with noise-aware ansatz design. Indian pharma — SuryaPharma, Dr. Sharma's, AushadhCorp — have been exploring quantum chemistry through academic partnerships, with VQE as the near-term algorithm of choice for small molecular fragments.

The National Quantum Mission's "quantum algorithms" thrust explicitly funds VQE-track work, recognising that a domestic hardware stack will take years to scale and that Indian algorithm research should run on foreign cloud hardware in the meantime. By the late 2020s, expect an Indian VQE stack running on domestic superconducting QPUs built under NQM, integrated with C-DAC's HPC infrastructure.

Going deeper

The rest of this chapter surveys the formal variational-principle proof in detail, measurement grouping via Pauli commutativity, the impact of noise on VQE, the barren-plateau obstruction, and comparisons to phase estimation and imaginary-time evolution. This is the reference layer for someone planning to implement VQE or read the technical VQE literature; the overview above suffices to understand what the algorithm does.

The variational principle as calculus of variations

The Rayleigh-Ritz inequality \langle\psi|H|\psi\rangle \ge E_0 is the finite-dimensional version of a theorem in classical calculus of variations. In quantum mechanics, it underlies the Hartree-Fock method (trial state = single Slater determinant), configuration interaction (trial state = linear combination of Slater determinants), and every parameterised-wavefunction method in classical quantum chemistry. VQE brings this tradition into the quantum-hardware era, using quantum states themselves as the trial family — potentially capturing correlations no classical parameterisation can.

A subtle point: the variational principle does not require the ansatz to be close to the ground state to give a usable bound. Even a mediocre ansatz gives an upper bound, and the quadratic dependence of energy on state error means that a 10% state error gives only 1% energy error. This is the key mathematical reason VQE is less fragile than it looks.

Measurement grouping: qubit-wise commutativity

Two Pauli strings P_k and P_j are qubit-wise commuting (QWC) if, for every qubit, the single-qubit Paulis at that position either match or include an identity. For example, Z_0 X_1 I_2 and I_0 X_1 Y_2 are QWC because qubit 0 has Z vs I (OK), qubit 1 has X vs X (OK), and qubit 2 has I vs Y (OK). Two QWC Pauli strings can be measured simultaneously in the same circuit run: a single rotation layer works for both.

Grouping the O(n^4) Pauli terms of a chemistry Hamiltonian into QWC groups reduces the shot cost by a factor of the average group size. Greedy graph-colouring algorithms group terms into O(n^2) to O(n^3) QWC classes for typical Hamiltonians. Beyond QWC, general-commuting groups allow more terms per measurement (at the cost of more circuit overhead) and reduce the effective shot count further.

More recent techniques (classical shadows, derandomised shadows) go further, using randomised measurements to estimate many observables simultaneously with provable sample complexity. For VQE on medium molecules, classical shadows have reduced shot counts by one to two orders of magnitude compared to naive term-by-term estimation.

Noise impact on VQE

NISQ noise affects VQE in three ways:

Shot noise (unavoidable, O(1/\sqrt{N_{\text{shots}}})): reduced by more shots; cost is linear in budget.
Coherent noise (over- or under-rotation, calibration drift): biases the expectation value; mitigated by randomised compiling, Pauli twirling, and frequent recalibration.
Incoherent noise (decoherence, depolarising, amplitude damping): pulls the density matrix toward the maximally mixed state, which biases expectation values of traceless Paulis toward zero. Counter-intuitively, this often makes the VQE energy too high, not too low — the optimiser still gets a valid upper bound, but a looser one.

Error-mitigation techniques (zero-noise extrapolation, probabilistic error cancellation, dynamical decoupling) reduce the noise bias at a cost of extra shots — typically 10\times to 100\times more shots for a 10\times reduction in bias. The combination of mitigation and good ansatz design is what keeps VQE viable on real hardware.

Barren plateaus in VQE

For deep or random ansatzes, the cost gradient \nabla_\theta E has variance exponentially small in qubit count. This is the barren-plateau phenomenon: the optimiser has essentially no signal to follow and gets stuck. UCCSD is somewhat robust because its structure is not random. Hardware-efficient ansatzes at moderate depth are on the edge; deep hardware-efficient ansatzes are the worst case.

Mitigations specific to VQE: initialise near Hartree-Fock (small angles), use layer-wise training, use adaptive ansatzes (ADAPT-VQE grows the ansatz one gradient-informed operator at a time), use symmetry-preserving ansatzes that restrict to physically relevant subspaces. All are active research.

Alternative near-term algorithms for ground states

VQE is not the only NISQ-era ground-state algorithm. Quantum imaginary-time evolution (QITE) simulates the imaginary-time evolution e^{-\beta H}, which projects onto the ground state as \beta \to \infty; QITE avoids the barren-plateau problem but is hard to implement on shallow circuits. Subspace-expansion VQE runs a cheaper base VQE and then refines by diagonalising H in a small subspace of ansatz derivatives. Quantum Krylov methods build a Krylov subspace on the quantum computer and diagonalise classically. Each has trade-offs; none has displaced VQE as the default first algorithm to try on new hardware.

Phase estimation — the fault-tolerant successor

On a fault-tolerant quantum computer, quantum phase estimation (QPE) gives ground-state energies to any precision \epsilon in O(1/\epsilon) gate operations, without the variational machinery. QPE requires deep, coherent circuits with controlled time evolution e^{-iHt} — currently impossible on NISQ hardware. When fault-tolerant quantum computers arrive (2030s target), QPE displaces VQE for precision ground-state energies. Until then, VQE is the hand we have.

Where this leads next

The specific ansatz families — UCCSD, hardware-efficient, adaptive — get their own chapter, with the trade-offs between expressiveness and trainability. VQE in practice covers the engineering: measurement grouping, noise mitigation, initial-parameter strategies, and what actually works on a 2026 QPU. QAOA is the VQE-cousin for combinatorial optimisation — same hybrid loop, different ansatz and Hamiltonian.

Beyond the variational family, the next arc covers quantum error mitigation in depth, and the transition to fault-tolerant algorithms where deep circuits (and therefore phase estimation) become possible.

References

Alberto Peruzzo, Jarrod McClean, Peter Shadbolt, Man-Hong Yung, Xiao-Qi Zhou, Peter J. Love, Alán Aspuru-Guzik, Jeremy L. O'Brien, A variational eigenvalue solver on a photonic quantum processor (Nature Communications, 2014) — arXiv:1304.3061.
Jarrod McClean, Jonathan Romero, Ryan Babbush, Alán Aspuru-Guzik, The theory of variational hybrid quantum-classical algorithms (New Journal of Physics, 2016) — arXiv:1509.04279.
Abhinav Kandala et al., Hardware-efficient variational quantum eigensolver for small molecules and quantum magnets (Nature, 2017) — arXiv:1704.05018.
M. Cerezo et al., Variational Quantum Algorithms (Nature Reviews Physics, 2021) — arXiv:2012.09265.
John Preskill, Lecture Notes on Quantum Computation, Chapter 7 — theory.caltech.edu/~preskill/ph229.
Wikipedia, Variational quantum eigensolver.