In short
A variational quantum algorithm (VQA) is a hybrid quantum-classical loop: build a shallow parameterised quantum circuit U(\theta), prepare the state |\psi(\theta)\rangle = U(\theta)|0\rangle^{\otimes n}, measure the expectation value E(\theta) = \langle \psi(\theta) | H | \psi(\theta) \rangle of some observable H, and hand E(\theta) to a classical optimiser that chooses the next \theta. Iterate until E(\theta) converges to a minimum. The quantum computer runs the preparation and measurement steps; the classical computer runs the optimisation. This architecture is the dominant response to the NISQ constraint — because the quantum circuit can be shallow (noise horizon survived) and the classical optimiser can handle noisy, stochastic evaluations of E(\theta). The variational principle guarantees that E(\theta) \ge E_0 for any \theta, where E_0 is the ground-state energy; so minimising E drives you towards the ground state. Three headline instantiations: VQE (chemistry; H is a molecular Hamiltonian; ground-state energy is the target), QAOA (combinatorial optimisation; H encodes a cost function), and quantum machine learning (a parameterised circuit is used as a trainable model). Ansatz choice matters enormously: UCCSD (chemistry-inspired, deep, physically motivated), hardware-efficient (shallow, expressive, NISQ-friendly but prone to barren plateaus), problem-inspired (QAOA's alternating structure). The major open challenges are barren plateaus (cost landscape becomes exponentially flat for deep or random ansatzes, making optimisation stuck), noise bias (NISQ noise can push the optimiser to the wrong minimum), shot noise (each E(\theta) evaluation takes 10^3–10^5 measurements), and unclear quantum advantage (as of 2025, no VQA has demonstrated a clear-cut speedup over the best classical methods on a practically useful problem).
A quantum computer that tries to run Shor's algorithm must be fault-tolerant. The circuit is ten billion gates deep, and at a NISQ error rate of 10^{-3} per gate, the first hundred gates corrupt the state beyond recovery. This is what makes the NISQ era feel, from the outside, like a waiting room: the real quantum algorithms need real quantum hardware, and real hardware is a decade away.
Variational quantum algorithms are how the field refuses to sit in the waiting room. The idea is almost audacious: do not try to run a long quantum algorithm. Run a very short one, many times, and use a classical computer to do the optimisation that the quantum computer cannot. You hand the quantum computer a narrow, well-defined job — prepare a parameterised state, measure an expectation value, return the number — and you build the full algorithm out of millions of these narrow jobs, glued together by a classical optimiser.
This design is the dominant NISQ-era algorithmic paradigm. Nearly every headline near-term-quantum-computing result since 2014 — the Peruzzo VQE paper, the Google chemistry experiments, IBM's factory-floor-chemistry demos, Quantinuum's small-instance QAOA, the quantum-machine-learning hype wave — is some instantiation of the variational framework. Learning the general framework lets you read any of these specific results as a variation on a common theme.
The hybrid quantum-classical loop
Every variational algorithm has the same shape: a loop between a quantum machine that evaluates an expensive probabilistic function and a classical machine that optimises over it.
The five steps of the loop, spelled out:
- Choose an ansatz U(\theta). The ansatz is a parameterised circuit — a circuit whose gates depend on a vector of classical parameters \theta = (\theta_1, \theta_2, \ldots, \theta_p). Typical p ranges from tens to a few thousand.
- Prepare the trial state. Run the circuit on the quantum computer starting from |0\rangle^{\otimes n}, yielding the trial state |\psi(\theta)\rangle = U(\theta)|0\rangle^{\otimes n}.
- Measure the cost. Decompose the observable H = \sum_k c_k P_k into a sum of Pauli strings P_k, measure each Pauli in its eigenbasis, and combine the results to estimate E(\theta) = \sum_k c_k \langle \psi(\theta) | P_k | \psi(\theta) \rangle. Each term needs many shots — typically 10^3 to 10^5 — to average out the sampling variance.
- Optimise classically. Hand E(\theta_k) (and possibly gradient estimates) to a classical optimiser, which computes \theta_{k+1}. Common choices: gradient descent, Adam, SPSA (Simultaneous Perturbation Stochastic Approximation), natural-gradient descent.
- Repeat. Iterate steps 2-4 until the cost converges (or the shot budget is exhausted).
The final \theta^* parameterises a state |\psi(\theta^*)\rangle that is an approximate ground state of H (or an approximate optimum of whatever H encodes).
Why this works on NISQ
The variational framework is not an elegant quirk; it is engineered specifically for the constraints of near-term quantum hardware. Three features make it NISQ-friendly:
Shallow circuits. The ansatz U(\theta) has circuit depth O(10)-O(100), well within the noise horizon of a 10^{-3}-error machine. This is in deliberate contrast to algorithms like Shor's, which require 10^{10} gates.
Noise-robust training. The classical optimiser handles noisy estimates of E(\theta) naturally — any classical stochastic optimiser (SGD, SPSA) is designed for function evaluations with noise. As long as the noise does not bias the minimum badly, the optimiser can converge through the noise.
Problem-matched structure. By choosing an ansatz that reflects the structure of the problem (see ansatz section below), you can concentrate the search in a physically reasonable subspace, making the optimisation tractable even with modest parameter counts.
The variational paradigm trades the generality of universal quantum computation for the engineering feasibility of running on noisy hardware. It is not the asymptotically optimal quantum algorithm for any given problem; it is the best actually-running-today quantum algorithm for many problems.
The variational principle — why minimising the expectation works
The mathematical backbone of VQAs is a theorem you might have seen in a first quantum-mechanics course.
The variational principle
For any Hermitian operator H with ground-state energy E_0 and any normalised state |\psi\rangle,
with equality if and only if |\psi\rangle is a ground state of H.
Reading the definition. \langle \psi | H | \psi \rangle is the expectation value of H in the state |\psi\rangle — the average value you would measure if you prepared the state |\psi\rangle many times and measured H each time. E_0 is the smallest eigenvalue of H. The theorem says: no matter which state you try, the expectation value is at least E_0, and it equals E_0 only when the state is a ground state.
The proof. Expand |\psi\rangle in the eigenbasis of H: |\psi\rangle = \sum_k c_k |E_k\rangle where H|E_k\rangle = E_k |E_k\rangle and E_0 \le E_1 \le E_2 \le \ldots. Then
Why the inequality: every E_k \ge E_0 by the ordering, and |c_k|^2 \ge 0, so replacing each E_k by the smaller E_0 can only decrease the sum. using normalisation \sum_k |c_k|^2 = 1 in the last step.
The big caveat: provided the ansatz can reach the ground state. If your parameterised family \{|\psi(\theta)\rangle : \theta \in \mathbb{R}^p\} does not include any ground state, the minimum of E(\theta) over \theta is still greater than E_0. The gap between that minimum and the true E_0 is called the ansatz error, and it is one of the two sources of error in every VQA (the other being optimisation error — not finding the true minimum of E(\theta)).
Ansatz design — the critical creative choice
The ansatz U(\theta) is the single most important design decision in a VQA. Too expressive, and you drown in parameters and run into barren plateaus. Too restrictive, and the ground state is unreachable. The field has developed three main ansatz families, each suited to different problems.
Unitary Coupled Cluster (UCCSD)
The Unitary Coupled Cluster with Singles and Doubles ansatz is the standard chemistry ansatz. It approximates the ground state by acting on the Hartree-Fock reference state (a classical-chemistry starting guess) with an exponential of "cluster operators":
where T(\theta) = T_1(\theta) + T_2(\theta) contains the single-electron (T_1) and double-electron (T_2) excitations, each with its own parameter. UCCSD is physically motivated: it is directly adapted from classical quantum-chemistry methods, and its parameters correspond to identifiable electronic configurations. But it is deep: implementing e^{T - T^\dagger} on a quantum computer requires Trotter decomposition into many small gates, and a chemistry-scale system quickly exceeds NISQ noise horizons.
Hardware-efficient ansatz
The hardware-efficient ansatz is the opposite philosophy. Take the quantum hardware's native gates (single-qubit rotations and the native two-qubit entangling gate), alternate them in a layered pattern, and trust that a sufficiently expressive parameterised circuit contains a good approximation of the target state.
Typical structure:
- Layer 0: R_y(\theta_1) \otimes R_y(\theta_2) \otimes \cdots \otimes R_y(\theta_n) (single-qubit rotations, one per qubit).
- Layer 1: a ladder of CNOTs or the native two-qubit gates.
- Layer 2: another round of single-qubit rotations with fresh parameters.
- Repeat for L layers.
With L layers and n qubits, this gives \sim Ln parameters. Small enough to optimise, shallow enough to fit the noise horizon. The downside: the ansatz has no physical intuition built in, so the ground state might be reachable only at large L, and large L brings barren plateaus.
QAOA ansatz
The Quantum Approximate Optimization Algorithm uses a problem-inspired alternating structure. For a combinatorial optimisation problem encoded in a Hamiltonian H_P (whose ground state represents the optimum), and a mixer Hamiltonian H_M (typically H_M = -\sum_j X_j, which drives transitions between solutions), the ansatz is:
Two parameters per layer (\gamma_k, \beta_k); p layers total. The appeal: at p \to \infty, QAOA converges to the adiabatic quantum algorithm, which provably finds ground states. At finite p, you get an approximation.
Classical optimisers — matching the landscape
Given E(\theta) as a noisy function to be minimised, the classical optimisation problem is harder than standard ML training because (a) function evaluations are expensive (each evaluation needs many quantum shots); (b) function evaluations are stochastic (shot noise plus NISQ noise); (c) gradient information is expensive or unreliable; (d) the landscape often has many local minima or flat regions.
Standard choices:
- Vanilla gradient descent. Requires estimating \nabla_\theta E(\theta), typically via the parameter-shift rule: \partial_\theta E \approx (E(\theta + \pi/2) - E(\theta - \pi/2))/2. Two circuit evaluations per parameter per step. With 100 parameters, that is 200 circuit evaluations per optimisation step.
- Adam. Same gradient estimation, plus momentum and per-parameter learning-rate adaptation. Usually faster convergence than vanilla GD in noisy settings.
- SPSA (Simultaneous Perturbation Stochastic Approximation). Cleverly estimates the gradient in random directions using only two circuit evaluations per step regardless of parameter count. Much cheaper, at the cost of more noise in the gradient estimate.
- Natural gradient descent. Accounts for the curvature of the parameter manifold via the quantum Fisher information. Converges in fewer steps but each step is much more expensive.
- Gradient-free (Nelder-Mead, COBYLA, BOBYQA). Useful when gradients are unreliable or the parameter count is small.
The practical choice depends on the shot budget, the parameter count, and the noise level.
Worked examples
Example 1: VQE for the hydrogen molecule
Setup. Minimise the molecular Hamiltonian of \text{H}_2 in the minimal STO-3G basis at equilibrium bond distance 0.74 Å. After the Bravyi-Kitaev (or Jordan-Wigner) mapping, the Hamiltonian is a sum of 15 Pauli strings on 4 qubits. The true ground-state energy is -1.137 Hartree (chemistry's unit of energy — approximately 27.2 eV).
Step 1. Ansatz. Use UCCSD with singles and doubles; for 4 qubits the ansatz has 3 independent parameters \theta = (\theta_1, \theta_2, \theta_3). The circuit is 8 two-qubit gates deep. Why 3 parameters: after symmetries (spin and particle number conservation), only 3 of the generic UCCSD amplitudes are independent.
Step 2. Initialise. Set \theta = (0.01, 0.01, 0.01) — small random starting point near the Hartree-Fock reference state.
Step 3. Evaluate. Prepare |\psi(\theta)\rangle on the quantum computer, measure each of the 15 Pauli strings with 10^4 shots each, combine: E(\theta_0) \approx -1.11 Hartree. Above the true ground state.
Step 4. Optimise. Use gradient descent. Compute \partial_{\theta_k} E via parameter-shift for k = 1, 2, 3; update \theta \to \theta - 0.1 \nabla E. Iterate.
Step 5. Converge. After ~50 iterations, E(\theta^*) \approx -1.137 Hartree — within chemical accuracy (1.6 mHartree) of the true value.
Result. VQE reproduces the \text{H}_2 ground-state energy to chemical accuracy using a 4-qubit NISQ machine. This is the smallest VQE demonstration, done on every major NISQ platform since 2014 (Peruzzo et al., IBM, Google, Rigetti, Quantinuum). It is a legitimate — if extremely small — quantum-chemistry calculation.
Example 2: QAOA for MaxCut on a 4-node graph
Setup. Given a graph with 4 nodes and 4 edges forming a cycle (0-1, 1-2, 2-3, 3-0), find the cut (partition of nodes into two sets) that maximises the number of cut edges. For the 4-cycle, the optimal cut is \{0, 2\} vs \{1, 3\}, cutting all 4 edges.
Step 1. Hamiltonian. Encode the problem as H_P = \frac{1}{2} \sum_{(i,j) \in E} (1 - Z_i Z_j). The ground state of -H_P maximises the cut. For the 4-cycle, -H_P = \frac{1}{2}(Z_0 Z_1 + Z_1 Z_2 + Z_2 Z_3 + Z_3 Z_0 - 4). Ground-state energy of -H_P is -4, corresponding to cut value 4. Why the factor of \frac{1}{2}(1 - Z_i Z_j): it evaluates to 1 when Z_i and Z_j have opposite signs (the edge is cut), 0 when same (not cut).
Step 2. Ansatz. Use p = 1 QAOA: |\psi(\gamma, \beta)\rangle = e^{-i\beta H_M} e^{-i\gamma H_P} |+\rangle^{\otimes 4} with mixer H_M = \sum_j X_j. Two parameters (\gamma, \beta).
Step 3. Initialise. Start at (\gamma, \beta) = (\pi/8, \pi/4), a generic starting point.
Step 4. Evaluate. Prepare the state on 4 qubits, measure \langle -H_P \rangle by measuring Z_i Z_j correlators. With 10^4 shots: \langle -H_P \rangle \approx -3.0. Cut value \sim 3.
Step 5. Optimise. Grid-search or gradient-descend over (\gamma, \beta). For p = 1 QAOA on MaxCut on the 4-cycle, the optimum is at approximately (\gamma^*, \beta^*) = (\pi/4, \pi/8), giving cut value \approx 3.5.
Step 6. Assess. Classical algorithms find the exact optimum (4) trivially for this tiny instance. QAOA at p = 1 gives only an approximation ratio of \sim 0.87. At p = 2, the ratio improves; at p \to \infty, QAOA converges to the exact optimum (adiabatic limit).
Result. QAOA on a toy MaxCut instance produces an approximate solution at low depth. Scaling up (larger graphs, more layers) is where the question of quantum advantage lies — and as of 2025 the empirical evidence is that well-tuned classical heuristics beat QAOA on every real-world MaxCut instance that has been tried.
The open problems
Variational algorithms have three hard theoretical issues, any one of which could prevent them from achieving quantum advantage.
Barren plateaus
The problem. McClean et al. (2018) proved that for random parameterised circuits of sufficient depth on n qubits, the variance of the cost gradient is exponentially small in n:
This means the cost landscape becomes exponentially flat — the gradient is indistinguishable from zero — for large systems. The optimiser has no signal to follow and gets stuck.
Why it happens. Deep random circuits approximate 2-designs — distributions of unitaries that look like uniformly random unitaries to 2nd-moment statistics. A random unitary has expectation value equal to its average over the unitary group, with variance that averages down as 1/2^n. So gradient variance goes to zero.
When it bites. Hardware-efficient ansatzes with many layers on many qubits are especially prone. UCCSD and QAOA, which have structure that prevents them from being fully random, are more robust — but even they can exhibit plateau-like behaviour in certain regimes.
Mitigations. Initialise parameters near zero (the "identity" circuit has zero gradient but is a degenerate case the optimiser can escape with a good warm start). Use layer-wise training (train one layer at a time, not the whole thing). Use structured ansatzes (symmetries, local-cost functions) that avoid the 2-design regime.
Noise bias
The problem. NISQ noise does not just add variance to E(\theta) — it can bias the expectation value. Depolarising noise, for instance, pushes every expectation value towards zero; amplitude damping biases towards |0\rangle. The optimiser, minimising the biased cost, converges to a \theta^* that is not the true minimum of the ideal cost.
The question. Does error mitigation correct this? Partially, yes — ZNE and PEC can reduce the bias. But the mitigation overhead grows with depth, so for deeper ansatzes the bias grows.
Unclear quantum advantage
Hype check. Variational quantum algorithms are the near-term-quantum-utility story that dominates press coverage, industry investment, and academic papers. As of 2025, no VQA has demonstrated a clear-cut, rigorously benchmarked quantum advantage over the best classical algorithm on a problem of practical interest. Small \text{H}_2-type demonstrations match classical methods, which is a success for the hardware but not a utility milestone. Larger instances where classical methods struggle are exactly where NISQ noise also struggles — the regimes do not usefully overlap. This is the honest middle: VQAs are interesting research, the dominant NISQ algorithmic approach, and still unproven on any useful task. The 2030s may or may not change this.
Common confusions
"VQAs are quantum machine learning"
VQAs include quantum machine learning (where a parameterised quantum circuit is a trainable model), but they also include VQE (where the objective is a physics ground state, not a ML loss function) and QAOA (where the objective is a combinatorial optimum). QML is one subfamily.
"Variational means approximate"
At finite ansatz expressiveness and finite optimisation effort, yes — VQAs return approximate answers. But with enough expressiveness and enough optimisation, the approximation can in principle be arbitrarily close to exact. "Variational" refers to the method (minimise over a family), not the precision.
"Barren plateaus only affect random ansatzes"
Deep random circuits are the worst case, but structured ansatzes (hardware-efficient at moderate depth, QAOA with many layers, certain UCCSD variants on larger systems) can also exhibit plateau phenomena. Barren plateaus are a concern for nearly every sufficiently deep ansatz.
"VQE always matches the true ground-state energy"
VQE matches the ground state only if (a) the ansatz contains the ground state in its parameterised family, and (b) the optimiser actually finds it. For real molecules, even UCCSD may not contain the exact ground state (UCCSD is an approximation); and NISQ optimisation often settles at a local minimum. Real-world VQE always has both ansatz error and optimisation error.
"Running a VQA on a quantum computer is faster than classical simulation"
The quantum computer's role is to evaluate one particular expectation value. For small systems (under, say, 40 qubits), classical simulation of the same ansatz is fast and exact — no quantum hardware needed. VQAs only potentially beat classical methods at problem sizes where classical simulation of the trial state is infeasible, which is large. And at large sizes, NISQ noise takes over. The window where VQAs beat classical is narrow and empirically open.
The India angle
Indian industry and academia are active in the variational-algorithm space. QpiAI (Bangalore) is an NQM-aligned quantum-computing startup with VQE and QAOA software products. TCS Research has published on QAOA variants, noise-aware ansatz design, and error-mitigated VQE. IISc Bangalore has a quantum-algorithms group working on adaptive ansatzes and barren-plateau mitigation. IIT Madras and IIT Bombay both have variational-algorithm PhD programmes accessing IBM Quantum Network hardware.
The NQM's algorithm thrust explicitly funds NISQ-era software, recognising that domestic hardware is not yet at the scale where advanced algorithms could be tested — so the Indian algorithm community runs mostly on foreign cloud hardware while the NQM hardware hubs scale up. By the late 2020s, expect an Indian VQE/QAOA stack integrated into the NQM hardware programmes at IIT Madras and TIFR.
Going deeper
The rest of this chapter concerns the variational principle formally, the McClean-et-al. barren-plateau theorem, natural-gradient optimisation, adaptive ansatzes (ADAPT-VQE), the theoretical limits of QAOA at finite depth, and the honest benchmarking literature for VQA-vs-classical comparisons. This is the research-level view, useful for a student considering a PhD in near-term quantum algorithms.
The variational principle as calculus of variations
The principle \langle \psi | H | \psi \rangle \ge E_0 is the finite-dimensional version of the Rayleigh-Ritz principle from calculus of variations. In the infinite-dimensional setting (e.g. Schrödinger ground states), the same inequality holds, and classical quantum-chemistry methods (Hartree-Fock, DFT, configuration interaction) are all specialised variational methods with specific ansatz families. VQAs bring this classical tradition into the quantum-hardware era, using quantum states themselves as the trial family.
The barren-plateau theorem (McClean et al. 2018)
Consider a parameterised circuit U(\theta) = \prod_k V_k e^{-i\theta_k H_k} where each H_k is a single Pauli. If the parameterised sub-blocks form a 2-design at depth L, then for any traceless observable O:
Why 2^{2n}: the dimension of the 2-design space scales as \dim(U(2^n))^2 \sim 2^{4n}, but the variance estimate picks up the particular 2^{2n} coming from the trace over the operator. which is exponentially small in n. Hardware-efficient ansatzes of depth \ge O(n) are essentially guaranteed to hit this regime.
Mitigations (post-2018 research):
- Identity-initialised parameters (\theta = 0 throughout): keeps the circuit close to identity where gradients are non-zero by construction; but gets stuck if identity is a local minimum.
- Layer-wise training: optimise layer 1 first, freeze, then layer 2, etc. Each sub-optimisation is in a smaller parameter space less prone to plateaus.
- Local cost functions: Cerezo et al. 2020 showed local observables (\langle Z_1 Z_2 \rangle rather than full-system \langle H \rangle) retain gradients longer.
- Symmetry-preserving ansatzes: restrict to the subspace of symmetry-respecting states. Works for problems with conserved quantities.
- Adaptive ansatzes (ADAPT-VQE, iterative VQE): grow the ansatz one operator at a time, chosen to maximise gradient; stops before the plateau regime.
Natural-gradient descent and the quantum Fisher information
The Euclidean gradient \nabla_\theta E is the direction of steepest descent in \theta-space. But the natural geometry of parameterised quantum states is not Euclidean — it is curved by the quantum Fisher information matrix F_{ij}(\theta). The natural gradient, F^{-1} \nabla_\theta E, gives steepest descent in the intrinsic geometry and converges in fewer iterations. Cost: estimating F requires extra circuit evaluations, making each step more expensive.
QAOA theoretical guarantees
At p = 1 on 3-regular graphs, QAOA achieves a provable approximation ratio of 0.6924 for MaxCut (Farhi et al. 2014). Classical Goemans-Williamson achieves 0.878. So QAOA at p = 1 is worse than the best classical algorithm.
At p \to \infty, QAOA converges to the adiabatic algorithm, which finds exact solutions. But the required p scales with problem complexity, and at each p you need O(n) extra depth — making large-p QAOA incompatible with NISQ.
The sweet spot (small p but large enough to beat classical) has not been demonstrated on any real-world problem of practical interest. This is the central open question of QAOA.
Honest benchmarks — VQA vs classical
A growing literature carefully benchmarks VQAs against tuned classical methods. Typical findings (2024-2025):
- Small-molecule VQE: matches classical methods, within the accuracy each provides. Not a speedup.
- Medium-molecule VQE (20+ qubits): NISQ noise dominates; classical CCSD or DMRG gives better results.
- Small-instance QAOA on MaxCut: tuned simulated annealing typically matches or beats.
- Quantum ML on toy datasets: sometimes competitive, no clear-cut advantage, heavily dependent on problem structure.
- Quantum kernels for hand-picked datasets: some evidence of advantage, disputed empirically.
The research community has broadly moved from "VQAs are the near-term quantum revolution" (2016-2019) to "VQAs are interesting, with specific open questions, and honest benchmarking should continue" (2022-2025). This is healthy.
Where this leads next
The specific variational algorithms get their own chapters from here: VQE for chemistry ground-state calculations; QAOA for combinatorial optimisation; quantum machine learning for the ML-inspired variational family; and barren plateaus for the theoretical obstacle that sits at the heart of the whole programme.
Beyond variational algorithms, the next arc of the curriculum covers quantum error mitigation in detail, and then the transition to fault-tolerant quantum algorithms (where circuits can be deep and the variational compromise becomes unnecessary).
References
- Alberto Peruzzo et al., A variational eigenvalue solver on a photonic quantum processor (Nature Communications, 2014) — arXiv:1304.3061.
- Edward Farhi, Jeffrey Goldstone, Sam Gutmann, A Quantum Approximate Optimization Algorithm (2014) — arXiv:1411.4028.
- Jarrod McClean, Sergio Boixo, Vadim Smelyanskiy, Ryan Babbush, Hartmut Neven, Barren plateaus in quantum neural network training landscapes (Nature Communications, 2018) — arXiv:1803.11173.
- M. Cerezo et al., Variational Quantum Algorithms (Nature Reviews Physics, 2021) — arXiv:2012.09265.
- John Preskill, Lecture Notes on Quantum Computation, Chapter 7: Quantum Error Correction and NISQ algorithms — theory.caltech.edu/~preskill/ph229.
- Wikipedia, Variational quantum eigensolver.