VQE in Practice — padho-wiki

Note: Company names, engineers, incidents, numbers, and scaling scenarios in this article are hypothetical — even when they resemble real ones. See the full disclaimer.

In short

Running VQE in practice means walking a ten-step pipeline: map the molecular Hamiltonian to qubit operators (Jordan-Wigner or Bravyi-Kitaev), decompose it into a sum of Pauli strings, pick an ansatz, compile the ansatz to the hardware's native gate set, group commuting Paulis so you can measure many at once, allocate shots across the groups, run the ansatz on the quantum computer, measure, classically combine the results to get E(\theta), hand E(\theta) to a classical optimiser (COBYLA, L-BFGS-B, SPSA, Adam are the big four), and iterate. A typical \text{H}_2 run uses \sim 10^4 shots per expectation-value estimate, \sim 100 optimiser iterations, and produces \sim 10^6 circuit executions — minutes of hardware time. A typical \text{LiH} run uses 10^7–10^8 total shots. Error mitigation is critical: zero-noise extrapolation (ZNE), probabilistic error cancellation (PEC), and measurement readout calibration each recover lost accuracy. Gradients come from the parameter-shift rule: \partial_\theta \langle H \rangle = \frac{1}{2}[\langle H \rangle(\theta + \pi/2) - \langle H \rangle(\theta - \pi/2)], exact with two extra circuit runs. Benchmarks: VQE matches classical CCSD for \text{H}_2, \text{LiH}, \text{BeH}_2, \text{H}_4 (Kandala et al. 2017; Compustar, Querion, Quantinuum 2017–2024). It does not convincingly beat classical methods on any molecule of industrial interest as of 2026. Molecules like FeMoCo — the holy grail — need fault-tolerant quantum computing, not NISQ VQE. This chapter is the practical engineering: the pipeline, the shot budget, the optimiser choice, the noise mitigation, and the honest benchmark ladder.

You have the idea (VQE the idea). You have the ansatz choices (VQE ansätze). Now you want to actually run VQE on a real quantum computer — on Compustar's 127-qubit Heron processor, or Quantinuum's H2 trapped-ion machine, or IonQ's Forte. What does a VQE run actually look like when you press "go"?

This chapter is the nuts-and-bolts: the pipeline, the shot budget, the classical optimiser choice, the error mitigation stack, the real benchmarks. We will follow a \text{H}_2 run from start to finish on the Compustar Quantum platform, then zoom out to the question everybody asks: has VQE produced a useful result yet? The answer — honest answer — is not yet.

The pipeline, step by step

Running VQE is orchestration. The quantum computer is one subsystem out of many; the bulk of the work is classical: Hamiltonian construction, Pauli-string decomposition, measurement grouping, compiler passes, optimiser logic, error mitigation. Here is the full workflow.

The VQE pipeline. Steps 1–4 are classical pre-processing; step 5 is compilation; step 6 runs on the quantum processor; step 7 mitigates noise; step 8 is the classical optimiser, which loops back to step 6 with updated parameters until convergence. The outer loop runs $\mathcal{O}(100)$–$\mathcal{O}(1000)$ iterations; the inner quantum circuit runs $10^3$–$10^5$ shots per iteration per measurement group.

Walk through each step once and the whole thing is transparent.

1. Build the Hamiltonian

Classical quantum chemistry software — PySCF, Psi4, OpenFermion-PySCF — computes the one- and two-electron integrals over your chosen basis (STO-3G for small molecules, cc-pVDZ for more accuracy, larger basis sets for production chemistry). These integrals define the fermionic Hamiltonian

H_{\text{mol}} = \sum_{pq} h_{pq} a_p^\dagger a_q + \frac{1}{2} \sum_{pqrs} h_{pqrs} a_p^\dagger a_q^\dagger a_r a_s.

For \text{H}_2 in STO-3G this is 4 spin-orbitals; for \text{LiH} in STO-3G it is 12 (reducible to 4 after active-space reduction); for \text{BeH}_2, 14.

2. Map to qubits

The fermionic operators a_p, a_p^\dagger do not act on qubits directly. Two standard maps:

Jordan-Wigner (JW): a_p = \frac{1}{2}(X_p + i Y_p) Z_{p-1} Z_{p-2} \cdots Z_0. Simple; each fermion maps to one qubit; long Z strings.
Bravyi-Kitaev (BK): more clever encoding with \mathcal{O}(\log N) weight instead of \mathcal{O}(N). Shorter strings, but more confusing book-keeping.

Qiskit-Nature handles this map for you; most VQE papers use JW for clarity.

3. Pauli decomposition

After the map, H becomes a sum of Pauli strings:

H = \sum_i c_i P_i, \qquad P_i \in \{I, X, Y, Z\}^{\otimes n}.

For \text{H}_2 in STO-3G after JW: 15 Pauli terms on 4 qubits. For \text{LiH}: around 100 terms. For \text{BeH}_2: around 200. For a medium molecule in a decent basis: tens of thousands.

4. Group commuting Paulis

Here is a key practical optimisation. A single quantum circuit run, followed by measurement, can give you \langle P \rangle for one Pauli operator P. If all your Paulis commute, you can measure them all at once — because commuting operators share an eigenbasis. If some don't commute, you need at least one circuit per commuting group.

Two types of commuting:

Qubit-wise commuting (QWC): two Paulis commute qubit-by-qubit — IX and ZX are QWC because they agree on qubit 1, and I commutes with Z. Measuring QWC groups requires only a change of basis on each qubit (e.g. apply H before measurement to measure X in the Z-basis).
General commuting: two Paulis commute as operators without being QWC — X_1 X_2 and Z_1 Z_2 both commute (both have even weight in each basis). Measuring general commuting groups requires a Clifford circuit to rotate into their shared eigenbasis.

Good grouping reduces the number of circuit runs from 100s of Paulis to \sim 5–30 groups — a \sim 10\times shot reduction.

5. Compile the ansatz

Your ansatz was written in abstract gates (UCCSD fermionic exponentials, HEA R_y + CNOT, etc). The hardware has specific native gates: Compustar processors use \sqrt{X}, R_z(\theta), and CNOT (or ECR). Quantinuum uses native ZZ(\theta). IonQ uses R_{xx}(\theta). The Qiskit transpile() pass rewrites the abstract circuit into native gates, optimises for circuit depth and gate count, and routes two-qubit gates to respect the hardware's connectivity.

Transpile settings have a real effect on VQE quality. optimization_level=3 in Qiskit typically halves the CNOT count versus level 0.

6. Run on the quantum processor

For each commuting group of Paulis, for each iteration of the classical optimiser:

Prepare the ansatz U(\theta) starting from |0\rangle^{\otimes n}.
Rotate to the group's shared measurement basis.
Measure in the computational basis.
Record the bit string.

Repeat N_{\text{shots}} times. Typical N_{\text{shots}}: 10^3–10^5 per group. Each shot takes \sim 1–10 microseconds on a trapped-ion machine, \sim 1–10 milliseconds on a superconducting system including reset time.

7. Apply error mitigation

The raw expectation values are biased by noise. The mitigation stack recovers much of the loss:

Readout calibration: characterise the measurement-error matrix M with separate calibration runs; invert it to correct measurement biases.
Zero-noise extrapolation (ZNE): run the same circuit at several artificially amplified noise levels (e.g. by inserting pairs of CNOTs that should cancel), extrapolate to zero noise. Typically 3 noise levels: 1\times, 3\times, 5\times.
Probabilistic error cancellation (PEC): characterise each gate's noise as a quasi-probability channel, then invert via classical post-processing of many circuits with randomly-inserted error-correction twirls.

8. Classical optimiser update

Combine the Pauli-group expectation values into E(\theta_k). Feed to the optimiser; get \theta_{k+1}. Loop back to step 6. Convergence check: is |E(\theta_{k+1}) - E(\theta_k)| < \epsilon for several iterations? If yes, stop.

Shot budget — where does the time actually go?

Step	Cost per iteration	Typical VQE run ($\sim 200$ iter)
$\text{H}_2$, 15 Paulis → 5 groups, $10^4$ shots/group	$5 \times 10^4$ shots	$10^7$ shots total
$\text{LiH}$, 100 Paulis → 25 groups, $10^4$ shots/group	$2.5 \times 10^5$ shots	$5 \times 10^7$ shots total
$\text{BeH}_2$, 666 Paulis → 60 groups, $10^5$ shots/group	$6 \times 10^6$ shots	$10^9$ shots total

At a 1-millisecond per-shot execution time on a typical superconducting machine, 10^9 shots is 10^6 seconds, or 11 days of pure quantum time. This is why \text{BeH}_2 VQE runs are rare experiments, not routine calculations.

Classical optimisers — matching the landscape to the algorithm

The classical optimiser is the second half of VQE. It must handle function evaluations that are noisy, expensive, and sometimes poorly-conditioned. No single optimiser wins everywhere; the right choice depends on your shot budget, parameter count, and noise level.

COBYLA — the default

Constrained Optimization By Linear Approximation. Derivative-free. Builds a local linear model of the cost using function values, solves a trust-region subproblem, iterates.

Why it is the default: needs only E(\theta), not gradients. Robust to moderate noise. Qiskit's default VQE optimiser.

Limitations: scales poorly past \sim 50 parameters; can get stuck in local minima.

L-BFGS-B — gradient-based, quasi-Newton

Limited-memory BFGS with box constraints. Uses gradients to build an approximation of the Hessian; takes Newton-like steps.

Why you would use it: fastest convergence when gradients are available and clean. Good for small ansätze on simulators (no shot noise).

Limitations: gradient estimates via parameter-shift are exact in the noise-free limit but noisy in practice. L-BFGS-B can diverge when gradients are noisy.

SPSA — the NISQ workhorse

Simultaneous Perturbation Stochastic Approximation. Picks a random direction in parameter space, evaluates the cost at \theta + c \mathbf{d} and \theta - c \mathbf{d}, takes a step along (E_+ - E_-) \mathbf{d}.

Why it is popular: gradient cost is 2 circuit evaluations per step, regardless of parameter count. (Contrast with parameter-shift, which needs 2p evaluations.) Naturally noise-tolerant.

Limitations: noisier gradient estimate than parameter-shift; may need more iterations.

Adam — momentum-based, from ML

Adaptive Moment Estimation. Uses gradient estimates plus per-parameter running averages of gradient magnitude; adapts step size per parameter.

Why you would use it: strong performance in the ML literature; handles ill-conditioned landscapes well. Gaining traction for quantum ML and ADAPT-VQE.

Limitations: many hyperparameters to tune; may overshoot for small-parameter VQE.

The parameter-shift rule — exact gradients on hardware

For any gate of the form e^{-i \theta P/2} with P a Pauli-like operator, the derivative of a measured expectation value satisfies:

\frac{\partial}{\partial \theta} \langle H \rangle (\theta) = \frac{1}{2} \left[ \langle H \rangle \left( \theta + \frac{\pi}{2} \right) - \langle H \rangle \left( \theta - \frac{\pi}{2} \right) \right].

Why this works: e^{-i\theta P/2} has eigenvalues e^{\pm i\theta/2}. A direct differentiation of \langle \psi(\theta) | H | \psi(\theta) \rangle picks up \sin\theta and \cos\theta terms which can be rewritten as the shifted evaluations. The factor 1/2 and the \pi/2 shift fall out of the algebra. See Schuld et al. 2019 for the derivation.

This gives exact gradients on quantum hardware, subject only to shot noise. No finite-difference h to pick; no systematic bias. Cost: 2p extra circuit runs per gradient estimate (where p is the parameter count).

The parameter-shift rule. Evaluate the cost at two shifted parameter values, $\theta + \pi/2$ and $\theta - \pi/2$. Half their difference is the exact derivative at $\theta$. No finite-difference truncation error: the formula is algebraically exact for any gate of the form $e^{-i\theta P/2}$ where $P$ has eigenvalues $\pm 1$ (all Pauli rotations, most common parameterised gates).

Noise mitigation — not optional on NISQ

Raw VQE on NISQ hardware returns energies that are systematically biased upward by noise. The mitigation stack recovers most of the loss, at the cost of extra shots.

Readout calibration

Measurement is the noisiest operation on most quantum hardware (error rates of 1–5\%). The fix: before the VQE run, characterise the measurement-error matrix by preparing each computational-basis state |x\rangle many times and measuring. This builds a matrix M_{yx} = P(\text{measured } y \mid \text{prepared } x). Inverting M (or applying iterative Bayesian unfolding to be more robust) lets you correct the raw measurement counts.

Cost: \sim 2^n calibration runs initially, negligible during the main VQE loop.

Zero-noise extrapolation (ZNE)

The idea: if you knew the cost at two or three different noise levels, you could extrapolate to zero noise. How to vary the noise: take the original circuit and fold some of its gates. Replace each CNOT G with G G^\dagger G (three CNOTs instead of one; algebraically identical; physically noisier by a factor of 3). Run at noise-scale 1\times, 3\times, 5\times; fit a linear or exponential curve through the three energies; evaluate at zero.

ZNE typically recovers 50–90\% of the noise bias at a cost of 3\times to 5\times more shots.

Zero-noise extrapolation. Run VQE at three folded noise scales ($1\times, 3\times, 5\times$). Each raw energy is above the exact $-1.137\,\text{Ha}$ because noise biases the cost upward. Fit a linear (or Richardson) extrapolation and evaluate at $\lambda = 0$. The extrapolated value ($-1.135\,\text{Ha}$) is within chemical accuracy ($1.6\,\text{mHa}$) of exact — a strong recovery from the raw $-1.09\,\text{Ha}$.

Probabilistic error cancellation (PEC)

PEC takes noise mitigation a step further. Characterise each gate's noise as a quasi-probability decomposition: the noisy channel \tilde G equals a linear combination (with some coefficients possibly negative) of error-correction operations that could be applied. Sample from the quasi-probability distribution: on each shot, insert a random error-correction twirl, weight the shot by the sign of the coefficient, average. The result is an unbiased estimator of the noise-free expectation value — at the cost of much higher variance (and thus more shots).

PEC requires a detailed characterisation of the hardware noise and scales poorly with circuit depth. Current production uses are mostly ZNE + readout calibration; PEC is research-grade as of 2026.

Symmetry verification

For chemistry, the true ground state has a specific particle number and spin. Measure the total \hat N = \sum_p a_p^\dagger a_p operator alongside H; discard any shot whose measured particle number is wrong. This post-selection removes many noise events at no extra circuit cost. Used in most production VQE workflows.

Benchmarks — the honest ladder

As of early 2026, the VQE benchmark ladder looks like this:

Molecule	Qubits	Experiment	Accuracy	Classical comparator
\text{H}_2	2–4	Peruzzo 2014, every platform since	\sim 1\,\text{mHa}	FCI exact; CCSD matches
\text{HeH}^+	2–4	Kandala 2017	\sim 1\,\text{mHa}	FCI exact
\text{LiH}	4–12	Kandala 2017; Compustar 2019	\sim 1\,\text{mHa} at equilibrium	CCSD matches; DMRG better
\text{BeH}_2	6–14	Kandala 2017	\sim 10\,\text{mHa}	CCSD(T) better
\text{H}_4	8	Querion 2020; others	\sim 5\,\text{mHa}	Exact diagonalisation still tractable
\text{H}_2\text{O}	12–14	Multiple, 2023+	\sim 10–50\,\text{mHa}	CCSD(T) significantly better
\text{N}_2	16–20	Research-only	\sim 50\,\text{mHa} at stretched bond	Strong correlation; classical multi-reference needed
FeMoCo / industrial catalysts	40+	Not feasible on NISQ	—	Requires fault-tolerant QC

The headline: VQE reaches chemical accuracy on molecules where classical methods (CCSD, CCSD(T), DMRG) are already exact or near-exact. It does not convincingly beat classical methods on any molecule of industrial interest.

The important molecule — FeMoCo, the iron-molybdenum cofactor of nitrogenase, the reason the "\text{N}_2 \to \text{NH}_3 fertiliser problem" is interesting — needs fault-tolerant quantum computing (millions of physical qubits under surface-code error correction), not NISQ VQE. Reiher et al. 2017 estimated the resource requirement; no current machine is even in the same order of magnitude.

This is not a failure of VQE. It is the current state of the engineering: NISQ hardware can do small chemistry as a demonstration, not as a scientific instrument. The expectation is that the 2030s will bring early fault-tolerant machines; the algorithm-design work happening now is what will be deployed on them.

Worked examples

Example 1: A complete VQE run on $\text{H}_2$ via Qiskit

Setup. Find the ground-state energy of \text{H}_2 at equilibrium bond length 0.74\,\text{Å} using VQE with UCCSD ansatz and COBYLA optimiser on Compustar's ibm_brisbane (Heron, 127-qubit superconducting processor).

Step 1. Build the Hamiltonian. Use Qiskit-Nature:

driver = PySCFDriver(atom='H 0 0 0; H 0 0 0.74', basis='sto3g')

problem = driver.run()

mapper = JordanWignerMapper()

qubit_op = mapper.map(problem.second_q_ops()[0])

This yields a 4-qubit Hamiltonian with 15 Pauli terms. The numerical coefficients depend on the integrals but a typical decomposition has one dominant identity-coefficient (the nuclear repulsion and one-body terms) and 14 smaller two-body terms.

Step 2. Group the Paulis. Qiskit's group_commuting function reduces 15 terms to 5 QWC groups. 5 circuits per energy evaluation.

Step 3. Build the UCCSD ansatz.

ansatz = UCCSD(num_spatial_orbitals=2, num_particles=(1,1), qubit_mapper=mapper)

3 parameters; compiled depth $\sim 8$ CNOTs.

Step 4. Transpile. transpile(ansatz, backend=backend, optimization_level=3). Depth reduces to \sim 7 CNOTs after circuit optimisation.

Step 5. Set up the optimiser.

optimizer = COBYLA(maxiter=200)

vqe = VQE(estimator=Estimator(), ansatz=ansatz, optimizer=optimizer)

result = vqe.compute_minimum_eigenvalue(qubit_op)

Step 6. Run. \sim 150 COBYLA iterations × 5 groups × 10^4 shots = 7.5 \times 10^6 total shots. On Heron at \sim 1 ms per shot including reset: \sim 2 hours of queue-to-result time (most of which is queueing, not actual circuit execution).

Step 7. Apply error mitigation. Enable measurement readout calibration and ZNE via the Qiskit Runtime primitives.

Step 8. Read off the energy. E(\theta^*) \approx -1.135\,\text{Ha} with mitigation; raw (no mitigation) would be \sim -1.09\,\text{Ha}.

Result. VQE on \text{H}_2 at equilibrium reaches chemical accuracy (1.6\,\text{mHa}) of the exact energy -1.137\,\text{Ha}, matching classical FCI and CCSD. The run is a faithful demonstration of the whole pipeline: Hamiltonian decomposition, grouping, ansatz design, compilation, execution with mitigation, classical optimisation.

What this shows. VQE for \text{H}_2 is a solved problem — every major platform has done it, and the infrastructure (Qiskit-Nature, pennylane-qchem) automates most of the pipeline. It is a useful teaching instance but not a scientifically new result. The interesting question for VQE in 2026 is not "can you do \text{H}_2?" but "can you do \text{N}_2 at stretched bond length on real hardware?" — and the answer is "not yet convincingly."

Example 2: Zero-noise extrapolation for a noisy $\text{H}_2$ run

Setup. You ran the VQE pipeline above but on a machine with a higher-than-usual error rate (10^{-2} per CNOT instead of 10^{-3}). Raw convergence returns E_{\text{raw}} = -1.09\,\text{Ha} — 47\,\text{mHa} above the true value, well outside chemical accuracy.

Step 1. Build the noise-folded circuits. Take the converged ansatz U(\theta^*). Create three circuits:

U(\theta^*) (scale \lambda = 1)
U(\theta^*) with each CNOT replaced by \text{CNOT} \cdot \text{CNOT}^\dagger \cdot \text{CNOT} (scale \lambda = 3)
U(\theta^*) with each CNOT replaced by a 5-fold fold (scale \lambda = 5)

Step 2. Measure at each scale. Run each folded circuit, compute E(\lambda) for each.

E(1) = -1.09\,\text{Ha}
E(3) = -1.06\,\text{Ha}
E(5) = -1.03\,\text{Ha} Why the energy drifts higher with more noise: depolarising noise biases expectation values toward zero; for a Hamiltonian whose ground-state energy is negative, "toward zero" means upward (less negative).

Step 3. Fit linear extrapolation. Linear fit through the three points: E(\lambda) = -1.105 + 0.015 \lambda. Extrapolate: E(0) = -1.105\,\text{Ha}.

Step 4. Compare fit models. The linear fit may be biased if the noise model is actually exponential. Try an exponential fit E(\lambda) = a + b \cdot e^{c\lambda}: yields E(0) = -1.130\,\text{Ha}.

Step 5. Report. Use the exponential ZNE (usually more accurate than linear for depolarising noise): E_{\text{mitigated}} = -1.130\,\text{Ha}.

Result. ZNE recovered \sim 40\,\text{mHa} of the 47\,\text{mHa} noise bias, bringing the result to within 7\,\text{mHa} of exact — still above chemical accuracy but much closer than raw. Combined with readout calibration (saves another \sim 5\,\text{mHa}) and symmetry verification, a high-quality \text{H}_2 VQE can reach chemical accuracy even on noisy hardware.

What this shows. Error mitigation is not optional for NISQ VQE. The raw hardware cannot deliver chemical accuracy on anything beyond trivial molecules; mitigation is what makes VQE a credible demonstration. The cost of ZNE (3\times to 5\times more shots) is negligible compared to what is saved in accuracy.

Common confusions

"VQE beats classical quantum chemistry"

No — not as of 2026. VQE matches classical CCSD and CCSD(T) on \text{H}_2, \text{LiH}, \text{BeH}_2, \text{H}_4. It is outperformed by classical methods on \text{H}_2\text{O}, \text{N}_2, and every molecule of industrial interest (drug targets, catalysts, materials). The NISQ era is the benchmark-and-validate phase for VQE; the production phase is the fault-tolerant era.

"Shots are free"

Shots are the single dominant cost of VQE. A medium-scale VQE run uses 10^7–10^9 total shots. At a typical Compustar pricing of roughly 1.60\,\text{USD} per second of hardware time (2024 pricing), a billion shots at 1 ms each is a \$4000 calculation — comparable to a multi-month classical run on a supercomputer. VQE is not cheap; it is only cheap relative to its potential future applications.

"The optimiser finds the global ground state"

It finds a local minimum of the cost landscape. For nontrivial Hamiltonians, the landscape has many minima. Restarts from multiple random initial parameters are standard practice; the best of many restarts is reported. VQE convergence does not guarantee ground-state convergence.

"Noise always hurts"

Usually, yes — but not always monotonically. Small amounts of decoherence can help escape local minima (a form of noise-induced regularisation). On balance noise bias is the dominant effect and mitigation helps.

"Running VQE on simulator and on hardware give similar results"

They can differ dramatically. The simulator is noiseless; hardware has gate errors, readout errors, decoherence, crosstalk. The gap between simulator and hardware is where most of VQE's engineering effort goes: compiler optimisations, ansatz choice, error mitigation. A successful VQE paper is one where the hardware result matches the simulator result; a failed one is where they diverge by tens of \text{mHa}.

The Indian angle

Indian industry and academia are actively running VQE workflows. QpiAI (Bangalore) hosts a full VQE pipeline as part of its drug-discovery platform, running on Compustar Quantum Network hardware with in-house error mitigation. TechSetu Research has published on noise-aware VQE compilation, including ansatz-level error-mitigation strategies, and participates in Compustar Quantum Network collaborations. IIT Bombay and IISc Bangalore both have active VQE experimental programmes accessing Compustar Quantum hardware. IIT Madras's Centre for Quantum Information, Communication and Computing runs VQE on problems including vibrational-mode-coupling chemistry (a niche but tractable NISQ problem). The National Quantum Mission's applications thrust explicitly funds chemistry-on-NISQ pilot programmes at these institutions. Expect, by 2028, an Indian-ownership VQE software stack (parallel to Qiskit) integrated with the NQM hardware hubs at IIT Madras and TIFR.

Going deeper

The rest of this chapter is for readers heading into NISQ-algorithm research: a careful review of error mitigation (asymptotics, bias-variance trade-offs), adaptive shot allocation (allocate more shots to larger-coefficient Pauli terms for better variance per second of hardware time), symmetry-verification post-selection, CAS-SCF orbital optimisation (reducing the effective Hilbert space before VQE runs), VQE for excited states (subspace expansion, VQD), and hardware-aware compilation (ansatz choice driven by the backend's noise map). This is the engineering edge where practical VQE research lives in 2026.

Error mitigation — a closer look

The error-mitigation stack has grown rich: readout calibration, ZNE, PEC, Clifford data regression, virtual distillation, symmetry verification. The order in which you apply them matters. Typical production order:

Symmetry verification (cheapest; discards obviously-wrong shots).
Readout calibration (one-time overhead; large first-order correction).
ZNE (variable cost, 3–5\times shot overhead; moderate-to-large bias correction).
PEC (expensive, can increase variance by orders of magnitude; highest-quality correction).

Cai, Babbush, Benjamin, Endo, Huggins, Li, McClean and O'Brien (2023) wrote the definitive error-mitigation review.

Adaptive shot allocation

If your Hamiltonian is H = \sum_i c_i P_i, the variance of the estimator of \langle H \rangle is \text{Var}[\langle H \rangle] = \sum_i c_i^2 \text{Var}[\langle P_i \rangle]. Since \text{Var}[\langle P_i \rangle] depends on the number of shots for that term, the optimal shot allocation is to allocate shots proportional to |c_i|, not uniformly. For \text{LiH} this gives a \sim 30\% reduction in total shot budget for the same accuracy. Rubin, Babbush and McClean (2018) formalised the analysis.

Symmetry verification for post-selection

For a N-electron system, every physical trial state has \hat N-expectation =N. Noise can flip this by excitations or de-excitations. Measure \hat N alongside H on each shot; discard any shot where the measured \hat N \ne N. Bonet-Monroig, Sagastizabal, Singh and O'Brien (2018) showed this can eliminate \sim 50\% of noise events at no extra circuit cost. Commonly used in production.

VQE for excited states

VQE as stated finds the ground state. For excited states, use quantum subspace expansion (compute the matrix elements of H in a subspace spanned by the ground state and a few extra states, diagonalise classically) or variational quantum deflation (VQD) (run VQE again with an extra cost term penalising overlap with the already-found states).

CAS-SCF orbital optimisation

For chemistry, you can reduce the effective Hilbert space before VQE runs by identifying the "active" orbitals (those where correlation matters most) and freezing the rest. This is complete active space self-consistent field (CAS-SCF) — a classical pre-processing step. A 20-qubit VQE becomes a 6-qubit VQE after good active-space selection; the reduced calculation is faster and more accurate. Every production VQE uses some form of active-space reduction.

Hardware-aware compilation

The best ansatz depends on the hardware's specific noise map. Qubits with high T_1 and low CNOT error host data qubits; high-error qubits host ancillae or are avoided. Compilers like mthree, noise-aware-transpile, and commercial backends (Quantinuum's TKET) do this routing.

Where this leads next

QAOA the algorithm — the optimisation sibling of VQE, using the same variational loop but for combinatorial problems.
Error mitigation — the full survey of ZNE, PEC, Clifford data regression, virtual distillation, and more.
Classical optimizers — COBYLA, SPSA, L-BFGS-B, Adam, natural gradient, and the landscape of derivative-free methods for noisy cost functions.
Parameter-shift rule — the derivation and its extensions to general multi-eigenvalue gates.
ADAPT-VQE — the adaptive variant of VQE that often performs better on real hardware for mid-sized molecules.

References

Abhinav Kandala et al., Hardware-efficient variational quantum eigensolver for small molecules and quantum magnets (Nature, 2017) — arXiv:1704.05018.
Alberto Peruzzo et al., A variational eigenvalue solver on a photonic quantum processor (Nature Communications, 2014) — arXiv:1304.3061.
M. Cerezo et al., Variational Quantum Algorithms (Nature Reviews Physics, 2021) — arXiv:2012.09265.
Abhinav Kandala et al., Error mitigation extends the computational reach of a noisy quantum processor (Nature, 2019) — arXiv:1805.04492.
Qiskit Textbook, Variational Quantum Eigensolver — official tutorial.
Wikipedia, Variational Quantum Eigensolver.