VQE Ansätze — padho-wiki

Note: Company names, engineers, incidents, numbers, and scaling scenarios in this article are hypothetical — even when they resemble real ones. See the full disclaimer.

In short

A VQE ansatz is the parameterised quantum circuit U(\theta) that defines your trial states |\psi(\theta)\rangle = U(\theta)|0\rangle^{\otimes n}. It is the "model" the quantum computer fits, exactly like a neural-network architecture is the model the classical computer fits in deep learning — the optimiser tunes parameters, the ansatz defines what is tunable. Four families dominate: Unitary Coupled Cluster (UCCSD), adapted from classical quantum chemistry, physically motivated, deep, accurate but NISQ-hostile; hardware-efficient (HEA), a layered stack of native single-qubit rotations and native two-qubit entanglers, shallow and NISQ-friendly, expressive but prone to barren plateaus where gradients vanish exponentially in qubit count; adaptive ansätze (ADAPT-VQE), which grow the circuit one operator at a time from a pool, stopping when gradients saturate — compact by construction; and Hamiltonian Variational / problem-inspired ansätze, which bake in the structure of the Hamiltonian itself. The design trilemma is the whole story: expressiveness (can the ansatz reach the ground state?), trainability (does the cost landscape have a useful gradient?), and noise resilience (can the circuit survive NISQ gate errors?). Pick any two — the third is the price. Symmetry-preserving ansätze (those that conserve particle number, spin, parity) outperform generic ones on chemistry. Deeper is not automatically better: the barren-plateau theorem (McClean et al. 2018) says the cost variance on random deep circuits is \mathcal{O}(2^{-n}), exponentially flat. Good ansatz design is the central craft of VQE.

You have met the variational loop: a short parameterised circuit U(\theta) on the quantum side, a classical optimiser on the other side, a cost E(\theta) = \langle \psi(\theta) | H | \psi(\theta) \rangle passing between them. The loop is the skeleton. The ansatz is the muscle. Every substantive question about a VQE run — will it find the ground state? how many parameters does it need? will the optimiser get stuck? does the hardware have the depth for it? — reduces to a question about the ansatz.

Think of the ansatz the way you would think of a machine-learning model's architecture. A linear regression cannot learn a curve, no matter how much data you throw at it. A 20-layer transformer on a two-number dataset will memorise and plateau. The model has to match the problem. In VQE, the ansatz is the model, and matching it to a chemistry Hamiltonian — or a materials-science Hamiltonian, or a combinatorial-optimisation Hamiltonian — is the whole creative act.

This chapter is the tour of the four families of VQE ansätze that the field has converged on, the trade-offs between them, and the deep trouble at the bottom of the well: barren plateaus, the phenomenon that puts a cap on how deep any random-looking ansatz can go.

The ansatz is the quantum ML model

Before we compare families, it helps to be blunt about what an ansatz is. An ansatz U(\theta) is a function from a vector of classical real numbers \theta \in \mathbb{R}^p to a unitary matrix acting on n qubits. The parameter count p is typically tens to low thousands. When you fix \theta, the circuit becomes a specific quantum program. When you vary \theta, the prepared state |\psi(\theta)\rangle = U(\theta)|0\rangle^{\otimes n} traces out a parameterised manifold inside the 2^n-dimensional Hilbert space of n qubits.

Three things are then true and you should carry them with you.

First: the ansatz defines a search space. The VQE optimiser can only find ground states that live inside the manifold the ansatz spans. If the true ground state is not reachable by any setting of \theta, the best VQE can return is whatever point on the manifold gets closest. The gap is called the ansatz error and it is an irreducible bias of your choice.

Second: the ansatz defines a cost landscape. The classical optimiser navigates E(\theta) as a function of \theta. Some ansätze give landscapes with meaningful gradients almost everywhere; some give landscapes that are flat as a chapati for exponentially many directions. The shape of the landscape is what barren plateaus are about.

Third: the ansatz defines a gate count. Every parameter is a gate. Every extra gate is another chance for NISQ noise to corrupt the state. A 1000-parameter ansatz on a machine with a 10^{-3} gate error has roughly a 63\% chance of at least one error per shot — so the state you prepared is, most of the time, not the state you asked for.

The three things an ansatz designer wants are in tension. UCCSD is expressive and physically trainable (gradients come from identifiable excitations) but too deep for NISQ. Hardware-efficient is shallow and expressive but the cost landscape goes flat. ADAPT-VQE is trainable and noise-resilient by construction (only grow as long as gradients remain) but compile-time slow. Hamiltonian-variational sits in the middle — problem-matched structure without the full weight of UCC.

The trilemma is why this is not a solved problem. Every ansatz family is a staked position in the trilemma, and the right one depends on your hardware, your Hamiltonian, and how much optimisation budget you have.

Unitary Coupled Cluster (UCCSD) — chemistry's favourite ansatz

The Unitary Coupled Cluster ansatz with Singles and Doubles (UCCSD) is the ansatz every chemist reaches for first, because it is a direct quantum-circuit translation of the best classical quantum-chemistry method short of full configuration interaction.

The idea in words

Classical coupled cluster says: start from a mean-field reference state (Hartree-Fock), then dress it with electron excitations. An electron hopping from an occupied orbital i to a virtual orbital a is a single excitation T_1. Two electrons swapping simultaneously is a double excitation T_2. The full cluster operator is T = T_1 + T_2 + \ldots, and the coupled-cluster state is e^T |\psi_{\text{HF}}\rangle. Classically this works extremely well — CCSD(T) is the "gold standard" for small molecules — but e^T is not unitary, so a quantum computer cannot implement it directly.

Fix: use e^{T - T^\dagger} instead. This is unitary (the generator T - T^\dagger is anti-Hermitian), and it reduces to e^T in the limit of small amplitudes. This is the Unitary Coupled Cluster ansatz:

|\psi_{\text{UCCSD}}(\theta)\rangle = e^{T(\theta) - T^\dagger(\theta)} |\psi_{\text{HF}}\rangle,

where T(\theta) = \sum_{ia} \theta_{ia} \, a_a^\dagger a_i + \sum_{ijab} \theta_{ijab} \, a_a^\dagger a_b^\dagger a_j a_i packs the single and double excitation amplitudes into a parameter vector.

Why it is physically motivated

The reason UCCSD is the natural chemistry ansatz is that electronic ground states are mostly Hartree-Fock plus small correlation corrections. The HF reference captures the mean field; the T_1 and T_2 operators capture the deviations. For a molecule with N spin-orbitals and N_e electrons, the number of independent T_1 parameters is \mathcal{O}(N_e (N - N_e)) and the number of independent T_2 is \mathcal{O}(N_e^2 (N - N_e)^2) — so the parameter count scales as \mathcal{O}(N^4) in the worst case.

Why it is NISQ-hostile

Implementing e^{T - T^\dagger} on a quantum computer requires Trotterisation, because T - T^\dagger is a sum of non-commuting fermionic operators. After Jordan-Wigner (or Bravyi-Kitaev) mapping, each excitation operator becomes a string of Pauli operators, and each Trotter step of e^{i\theta P} for a length-k Pauli string P takes roughly 2k CNOT gates. For a modest molecule like \text{LiH} (12 spin-orbitals, 4 electrons), the UCCSD circuit has hundreds of Trotter terms and tens of thousands of CNOTs — well beyond what any NISQ machine can execute coherently.

Symmetry preservation — UCCSD's killer feature

One reason UCCSD is so accurate is that it preserves the symmetries of the electronic Hamiltonian by construction: particle number (total electron count), spin (S^2, S_z), and parity. The trial state e^{T - T^\dagger}|\psi_{\text{HF}}\rangle lives in the same symmetry sector as the Hartree-Fock state, and since the ground state also lives in that sector, the search never wastes amplitude on unphysical states.

Hardware-efficient ansätze do not preserve these symmetries, and the loss shows up as worse accuracy at matched depth.

UCCSD for $\text{H}_2$ in the STO-3G basis, after Jordan-Wigner mapping onto 4 qubits. The HF reference occupies qubits 0 and 1 (the two lowest spin-orbitals). Three parameters — one single-excitation amplitude, one double-excitation amplitude, and one redundant single-excitation — parameterise the trial state. Symmetries reduce the nominal $\binom{4}{2}^2 = 36$ potential excitations to three independent ones.

Hardware-efficient ansatz (HEA) — shallow, native, plateau-prone

The hardware-efficient ansatz takes the opposite philosophy: forget the physics, embrace the hardware. Build a circuit that uses exactly the gates the quantum computer does well natively (single-qubit rotations, one specific two-qubit entangler), layer them cheaply, and hope that the parameterised family is expressive enough to contain a good ground-state approximation.

The structure

A typical HEA layer does two things:

Single-qubit rotations on every qubit, each with its own parameter. The common choice is R_y(\theta) because R_y rotates around the real-amplitude axis of the Bloch sphere — any real-amplitude state can be reached with R_y gates alone.
Entangling block: a fixed pattern of CNOTs (or CZs, or iSWAPs, depending on hardware native gates). Common patterns: linear (CNOT between neighbours), ladder (CNOTs in a staircase), all-to-all (every pair, when connectivity allows).

Stack L such layers. With n qubits and L layers and one R_y per qubit per layer, the parameter count is nL — tens to low hundreds in practice. The depth is L times a constant (whatever a layer costs), so you can tune depth directly.

The appeal

HEA is NISQ-native. Every gate it uses is one the quantum computer implements natively. No Trotterisation, no Pauli decompositions, no fermion-to-qubit mappings on the ansatz side. The circuit is as shallow as you dare make it. Gradients can be estimated with the parameter-shift rule cleanly because each parameter appears in a single rotation gate.

The catch — expressiveness without guardrails

HEA is highly expressive at large L: with enough layers, the parameterised family becomes dense in the Hilbert space and can reach any state. But expressiveness without physical guardrails means the optimiser searches across states that are fundamentally unphysical for the problem — states with wrong particle number, wrong spin, wrong parity. For chemistry, this is a huge fraction of Hilbert space that the optimiser has to rule out before it narrows in on the physical ground state.

The deep problem — barren plateaus

And then there is the really bad news. In 2018, McClean, Boixo, Smelyanskiy, Babbush and Neven proved that random parameterised circuits with enough layers to be approximately 2-designs have cost-gradient variance that decays exponentially in qubit count:

\text{Var}_\theta[\partial_\theta E] \sim \mathcal{O}(2^{-n}).

Why this matters: the variance is how big the gradient fluctuates around its mean (which is zero for any parameterisation that sends \theta = 0 to the identity circuit). If the variance is exponentially small in n, then a typical sample of \theta gives a gradient that is exponentially close to zero — distinguishable from zero only by taking exponentially many shots.

This is a barren plateau: the cost landscape becomes flat everywhere, and the optimiser loses signal. Deep HEA on large n essentially cannot be trained.

Cost-gradient variance as a function of the number of qubits, for three ansatz families. A structured ansatz (UCCSD or ADAPT-VQE) keeps its gradient signal roughly constant as $n$ grows. Shallow hardware-efficient ansätze degrade gracefully. Deep random HEA hits the barren-plateau regime and the gradient variance collapses as $2^{-n}$ — exponentially fewer shots can distinguish it from zero. Training becomes infeasible past the threshold.

Mitigations

This is an active research topic. The main escape routes:

Identity initialisation. Start \theta = 0 so the circuit is the identity; gradients near identity are structurally non-zero and the optimiser can crawl away.
Layer-wise training. Freeze all layers except one, train that one, then unfreeze the next. Each sub-optimisation is in a smaller plateau-free regime.
Local cost functions. Measuring \langle Z_1 Z_2 \rangle instead of the full Hamiltonian \langle H \rangle often retains gradient signal for shallower depth (Cerezo et al. 2020).
Symmetry-preserving HEA. Add projectors or use a restricted gate set that preserves conserved quantities; this excludes the "random unitary" regime where plateaus live.
Move to a structured ansatz. UCCSD, ADAPT, Hamiltonian-variational.

Adaptive ansätze (ADAPT-VQE and friends)

The adaptive family is the clever kid in the room. Instead of picking an ansatz and committing, grow one operator at a time from a pool — and only keep growing as long as the gradient says the extra parameter helps.

The ADAPT-VQE procedure

Grimsley, Economou, Barnes and Mayhall introduced ADAPT-VQE in 2019. The algorithm is this loop:

Start from the Hartree-Fock reference, no parameters.
Pick an operator pool \{A_i\} — a fixed list of candidate excitation operators (typically UCCSD-type fermionic excitations, or spin-adapted equivalents).
For every operator A_i in the pool, measure the gradient of the energy with respect to a hypothetical new parameter attached to that operator: g_i = \partial_\theta \langle \psi | e^{\theta A_i} H e^{-\theta A_i} | \psi \rangle \Big|_{\theta=0} = \langle \psi | [H, A_i] | \psi \rangle.
Find the operator with the largest |g_i|. This is the operator that gives the steepest descent if we added it next.
Add e^{\theta_{k+1} A_{\text{max}}} to the ansatz with a fresh parameter.
Re-optimise all parameters from scratch (or incrementally).
Stop when the maximum gradient \max_i |g_i| falls below a threshold \epsilon. The ansatz is "done".

The beauty: the ansatz grows to fit the problem. For a simple ground state (say, \text{H}_2 near equilibrium), ADAPT-VQE terminates after a handful of operators. For a strongly-correlated state where HF is a bad start, it grows more.

Why adaptive ansätze sidestep barren plateaus

Each step of ADAPT-VQE commits to an operator specifically because its gradient is large. The next step's landscape, restricted to the chosen operators, is by construction not in the random-unitary regime — every direction was picked for gradient signal. You never reach the 2-design regime that McClean's theorem needs.

Cost

The price of ADAPT-VQE is measurement overhead: at every growth step, you must estimate gradients against the entire operator pool, which can have \mathcal{O}(N^2) to \mathcal{O}(N^4) members. For small molecules this is fine; for larger ones, the pre-selection becomes a bottleneck.

Variants

Since 2019, the ADAPT family has multiplied. qubit-ADAPT uses a pool of qubit-Pauli operators (cheaper to evaluate). ADAPT-QAOA brings the idea to optimisation. Iterative Qubit Coupled Cluster (iQCC) is a closely related method developed in parallel. TETRIS-ADAPT-VQE adds multiple operators per growth step.

The k-UpCCGSD family — UCC, but not so deep

Sitting between full UCCSD and hardware-efficient is the k-Unitary Pair Coupled-Cluster with Generalized Singles and Doubles (k-UpCCGSD) family. It keeps the UCC flavour — fermionic excitation operators, symmetry preservation, physical meaning — but restricts to pair doubles (a pair of electrons moving as a pair, not independent doubles) and allows k layers of the same structure. k=3 or k=4 tends to match full UCCSD for small molecules at a fraction of the depth.

For a class-11 reader: you can think of k-UpCCGSD as "UCC with a budget." It is what you use when full UCCSD would compile to too many gates for your NISQ machine but you still want physics-motivated structure.

Hamiltonian Variational Ansatz (HVA) — bake the problem into the circuit

For lattice problems — spin models, the Hubbard model, condensed-matter Hamiltonians — there is a third option: the Hamiltonian Variational Ansatz, introduced by Wecker et al. and formalised by others.

The idea: if your Hamiltonian has the form H = H_1 + H_2 + \ldots + H_k (a sum of local terms), then parameterise the ansatz by alternating exponentials of each term:

U(\theta) = \prod_{\ell=1}^{p} \left( \prod_{j=1}^{k} e^{-i \theta_{\ell j} H_j} \right).

Each e^{-i \theta H_j} is a short local unitary because H_j is local. The structure matches the Hamiltonian exactly. At p = 1 it is shallow; at p \to \infty it converges to the adiabatic algorithm applied to H starting from a simple reference.

QAOA is a special case of HVA, with H_P and a mixer H_M as the only two terms. So is the 2-local circuit used for the transverse-field Ising model in VQE benchmarks.

The design trade-offs — a consolidated view

Now that you have the four families, here is a consolidated comparison:

Ansatz family	Expressiveness	Trainability	Noise resilience	Typical depth	Best for
UCCSD	High (all 1- and 2-electron excitations)	Good (physical landscape)	Poor (deep)	$\mathcal{O}(N^4)$ gates	Small molecules, simulators
Hardware-efficient	Very high at large $L$	Collapses past threshold (plateaus)	Excellent at shallow $L$	$\mathcal{O}(nL)$	Quick benchmarks, small $n$
ADAPT-VQE	Exactly as high as needed	Excellent (plateau-free by construction)	Good (depth stops growing)	Problem-dependent, compact	Larger molecules, real hardware
HVA / QAOA	Moderate, converges at $p \to \infty$	Moderate (depends on $H$)	Good at small $p$	$\mathcal{O}(pk)$ where $k$ = Hamiltonian terms	Lattice models, combinatorial
k-UpCCGSD	High (tuneable by $k$)	Good	Moderate	$\mathcal{O}(kN^2)$	Mid-sized molecules

None of these dominates. The right choice is the one that matches your Hamiltonian's structure, your hardware's depth budget, and your patience for optimisation.

Worked examples

Example 1: Full UCCSD for $\text{H}_2$, step by step

Setup. The hydrogen molecule \text{H}_2 at equilibrium bond length 0.74\,\text{Å}, in the minimal STO-3G basis. Four spin-orbitals, two electrons. After Jordan-Wigner mapping, this sits on 4 qubits. The Hartree-Fock reference is |\psi_{\text{HF}}\rangle = |0011\rangle (electrons in the two lowest spin-orbitals). Why |0011\rangle and not |1100\rangle: Jordan-Wigner convention here puts the occupied spin-orbitals on the right, unoccupied on the left. Some conventions flip this; the physics is identical.

Step 1. Enumerate excitations. From |0011\rangle, the one-electron excitations ("singles") move one electron from occupied spin-orbital \{0, 1\} to virtual spin-orbital \{2, 3\}. By spin symmetry (alpha stays alpha, beta stays beta), only two singles are symmetry-allowed: 0 \to 2 and 1 \to 3. Why: spin-orbital 0 is alpha, 2 is alpha; 1 is beta, 3 is beta. A single excitation cannot flip spin without a spin-symmetry-breaking operator. Two-electron excitations ("doubles") promote both electrons: the only symmetry-allowed one is (0,1) \to (2,3).

Step 2. Write the cluster operator.

T(\theta) = \theta_1 (a_2^\dagger a_0 + a_3^\dagger a_1) + \theta_2 \, a_2^\dagger a_3^\dagger a_0 a_1.

Why combine the two singles into one parameter: closed-shell spin symmetry enforces that the two singles have equal amplitudes — otherwise the state would not be a spin singlet. So one \theta_1 for both.

Step 3. Form the unitary ansatz.

U(\theta) = e^{T(\theta) - T(\theta)^\dagger}.

On a quantum computer this must be Trotterised. For two parameters and small \theta, first-order Trotter gives:

U(\theta) \approx e^{\theta_1 (a_2^\dagger a_0 + a_3^\dagger a_1 - \text{h.c.})} \cdot e^{\theta_2 (a_2^\dagger a_3^\dagger a_0 a_1 - \text{h.c.})}.

Step 4. Compile to Pauli gates. Under Jordan-Wigner, a_j = \frac{1}{2}(X_j + iY_j) Z_{j-1} Z_{j-2} \ldots Z_0. Substituting and simplifying (many pages of algebra that a Qiskit transpiler does automatically), each fermionic exponential becomes a product of Pauli-string exponentials e^{-i\phi P}. For \text{H}_2 this yields about 8 CNOT gates total.

Step 5. Run VQE. Prepare HF, apply U(\theta), measure the 15 Pauli terms of the Hamiltonian, combine to E(\theta), feed to classical optimiser (say COBYLA), iterate.

Result. Converges to E^* = -1.137 \,\text{Ha}, matching the exact diagonalisation to within 10^{-6}\,\text{Ha}. The three parameters (in some conventions the T_1 is split into two; in ours it is one) suffice to reach chemical accuracy.

What this shows. UCCSD for \text{H}_2 is essentially the smallest non-trivial VQE calculation. Every NISQ platform has done it. The physics-motivated ansatz finds the ground state precisely because it was designed to.

Example 2: Hardware-efficient ansatz for 4 qubits, 3 layers — counting parameters

Setup. Same 4-qubit Hilbert space as \text{H}_2, but this time we use a hardware-efficient ansatz with 3 layers of R_y-plus-linear-CNOTs:

U(\theta) = \prod_{\ell=1}^{3} \left[ \left(\bigotimes_{j=0}^{3} R_y(\theta_{\ell, j})\right) \cdot U_{\text{ent}} \right] \cdot \left(\bigotimes_{j=0}^{3} R_y(\theta_{0, j})\right).

Step 1. Count the rotations. There is an initial layer of R_y on each of the 4 qubits (4 parameters), then 3 layers of 4 R_y gates each (12 parameters). That is 16 parameters total. (If the problem specification said "12" it was counting only the 3 post-entangler layers and omitting the initial layer — a common convention. We adopt the literal count here.) Why: each R_y gate has one angle. Four qubits × four layer-passes (1 initial + 3 layer) = 16 rotation angles.

Step 2. Count the entanglers. Each U_{\text{ent}} is a linear ladder of 3 CNOTs: \text{CNOT}_{0,1}, \text{CNOT}_{1,2}, \text{CNOT}_{2,3}. With 3 layers, that is 9 CNOTs total. No parameters on the entanglers.

Step 3. Write the circuit.

|0⟩─[R_y]─●─────[R_y]─●─────[R_y]─●─────[R_y]─

|0⟩─[R_y]─⊕──●──[R_y]─⊕──●──[R_y]─⊕──●──[R_y]─

|0⟩─[R_y]─────⊕──●─[R_y]─────⊕──●─[R_y]─────⊕──●─[R_y]─

|0⟩─[R_y]────────⊕─[R_y]────────⊕─[R_y]────────⊕─[R_y]─

Step 4. Assess expressiveness. With 16 real parameters, the ansatz covers a 16-dimensional manifold inside the 2^4 = 16-dimensional real-amplitude subspace. In principle, it can represent any real-amplitude 4-qubit state. In practice, finding the right \theta for a chemistry problem is hard because the landscape has no physical structure.

Step 5. Assess noise cost. 9 CNOTs. At an error rate of 10^{-3} per two-qubit gate, the probability of at least one error per shot is \approx 1 - (1-10^{-3})^9 \approx 0.9\%. Manageable on modern hardware.

Result. The HEA for \text{H}_2 at L = 3 reaches chemical accuracy for the equilibrium geometry but may miss it by \sim 1 mHa at stretched bond lengths (strong-correlation regime), where the ground state is farther from any HF-like real-amplitude state and the lack of symmetry preservation starts to hurt.

What this shows. HEA is structurally straightforward — no chemistry, no Trotter decompositions, just layered hardware gates. Its parameters are "bare" directions in Hilbert space; they do not correspond to physical excitations. That is the blessing and the curse.

Common confusions

"Is the ansatz the same as the trial wavefunction?"

Yes — the ansatz U(\theta) defines the parameterised family of trial wavefunctions |\psi(\theta)\rangle = U(\theta)|0\rangle^{\otimes n}. The circuit and the state family are two sides of the same coin: the circuit is a program; the state family is what running that program for different \theta produces.

"Deeper ansatz is always better"

No. Depth increases expressiveness but also noise exposure and plateau risk. Past a certain depth — set by your hardware's coherence, the barren-plateau threshold, and the ground-state's entanglement structure — adding more layers makes performance worse. The sweet spot is an empirical question for each problem.

"UCCSD is optimal for chemistry"

UCCSD is physically motivated and gives the best chemistry ansatz in the limit of deep, noise-free simulation. On NISQ hardware, it is often not the best choice because the circuit is too deep to run coherently. Adaptive and hardware-efficient ansätze often beat UCCSD on real hardware, even though UCCSD would beat them on a simulator. The ansatz that wins depends on the interaction of the Hamiltonian, the optimiser, and the hardware.

"Barren plateaus only affect hardware-efficient ansätze"

The McClean theorem is worst for deep random circuits, which HEA most directly matches. But structured ansätze can also hit plateau-like phenomena: large UCCSD on many qubits, QAOA at very high p, any ansatz whose cost function has exponentially many near-equivalent local minima. Plateaus are more nuanced than "only HEA."

"More parameters is always better"

No. More parameters expand expressiveness but also shot cost (you need to estimate more gradients), classical optimiser time, and noise per shot. The right question is: what is the smallest ansatz that reaches chemical accuracy? ADAPT-VQE is explicitly designed around that question.

The Indian angle

Ansatz design for NISQ chemistry is a specific research focus at IIT Madras's quantum algorithms group, which has published on qubit-adaptive ansatzes for small molecules. IISc Bangalore hosts the Quantum Information and Computing group led by researchers who publish regularly on symmetry-preserving ansatzes. QpiAI (Bangalore) develops VQE workflows for their drug-discovery platform, where ansatz selection is a product-level decision made per target. Indian pharma (Dr. Sharma's, BioCorp, SuryaPharma) exploring NISQ chemistry for lead molecule screening will interact directly with these ansatz choices: should we use UCCSD on a 100-qubit machine? Should we use ADAPT-VQE and pay the measurement overhead? These are not academic questions; they are commercial infrastructure decisions. The NQM's 2030 roadmap explicitly funds ansatz research under its "algorithms and applications" thrust, with participating groups at TIFR, IIT Bombay, and the NQM-aligned startups.

Going deeper

The rest of this chapter takes up the formal side: the precise statement and proof sketch of the barren-plateau theorem, noise-induced barren plateaus, expressibility and entangling power as ansatz metrics, ADAPT-VQE's convergence guarantees, and the quantum natural gradient's role in ansatz optimisation. This is the material for readers heading into NISQ-algorithm research; a class-11 reader who wants the big picture can stop here.

Formal UCC — from classical coupled cluster to the quantum ansatz

Classical coupled cluster (CC) writes the exact wavefunction as |\psi_{\text{exact}}\rangle = e^{T} |\psi_{\text{HF}}\rangle where T = T_1 + T_2 + T_3 + \ldots is the full cluster operator. CCSD truncates to T = T_1 + T_2. Because e^T is not unitary (it is exponential of a non-anti-Hermitian operator), classical CC is a non-variational method with a non-Hermitian effective Hamiltonian \bar H = e^{-T} H e^{T}. CCSD(T) adds triples perturbatively.

The unitary version, e^{T - T^\dagger}, is forced by the quantum-circuit unitarity requirement. It is variational (because the state is unit-norm) but less accurate per parameter than classical CC because e^{T-T^\dagger} couples excitations and de-excitations together. In the small-amplitude limit the two agree. For most chemistry ground states near equilibrium the amplitudes are indeed small and UCCSD matches CCSD; for strongly-correlated (stretched-bond, multi-reference) regimes it can underperform.

The McClean et al. barren-plateau theorem

Statement. Let U(\theta) = \prod_{k=1}^{L} V_k e^{-i\theta_k H_k} where each V_k is drawn from a unitary 2-design on a sub-block of qubits at depth L large enough that the full circuit approximates a 2-design on n qubits. Let O be a traceless observable. Then

\mathbb{E}_\theta \left[ \left( \frac{\partial \langle 0 | U^\dagger O U | 0 \rangle}{\partial \theta_k} \right)^2 \right] \le \frac{2 \, \|O\|_F^2}{2^{2n} - 1}.

Why this exponent: a 2-design's second-moment statistics match the Haar measure. Haar-integrating |\partial U|^2 over the unitary group picks up a factor of \dim(\mathcal{H})^2 = 2^{2n} in the denominator.

Consequence. For any measurement of the gradient using S shots, the sampling error scales as 1/\sqrt{S}. The gradient mean is zero and the variance is \mathcal{O}(2^{-n}), so distinguishing a real gradient from shot noise requires S = \mathcal{O}(2^n) shots — exponentially many. Training is infeasible past n \sim 20.

Noise-induced barren plateaus

Wang, Fontana, Cerezo, Sharma, Sone, Cincio and Coles (2021) showed that NISQ noise itself induces plateaus: depolarising noise at rate p per gate, acting on a depth-L circuit, shrinks gradient variance by a factor (1-p)^{2L}. This is independent of the ansatz structure — even UCCSD or ADAPT, on a sufficiently noisy machine, will hit a plateau from noise alone. The implication: plateaus are not just about ansatz randomness; they are also about hardware quality.

Expressibility and entangling power as ansatz metrics

Sim, Johnson and Aspuru-Guzik (2019) proposed two quantitative metrics for ansatz families:

Expressibility: how close the distribution of trial states (under random \theta) is to the Haar-random distribution on the Hilbert sphere. An ansatz that is too expressible covers states that are irrelevant to the problem; one that is not expressible enough misses the ground state.
Entangling power: the average entanglement (Meyer-Wallach measure, or linear entropy) of the trial states. Useful ansätze need moderate entangling power — enough to reach entangled ground states, not so much that they scramble.

The quantum natural gradient

Stokes, Izaac, Killoran and Carleo (2019) defined the quantum natural gradient F^{-1} \nabla E, where F_{ij} = \text{Re} \langle \partial_i \psi | \partial_j \psi \rangle - \langle \partial_i \psi | \psi \rangle \langle \psi | \partial_j \psi \rangle is the quantum Fisher information matrix. This accounts for the fact that the natural geometry of parameterised quantum states is not Euclidean but Fubini-Study. Natural-gradient descent converges in fewer iterations but each iteration is \mathcal{O}(p^2) circuit evaluations instead of \mathcal{O}(p) — a trade-off worth taking for small p and steep cost landscapes.

Where this leads next

VQE in practice — how to actually run any of these ansätze on Compustar, Quantinuum, or IonQ hardware: measurement budgets, error mitigation, classical optimiser choice.
Barren plateaus — the phenomenon in depth, including noise-induced plateaus and the research programme for mitigating them.
ADAPT-VQE — the full adaptive algorithm with its operator-pool strategies and convergence theorems.
QAOA the algorithm — the canonical problem-inspired ansatz for combinatorial optimisation.
Simulating chemistry — the broader chemistry context these ansätze sit inside.

References

Alberto Peruzzo et al., A variational eigenvalue solver on a photonic quantum processor (Nature Communications, 2014) — arXiv:1304.3061.
Abhinav Kandala et al., Hardware-efficient variational quantum eigensolver for small molecules and quantum magnets (Nature, 2017) — arXiv:1704.05018.
Jarrod McClean, Sergio Boixo, Vadim Smelyanskiy, Ryan Babbush, Hartmut Neven, Barren plateaus in quantum neural network training landscapes (Nature Communications, 2018) — arXiv:1803.11173.
Harper R. Grimsley, Sophia E. Economou, Edwin Barnes, Nicholas J. Mayhall, An adaptive variational algorithm for exact molecular simulations on a quantum computer (Nature Communications, 2019) — arXiv:1812.11173.
M. Cerezo et al., Variational Quantum Algorithms (Nature Reviews Physics, 2021) — arXiv:2012.09265.
John Preskill, Lecture Notes on Quantum Computation, Chapter 7 — theory.caltech.edu/~preskill/ph229.