QAOA — The Algorithm

Note: Company names, engineers, incidents, numbers, and scaling scenarios in this article are hypothetical — even when they resemble real ones. See the full disclaimer.

In short

The Quantum Approximate Optimization Algorithm (QAOA) — proposed by Farhi, Goldstone and Gutmann in 2014 — attacks combinatorial optimisation problems (MaxCut, graph colouring, satisfiability) on near-term quantum hardware. Step one: encode the problem as a cost Hamiltonian H_C, diagonal in the computational basis, whose ground state is the optimum bitstring. Step two: pick a mixer Hamiltonian H_M = \sum_j X_j that mixes bitstrings by flipping bits. Step three: prepare the uniform superposition |+\rangle^{\otimes n}, then apply p alternating layers to build the QAOA state:

|\psi(\gamma, \beta)\rangle = e^{-i\beta_p H_M} e^{-i\gamma_p H_C} \cdots e^{-i\beta_1 H_M} e^{-i\gamma_1 H_C} |+\rangle^{\otimes n}.

Step four: measure \langle H_C \rangle on a quantum computer, hand the number to a classical optimiser, and update the 2p parameters (\gamma_1, \ldots, \gamma_p, \beta_1, \ldots, \beta_p). Iterate. At p \to \infty QAOA recovers the adiabatic algorithm — the parameters smoothly interpolate from pure mixer (\beta dominant) to pure cost (\gamma dominant), and the adiabatic theorem guarantees ground-state convergence. At finite p, QAOA is an approximation — and on 3-regular MaxCut, Farhi et al. proved p = 1 gives a worst-case approximation ratio of 0.6924. The honest status: QAOA is the most-studied quantum optimisation heuristic, it sometimes matches classical heuristics on benchmarks, and as of 2025 no QAOA run has demonstrated a rigorous quantum advantage on any practically interesting problem. Whether one ever will is the central open question of NISQ-era optimisation.

Two of the three quantum algorithms that catapulted this field into public awareness are about specific, structured problems. Shor's algorithm factors integers in polynomial time. Grover's algorithm searches an unsorted database with a quadratic speedup. Both are beautiful, both exploit a very particular kind of quantum structure (a period for Shor, an amplitude to amplify for Grover), and neither works on the kind of problem an engineer actually brings to a quantum computer.

The problems engineers bring are messier. How do you partition a graph to minimise the number of cut edges? How do you schedule a week of flights so Mumbai-Bangalore bookings are maximised? How do you route a delivery van through 20 pin-codes in the shortest distance? These are combinatorial optimisation problems, and they do not have the clean algebraic structure that Shor and Grover exploit. They are NP-hard in general. And they are what most of industry actually wants.

In 2014, Edward Farhi, Jeffrey Goldstone and Sam Gutmann proposed the Quantum Approximate Optimization Algorithm as the quantum attack on these problems. The name is deliberately modest. "Approximate" because QAOA does not claim to solve NP-hard problems exactly — it claims to get close. "Algorithm" but singular: the same template applies to a huge family of combinatorial problems, and the algorithm is defined by a single structural idea: alternate a cost Hamiltonian with a mixer Hamiltonian, and optimise the relative times.

This chapter builds the algorithm from the ground up. First the encoding — how a classical cost function becomes a quantum operator. Then the circuit — the alternating layers. Then the optimisation loop — how you actually run it. Then the mathematics of why it should work — the adiabatic connection, which gives the p \to \infty limit its teeth. Finally the honest ledger: what QAOA has achieved, what it has not, and why the jury is still out on whether it ever will.

The setup — a combinatorial problem in Hamiltonian dress

Combinatorial optimisation problems take a common shape. You have n binary variables x_1, x_2, \ldots, x_n \in \{0, 1\} — a bitstring. You have a cost function C(x_1, \ldots, x_n) that assigns a real number to every bitstring. The goal: find the bitstring that maximises (or minimises) C.

Examples:

MaxCut. Given a graph with vertices V and edges E, assign each vertex a bit (0 or 1, representing two sides of a partition). Cost C(x) = \sum_{(i,j) \in E} \mathbb{1}[x_i \neq x_j] — the number of edges whose endpoints land on opposite sides. Maximise.
3-SAT. Given a Boolean formula over n variables with m clauses, cost C(x) = number of clauses satisfied. Maximise.
Travelling Salesman. Given distances d_{ij} between cities, cost C = total tour length. Minimise.
Max Independent Set. Given a graph, find the largest subset of vertices with no edges between them. Cost = size of the subset.

Every one of these can be written in QUBO form (Quadratic Unconstrained Binary Optimisation): the cost is a polynomial of degree \le 2 in the x_i, possibly with linear terms and constants. For cases where the cost has higher-degree terms (e.g. 3-body interactions), the problem is sometimes called PUBO, but the QAOA recipe carries over — just with more-body Hamiltonian terms.

From bits to Pauli-Z

The quantum encoding works by a simple substitution. Replace each classical bit x_i \in \{0, 1\} with a qubit, and represent bit values via the Pauli-Z operator:

x_i \;\; \leftrightarrow \;\; \frac{1 - Z_i}{2}.

Why this map: Z_i has eigenvalue +1 on |0\rangle and -1 on |1\rangle. So (1 - Z_i)/2 has eigenvalue 0 on |0\rangle (i.e. x_i = 0) and 1 on |1\rangle (x_i = 1). Exactly the bit, promoted to an operator.

Any polynomial cost function becomes an operator by substituting this expression for each x_i. For instance, a MaxCut edge term \mathbb{1}[x_i \neq x_j] = x_i + x_j - 2 x_i x_j becomes, after substitution and simplification, (1 - Z_i Z_j)/2.

The resulting operator, called the cost Hamiltonian H_C, is diagonal in the computational basis:

H_C |x\rangle = C(x) |x\rangle.

Every bitstring |x\rangle is an eigenstate, and its eigenvalue is exactly the classical cost C(x). The ground state (smallest eigenvalue) is the optimal bitstring.

Cost Hamiltonian

For a classical cost function C : \{0, 1\}^n \to \mathbb{R}, the cost Hamiltonian is the operator

H_C = \sum_{x \in \{0,1\}^n} C(x) \, |x\rangle\langle x|,

equivalently, the polynomial obtained by substituting x_i \to (1 - Z_i)/2. It is diagonal in the computational basis, with classical cost values as eigenvalues. Its ground state (if C is being minimised) or top state (if maximised) encodes the optimum.

The encoding. Every one of the $2^n$ bitstrings is a computational-basis state $|x\rangle$, and its cost $C(x)$ becomes the corresponding diagonal entry of $H_C$. Finding the optimal bitstring becomes finding the ground state of $H_C$ — a quantum eigenvalue problem. (The picture here shows cost on the vertical axis for a 3-bit problem; the highlighted bar $|100\rangle$ is the classical optimum.)

So far there is nothing quantum — H_C is just a classical cost function in operator clothing. Ground-state search on a diagonal Hamiltonian is easy in principle: just enumerate all 2^n bitstrings. That is still exponential. What we want is a quantum algorithm that gets close to the ground state without enumerating everything.

The mixer — why a second Hamiltonian

If you only had H_C, there would be no quantum algorithm to speak of. Start in any state, apply e^{-i t H_C}, and every computational-basis component picks up a phase but the probabilities stay put. Measuring gives you the starting distribution back. No search happens.

The quantum magic needs a second Hamiltonian that does not commute with H_C, so that alternating between them produces non-trivial dynamics — exactly the way the adiabatic algorithm works. QAOA's choice is the transverse-field mixer Hamiltonian:

H_M = \sum_{j=1}^n X_j.

X_j flips the j-th qubit, so H_M is a sum of single-qubit bit-flips. Its ground state (smallest eigenvalue of H_M is -n) is the uniform superposition

|+\rangle^{\otimes n} = \frac{1}{\sqrt{2^n}} \sum_{x \in \{0,1\}^n} |x\rangle,

Why |+\rangle^{\otimes n} is a ground state of H_M: each X_j has eigenvalue -1 on the single-qubit state |-\rangle = (|0\rangle - |1\rangle)/\sqrt{2}... wait — actually X_j has eigenvalue +1 on |+\rangle and -1 on |-\rangle. So |+\rangle^{\otimes n} has H_M eigenvalue +n. The QAOA convention is to use H_M = -\sum X_j or equivalently reverse the sign in e^{-i\beta H_M}; different sources differ. The important physics: |+\rangle^{\otimes n} is an eigenstate of H_M, it is the easy-to-prepare starting state, and the dynamics generated by H_M spread amplitude across all bitstrings uniformly. which is trivially easy to prepare — apply a Hadamard to each qubit starting from |0\rangle^{\otimes n}.

The key property: H_M and H_C do not commute. H_C is diagonal in the Z basis; H_M is diagonal in the X basis. They fight each other. Alternating between e^{-i\gamma H_C} (which phases computational-basis states by their cost) and e^{-i\beta H_M} (which rotates toward/away from the uniform state) produces genuine quantum interference — amplitudes add and cancel, and with the right \gamma, \beta they concentrate on low-cost bitstrings.

The QAOA circuit — p alternating layers

The algorithm, in one equation:

|\psi(\gamma, \beta)\rangle = U_M(\beta_p) U_C(\gamma_p) \cdots U_M(\beta_1) U_C(\gamma_1) \, |+\rangle^{\otimes n},

where U_C(\gamma) = e^{-i\gamma H_C} and U_M(\beta) = e^{-i\beta H_M}. The number p — the number of alternating layers — is the main hyperparameter. Each layer contributes two real parameters: a cost phase \gamma_k and a mixer phase \beta_k. Total parameter count: 2p.

The QAOA circuit. Hadamards create the uniform superposition; then $p$ pairs of cost-then-mixer blocks alternate. At the end, a $Z$-basis measurement yields a bitstring sampled from the distribution $|\langle x | \psi(\gamma, \beta) \rangle|^2$. Re-run the whole circuit many times to estimate $\langle H_C \rangle = \sum_x C(x) |\langle x | \psi \rangle|^2$.

Two practical notes:

Implementing e^{-i\gamma H_C}. Because H_C is a sum of commuting Pauli-Z terms (all diagonal in the Z basis), its exponential factorises exactly: e^{-i\gamma H_C} = \prod_k e^{-i \gamma c_k P_k} where each P_k is a product of Z operators. Each factor is a single phase rotation — for MaxCut, each edge term (1 - Z_i Z_j)/2 becomes an R_{ZZ}(\gamma) gate between qubits i and j. So U_C(\gamma) is one two-qubit gate per edge of the problem graph.

Implementing e^{-i\beta H_M}. Because H_M = \sum_j X_j is a sum of commuting single-qubit terms, U_M(\beta) = \prod_j e^{-i\beta X_j} = \prod_j R_X(2\beta)_j — one single-qubit rotation per qubit. Trivially shallow.

So each QAOA layer has depth proportional to the problem's edge count (for graph problems) or clause count (for SAT). The circuit is shallow when the problem graph is sparse — which is the usual case.

The optimisation loop

The circuit takes (\gamma, \beta) \in \mathbb{R}^{2p} and outputs a quantum state whose expectation value F_p(\gamma, \beta) = \langle \psi(\gamma, \beta) | H_C | \psi(\gamma, \beta) \rangle is what you want to minimise (or, equivalently, maximise its negative). The classical computer's job is to choose (\gamma, \beta) that make F_p small.

The loop:

Guess an initial (\gamma^{(0)}, \beta^{(0)}).
Prepare the state on a quantum computer; measure each term of H_C in the appropriate basis; combine to estimate F_p.
Feed F_p (and possibly gradients via parameter-shift) to a classical optimiser.
Receive (\gamma^{(k+1)}, \beta^{(k+1)}).
Repeat until convergence.

This is the generic variational quantum algorithm pattern — QAOA is the instance where the ansatz is the specific alternating structure above. The classical optimiser choices (Adam, SPSA, COBYLA, gradient descent with parameter-shift) are the same as for VQE. What is different is the low parameter count — typically 2p with p \in \{1, 2, 3, \ldots, 10\} — which keeps the optimisation space manageable.

After optimisation, the final state |\psi(\gamma^*, \beta^*)\rangle is measured repeatedly. Each measurement yields a bitstring x; you compute C(x) classically and keep the best one you have seen. The quantum computer's role is a biased sampler: it samples bitstrings from a distribution that is concentrated on low-cost strings.

The adiabatic connection — why p \to \infty recovers the adiabatic algorithm

The beautiful fact about QAOA is that it is not just a heuristic. At infinite depth, it provably recovers the adiabatic quantum algorithm, which in turn provably finds ground states (given a spectral gap).

The adiabatic algorithm prepares the ground state of a complicated Hamiltonian H_C by a slow interpolation:

H(s) = (1 - s) H_M + s H_C, \qquad s \in [0, 1].

Start in the ground state of H_M (the uniform superposition — easy). Slowly increase s from 0 to 1. The adiabatic theorem says: if you do this slowly enough relative to the minimum spectral gap along the way, the system stays in the instantaneous ground state — ending in the ground state of H_C.

QAOA is a trotterisation of this. Trotter-Suzuki says (see Trotter-Suzuki):

e^{-i T [(1-s) H_M + s H_C]} \approx \left(e^{-i (\Delta t) s H_C} e^{-i (\Delta t) (1-s) H_M}\right)^{T / \Delta t} + O(\Delta t).

Replace the smooth adiabatic schedule with a stepwise one: at time step k (out of p steps), the schedule is at s_k = k/p. The evolution over that step becomes

e^{-i \Delta t [(1 - s_k) H_M + s_k H_C]} \approx e^{-i \Delta t \, s_k H_C} \, e^{-i \Delta t \, (1 - s_k) H_M}.

Identifying \gamma_k = \Delta t \cdot s_k and \beta_k = \Delta t \cdot (1 - s_k), the full evolution becomes exactly the QAOA circuit with p layers. At large p and carefully chosen (\gamma_k, \beta_k) schedules approximating the smooth adiabatic path, QAOA reproduces adiabatic evolution — and inherits its ground-state convergence guarantee.

Adiabatic evolution smoothly interpolates from the mixer $H_M$ (at $s=0$) to the cost $H_C$ (at $s=1$). QAOA approximates this by a stepwise schedule of $p$ layers, each parameterised by $(\gamma_k, \beta_k)$. As $p \to \infty$, the stepwise schedule approaches the smooth adiabatic one, and QAOA inherits the adiabatic theorem's guarantee of convergence to the ground state.

This is the theoretical dignity of QAOA — it is not an arbitrary ansatz pulled from a hat. It is a principled discretisation of an algorithm that is known to work, re-parameterised so that a classical optimiser can learn the schedule that a human would otherwise have to hand-tune.

But finite p is where we actually live

The catch is that the adiabatic theorem requires T = evolution time to scale as 1 / \Delta_{\min}^2, where \Delta_{\min} is the minimum spectral gap along the path. For hard combinatorial problems, the gap can be exponentially small — meaning adiabatic evolution (and its trotterisation) also takes exponential time. The p \to \infty limit recovers adiabatic, but adiabatic itself is slow on hard problems. The interesting QAOA regime is small p — where you hope that a learned schedule beats a naive smooth one.

At p = 1 (two parameters only!), QAOA already gives non-trivial approximation ratios. For MaxCut on 3-regular graphs, p = 1 achieves at least 0.6924 (Farhi-Goldstone-Gutmann 2014, a tight analysis). This is worse than the classical Goemans-Williamson 0.878, but better than a random assignment 0.5. At p = 2, 3 the ratio climbs; empirically it matches Goemans-Williamson somewhere around p \approx 11 for 3-regular graphs (Wurtz-Farhi-Gosset 2021). We unpack all of this in the next chapter, QAOA for MaxCut.

Worked examples

Example 1: QAOA circuit for a 4-node graph at $p=1$

Setup. Consider a MaxCut instance on a 4-node cycle graph: vertices \{0, 1, 2, 3\}, edges \{(0,1), (1,2), (2,3), (3,0)\}. The goal: partition vertices into two sets maximising cut edges. The optimum is obvious by inspection: \{0, 2\} vs \{1, 3\}, cutting all 4 edges.

Step 1. Cost Hamiltonian. Each edge (i, j) contributes \tfrac{1}{2}(1 - Z_i Z_j) to the MaxCut cost. Summing over the 4 edges:

H_C = \tfrac{1}{2}\left[(1 - Z_0 Z_1) + (1 - Z_1 Z_2) + (1 - Z_2 Z_3) + (1 - Z_3 Z_0)\right] = 2 - \tfrac{1}{2}(Z_0 Z_1 + Z_1 Z_2 + Z_2 Z_3 + Z_3 Z_0).

Why the constant 2: the sum of four \tfrac{1}{2}'s. It just shifts the spectrum by a constant and does not affect the optimisation.

Step 2. Mixer Hamiltonian. H_M = X_0 + X_1 + X_2 + X_3.

Step 3. QAOA circuit at p = 1. Two parameters (\gamma, \beta). The circuit:

(a) Start |0\rangle^{\otimes 4}, apply H to each qubit. State is |+\rangle^{\otimes 4}.

(b) Apply U_C(\gamma) = e^{-i \gamma H_C}. The constant part contributes a global phase (irrelevant). The Z_i Z_j part decomposes into 4 commuting two-qubit phase rotations:

e^{i\gamma Z_0 Z_1 / 2} \cdot e^{i\gamma Z_1 Z_2 / 2} \cdot e^{i\gamma Z_2 Z_3 / 2} \cdot e^{i\gamma Z_3 Z_0 / 2}.

Each factor is an R_{ZZ} gate, implementable as CNOT-R_Z-CNOT between the two qubits.

(c) Apply U_M(\beta) = e^{-i\beta H_M} = \prod_j e^{-i\beta X_j} = \prod_j R_X(2\beta)_j. One single-qubit rotation per qubit.

Step 4. Measure. Measure all four qubits in the Z basis, get a bitstring x \in \{0,1\}^4. Repeat S = 10^4 times. For each outcome x, compute cut value C(x) classically.

Step 5. Expected cut. Compute F_1(\gamma, \beta) = \frac{1}{S} \sum_s C(x^{(s)}) from the samples. This estimates \langle \psi(\gamma, \beta) | H_C | \psi(\gamma, \beta) \rangle.

Step 6. Optimise. Grid-search (\gamma, \beta) over [0, 2\pi] \times [0, \pi], or gradient-ascend. For the 4-cycle, the p = 1 optimum is at approximately \gamma^* = \pi/4, \beta^* = \pi/8, giving expected cut value \approx 3.0 (vs the true optimum 4). So p = 1 gets 75% of the way.

Result. QAOA at p = 1 on a 4-cycle produces a quantum sampler that concentrates probability on 3-out-of-4-edges-cut bitstrings. To get closer to 4, go to p = 2 or higher.

Example 2: Interpreting the QAOA output as a sampling distribution

Setup. You have just run the p = 1 QAOA circuit from Example 1 at its optimal parameters (\gamma^*, \beta^*). You have 10^4 bitstring samples. What do they tell you?

Step 1. Build the histogram. Count how often each of the 2^4 = 16 bitstrings appeared. For the 4-cycle at the p = 1 optimum, the distribution is concentrated on the two true optima (|0101\rangle and |1010\rangle, cut = 4) and a few near-optima (cut = 3 bitstrings).

Step 2. Compute the empirical expected cut. \langle C \rangle \approx 3.0 — so on average a QAOA-sampled bitstring cuts 3 out of 4 edges.

Step 3. Compute the best-found cut. Because QAOA is a sampler, you do not take the expected cut as your answer — you take the best cut you observed. Out of 10^4 samples, you will likely see |0101\rangle (cut = 4) hundreds of times. Your actual answer is cut-value 4.

Step 4. Recognise the two roles of QAOA. The quantum computer is (a) an expectation-value estimator for the classical optimiser (so it can navigate the (\gamma, \beta) landscape) and (b) a biased sampler of good bitstrings (the real output of the algorithm).

Result. The final deliverable of QAOA is the best bitstring seen, not the expected cut. This distinction matters: even when \langle H_C \rangle is mediocre, a quantum sampler may still find excellent bitstrings with non-negligible probability. The approximation-ratio analyses bound the expected cut; in practice you look at the best seen.

Common confusions

"QAOA solves NP-hard problems in polynomial time"

No. QAOA is an approximation algorithm. It produces a bitstring whose cost is a fraction (the approximation ratio) of the true optimum. For p = O(1) it runs in polynomial time — but the approximation ratio is bounded, and beating the best classical approximation algorithm on a real problem has not been demonstrated. QAOA does not imply P = NP.

"p = 1 is enough — higher p is just marginal improvement"

Only for easy instances. On sparse 3-regular graphs, p = 1 is already non-trivial. But for dense or adversarial problem instances, low-p QAOA fails and you need p growing with problem size. The worst-case approximation ratio at small p is not 1.

"The (\gamma, \beta) parameters are fixed by theory"

No. You optimise them classically, as part of the variational loop. There are analytical results for specific classes (3-regular graphs at p = 1 have nearly closed-form optimal parameters; Fourier-based parameter-concentration heuristics exist), but in general QAOA is a variational hybrid, not a fixed-parameter algorithm.

"QAOA runs the whole algorithm on a quantum computer"

No. The classical computer runs the outer optimisation loop; the quantum computer only evaluates \langle H_C \rangle and (eventually) samples bitstrings. Like all variational algorithms, QAOA is a hybrid.

"QAOA converges to the true optimum for any problem as p \to \infty"

With caveats. At p \to \infty, QAOA can reproduce the adiabatic algorithm, which converges — but only if the adiabatic schedule is slow relative to the minimum spectral gap. For problems with exponentially small gaps (the hard NP-hard instances), this means QAOA also needs exponentially large p. The p \to \infty statement is a theoretical connection, not a practical escape from hardness.

The hype check

Hype check. QAOA is the poster child for "maybe-quantum-advantage-someday" optimisation. It has been benchmarked extensively on MaxCut, Max-k-SAT, portfolio optimisation, graph colouring, and traffic-flow toy problems. As of 2025, no QAOA run on any problem of practical importance has clearly beaten the best classical algorithm — and on MaxCut in particular, classical Goemans-Williamson remains the approximation-ratio champion except at depths too large for current hardware. The field is not dead: active research continues on parameter optimisation (Fourier, warm-starts, reinforcement learning), hybrid classical-quantum embeddings, noise-aware ansatzes, and non-standard mixers. But the "quantum advantage on NP-hard optimisation" claim has not been demonstrated. Treat any press release claiming otherwise with serious scepticism, and check whether the classical baseline was tuned as hard as the quantum one.

The India angle

Indian industry has been quietly active in QAOA research. TechSetu Research (the largest corporate research lab in Indian IT) has published on QAOA parameter-concentration, noise-resilient ansatzes, and applications to financial portfolio optimisation — using Compustar cloud hardware and local classical simulators. QpiAI (Bangalore, National Quantum Mission-aligned) ships a QAOA-based optimisation toolkit aimed at logistics and scheduling clients. IIT Madras and IISc Bangalore both have algorithms groups working on QAOA variants for graph problems and scheduling.

The NQM's algorithm thrust explicitly supports heuristic quantum optimisation software — with an eye on the eventual domestic hardware. If and when NQM-funded superconducting or neutral-atom hardware at IIT-Madras or TIFR scales to hundreds of qubits, a natural target application is QAOA on scheduling problems of the kind the Indian Railways, civil aviation, or power-grid operators grapple with. The research pipeline is in place; the hardware is still catching up.

Going deeper

The rest of this chapter covers QAOA as a trotterisation of adiabatic evolution at the operator level, the structure of the p = 1 parameter landscape for MaxCut, barren-plateau-like phenomena in QAOA at high p, the parameter-concentration conjecture and its Fourier-transform heuristic, the original 2000 Farhi-Goldstone-Gutmann-Sipser adiabatic algorithm that QAOA descends from, and QAOA's relationship to quantum chemistry's phase-separator approach. This is the PhD-entry-level view.

QAOA as trotterisation — the formal statement

Let H(s) = (1 - s) H_M + s H_C be the adiabatic interpolating Hamiltonian, s \in [0, 1]. Discretise s into p equal steps s_k = k/p, each of duration \tau = T / p where T is the total adiabatic time. The evolution operator for the k-th step is e^{-i \tau H(s_k)}. Apply the first-order Trotter-Suzuki decomposition:

e^{-i \tau [(1 - s_k) H_M + s_k H_C]} = e^{-i \tau s_k H_C} \, e^{-i \tau (1 - s_k) H_M} + O(\tau^2 [H_C, H_M]).

Why the error is O(\tau^2): Baker-Campbell-Hausdorff gives e^A e^B = e^{A + B + \frac{1}{2}[A, B] + \ldots}; the [A, B] term is second-order in \tau. See Trotter-Suzuki. With \gamma_k = \tau s_k and \beta_k = \tau (1 - s_k), the product over all steps is exactly the QAOA circuit, and the total error is O(p \tau^2) = O(T^2 / p). Taking p \to \infty with T fixed sends the trotter error to zero — recovering ideal adiabatic evolution. QAOA at finite p is a bounded-error approximation to it.

The p = 1 MaxCut landscape

For MaxCut on bounded-degree graphs at p = 1, the expectation value F_1(\gamma, \beta) = \langle H_C \rangle can be computed analytically per edge. For a d-regular triangle-free graph, the per-edge contribution at p = 1 is:

\langle C_{ij} \rangle = \tfrac{1}{2} + \tfrac{1}{4} \sin(4\beta) \sin(\gamma) [\cos^{d-1}(\gamma) - \cos^{d-1}(\gamma)] + \ldots

— a clean closed-form. On 3-regular triangle-free graphs this gives the tight approximation ratio 0.6924, achieved at specific optimal (\gamma^*, \beta^*). The full analysis (Farhi et al. 2014) is an exercise in symbolic computation with Pauli strings.

Barren-plateau phenomena in QAOA

QAOA is more structured than generic hardware-efficient ansatzes, so the McClean barren-plateau result does not directly apply. However, recent work (Wang-Fontana-Cerezo 2021) shows that QAOA at large p on typical random instances exhibits gradient variance decaying polynomially with problem size — not exponentially, but still a training challenge. For deep QAOA the optimisation landscape has many local minima; warm-starting from lower-p solutions and Fourier-based interpolation schemes help.

Parameter concentration and the Fourier heuristic

Remarkable empirical observation: for many graph families (3-regular, Erdős-Rényi random), the optimal QAOA parameters at a given p are nearly the same across different instances. This "parameter concentration" allows transfer — learn parameters once on a small instance, reuse on a larger one. Zhou et al. (2020) formalised this via a Fourier-basis parameterisation: express (\gamma_k, \beta_k) as truncated Fourier series and optimise the Fourier coefficients instead. Works empirically, no rigorous theory yet.

The original Farhi-Gutmann adiabatic algorithm (2000)

QAOA was not pulled from a hat in 2014. It was the trotterised cousin of the Farhi-Goldstone-Gutmann-Sipser 2000 adiabatic algorithm, where H(s) = (1 - s) H_M + s H_C is evolved smoothly over T \propto 1/\Delta_{\min}^2 time. That earlier paper proposed adiabatic QC as an alternative to the gate model; it is computationally equivalent (Aharonov et al. 2007) but often more intuitive for optimisation. QAOA is the gate-model way of running approximate-adiabatic evolution on finite-coherence hardware.

QAOA for quantum chemistry

QAOA is usually pitched for combinatorial optimisation, but the same alternating structure can target molecular Hamiltonians. Here H_C is the electronic Hamiltonian (after qubit mapping) and H_M is a chemistry-natural mixer. This is sometimes called "chemistry QAOA" or a Hamiltonian-variational ansatz. Its convergence behaviour connects to adiabatic state preparation and to quantum-chemistry methods like Hartree-Fock–QAOA hybrids.

Where this leads next

The next chapter, QAOA for MaxCut, applies this machinery to the canonical benchmark: the MaxCut problem, where QAOA's approximation-ratio analysis is tightest and the classical baseline (Goemans-Williamson) is sharpest. After that: variational quantum eigensolver for chemistry (QAOA's chemistry cousin), adiabatic quantum computing for the full adiabatic framework, and barren plateaus for the training obstacle that haunts all variational algorithms.

References

Edward Farhi, Jeffrey Goldstone, Sam Gutmann, A Quantum Approximate Optimization Algorithm (2014) — arXiv:1411.4028. The paper that defined QAOA.
Edward Farhi, Aram Harrow, Quantum Supremacy through the Quantum Approximate Optimization Algorithm (2016) — arXiv:1602.07674. On hardness of classically simulating QAOA output distributions.
Michel Goemans, David Williamson, Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming (JACM, 1995) — dl.acm.org/doi/10.1145/227683.227684. The classical 0.878 bound.
John Preskill, Lecture Notes on Quantum Computation, Chapter 7: NISQ algorithms — theory.caltech.edu/~preskill/ph229.
Qiskit Textbook, Solving combinatorial optimization problems using QAOA.
Wikipedia, Quantum optimization algorithms.