In short

Given a Hamiltonian H = \sum_{j=1}^J h_j that is a sum of k-local pieces, you want to implement the time-evolution unitary e^{-iHt} on a quantum computer. The obstacle: e^{A+B} \neq e^A e^B when A and B do not commute, so you cannot just multiply the exponentials of the individual terms. Trotter's 1959 formula provides the fix: for large n,

e^{-iHt} \approx \left(\prod_{j=1}^J e^{-i h_j t/n}\right)^n.

Each factor e^{-i h_j t/n} is a small, bounded-size unitary — easy to implement as a short quantum circuit because h_j is k-local. The price is an error that scales as O(t^2/n) per Trotter step of the sum — so to achieve total precision \epsilon, you need n = O(t^2 / \epsilon) Trotter steps and a total gate count of O(J \cdot t^2 / \epsilon).

Seth Lloyd's 1996 Science paper [2] turned this observation into the first efficient quantum-simulation algorithm: polynomial in the number of qubits, polynomial in the evolution time, polynomial in 1/\epsilon. Everything in Part 15 — higher-order Trotter, linear combinations of unitaries, qubitisation — is a refinement of, or a replacement for, this single idea.

You have a Hamiltonian in hand. Last chapter gave you the zoo — Ising, Heisenberg, Hubbard, molecular — and convinced you that every one of them is a sum of k-local terms, H = \sum_j h_j. Your quantum computer wants to run e^{-iHt}, the unitary that evolves a state for time t under H. You know what each e^{-i h_j t} looks like individually — a small bounded-size unitary on the few qubits h_j touches, which you can compile into elementary gates. What you need is a way to glue those pieces into the full e^{-iHt}.

Here is the wrinkle. If A and B were ordinary numbers, you could write e^{A+B} = e^A \cdot e^B without a second thought. But Hamiltonian terms are matrices, and matrices do not commute in general. The Pauli operators X and Z famously satisfy XZ = -ZX — their product depends on the order. For non-commuting operators, e^{A+B} is not the same as e^A e^B. If you just multiplied the exponentials of the individual terms, you would be computing the wrong unitary.

Trotter's insight — the foundation of quantum Hamiltonian simulation — is that if you chop the time into very small slices, the error from pretending the pieces commute shrinks faster than the number of slices grows. In the limit of infinitely many slices, the approximation becomes exact. For any finite number of slices, the error is controllable. That is what you are going to derive and count in this chapter.

The core obstacle — non-commutativity

Start with the sharpest possible picture. Two 2 \times 2 matrices:

A = \begin{pmatrix} 0 & 1 \\ 1 & 0 \end{pmatrix} = X, \qquad B = \begin{pmatrix} 1 & 0 \\ 0 & -1 \end{pmatrix} = Z.

Compute both orderings.

AB = XZ = \begin{pmatrix} 0 & -1 \\ 1 & 0 \end{pmatrix}, \qquad BA = ZX = \begin{pmatrix} 0 & 1 \\ -1 & 0 \end{pmatrix}.

Why they are different: X flips the basis then Z adds a sign to |1\rangle; but Z first adds a sign then X flips — the sign ends up on the other basis element. The commutator [X, Z] = XZ - ZX = -2iY is not zero. Two Pauli operators on the same qubit generically do not commute.

Expand e^{X+Z} as a power series to second order:

e^{X+Z} = I + (X+Z) + \frac{(X+Z)^2}{2} + \ldots = I + (X+Z) + \frac{X^2 + XZ + ZX + Z^2}{2} + \ldots

Now expand e^X e^Z, also to second order:

e^X e^Z = \left(I + X + \frac{X^2}{2}\right)\left(I + Z + \frac{Z^2}{2}\right) + \ldots
= I + X + Z + XZ + \frac{X^2 + Z^2}{2} + \ldots

Compare the second-order pieces. In e^{X+Z} you have \frac{1}{2}(XZ + ZX) — the symmetric combination. In e^X e^Z you have XZ alone. The difference is

e^X e^Z - e^{X+Z} = \frac{XZ - ZX}{2} + \ldots = \frac{[X,Z]}{2} + \ldots

Why this matters: whenever [A, B] \neq 0, the product e^A e^B differs from e^{A+B} at second order in the size of A and B. The leading error is proportional to the commutator. If you make A and B smaller — say, replace A by A/n and B by B/n — the commutator gets smaller by 1/n^2. That is the lever Trotter will pull.

e^A e^B versus e^(A+B) differ by commutatorThree panels. Left: a Bloch-sphere dot at the north pole, with arrows showing rotation by A then by B ending at one location. Middle: same starting dot, rotation by B then by A ending at a different location. Right: rotation by A+B directly, a third location between. Arrows drawn with curved paths on a sphere.e^B e^A |ψ⟩|0⟩endpoint 1ABe^A e^B |ψ⟩|0⟩endpoint 2BAe^(A+B) |ψ⟩|0⟩true endpoint
Non-commutativity made visible. For a single qubit, evolving by $A$ then $B$, by $B$ then $A$, and by $A+B$ directly land at three different points on the Bloch sphere. The gap between endpoint 1 / 2 and the true endpoint is controlled by the commutator $[A, B]$. Trotter's trick is to cut the time into many small pieces so each gap becomes tiny.

So if A and B do not commute, e^{A+B} is not e^A e^B. What is it?

Trotter's formula — the idea

Trotter's 1959 theorem [1] says this. For any bounded operators A and B (bounded means their operator norms \|A\| and \|B\| are finite — always true for k-local terms with bounded coefficients), the limit

e^{A+B} = \lim_{n \to \infty} \left(e^{A/n} \, e^{B/n}\right)^n

holds exactly. The product of exponentials, if you slice each one into n pieces and interleave them, converges to the exponential of the sum.

Heuristic picture. Cut the time interval [0, t] into n small slices, each of length \Delta t = t/n. Over one slice, evolve first by A alone for time \Delta t, then by B alone for time \Delta t. Do this n times in a row. In the limit n \to \infty, \Delta t \to 0, the non-commutativity error in each slice shrinks like (\Delta t)^2 = (t/n)^2, and since there are n slices, the total error shrinks like t^2 / n \to 0. You end up with the correct unitary e^{A+B}.

That is the idea in one breath. Now make it quantitative.

The first-order Trotter formula

Substitute A \to -iAt, B \to -iBt into Trotter's identity (the -i factors and the time t are absorbed into the operators). The formula becomes

e^{-i(A+B)t} = \lim_{n \to \infty} \left(e^{-iAt/n} \, e^{-iBt/n}\right)^n.

For finite n, define the first-order Trotter approximation

U_1(t, n) \;=\; \left(e^{-iAt/n} \, e^{-iBt/n}\right)^n.

The question is: how close is U_1(t, n) to the true unitary e^{-i(A+B)t}?

The error bound, derived

You can read off the leading-order error from the Baker-Campbell-Hausdorff expansion. For small \Delta t = t/n:

e^{-iA\Delta t} e^{-iB\Delta t} = \exp\!\left(-i(A+B)\Delta t - \frac{\Delta t^2}{2}[A, B] + O(\Delta t^3)\right).

Why: the BCH formula says e^X e^Y = e^{X + Y + \tfrac{1}{2}[X, Y] + \ldots}. Plug in X = -iA\Delta t, Y = -iB\Delta t. The commutator [X, Y] = -\Delta t^2 [A, B]. The leading correction to e^{-i(A+B)\Delta t} is -\tfrac{1}{2}\Delta t^2 [A, B].

Step from one slice to n slices. Each slice introduces an error of size \frac{1}{2}\Delta t^2 \|[A, B]\| in the operator norm. The errors accumulate — roughly, the total error after n slices is n times the per-slice error:

\|U_1(t, n) - e^{-i(A+B)t}\| \;\leq\; \frac{n \cdot \Delta t^2 \cdot \|[A, B]\|}{2} \;=\; \frac{t^2 \, \|[A, B]\|}{2 n}.

This is the first-order Trotter error bound. It scales as O(t^2 / n) — quadratic in the total time t, inverse in the number of slices n.

Generalising to a sum of J terms

For a local Hamiltonian H = \sum_{j=1}^J h_j, apply the same idea piece by piece. One Trotter step through all J terms is

S_1(\Delta t) \;=\; e^{-i h_1 \Delta t} \, e^{-i h_2 \Delta t} \cdots e^{-i h_J \Delta t}.

The full simulation is n Trotter steps:

U_1^H(t, n) \;=\; \big(S_1(t/n)\big)^n \;=\; \left(\prod_{j=1}^{J} e^{-i h_j t/n}\right)^n.

The error per step picks up contributions from every pair of non-commuting terms. A standard bound [2][4]:

\|U_1^H(t, n) - e^{-iHt}\| \;\leq\; \frac{t^2}{2n} \sum_{j < k} \|[h_j, h_k]\|.

If each \|h_j\| \leq \Lambda (a uniform bound on each term's operator norm) and there are J terms, the sum of commutator norms is at most J^2 \Lambda^2, giving

\|U_1^H(t, n) - e^{-iHt}\| \;=\; O\!\left(\frac{J^2 \Lambda^2 t^2}{n}\right).

For k-local Hamiltonians with \|h_j\| = O(1) and J polynomial in the number of qubits N, this is the bound. For many physical Hamiltonians the commutator count is much smaller — only pairs of terms that share a qubit can fail to commute, which for nearest-neighbour 2-local Hamiltonians gives O(N) non-zero commutators, not O(N^2).

First-order Trotter formula

Let H = \sum_{j=1}^{J} h_j be a sum of bounded Hermitian operators on N qubits. The first-order Trotter approximation of e^{-iHt} with n Trotter steps is

U_1^H(t, n) = \left(\prod_{j=1}^{J} e^{-i h_j t/n}\right)^n.

Error bound. \|U_1^H(t, n) - e^{-iHt}\| \leq \dfrac{t^2}{2n} \sum_{j < k} \|[h_j, h_k]\|.

Gate count. Each factor e^{-i h_j t/n} implements in O(1) gates (since h_j is k-local with bounded k). Total gates: O(J \cdot n). To reach error \epsilon, set n = O(J^2 \Lambda^2 t^2 / \epsilon), giving total gates O(J^3 \Lambda^2 t^2 / \epsilon).

For practical k-local Hamiltonians (J = O(N), \Lambda = O(1)): total gates O(N^3 t^2 / \epsilon). Higher-order formulas (next chapter) improve the dependence on t and \epsilon.

Reading the bound. Three knobs control the cost:

A picture of one Trotter step

One Trotter step as a sequence of small unitariesA horizontal bar labelled "time t" at the top. Below, a line divided into n slices. Each slice is decomposed into J smaller boxes labelled e^(-i h_1 Δt), e^(-i h_2 Δt), ..., e^(-i h_J Δt). An arrow below illustrates the repetition n times.Simulating e^(−iHt) with H = Σⱼ hⱼ via first-order Trottertotal time tdivide into n slices of duration Δt = t/nh₁Δth₂Δth₃Δt...h_JΔtstep 1...step 2· · ·......step nOne Trotter step S₁(Δt):J small unitaries in sequenceTotal circuit:J · n elementary unitariesError after n steps:O(t² / n)For error ≤ ε, need:n = O(t² / ε)Total gates:O(J · t² / ε)Each small box e^(−i hⱼ Δt) is k-local ⇒ constant-gate-count subroutine. That is why this is efficient.
A first-order Trotter simulation of $e^{-iHt}$ unrolled as a quantum circuit. The total evolution is cut into $n$ slices; each slice runs the $J$ terms of the Hamiltonian in sequence, each for a short time $\Delta t = t/n$. Total gate count is $J \cdot n$, all bounded-size pieces, provided the Hamiltonian is $k$-local.

Why locality is the load-bearing assumption

Take a moment to see what makes this work. Each factor e^{-i h_j t/n} is the exponential of a k-local operator. An operator that touches k qubits is a 2^k \times 2^k matrix — size 4 for k=2, size 16 for k=4, a fixed number. Its exponential is another 2^k \times 2^k unitary, and by the Solovay-Kitaev theorem any such unitary can be built out of O(4^k) gates from a universal gate set, to error \epsilon, in \text{polylog}(1/\epsilon) steps.

For bounded k, this is O(1) gates per factor. The full Trotter circuit has J \cdot n factors, each O(1) gates, so the total gate count is O(J n). Setting n = O(J^2 t^2 / \epsilon) to hit error \epsilon gives total gates O(J^3 t^2 / \epsilon) — polynomial in every parameter. This is Lloyd's efficient quantum simulation theorem [2].

Contrast with a generic (non-local) Hamiltonian. An arbitrary 2^N \times 2^N Hermitian has 4^N independent real parameters. There is no hope of decomposing e^{-iHt} into anything polynomial in N; the unitary simply has too much information in it. Locality is what makes the problem tractable, and Trotter is the algorithm that exploits it.

Why this is the Feynman-Lloyd theorem: Feynman (chapter 129) argued that nature, being quantum-mechanical, can only be efficiently simulated by another quantum system. Lloyd's 1996 paper gave the first proof — an explicit quantum algorithm (first-order Trotter) that simulates any k-local Hamiltonian in polynomial time, polynomial space, and with bounded error. Before Lloyd, "efficient quantum simulation" was a hope. After Lloyd, it was a theorem.

Examples — work through the counts

Example 1: simulate $e^{-i(X+Z)t}$ on one qubit

Setup. The simplest non-commuting Hamiltonian: two Paulis on one qubit. H = X + Z. You want to apply e^{-iHt} to a qubit in state |0\rangle. Take t = 1 (in some unit of inverse energy) and aim for error \epsilon = 10^{-2} using first-order Trotter.

Step 1. Split and exponentiate each term. X is 1-local, Z is 1-local, both single-qubit Paulis. The exponentials are standard single-qubit rotations:

e^{-iX\Delta t} = \cos(\Delta t)\, I - i \sin(\Delta t)\, X, \qquad e^{-iZ\Delta t} = \begin{pmatrix} e^{-i\Delta t} & 0 \\ 0 & e^{i\Delta t} \end{pmatrix}.

Why: for any Pauli P with P^2 = I, e^{-iP\theta} = \cos\theta \, I - i\sin\theta \, P. Paulis are involutions; their exponentials are rotations by angle \theta on the Bloch sphere about the corresponding axis.

Step 2. Choose n. The commutator is [X, Z] = -2iY, so \|[X, Z]\| = 2. The error bound is \|U_1 - e^{-i(X+Z)}\| \leq t^2 \|[X,Z]\| / (2n) = 1 / n. For error \epsilon = 10^{-2}, set n = 100 Trotter steps.

Step 3. Build the circuit. One Trotter step is R_X(2/n) \cdot R_Z(2/n) where R_P(\theta) = e^{-iP\theta/2} is the standard rotation convention. Repeat n times. Total gate count: 2n = 200 single-qubit rotations. Modern compilers collapse consecutive same-axis rotations, but for the plain algorithm: 200 gates.

Step 4. Verify numerically. The exact e^{-i(X+Z)} is a rotation on the Bloch sphere about the axis (1, 0, 1)/\sqrt{2} by angle 2\sqrt{2}. The Trotter approximation U_1(1, 100) is close — you can compute \|U_1 - e^{-i(X+Z)}\| explicitly and check it is roughly 10^{-2}, matching the bound.

Result. First-order Trotter with n = 100 steps reaches 1% precision on e^{-i(X+Z)} at t = 1, using 200 single-qubit rotations.

Example 1: Trotter circuit for e^(−i(X+Z)t)A quantum circuit showing n repetitions of the block R_X(2/n) R_Z(2/n) applied to |0⟩. Below, the axis of the net rotation visualised on a Bloch sphere.First-order Trotter for e^(−i(X+Z)t) on one qubit|0⟩R_ZR_XR_ZR_X· · ·R_ZR_Xstep 1step 2step n=100Rotation angle per step: 2Δt = 2/n = 0.02 radTotal gates: 2n = 200 single-qubit rotationsError bound: ‖U₁ − e^(−i(X+Z))‖ ≤ 1/n = 0.01 ✓The exact unitary is a Bloch rotation about axis (1, 0, 1)/√2 by angle 2√2.
Example 1 circuit. 100 Trotter steps, each applying $R_Z$ then $R_X$ with small angle, approximates $e^{-i(X+Z)}$ to 1% precision.

What this shows. Even for one qubit, Trotter is doing real work: [X, Z] \neq 0, so you cannot just apply e^{-iX} once and e^{-iZ} once. You have to interleave many small rotations to approximate the true evolution.

Example 2: gate count for the 3×3 Ising model

Setup. From the last chapter's example: the transverse-field Ising model on a 3×3 periodic lattice.

H = -J \sum_{\langle i,j\rangle} Z_i Z_j - h \sum_{i=1}^{9} X_i, \qquad J = h = 1.

J and h here are energy scales (units of inverse time), not the number of terms. The number of terms: 18 bond ZZ terms + 9 field X terms = 27 terms total, all k-local with k = 2. You want to simulate e^{-iHt} for t = 10 (in units of inverse energy) with total error \epsilon = 10^{-2}.

Step 1. Choose the Trotter order and step count. First-order Trotter error bound: \|U_1 - e^{-iHt}\| \leq t^2 C / n where C is a problem-dependent constant that depends on commutators. For a 2D Ising model with all coefficients O(1), numerical experiments and analytical bounds give C on the order of tens (rough estimate).

Using C \approx 30 (absorbing the sum of commutator norms):

\epsilon = \frac{C t^2}{n} \;\Longrightarrow\; n = \frac{C t^2}{\epsilon} = \frac{30 \cdot 100}{0.01} = 3 \times 10^5.

Step 2. Count gates per Trotter step. 27 terms per step, each a single small rotation (an R_{ZZ} or R_X). An R_{ZZ} compiles to 2 CNOTs + 1 single-qubit rotation = 3 elementary gates. An R_X is 1 gate. So gates per Trotter step: 18 \cdot 3 + 9 \cdot 1 = 63 elementary gates.

Step 3. Total gate count. 63 \times 3 \times 10^5 \approx 2 \times 10^7 gates.

Why this matters: 20 million gates is well beyond NISQ reach (current hardware's gate-error ceiling is around 10^{-3}, and 10^7 gates all need to work to get one measurement). To actually run this simulation, you need either a fault-tolerant machine or a shorter time / larger error tolerance. Using higher-order Trotter (next chapter) or LCU would reduce this to perhaps 10^4 gates.

Step 4. Back-of-envelope with higher-order. Fourth-order Trotter (chapter 132) reduces n to O(t^{5/4}/\epsilon^{1/4}). For t = 10, \epsilon = 10^{-2}: n \sim 10^{5/4} / (10^{-2})^{1/4} \sim 18 \cdot 3.2 \approx 60. A dramatic reduction — at the cost of more gates per Trotter step (factor ~5).

Result. First-order Trotter on the 3×3 Ising model for t = 10, \epsilon = 10^{-2}: roughly 2 \times 10^7 gates. Fourth-order Trotter: roughly 3 \times 10^3 gates. A 10,000× reduction, which is why the next chapter exists.

Example 2: 3×3 Ising gate countA bar chart comparing three simulation schemes: first-order Trotter, second-order Trotter, fourth-order Trotter. Gate counts shown on a log scale.Gate count: 3×3 Ising model, t=10, ε=0.0110²10⁴10⁶10⁸gates2·10⁷1st-ordern = 3·10⁵ steps2·10⁵2nd-ordern ≈ 10003·10³4th-ordern ≈ 60
Gate count for simulating the 3×3 Ising model at $t = 10$ to precision $10^{-2}$, using three Trotter orders. First-order needs ~$2 \times 10^7$ gates; higher orders (next chapter) reduce this by four orders of magnitude.

What this shows. First-order Trotter is simple and always works, but its gate count explodes for long simulations. Knowing the number of terms, the time, and the precision lets you estimate in advance whether a simulation is feasible on available hardware.

Common confusions

Going deeper

You now have first-order Trotter: the formula, the error bound, the gate count, the picture. That is enough to write a working quantum simulator for any k-local Hamiltonian. What follows goes into the structure below the bound — Baker-Campbell-Hausdorff corrections, tighter commutator-scaling bounds, the modern post-Trotter landscape (LCU, qDRIFT, qubitisation), and applications to quantum chemistry where real gate-count engineering happens.

Baker-Campbell-Hausdorff corrections

The formula e^A e^B = e^{A + B + \tfrac{1}{2}[A, B] + \tfrac{1}{12}([A, [A, B]] - [B, [A, B]]) + \ldots} is the Baker-Campbell-Hausdorff expansion. Every term in the BCH series is a nested commutator of A and B. When you Trotterise with small step \Delta t = t/n, each slice introduces a correction

\Delta t \cdot (A + B) + \tfrac{\Delta t^2}{2}[A, B] + O(\Delta t^3)

in the exponent. Summing over n slices, the linear part gives the desired t(A + B); the quadratic commutator term gives a total error \propto n \cdot \Delta t^2 = t^2/n, which is the first-order Trotter error. Pushing BCH to third order gives the leading O(t^3/n^2) correction after careful reshuffling — which is the error of second-order (symmetric) Trotter. Higher-order Suzuki formulas kill these corrections systematically.

Commutator-scaling bounds — tighter than worst-case

The bound \|U_1 - e^{-iHt}\| \leq \tfrac{t^2}{2n} \sum_{j < k} \|[h_j, h_k]\| is tighter than the worst-case bound O(J^2 \Lambda^2 t^2 / n) because many commutators vanish. For a nearest-neighbour 2-local Hamiltonian on an N-qubit lattice, only terms sharing a qubit fail to commute — there are O(N) such non-zero commutators, not O(N^2). Childs et al. 2019 [6] proved commutator-based Trotter bounds with explicit constants, showing that physical Hamiltonians often need far fewer steps than worst-case analysis predicts. This "commutator scaling" is why real-world quantum-chemistry simulations use Trotter in practice.

Post-Trotter methods

Three modern algorithms improve over Trotter.

The hierarchy: Trotter is simplest and most versatile; LCU is better scaling but needs more ancilla qubits; qubitisation is optimal but requires the most complex block-encoding machinery.

Application to quantum chemistry

For real molecules, first-order Trotter applied to the molecular Hamiltonian (Part 15 later chapters) gives gate counts of order 10^{10}10^{14} for chemical accuracy — well out of reach of NISQ. Much of the last decade of quantum-chemistry-on-QC research is about engineering the constant factors. Key tricks:

A 2020 paper by Campbell and co-authors estimated that simulating FeMoCo — the nitrogen-fixation enzyme active site — needs around 10^{10} gates with heavily optimised Trotter. With qubitisation, that drops to 10^{8}. Both are beyond NISQ; fault tolerance is required. The open question is whether the hardware gets there before the classical competition (tensor networks, DMRG extended to larger systems) solves the same problems.

Indian research connections

TIFR's theoretical condensed-matter group has published on Trotter-based simulation of Hubbard models; IISc Bangalore's quantum chemistry group (Mukhopadhyay, Mandal, and others) has VQE + Trotter hybrid-algorithm papers. India's National Quantum Mission explicitly lists Hamiltonian simulation as a priority use case for the future fault-tolerant quantum computing effort.

Where this leads next

References

  1. Wikipedia, Lie product formula (Trotter product formula) — the original Trotter 1959 identity and its extensions.
  2. Seth Lloyd, Universal Quantum Simulators, Science 273, 1073 (1996) — DOI:10.1126/science.273.5278.1073.
  3. Nielsen and Chuang, Quantum Computation and Quantum Information, §4.7 — Cambridge University Press.
  4. John Preskill, Lecture Notes on Quantum Computation, Chapter 7 — theory.caltech.edu/~preskill/ph229.
  5. Berry, Childs, Kothari, Hamiltonian simulation with nearly optimal dependence on all parameters (2015) — arXiv:1501.01715.
  6. Childs, Su, Tran, Wiebe, Zhu, A Theory of Trotter Error (2019) — arXiv:1912.08854.