Qubitization and Block Encoding

Note: Company names, engineers, incidents, numbers, and scaling scenarios in this article are hypothetical — even when they resemble real ones. See the full disclaimer.

In short

Block encoding is the trick that turns any operator A with \|A\| \leq 1 (not just unitaries) into something a quantum circuit can apply. Find a unitary U on ancilla \otimes data such that, in block-matrix form on the ancilla split |0\rangle_a vs |\neq 0\rangle_a,

U \;=\; \begin{pmatrix} A & \cdot \\ \cdot & \cdot \end{pmatrix}.

The operator A sits as the top-left block of the larger unitary. Apply U, project the ancilla onto |0\rangle, and you have applied A to the data register. LCU (chapter 133) is one recipe for producing block encodings.

Qubitisation (Low-Chuang 2017) is what you do with a block encoding of a Hermitian H/\alpha. Define the walk operator

W \;=\; (2|0\rangle\langle 0|_a \otimes I_d - I)\,U,

a reflection about the |0\rangle-ancilla subspace, composed with U. The walk operator W has eigenvalues e^{\pm i\arccos(\lambda/\alpha)} for each eigenvalue \lambda of H — the spectrum of H is encoded in the phase of W. Phase estimation on W therefore extracts \lambda/\alpha to any precision.

Combined with quantum signal processing, qubitisation implements e^{-iHt} at optimal gate count O(\alpha t + \log(1/\epsilon)) — matching the Berry lower bound. This is the state of the art. Every quantum-chemistry and Hamiltonian-simulation result published since 2020 uses a qubitised, block-encoded algorithm, because no other approach is this efficient.

You met LCU in chapter 133: prepare-select-unprepare, and a non-unitary linear combination \sum_i \alpha_i U_i gets applied to the data register conditional on the ancilla returning to |0\rangle. The success probability is 1/\alpha^2, amplitude amplification boosts it to 1/\alpha, and the whole thing is a probabilistic gadget.

Now zoom out. What LCU really did was embed a non-unitary operator V = \sum_i \alpha_i U_i into a larger unitary W on ancilla-plus-data space, with V/\alpha occupying the top-left block of W. The post-selection was just extracting that block. Forget the post-selection for a moment and look at W itself: it is a perfectly good unitary, implementable as a quantum circuit, which somehow encodes a non-unitary operator inside its structure.

That observation turns out to be the single most fertile idea in modern quantum algorithms. It generalises. You can build unitaries whose top-left block is any operator A with \|A\| \leq 1, not just a sum of unitaries. You can compose such unitaries to apply polynomials of A. You can process the ancilla with a cleverly chosen sequence of rotations and make the whole construction implement e^{-iAt}, or A^{-1}, or any other smooth function of A — at optimal cost.

The framework is block encoding. The specific construction for Hamiltonian simulation — and the algorithm that saturates the Berry 2014 lower bound on gate count — is qubitisation, introduced by Guang Hao Low and Isaac Chuang in 2017 [1]. This chapter is their story.

Block encoding — the definition

Start with the operator you want to apply. Call it A; it is an operator on n data qubits (2^n-dimensional space) with spectral norm \|A\| \leq 1. The constraint \|A\| \leq 1 is necessary: a unitary preserves vector norms, so nothing hidden inside a unitary can amplify norms beyond 1. If your operator has \|A\| > 1, rescale: work with A/\alpha for some \alpha \geq \|A\|, and track the factor of \alpha explicitly.

The block-encoding statement

An (\alpha, a, \epsilon)-block encoding of an operator A is a unitary U on a + n qubits (that is, a ancilla qubits plus the n data qubits) such that

\bigl\|A - \alpha\,(\langle 0|^{\otimes a} \otimes I_d)\,U\,(|0\rangle^{\otimes a} \otimes I_d)\bigr\| \;\leq\; \epsilon.

Read this in pieces. The sandwich (\langle 0|^{\otimes a}\otimes I_d)\,U\,(|0\rangle^{\otimes a}\otimes I_d) is an operator on the data register alone — it takes a data state |\psi\rangle, tensors with |0\rangle^{\otimes a} on the ancilla, applies U, and projects the ancilla back onto |0\rangle^{\otimes a}. If U were unitary without any ancilla entanglement, this would give back |\psi\rangle with some probability. In general, though, U mixes ancilla and data, and the sandwich extracts a specific linear operator on the data register.

That extracted operator, multiplied by the scaling \alpha, equals A up to error \epsilon. In block form:

U \;=\; \begin{pmatrix} A/\alpha & \cdot \\ \cdot & \cdot \end{pmatrix}

on the ancilla split between "|0\rangle^{\otimes a}" (the top-left block) and "everything else." The dots are not arbitrary — they must make U a unitary — but they are not constrained beyond that.

Why the constraint \|A\| \leq \alpha is necessary

Suppose \|A\| > \alpha. Then there is some unit vector |\psi\rangle with \|A|\psi\rangle\| > \alpha. Put that vector on the data register with |0\rangle^{\otimes a} on the ancilla and apply U. The resulting state |0\rangle^{\otimes a} \otimes A|\psi\rangle/\alpha + \text{(orthogonal)} would have the |0\rangle_a-branch norm equal to \|A|\psi\rangle\|/\alpha > 1. But unitaries preserve norms: the total norm stays 1, so the |0\rangle_a-branch cannot exceed 1. Contradiction. So \alpha must be at least \|A\|.

This is the "normalisation tax" of block encoding. The bigger \alpha you use, the easier the block encoding is to build (LCU gives \alpha = \sum_i \alpha_i), but the more the encoded operator is watered down — and the longer the subsequent algorithms take, because they scale with \alpha.

Block encoding in one picture. The unitary $U$ acts on ancilla tensor data. Split the rows and columns according to whether the ancilla is in $|0\rangle^{\otimes a}$ or not. The top-left block, corresponding to ancilla $= |0\rangle$ both before and after, is $A/\alpha$. The other three blocks are filled in with whatever entries are needed to make $U$ unitary overall.

How to get a block encoding

LCU. Chapter 133 builds a block encoding of any operator V = \sum_i \alpha_i U_i with normalisation constant \alpha = \sum_i \alpha_i. The PREP-SELECT-PREP† circuit is the block encoding.

Sparse Hamiltonians. For a Hamiltonian H with at most d non-zero entries per row, accessible through oracle queries, a block encoding exists with \alpha = d\,\|H\|_{\max} and O(\log N) ancilla qubits, using Berry-Childs 2015 techniques.

Purified density matrices. Given a quantum state |\psi\rangle on system + reference, the reduced density matrix \rho on the system has a natural block encoding with \alpha = 1 using the purification itself.

Fermionic Hamiltonians. Second-quantised molecular Hamiltonians have clever block encodings with \alpha much smaller than the naive LCU sum of coefficients — this is the "double factorisation" line of work (Motta-Ye-McClean 2020, von Burg-Tezuka 2021) that brought FeMoco gate-count estimates down by orders of magnitude.

Each construction trades different quantities: \alpha (bigger is worse for subsequent algorithms), ancilla count (usually logarithmic in problem size), and circuit depth (always a concern). The best block encoding for a given problem is an active research question.

From block encoding to spectral access — the idea of qubitisation

Suppose H is a Hermitian operator with \|H\| \leq \alpha, and you have a block encoding U of H/\alpha with a single-qubit ancilla (for simplicity; the construction generalises). The data register holds an arbitrary state; the ancilla starts in |0\rangle.

The question. Given U, how do you extract the spectrum of H? How do you apply e^{-iHt} without the post-selection penalty of LCU?

The observation that starts qubitisation. Look at what U does to states of the form |0\rangle_a|\lambda\rangle_d, where |\lambda\rangle_d is an eigenvector of H with eigenvalue \lambda. The block-encoding property says

So the |0\rangle_a-component of U|0\rangle_a|\lambda\rangle_d has amplitude \lambda/\alpha on |\lambda\rangle_d. The rest of the amplitude — with weight \sqrt{1 - (\lambda/\alpha)^2} — sits on states with the ancilla not in |0\rangle.

Write

U|0\rangle_a|\lambda\rangle_d \;=\; \frac{\lambda}{\alpha}|0\rangle_a|\lambda\rangle_d + \sqrt{1 - (\lambda/\alpha)^2}\,|\perp_\lambda\rangle,

where |\perp_\lambda\rangle is some state orthogonal to |0\rangle_a \otimes \text{(data)}.

Now here is the magical part: the vectors |0\rangle_a|\lambda\rangle_d and |\perp_\lambda\rangle span a 2-dimensional invariant subspace of a certain pair of reflections. Inside this 2D plane, U composed with a simple ancilla reflection acts as a rotation by a specific angle \theta_\lambda = \arccos(\lambda/\alpha).

Translation: the spectrum of H shows up as rotation angles in a family of 2D planes, one per eigenvalue of H. This is the trick. It turns an operator's spectrum (generally hard to access via unitaries) into a family of rotations (directly accessible via phase estimation).

The walk operator W

Define the ancilla reflection

R_a \;=\; 2|0\rangle\langle 0|_a \otimes I_d - I.

Acting on any state of the form |0\rangle_a|\phi\rangle_d, R_a returns |0\rangle_a|\phi\rangle_d. Acting on any state orthogonal to |0\rangle_a \otimes \text{(data)}, R_a flips the sign. It is the reflection about the |0\rangle_a-ancilla subspace.

The walk operator is

W \;=\; R_a \cdot U.

The claim of qubitisation is that W, restricted to the 2D plane spanned by |0\rangle_a|\lambda\rangle_d and a certain partner vector |\perp_\lambda\rangle, is a 2D rotation by angle 2\theta_\lambda where \cos\theta_\lambda = \lambda/\alpha.

Proving the rotation structure

Fix an eigenvector |\lambda\rangle_d of H. Define

|g_\lambda\rangle \;=\; |0\rangle_a|\lambda\rangle_d \qquad\text{("good" state for eigenvalue $\lambda$)}.

The block encoding gives

U|g_\lambda\rangle \;=\; \frac{\lambda}{\alpha}|g_\lambda\rangle + \sqrt{1 - (\lambda/\alpha)^2}\,|b_\lambda\rangle

where |b_\lambda\rangle is the normalised component orthogonal to |g_\lambda\rangle in the image. Note |b_\lambda\rangle has the ancilla not in |0\rangle (it is orthogonal to |0\rangle_a \otimes \text{(data)} when |\lambda/\alpha| < 1).

W|g_\lambda\rangle \;=\; R_a\,U|g_\lambda\rangle \;=\; \frac{\lambda}{\alpha}|g_\lambda\rangle - \sqrt{1 - (\lambda/\alpha)^2}\,|b_\lambda\rangle.

Why this is a rotation: in the basis \{|g_\lambda\rangle, |b_\lambda\rangle\}, the action of W on |g_\lambda\rangle is (\cos\theta_\lambda)|g_\lambda\rangle - (\sin\theta_\lambda)|b_\lambda\rangle, where \cos\theta_\lambda = \lambda/\alpha and \sin\theta_\lambda = \sqrt{1 - (\lambda/\alpha)^2}. That is a rotation by -\theta_\lambda in the g-b plane. To close the argument you need to show W is also a rotation on |b_\lambda\rangle and stays inside the same 2D plane — this is a calculation using the fact that U^\dagger acting on |b_\lambda\rangle reconstructs a state of the form (\lambda/\alpha)|b_\lambda\rangle + (\text{something})|g_\lambda\rangle because H is Hermitian. Low and Chuang's 2017 paper carries this through; the upshot is that W restricted to the 2D (|g_\lambda\rangle, |b_\lambda\rangle) plane is a rotation by angle 2\theta_\lambda (not \theta_\lambda — the factor of 2 comes from composing W with itself to close the orbit).

In matrix form on the (|g_\lambda\rangle, |b_\lambda\rangle) basis,

W \;=\; \begin{pmatrix} \cos 2\theta_\lambda & -\sin 2\theta_\lambda \\ \sin 2\theta_\lambda & \cos 2\theta_\lambda \end{pmatrix}, \qquad \cos\theta_\lambda = \frac{\lambda}{\alpha}.

Therefore W has eigenvalues e^{\pm 2i\theta_\lambda} inside this 2D plane — or equivalently (writing \phi_\lambda = 2\theta_\lambda) eigenvalues e^{\pm i\phi_\lambda} where \cos(\phi_\lambda/2) = \lambda/\alpha. The spectrum of H is encoded in the phase of W.

The full spectral decomposition of W

Across all eigenvalues of H, the data Hilbert space decomposes into 2D invariant planes of W. On each plane, W is a rotation by \pm 2\theta_\lambda. The full W acts as a direct sum of 2D rotations:

W \;=\; \bigoplus_{\lambda} R(2\theta_\lambda)_{\text{plane}_\lambda}.

This is the "qubitisation" name: the dynamics of the larger ancilla-data system decompose into independent 2-level (qubit-like) systems, one per eigenvalue of H.

Qubitisation turns the spectrum of $H$ into a family of phases of the walk operator $W$. Each eigenvalue $\lambda$ of $H$ produces a pair of eigenvalues $e^{\pm i\phi_\lambda}$ of $W$, where $\phi_\lambda = 2\arccos(\lambda/\alpha)$. The map from $\lambda$ to $\phi_\lambda$ is monotone on $[-\alpha, \alpha]$, so knowing $\phi_\lambda$ lets you recover $\lambda$.

Hamiltonian simulation via qubitisation

You now have the machinery. Given a block encoding of H/\alpha:

Construct W = R_a \cdot U. One extra gate beyond U.
The eigenvalues of W are e^{\pm i\phi_\lambda} where \cos(\phi_\lambda/2) = \lambda/\alpha.

The most direct algorithm uses phase estimation (chapter 73) on W. Applying W for T steps and running phase estimation gives each eigenvalue \phi_\lambda of W to precision O(1/T), from which \lambda = \alpha\cos(\phi_\lambda/2) is recovered.

But the sharpest result comes from quantum signal processing (QSP). QSP takes a block encoding (or a walk operator) and, by interleaving W with single-qubit rotations on the ancilla, constructs a new unitary that implements any smooth function of H. Specifically, there is a QSP circuit using O(\alpha t + \log(1/\epsilon)) applications of W that produces a block encoding of e^{-iHt} up to error \epsilon.

\text{gate count for simulating $e^{-iHt}$:} \quad O(\alpha t + \log(1/\epsilon)).

This is the Berry lower bound: any algorithm for simulating e^{-iHt} to precision \epsilon must use \Omega(\alpha t + \log(1/\epsilon)/\log\log(1/\epsilon)) queries to the block encoding. Qubitisation + QSP matches the lower bound up to a \log\log factor — it is optimal.

The historical progression

To appreciate how much qubitisation improves matters, stack up the successive Hamiltonian-simulation gate counts as a function of the complexity parameter \tau = \alpha t and error \epsilon.

The gate-count progression for Hamiltonian simulation. Trotter's $\tau^2/\epsilon$ becomes Suzuki's $\tau^{1+1/k}/\epsilon^{1/k}$ at higher orders. LCU Taylor brings the inverse-precision dependence to logarithmic while keeping a $\tau$ prefactor. Qubitisation + QSP achieves additive $\tau + \log(1/\epsilon)$ — matching the Berry lower bound up to a $\log\log$ factor.

The jump from LCU Taylor to qubitisation (row 3 to row 4) is from multiplicative \tau \cdot \log(1/\epsilon) to additive \tau + \log(1/\epsilon). When \epsilon is small and \tau is large, both terms matter, but you never pay more than their sum — whereas LCU Taylor would charge you their product. For chemistry-scale \tau \sim 10^6 and \epsilon \sim 10^{-4}, qubitisation saves a factor of \log(10^4) \approx 13 in gate count right off the bat, and the constants inside the O(\cdot) are small.

Worked example — block-encoding (X + Z)/2

Example 1 — block-encode $A = (X + Z)/2$ on one qubit

Setup. You have a 1-qubit operator A = (X + Z)/2. It is Hermitian (A^\dagger = A), its operator norm is \|A\| = 1/\sqrt 2 < 1. You want a block encoding with one ancilla qubit and \alpha as small as possible.

Step 1 — recognise this is LCU. Write A = \tfrac{1}{2}\cdot X + \tfrac{1}{2}\cdot Z, which is an LCU with coefficients \alpha_0 = \alpha_1 = 1/2, \alpha = 1. The LCU circuit from chapter 133 is PREP = H on the ancilla, SELECT = controlled-X and anti-controlled-Z, PREP† = H.

Step 2 — write down U. The full unitary is

U \;=\; (H \otimes I)\,\bigl(|0\rangle\langle 0| \otimes Z + |1\rangle\langle 1| \otimes X\bigr)\,(H \otimes I).

Expand the outer Hadamards on the ancilla. Let \{|0\rangle, |1\rangle\} be ancilla basis, \{|0\rangle, |1\rangle\} also the data basis (context clarifies). The action of U on |0\rangle_a|\psi\rangle_d:

H \otimes I takes |0\rangle_a|\psi\rangle_d to \tfrac{1}{\sqrt 2}(|0\rangle_a + |1\rangle_a)|\psi\rangle_d.

The SELECT part applies Z to the |0\rangle_a-part of the data and X to the |1\rangle_a-part, giving \tfrac{1}{\sqrt 2}(|0\rangle_a Z|\psi\rangle + |1\rangle_a X|\psi\rangle).

The second H \otimes I maps |0\rangle_a \to \tfrac{1}{\sqrt 2}(|0\rangle_a + |1\rangle_a) and |1\rangle_a \to \tfrac{1}{\sqrt 2}(|0\rangle_a - |1\rangle_a). Collecting the |0\rangle_a-component:

\langle 0|_a\,U\,|0\rangle_a|\psi\rangle_d \;=\; \tfrac{1}{\sqrt 2}\cdot\tfrac{1}{\sqrt 2}(Z|\psi\rangle + X|\psi\rangle) \;=\; \tfrac{1}{2}(X + Z)|\psi\rangle \;=\; A|\psi\rangle.

Step 3 — verify the block encoding. The block-encoding condition \langle 0|_a U |0\rangle_a = A/\alpha requires \alpha \geq \|A\|. Here \|A\| = 1/\sqrt 2 and the LCU sum gives \alpha = 1. The left-hand side of the check is A and the right-hand side is A/\alpha = A. Match. So U is a (1, 1, 0)-block encoding of A.

U \;=\; \begin{pmatrix} A & \cdot \\ \cdot & \cdot \end{pmatrix}.

Step 4 — what the other blocks are. Explicit matrix (in the basis |00\rangle, |01\rangle, |10\rangle, |11\rangle with ancilla first):

U = \tfrac{1}{2}\begin{pmatrix} 1 & 1 & 1 & -1 \\ 1 & -1 & -1 & -1 \\ 1 & 1 & -1 & 1 \\ -1 & 1 & -1 & -1 \end{pmatrix}.

The top-left 2\times 2 block is A = \tfrac{1}{2}\begin{pmatrix}1 & 1 \\ 1 & -1\end{pmatrix}, which is indeed (X + Z)/2. Check U^\dagger U = I: straightforward column-dot-product exercise. The other three blocks are determined uniquely (up to freedom within SELECT construction) by requiring unitarity.

Result. A = (X + Z)/2 is block-encoded as the top-left 2\times 2 block of a 4\times 4 unitary U. The normalisation constant is \alpha = 1 (the LCU \ell_1 norm of the coefficients), which is slightly larger than the optimal \|A\| = 1/\sqrt 2 but close enough for a first example. Sharper block encodings of the same A exist with \alpha = 1/\sqrt 2, at the cost of a more intricate preparation step.

What this shows. A 1-qubit Hermitian operator with norm < 1 lifts into a 2-qubit unitary where it occupies the top-left block. Applying U to |0\rangle_a|\psi\rangle_d and post-selecting ancilla = |0\rangle implements A with success probability \|A|\psi\rangle\|^2/\alpha^2. The rest of qubitisation — the walk operator, the spectrum encoding, QSP — operates on exactly this U.

Example 2 — the walk operator for $H = Z/2$

Setup. Take the simple 1-qubit Hamiltonian H = Z/2, which has eigenvalues \pm 1/2 with eigenvectors |0\rangle and |1\rangle respectively. You have a block encoding U of H with \alpha = 1/2 (so H/\alpha = Z, which is itself a unitary — take U = I_a \otimes Z, i.e., apply Z to the data regardless of the ancilla; then the top-left block is Z = H/\alpha).

Step 1 — walk operator. W = R_a \cdot U = (2|0\rangle\langle 0|_a - I)(I_a \otimes Z) = (2|0\rangle\langle 0|_a \otimes Z) - (I_a \otimes Z). Write this as a 4\times 4 matrix in \{|0\rangle, |1\rangle\}_a \otimes \{|0\rangle, |1\rangle\}_d. On |0\rangle_a, the ancilla reflection acts as identity, so W|0\rangle_a|\psi\rangle_d = |0\rangle_a Z|\psi\rangle_d. On |1\rangle_a, the ancilla reflection acts as -I, so W|1\rangle_a|\psi\rangle_d = -|1\rangle_a Z|\psi\rangle_d. Matrix form:

W \;=\; \begin{pmatrix} Z & 0 \\ 0 & -Z \end{pmatrix} \;=\; \begin{pmatrix} 1 & 0 & 0 & 0 \\ 0 & -1 & 0 & 0 \\ 0 & 0 & -1 & 0 \\ 0 & 0 & 0 & 1 \end{pmatrix}.

Step 2 — eigenvalues of W. The diagonal matrix above has eigenvalues 1, -1, -1, 1, or equivalently e^{0}, e^{i\pi}, e^{i\pi}, e^{0}. Compare to qubitisation's prediction: the eigenvalues of H are \lambda_\pm = \pm 1/2, so \lambda_\pm/\alpha = \pm 1, and \theta_\pm = \arccos(\pm 1) = 0 or \pi. The walk eigenvalues should be e^{\pm 2i\theta_\pm} = e^0 = 1 for \lambda = +1/2 and e^{\pm 2i\pi} = e^{0} or e^{2i\pi} both equal to 1? Hmm — this degenerate case is why this trivial example is a poor illustration of the non-trivial rotation structure.

Why the degeneracy: we chose \alpha = \|H\| = 1/2, which makes \lambda/\alpha = \pm 1, putting the rotation angle at the endpoints where the 2D plane degenerates. A more interesting block encoding uses \alpha > \|H\|, say \alpha = 1. Then U is the one from Example 1 (with A = Z), \lambda/\alpha = \pm 1/2, \theta_\pm = \arccos(\pm 1/2) = \pi/3 or 2\pi/3, and W's eigenvalues are e^{\pm 2\pi i/3} and e^{\pm 4\pi i/3} — non-trivial rotations whose phase encodes the original eigenvalue.

Step 3 — redo with \alpha = 1. Block-encode Z/2 using the LCU Z/2 = \tfrac{1}{2} Z, a single-term LCU with a trivial PREP = I that sets the ancilla to |0\rangle deterministically. That gives the \alpha = 1/2 case again. Use instead Z/2 = \tfrac{1}{4}I + \tfrac{1}{4}I + \tfrac{1}{4}Z + \tfrac{1}{4}(-(I-Z)) — there are multiple LCU decompositions. The point: different decompositions give different \alpha; choosing \alpha > \|H\| deliberately is what makes qubitisation's rotation structure visible.

Step 4 — the general statement. For a generic block encoding with \alpha slightly larger than \|H\|, each eigenvalue \lambda of H gives a walk operator eigenvalue e^{\pm i\phi_\lambda} with \cos(\phi_\lambda/2) = \lambda/\alpha. Running phase estimation on W returns \phi_\lambda to precision O(1/T), from which \lambda is recovered.

Result. The walk operator W for a block-encoded Hamiltonian has its spectrum encoded in the phases. For H = Z/2 with \alpha = 1, you get walk-eigenvalues at \pm 120^\circ and \pm 240^\circ on the unit circle, from which the original eigenvalues \pm 1/2 can be read off.

What this shows. A Hamiltonian whose spectrum you want to understand (Z/2, trivially here but imagine a complicated many-body H in the real use case) becomes a walk operator whose phase estimation gives you the spectrum. The walk operator is a single quantum circuit — no post-selection, no probabilistic failure. This is the magic of qubitisation: the non-unitary H/\alpha becomes the unitary W, with the spectrum preserved.

Why this is the current state of the art

Three properties make qubitisation + QSP the leading Hamiltonian-simulation algorithm:

Optimality. The gate count O(\alpha t + \log(1/\epsilon)) matches the Berry lower bound (up to \log\log). No future algorithm can be asymptotically faster.

Determinism. Unlike LCU's post-selection, qubitisation produces e^{-iHt} as a unitary embedded in a larger unitary — with no probabilistic failure. Amplitude amplification is still used inside the construction, but it is wrapped into a deterministic wrapper so the user never sees failed runs.

Generality. The same framework handles Hermitian H (via qubitised walk operators), non-Hermitian matrix inversion (HHL-style, via QSP applied to A^\dagger A), linear systems, amplitude estimation, and ground-state energy estimation. Gilyén-Su-Low-Wiebe 2019 [2] showed that essentially every known quantum algorithm fits inside this framework, under the name Quantum Singular Value Transformation (QSVT).

This is why modern chemistry and materials-science quantum algorithms — the FeMoco estimates that brought gate counts down from 10^{14} to 10^9, the Fe-catalysis papers from Querion, the lattice-gauge-theory simulations in high-energy physics — all use qubitised block-encoded algorithms. Trotter and LCU are pedagogical stepping stones now; production algorithms live in qubitisation space.

Applications and impact

Quantum chemistry. Molecular Hamiltonians decompose into fermionic terms; double-factorisation block encodings (Motta-Ye 2020, von Burg-Tezuka 2021) keep \alpha as small as possible, pushing FeMoco and related-enzyme estimates into \sim 10^9-T-gate regimes that may be reachable by fault-tolerant machines of the 2030s.

Condensed matter. 2D Hubbard-model simulations on qubitised architectures have gate-count estimates several orders of magnitude better than Trotter. Phase-diagram exploration at hundreds of sites is a target for early fault-tolerant hardware.

High-energy physics. Lattice gauge theories at non-zero chemical potential — classically blocked by the fermion sign problem — map naturally to qubitised block encodings.

Linear systems. The HHL algorithm's modern form uses QSP on a block encoding of A^\dagger A to apply A^{-1}, reaching optimal query complexity.

The common thread: every problem that used to require a custom quantum algorithm now slots into the qubitisation + QSP framework. It is the "unified theory" of quantum algorithms in the 2020s.

Common confusions

"Block encoding is the same as embedding in a bigger unitary." Yes, but with a specific normalisation: A appears as the top-left block scaled by \alpha. The \alpha is not cosmetic — it determines the downstream algorithm cost. Two block encodings of the same A with different \alpha lead to algorithms with different gate counts.
"Qubitisation is a Trotter scheme." No — qubitisation has nothing to do with splitting exponentials. It takes an already-constructed block encoding and manipulates its spectrum via a reflection. The block encoding itself may have been built by Trotter, LCU, or any other method; qubitisation then operates on it.
"Optimal means fastest in practice." "Optimal" here means optimal asymptotic scaling. The constants hidden in the O(\cdot) notation can be substantial, and a Trotter algorithm with small constants sometimes wins on concrete examples where the asymptotic regime hasn't kicked in. The 2024 line of work comparing qubitisation to Trotter-based methods found qubitisation winning only once \tau = \alpha t exceeds a few hundred — fine for chemistry, but not automatic.
"Block encoding works for any operator." Only for operators with \|A\| \leq \alpha. If your operator has arbitrarily large norm, rescale with \alpha. The rescaling appears as a multiplicative factor in the subsequent gate count, so you don't want \alpha much larger than necessary.
"Qubitisation requires fault tolerance." The algorithm is expressed as a circuit, and circuits need error correction to run reliably — so yes, useful qubitisation applications need fault-tolerant hardware. Small-scale demos on NISQ devices exist (1–2 ancilla qubits, few-qubit Hamiltonians) but the asymptotic advantages kick in only at scales that require error correction.
"QSP is a separate algorithm from qubitisation." They are usually presented together because QSP is what you do after qubitising to get the optimal gate count. Qubitisation gives you the walk operator; QSP processes it into the function of H you actually want (e^{-iHt}, H^{-1}, \mathrm{sign}(H - E_0), etc.). In recent papers the whole package is called "QSVT" (quantum singular value transformation).

Going deeper

If you came here for the big picture — block encoding puts non-unitary operators inside unitaries, and qubitisation makes the spectrum visible as walk-operator phases — you have it. What follows is the formal block-encoding definition with the \epsilon tolerance, Low-Chuang's proof sketch that the walk operator genuinely rotates by 2\theta_\lambda, a one-paragraph sketch of quantum signal processing, the Szegedy-walk connection that motivated the name "qubitisation", and a comparison of qubitisation to Trotter in practice for small systems.

The formal block-encoding definition

A unitary U on a + n qubits is an (\alpha, a, \epsilon)-block encoding of an n-qubit operator A if

\bigl\|A - \alpha\,(\langle 0|^{\otimes a}_a \otimes I_d)\,U\,(|0\rangle^{\otimes a}_a \otimes I_d)\bigr\|_{\mathrm{op}} \;\leq\; \epsilon.

Here \|\cdot\|_{\mathrm{op}} is the operator norm (largest singular value). The (\alpha, a, \epsilon) tuple is standard: \alpha is the normalisation, a the ancilla count, \epsilon the precision.

Properties preserved: unitaries compose, block encodings compose with specific rules. If U_A is an (\alpha_A, a_A, \epsilon_A)-block encoding of A and U_B is an (\alpha_B, a_B, \epsilon_B)-block encoding of B, then there is a circuit producing an (\alpha_A + \alpha_B, a_A + a_B + 1, \alpha_A\epsilon_B + \alpha_B\epsilon_A)-block encoding of A + B (LCU construction) and an (\alpha_A\alpha_B, a_A + a_B, \alpha_A\epsilon_B + \alpha_B\epsilon_A)-block encoding of AB (sequential composition). These compositional rules are what make the framework scale.

The proof sketch — why W rotates by 2\theta_\lambda

Low and Chuang [1] prove the qubitisation rotation theorem by explicit calculation in the 2D invariant subspace. Start with |g_\lambda\rangle = |0\rangle_a|\lambda\rangle_d as before. The block encoding gives U|g_\lambda\rangle = (\lambda/\alpha)|g_\lambda\rangle + \sqrt{1 - (\lambda/\alpha)^2}\,|b_\lambda\rangle. The key additional claim is U|b_\lambda\rangle = -\sqrt{1 - (\lambda/\alpha)^2}\,|g_\lambda\rangle + (\lambda/\alpha)|b_\lambda\rangle — the 2D plane \{|g_\lambda\rangle, |b_\lambda\rangle\} is invariant under U.

Proving this requires using the hermiticity of H (or more generally the fact that U came from a Hermitian block encoding with a specific structure called the "qubitisation property"). The argument is about six lines of linear algebra: apply U^\dagger to both sides of the block-encoding identity, use \lambda\langle\lambda|'s hermiticity, and identify the relevant component. The full proof is in Low-Chuang 2017, §2.

Once the 2D plane is invariant, the matrix of U restricted to it (in the \{|g_\lambda\rangle, |b_\lambda\rangle\} basis) is \begin{pmatrix}\lambda/\alpha & -\sqrt{1-(\lambda/\alpha)^2} \\ \sqrt{1-(\lambda/\alpha)^2} & \lambda/\alpha\end{pmatrix}. The reflection R_a acts as \begin{pmatrix}1 & 0 \\ 0 & -1\end{pmatrix} (identity on |g_\lambda\rangle which has ancilla in |0\rangle; sign flip on |b_\lambda\rangle which does not). Their product W = R_a U is the rotation matrix \begin{pmatrix}\cos 2\theta_\lambda & -\sin 2\theta_\lambda \\ \sin 2\theta_\lambda & \cos 2\theta_\lambda\end{pmatrix} with \cos\theta_\lambda = \lambda/\alpha. The factor of 2 in the angle comes from composing two reflections (the R_a implicit in U being a block-encoding, plus the explicit R_a in W).

Quantum signal processing — what it does

Given a walk operator W whose spectrum is \{e^{\pm i\phi_\lambda}\}, QSP builds a new unitary U_P whose action on the good subspace applies a specified polynomial P(\lambda/\alpha) = P(\cos(\phi_\lambda/2)) to each eigenvalue. The construction is:

Interleave W with single-qubit rotations R_z(\phi_k) on the ancilla, for a sequence of chosen phases \phi_1, \phi_2, \ldots, \phi_d.
The resulting unitary block-encodes the degree-d polynomial P(x) = \sum_k c_k T_k(x) (Chebyshev expansion) where the coefficients depend on the phase sequence.

For e^{-iHt}: choose P to approximate \cos(\tau t x) - i\sin(\tau t x) where x = \lambda/\alpha. The degree needed for error \epsilon is d = O(\alpha t + \log(1/\epsilon)). Phase-sequence computation is done classically and is efficient.

For A^{-1}: choose P(x) \approx 1/x (with appropriate regularisation avoiding x = 0). Degree scales with condition number.

The framework unifies most known quantum algorithms into one: given a block encoding of the relevant operator, choose the polynomial, compute the phase sequence, run the interleaved circuit. Shor, Grover, HHL, amplitude amplification, amplitude estimation, Hamiltonian simulation, ground-state preparation — all fit.

The Szegedy-walk connection

The name "qubitisation" was coined by Low and Chuang to emphasise that the 2D invariant subspaces are qubit-like systems. Mario Szegedy's 2004 work on quantum walks had already shown a related structure: classical Markov chains lift to quantum walks whose spectrum is related to the classical transition matrix's spectrum via a similar \arccos transform. Qubitisation is the Hamiltonian-simulation analogue of the Szegedy walk, using block encoding in place of the Markov-chain transition matrix.

The practical upshot: many results originally proved for Szegedy walks (hitting times, mixing times, search problems) translate directly to qubitised Hamiltonian simulation. Szegedy-walk-based search (Magniez-Nayak-Roland-Santha 2007) is a cousin of qubitised spectrum estimation — the same underlying machinery, different application.

Qubitisation versus Trotter in practice

Trotter-based algorithms have good constants and are simple to implement. For small systems (n \lesssim 30 qubits, \tau \lesssim 100), Trotter can beat qubitisation concretely because the \log and \log\log factors in qubitisation come with non-trivial multiplicative constants from the block-encoding overhead. A 2021 benchmark by Childs-Maslov-Nam-Ross-Su on specific chemistry Hamiltonians found Trotter and qubitisation within a small constant factor up to \tau \sim 10^3; qubitisation wins unambiguously only for \tau \gtrsim 10^4, corresponding to larger molecules and tighter error bars.

For this reason, production quantum-chemistry software (Microsoft Azure Quantum, Querion Cirq, Compustar Qiskit Chemistry) maintains both Trotter and qubitisation modes, selecting per problem instance. The "right" algorithm is not always qubitisation — but qubitisation is always the asymptotic champion, and everything else is legacy.

Applications in chemistry — the FeMoco estimates

Reiher-Wiebe-Svore-Wecker-Troyer 2017 estimated \sim 10^{14} T-gates for a FeMoco ground-state calculation to chemical accuracy, using a Trotter-based algorithm. That number made the application look hopeless for any machine of the next century. Subsequent work brought it down by ~5 orders of magnitude:

Babbush-Berry-McClean-Neven 2018 — first qubitisation-based estimates, ~10^{12} T-gates.
von Burg-Tezuka et al. 2021 — double-factorised block encodings, ~10^{10} T-gates for FeMoco.
Lee-Berry et al. 2021 — tensor-hypercontraction block encodings, ~10^{9} T-gates.

Each improvement comes from finding a better block encoding with smaller \alpha or smaller ancilla count, then running the same qubitisation + QSP pipeline. The algorithm is fixed; the science is in the block encoding.

These numbers may still come down further. Useful fault-tolerant FeMoco simulation is a target for the late 2020s to early 2030s — when sufficient logical qubit counts arrive, the block encoding + qubitisation machinery is ready to run on them.

Where this leads next

Linear Combination of Unitaries — chapter 133, the first block-encoding construction and the LCU prehistory of qubitisation.
Trotter-Suzuki Decomposition — chapter 131, the older Hamiltonian-simulation paradigm that qubitisation supersedes asymptotically.
Quantum Phase Estimation — the subroutine that, applied to W, extracts H's spectrum.
Feynman Revisited — chapter 129, the big-picture framing for quantum simulation.
HHL Algorithm — solving linear systems via QSP on a block encoding of A^\dagger A.
Simulating Chemistry — chapter 135, where qubitisation meets fermionic Hamiltonians.

References

Guang Hao Low and Isaac L. Chuang, Hamiltonian Simulation by Qubitization (2017) — arXiv:1610.06546.
András Gilyén, Yuan Su, Guang Hao Low, Nathan Wiebe, Quantum singular value transformation and beyond (2019) — arXiv:1806.01838.
Guang Hao Low, Hamiltonian Simulation with Nearly Optimal Dependence on Spectral Norm (2019) — arXiv:1807.03967.
John Preskill, Lecture Notes on Quantum Computation, Chapter 5 — theory.caltech.edu/~preskill/ph229.
Wikipedia, Quantum signal processing.
Nielsen and Chuang, Quantum Computation and Quantum Information, §4.7 — Cambridge University Press.