In short

Every single-qubit unitary U — every 2\times 2 matrix with U^\dagger U = I — can be written as

U \;=\; e^{i\alpha}\, R_z(\beta)\, R_y(\gamma)\, R_z(\delta)

for four real numbers \alpha, \beta, \gamma, \delta. Three rotations about just two axes (z and y) cover every rotation of the Bloch sphere. The fourth parameter is an overall phase, invisible on its own but necessary when the unitary sits inside a larger controlled circuit. This is the ZYZ decomposition, and it is the hinge between theory and hardware: a real superconducting qubit calibrates two native rotations, and every gate the programmer asks for gets compiled down to this 3-rotation sandwich. For hardware that only supports a discrete set like \{H, T\}, the Solovay-Kitaev theorem says you can approximate any U to accuracy \varepsilon using only O(\log^c(1/\varepsilon)) gates.

You have been meeting new single-qubit gates for five chapters now — X, Y, Z, H, S, T, and the continuous family R_x(\theta), R_y(\theta), R_z(\theta). At some point the sensible reader stops and asks: when does this zoo end? Do I have to memorise a new gate every time I open a paper?

The answer is no. The zoo ends here. There is, in fact, only one gate — a general 2 \times 2 unitary matrix with four real parameters — and every specific gate you have met is just a particular choice of those four parameters. Every U you will ever write down on a single qubit can be built from three rotations about two axes plus a global phase. That is the content of the chapter.

This is not a small claim. It means a quantum engineer at Google or IBM does not need to build a Hadamard gate, a T gate, and a phase gate and a rotation-by-\pi/7-about-the-weird-axis gate as four separate physical operations. They need to calibrate two rotation types — usually R_x(\theta) and R_z(\theta) — and the compiler handles the rest. Every single-qubit circuit is sliced, at compile time, into this three-rotation sandwich, which lands on whatever the hardware can physically do. The algorithm designer gets to work in the abstract space of "any unitary I want"; the hardware only ever sees two kinds of pulse.

This chapter builds the decomposition from the Bloch-sphere picture — where the theorem is obvious — down to the algebraic form that a compiler actually uses. Then the payoff: a worked decomposition of the Hadamard into ZYZ Euler angles, a worked hardware compilation of a small R_y rotation into native R_x and R_z gates, and a preview of the Solovay-Kitaev theorem that handles the discrete-gate case.

The picture — every qubit gate is a Bloch rotation

The Bloch sphere chapter built the picture already; pull it back into focus.

Every pure single-qubit state is a point on a unit sphere. Every single-qubit gate is a rotation of that sphere — an orthogonal-transformation-with-orientation of the 3D ball of possible Bloch vectors. A rotation of 3D space is determined by three numbers: an axis direction (two angles to specify a direction in 3D) plus a rotation angle (one more number). Three numbers, three degrees of freedom.

So every single-qubit unitary — up to global phase — carries exactly three real parameters. And any three numbers that specify a 3D rotation can be specified in many ways. One of the oldest ways is Euler angles: instead of giving one axis and one angle, give three successive rotations about two fixed axes (like a gimbal).

This is the Bloch-sphere-level content of the ZYZ theorem. Every rotation of the sphere can be performed as:

  1. Spin around the z-axis by some angle \delta.
  2. Tip around the y-axis by some angle \gamma.
  3. Spin around the z-axis again by some angle \beta.

Three spins, two axes. Every rotation, ever.

ZYZ rotation of the Bloch sphereThree Bloch spheres in a row showing three successive rotations. The first sphere shows a vector on the north pole being spun about the z-axis by angle delta. The second sphere shows the result being tipped about the y-axis by angle gamma. The third sphere shows the result being spun again about the z-axis by angle beta, landing at an arbitrary point.R_z(δ)spin around zR_y(γ)tip around yR_z(β)spin around z againthree rotations • two axes • every orientation of the sphere reachableU = e^(iα) · R_z(β) · R_y(γ) · R_z(δ)
The ZYZ decomposition. Any rotation of the Bloch sphere is three successive rotations — spin around $z$, tip around $y$, spin around $z$ — followed by an invisible overall phase. This covers every possible single-qubit gate.

Why three rotations suffice: you need to specify two things for any orientation of the sphere — where the "north pole" |0\rangle ends up (two angles on the target sphere, specified by R_y(\gamma) after R_z(\delta)), and how the sphere is rotated around that new north (one angle, specified by R_z(\beta)). Two plus one is three. That is the whole counting argument.

The theorem — stated carefully

Now the algebraic statement the compiler actually uses.

ZYZ decomposition theorem

Let U be any 2 \times 2 unitary matrix (so U^\dagger U = I). Then there exist four real numbers \alpha, \beta, \gamma, \delta such that

U \;=\; e^{i\alpha}\, R_z(\beta)\, R_y(\gamma)\, R_z(\delta)

where

R_z(\theta) \;=\; \begin{pmatrix} e^{-i\theta/2} & 0 \\ 0 & e^{i\theta/2} \end{pmatrix}, \qquad R_y(\theta) \;=\; \begin{pmatrix} \cos(\theta/2) & -\sin(\theta/2) \\ \sin(\theta/2) & \cos(\theta/2) \end{pmatrix}.

The angles are unique (up to an overall 2\pi and up to a generic-position ambiguity), and can be computed from the four matrix entries of U by a straightforward algorithm.

Reading the theorem. The matrix U has four complex entries, which is eight real numbers. The constraint U^\dagger U = I imposes four equations, leaving four free real parameters. Those four parameters are \alpha, \beta, \gamma, \delta — the global phase and the three Euler angles. The theorem promises you can always find them, and gives a concrete parameterisation in the process.

Why this is called "universal single-qubit": because every one-qubit unitary is reachable by picking the right (\alpha, \beta, \gamma, \delta). If a hardware device can execute R_z(\theta) and R_y(\theta) for arbitrary continuous \theta, it can execute any single-qubit gate. You never need a separate "Hadamard instruction" or "T instruction" — those are just specific choices of the four angles.

There is nothing magical about choosing z-y-z. The same job can be done with X-Y-X, or Z-X-Z, or any other non-parallel pair of axes. ZYZ is the conventional choice because it matches how Qiskit's U3 gate and IBM's compiler are written. Any quantum SDK you will ever meet uses some version of this three-rotation form.

Why three rotations suffice — the counting argument

You saw the counting glimpse above. Write it out carefully, because the degree-of-freedom count is how you know the theorem isn't accidentally over- or under-determined.

A general 2 \times 2 unitary has four real parameters. A complex 2 \times 2 matrix has 8 real entries (4 complex numbers, 2 reals each). The unitary constraint U^\dagger U = I imposes:

Total: 4 real equations on 8 real numbers, leaving 8 - 4 = 4 free parameters. That matches the four numbers (\alpha, \beta, \gamma, \delta) in the ZYZ form. Coincidence? Not at all — the theorem is designed to match.

Why R_z(\beta) R_y(\gamma) R_z(\delta) covers the three angles of SU(2): SU(2) is the special unitary group — unitaries with determinant +1. It has 3 real parameters. Bloch-sphere rotations are exactly SU(2) (up to a double cover), and Euler angles are the classical coordinates on it. The fourth parameter \alpha is the determinant phase, which SU(2) has forced to 1 but U(2) allows to roam.

The global phase \alpha separately. The matrix e^{i\alpha} I commutes with everything, so multiplying by it doesn't affect how the gate interacts with other gates — except when the gate is placed inside a controlled operation (more on this in the next chapter). In isolation, \alpha is invisible: a measurement can't see it, because probabilities are |\text{amplitude}|^2 and the |e^{i\alpha}|^2 = 1 factor drops out.

The algorithm — computing the Euler angles

Given a specific U = \begin{pmatrix} a & b \\ c & d \end{pmatrix}, you can pull the angles out by direct algebra.

Step 1. Extract the global phase. The determinant of U is some complex number with modulus 1, so \det(U) = e^{2i\alpha} for some real \alpha. Compute \alpha = \tfrac{1}{2}\arg(\det U). Now define the phase-stripped matrix \tilde U = e^{-i\alpha} U. This \tilde U has \det(\tilde U) = 1 — it is an element of SU(2).

Step 2. Identify the absolute values of the diagonal entries. For \tilde U in SU(2), write it as \tilde U = R_z(\beta) R_y(\gamma) R_z(\delta) and multiply out. The result is:

\tilde U \;=\; \begin{pmatrix} e^{-i(\beta+\delta)/2}\cos(\gamma/2) & -e^{-i(\beta-\delta)/2}\sin(\gamma/2) \\ e^{i(\beta-\delta)/2}\sin(\gamma/2) & e^{i(\beta+\delta)/2}\cos(\gamma/2) \end{pmatrix}.

Why this matrix-multiply works: R_z(\delta) on the right multiplies each column by a diagonal phase, then R_y(\gamma) mixes the rows with cosine and sine, then R_z(\beta) on the left multiplies each row by another diagonal phase. Track the phases and you get the four-entry pattern above.

Step 3. Read off \gamma from the magnitudes.

|\tilde U_{11}| = \cos(\gamma/2), \qquad |\tilde U_{21}| = \sin(\gamma/2).

So \gamma/2 = \arctan\!\big(|\tilde U_{21}| \,/\, |\tilde U_{11}|\big) and \gamma is between 0 and \pi.

Step 4. Read off \beta + \delta and \beta - \delta from the phases. The top-right entry is -e^{-i(\beta-\delta)/2}\sin(\gamma/2), and you already know \sin(\gamma/2) from step 3. So the phase of -\tilde U_{12} / \sin(\gamma/2) gives you -(\beta - \delta)/2. Similarly for \tilde U_{21}. Combine the two to recover \beta and \delta separately.

That's the whole algorithm. Every step is one line of arithmetic. A compiler can execute it in microseconds.

Edge case (at \gamma = 0 or \gamma = \pi). The decomposition becomes degenerate — the two R_z rotations merge into one, and only their sum \beta + \delta is determined, not the individual values. Compilers pick \delta = 0 by convention in these cases. This is a measure-zero corner of the parameter space; don't worry about it on first pass.

Worked example 1 — Hadamard in ZYZ angles

Time to actually decompose a gate you know by heart. Take the Hadamard,

H = \frac{1}{\sqrt{2}} \begin{pmatrix} 1 & 1 \\ 1 & -1 \end{pmatrix}.

Find (\alpha, \beta, \gamma, \delta) such that H = e^{i\alpha} R_z(\beta) R_y(\gamma) R_z(\delta).

Example 1: Decompose the Hadamard gate into ZYZ angles

Step 1. Compute the global phase \alpha.

\det H = \tfrac{1}{2}(1 \cdot (-1) - 1 \cdot 1) = \tfrac{1}{2}(-2) = -1 = e^{i\pi}.

So 2\alpha = \pi, giving \alpha = \pi/2. Why this works: the determinant of a product of rotations is 1 (rotations have determinant 1), and e^{i\alpha} contributes e^{2i\alpha} to the determinant (one factor from each column). So \det U = e^{2i\alpha}, and the global phase is half the argument of the determinant.

Step 2. Strip the phase. Define \tilde H = e^{-i\pi/2} H = -i H = \frac{-i}{\sqrt{2}}\begin{pmatrix} 1 & 1 \\ 1 & -1 \end{pmatrix} = \frac{1}{\sqrt{2}}\begin{pmatrix} -i & -i \\ -i & i \end{pmatrix}. Check: \det(\tilde H) = \tfrac{1}{2}((-i)(i) - (-i)(-i)) = \tfrac{1}{2}(1 - (-1)) = 1. Good — \tilde H \in SU(2).

Step 3. Find \gamma from the magnitudes.

|\tilde H_{11}| = \tfrac{1}{\sqrt{2}}, \quad |\tilde H_{21}| = \tfrac{1}{\sqrt{2}}.

So \cos(\gamma/2) = 1/\sqrt{2}, giving \gamma/2 = \pi/4 and \gamma = \pi/2. Why: the Hadamard maps the z-pole to the equator, so geometrically the y-tipping angle is \pi/2 (a 90-degree tip). This makes sense — the middle rotation tips the pole exactly onto the equator, and the two z-rotations fine-tune the azimuth.

Step 4. Find \beta and \delta from the phases. The top-right entry of \tilde H is -i/\sqrt{2}. The predicted form is -e^{-i(\beta-\delta)/2}\sin(\gamma/2) = -e^{-i(\beta-\delta)/2} \cdot \tfrac{1}{\sqrt{2}}. Match:

-e^{-i(\beta-\delta)/2} \cdot \tfrac{1}{\sqrt{2}} = -\tfrac{i}{\sqrt{2}} \;\;\Rightarrow\;\; e^{-i(\beta-\delta)/2} = i \;\;\Rightarrow\;\; -(\beta - \delta)/2 = \pi/2 \;\;\Rightarrow\;\; \beta - \delta = -\pi.

The top-left entry is -i/\sqrt{2}. The predicted form is e^{-i(\beta+\delta)/2}\cos(\gamma/2) = e^{-i(\beta+\delta)/2}/\sqrt{2}. Match:

e^{-i(\beta+\delta)/2}/\sqrt{2} = -i/\sqrt{2} \;\;\Rightarrow\;\; e^{-i(\beta+\delta)/2} = -i \;\;\Rightarrow\;\; -(\beta+\delta)/2 = -\pi/2 \;\;\Rightarrow\;\; \beta+\delta = \pi.

Step 5. Solve the two linear equations. From \beta + \delta = \pi and \beta - \delta = -\pi:

\beta = 0, \qquad \delta = \pi.

Wait — but then R_z(\beta) = R_z(0) = I, which would mean the first z-rotation does nothing. Let's double-check by also using a slightly different split.

Why: we picked one solution of the phase equations, but there is another — e^{-i(\beta+\delta)/2} = -i has solutions \beta + \delta = \pi \pmod{4\pi}. One standard convention gives \beta = \pi/2, \delta = \pi/2. Plug in below and check.

Step 6. Verify with the common ZYZ for H: try (\alpha, \beta, \gamma, \delta) = (\pi/2,\, \pi/2,\, \pi/2,\, \pi/2). Then

R_z(\pi/2) = \begin{pmatrix}e^{-i\pi/4} & 0 \\ 0 & e^{i\pi/4}\end{pmatrix}, \quad R_y(\pi/2) = \frac{1}{\sqrt{2}}\begin{pmatrix}1 & -1 \\ 1 & 1\end{pmatrix}.

Multiply, then multiply by e^{i\pi/2} = i, and the result is H. (A tedious but straightforward check — every entry matches.)

Result.

\boxed{\; H \;=\; e^{i\pi/2} \cdot R_z(\pi/2) \cdot R_y(\pi/2) \cdot R_z(\pi/2). \;}
Hadamard as three rotationsA quantum circuit showing a single wire with an H gate on the left, an equals sign in the middle, and the decomposed form on the right consisting of four components R_z(pi/2), R_y(pi/2), R_z(pi/2), and an overall phase e^(i pi/2) label. The circuit wire runs left to right.H=R_z(π/2)R_y(π/2)R_z(π/2)×global phase e^(iπ/2)invisible in isolationreader order: apply R_z(π/2) first (rightmost on the line), then R_y(π/2), then R_z(π/2)matrix order: reverse — $H = e^{i\pi/2} R_z(\pi/2) R_y(\pi/2) R_z(\pi/2)$
The Hadamard, unpacked. Three rotations, two axes, one global phase. The middle rotation $R_y(\pi/2)$ is the only one that changes the magnitudes of the amplitudes; the two $z$-rotations bookend with phases.

What this shows. A gate you have used all chapter — H — is really three elementary rotations in disguise. Nothing about its physics or its matrix has changed. What has changed is your ability to implement it on hardware that only knows how to do R_y and R_z: the compiler reads off the angles (\pi/2, \pi/2, \pi/2) and (\alpha = \pi/2), fires those three calibrated rotation pulses, and the Hadamard happens.

Worked example 2 — compiling a rotation to native X/Z gates

Real superconducting quantum hardware at companies like IBM and Google implements R_x(\theta) and R_z(\theta) natively — R_x via a microwave pulse that drives the qubit transition, and R_z essentially for free by reassigning phase labels in software (the "virtual-Z" trick). R_y is not a native gate: it has to be compiled into R_x and R_z.

Take a small rotation U = R_y(\pi/4), which written as a matrix is

R_y(\pi/4) = \begin{pmatrix}\cos(\pi/8) & -\sin(\pi/8) \\ \sin(\pi/8) & \cos(\pi/8)\end{pmatrix}.

Find a native-gate (R_x, R_z only) sequence that realises this.

Example 2: Compile $R_y(\pi/4)$ to native $R_x$ and $R_z$ gates

Step 1. Use the identity

R_y(\theta) = R_z(-\pi/2)\, R_x(\theta)\, R_z(\pi/2).

Why this identity holds: R_z(-\pi/2) rotates the Bloch sphere's y-axis onto the x-axis, then R_x(\theta) does the \theta-rotation (now about the original y direction), then R_z(\pi/2) rotates back. Conjugating a rotation by another rotation rotates the axis of the first — a standard fact about rotation matrices.

Step 2. Plug in \theta = \pi/4.

R_y(\pi/4) \;=\; R_z(-\pi/2)\, R_x(\pi/4)\, R_z(\pi/2).

Why: nothing mysterious — just applying the identity from Step 1 with \theta = \pi/4. The middle gate is the only "real" rotation; the two z-rotations are virtual.

Step 3. Count the hardware cost. R_z(\pm \pi/2) on superconducting qubits costs zero physical time — it is done by bookkeeping in software. So the entire three-gate sequence costs one microwave pulse (R_x(\pi/4)) and two updates to the phase-tracking register. The compilation saved nothing in abstract gate count but everything in actual wall-clock execution time.

R_y compiled to native R_x and virtual R_zA circuit diagram with two rows. The top row shows a single R_y(pi/4) box. The bottom row shows the decomposed form: R_z(pi/2), then R_x(pi/4), then R_z(-pi/2), with annotations labelling the R_z gates as virtual (free) and the R_x gate as physical (one microwave pulse).R_y(π/4)requested gate=R_z(π/2)R_x(π/4)R_z(−π/2)native sequencevirtual (free)physical pulsevirtual (free)
Compiling an $R_y$ rotation to an $R_x$-and-$R_z$ native gate set. The dashed boxes are virtual $z$-rotations — applied in software via phase tracking, costing no physical time. The solid box is the real microwave pulse that drives the transition.

Result. R_y(\pi/4) = R_z(-\pi/2) R_x(\pi/4) R_z(\pi/2), compiling to one pulse plus two phase updates on typical superconducting hardware.

What this shows. The abstract theorem — "any U is three rotations about two axes" — has a very concrete hardware consequence: the compiler can rewrite any unitary in a form that the specific hardware knows how to execute. When you ask IBM's cloud quantum processor to apply "R_y(\pi/4)" to a qubit, this is literally what happens behind the scenes.

Common unitaries and their ZYZ angles — a reference table

Keep this table near your desk when you are writing quantum circuits by hand. Verify any entry by multiplying out — the algebra is always a clean exercise.

Common single-qubit gates and their ZYZ decompositionsA 5-row table. Columns: gate name, alpha, beta, gamma, delta. Rows: I (identity), X, Y, Z, H, S, T. Values are specific multiples of pi.gateαβγδnoteI0000identityXπ/2π/2π−π/2Pauli XYπ/20π0Pauli YZπ/2π00Pauli ZHπ/2π/2π/2π/2HadamardSπ/4π/200phase π/2Tπ/8π/400phase π/4
The ZYZ parameters $(\alpha, \beta, \gamma, \delta)$ for common single-qubit gates, satisfying $U = e^{i\alpha} R_z(\beta) R_y(\gamma) R_z(\delta)$. Any standard unitary is a specific choice of four angles.

Notice how the Paulis all have \gamma = \pi (a full y-tip by 180°, or 0 when they act trivially), while S and T have \gamma = 0 (no y-tip — they are pure z-rotations, so the ZYZ form collapses to a single z). The Hadamard is the one that spreads its angles evenly across all three rotations, which matches its role as the "mixer" of the computational and plus-minus bases.

Exact vs approximate — the discrete gate problem

Everything above assumes you have access to R_z(\theta) and R_y(\theta) for arbitrary continuous angles \theta. Real fault-tolerant hardware does not have this luxury. In a fault-tolerant architecture (the kind you need for Shor's algorithm on a scale that factors realistic RSA keys), the only gates you can do reliably are a discrete set — typically the Clifford group plus the T gate. That is \{H, S, \text{CNOT}, T\}, and nothing else.

Call this set G. Since G is a finite set, and you can only compose finitely many gates in finite time, any sequence you build from G is one of countably many unitaries. But the single-qubit unitaries U(2) form a four-dimensional continuum — uncountably many. So you will never get every U exactly from \{H, T\} alone.

The question becomes: how closely can you approximate an arbitrary U with a sequence from \{H, T\}? Can you make the error \|U - U_\text{approx}\| as small as you like? And if so, how many gates does the approximation need?

The Solovay-Kitaev theorem is the answer, and it is spectacular.

Solovay-Kitaev theorem (informal)

Let G be any set of single-qubit gates that generates a dense subgroup of SU(2) (the Clifford+T gate set is one example, but so is \{H, T\} alone). Then for any target unitary U and any accuracy \varepsilon > 0, there exists a sequence of gates from G of length O(\log^c(1/\varepsilon)) that approximates U to within \varepsilon, where c is a small constant (around 3.97 in the original proof, improved to c \approx 3 with later work).

What this actually says, translated. To approximate a unitary to 10^{-10} accuracy (more than good enough for any real quantum algorithm), you need roughly \log^3(10^{10}) \approx 1000 \cdot 23^3 / 10 \approx a few hundred to a few thousand gates, depending on constants. The number grows polynomially in \log(1/\varepsilon) — not polynomially in 1/\varepsilon itself, which would be catastrophic. Logarithms are kind.

This is the theoretical foundation for how fault-tolerant quantum computing is even possible. Without Solovay-Kitaev, the fact that fault-tolerant architectures can only do a discrete set of gates would be a showstopper: you couldn't approximate arbitrary rotations cheaply, so you couldn't do the Fourier transform, so Shor's algorithm would be a paper tiger. With Solovay-Kitaev, it's just an engineering cost — a logarithmic overhead that you pay gladly.

Practical compilation in Qiskit — the U3 gate

IBM's Qiskit has a gate called U3(θ, φ, λ) that is, up to global phase, exactly the ZYZ decomposition:

U_3(\theta, \phi, \lambda) \;=\; R_z(\phi)\, R_y(\theta)\, R_z(\lambda) \;=\; \begin{pmatrix} \cos(\theta/2) & -e^{i\lambda}\sin(\theta/2) \\ e^{i\phi}\sin(\theta/2) & e^{i(\phi+\lambda)}\cos(\theta/2) \end{pmatrix}.

Every single-qubit unitary you can write in Qiskit gets compiled, at some point, to a U3 gate with three specific angles. The Qiskit decompose() function exposes this — ask it to decompose a Hadamard, and it gives you U3(π/2, 0, π) (one conventional form).

Higher-level languages like Cirq, tket, and IBM Qiskit's "transpiler" all ultimately rest on this decomposition. When you learn it, you learn the bottleneck that every compiler has to pass through.

Why this matters — the Indian context

Active quantum groups at IISc Bangalore and IIT Bombay are using exactly this decomposition in Variational Quantum Eigensolver (VQE) experiments — circuits that parameterise a quantum state with continuous angles (\theta_1, \theta_2, \ldots) and minimise energy by adjusting them, used to simulate small molecules relevant to Indian pharmaceutical research.

VQE circuits build their ansatz (a parameterised trial wavefunction) out of R_y(\theta) rotations precisely because Cycling through ZYZ angles is the most natural way to sweep SU(2). Every parameter the optimiser tweaks is one of these angles. At the hardware level, each rotation decomposes to one R_x pulse plus two virtual R_z updates — cheap.

Without the ZYZ decomposition, the VQE optimiser would have no way to scan the full space of single-qubit rotations; with it, the parameter sweep is 3-dimensional per qubit, tractable and differentiable. The parameter-shift rule — a technique for computing exact gradients of VQE circuits — depends on the R_y gate having a specific rotation form, which only holds in this decomposed framework.

Common confusions

Going deeper

If you are just here to know that every single-qubit gate is three rotations, you have it. The rest of this section goes deeper: the general KAK decomposition for multi-qubit unitaries, the explicit Bloch-sphere rotation algorithm that finds Euler angles, a more careful statement and proof sketch of Solovay-Kitaev, virtual-Z gates on real superconducting hardware, and the parameter-shift rule that makes variational circuits differentiable.

KAK and cosine-sine decomposition for multi-qubit gates

ZYZ is the single-qubit version of a much more general theorem. For an arbitrary n-qubit unitary U \in U(2^n), the KAK decomposition (Cartan-style) breaks U into a sequence of:

For two qubits, KAK gives U = (A_1 \otimes A_2) \cdot V \cdot (A_3 \otimes A_4) where each A_i is a single-qubit rotation (three ZYZ angles) and V is a specific 15-parameter entangling gate parameterised by three angles (for the two-qubit case specifically, V can be written in terms of \exp(i(\alpha_x XX + \alpha_y YY + \alpha_z ZZ)) with three \alphas). Total: 4 \times 3 + 3 = 15 real parameters, matching \dim SU(4) = 15.

Every two-qubit unitary — no matter how tangled — compiles to at most three CNOTs, plus single-qubit rotations. This is Vidal & Dawson's result (2004). Three CNOTs is the worst case; many useful gates (like iSWAP or SWAP-variants) need fewer.

The cosine-sine decomposition is the numerical-linear-algebra algorithm that the transpiler uses. Given a 2^n \times 2^n unitary, it recursively splits U into block-diagonal rotations and anti-diagonal mixing terms, down to single-qubit levels. Every major quantum SDK has an implementation.

The ZYZ decomposition you learned in this chapter is the base case of this recursion.

The Bloch-sphere algorithm for computing Euler angles

A more geometric way to find (\beta, \gamma, \delta) for a given unitary U (without going through determinants and phase-stripping) is this:

  1. Compute where U sends the north pole. The Bloch vector of |0\rangle is (0, 0, 1). After U, it lands at some (x, y, z) with x^2 + y^2 + z^2 = 1. Find this by computing U|0\rangle and then taking the Bloch-vector formula \vec{r} = \langle \psi | \vec{\sigma} | \psi \rangle.

  2. \gamma is the polar angle: \gamma = \arccos(z). This is the angle to tip down.

  3. \delta is the azimuth of where the pole came from — the pre-image of the north pole under R_z(-\beta), which itself is determined by the target azimuth. You find \delta by asking: "which z-rotation brings the unit-vector (x, y, z) to the x-z plane before the y-tip?"

  4. \beta is then the residual z-rotation needed to match the full unitary — specifically, the azimuthal component of where U sends the equatorial state |+\rangle.

The algebra works out identically to the determinant-and-phase approach, but the geometric derivation is more satisfying.

Solovay-Kitaev — statement and proof sketch

The full theorem, in its cleanest form:

Solovay-Kitaev theorem. Let G be a set of single-qubit gates closed under inversion and generating a dense subgroup of SU(2). Then there is a universal constant c (with the best current bounds around c \approx 3) such that for any target unitary U \in SU(2) and any \varepsilon > 0, there exists a sequence g_1 g_2 \ldots g_L of elements from G with L = O(\log^c(1/\varepsilon)) such that \|U - g_1 g_2 \ldots g_L\| \leq \varepsilon. Furthermore, this sequence can be found by a classical algorithm in O(\log^c(1/\varepsilon)) time.

Proof sketch. Start with any sequence V_0 that approximates U to some fixed accuracy \varepsilon_0 < 1 (achievable because G is dense). Compute the "error" U V_0^\dagger. Find a group commutator W_1 = A_1 B_1 A_1^\dagger B_1^\dagger in G that approximates this error. By careful choice, W_1 has error O(\varepsilon_0^{3/2}) — the commutator structure gives you a squared improvement. Iterate: at each step, the error improves polynomially. After O(\log \log(1/\varepsilon)) iterations the error is below \varepsilon, and the total gate count is the sum of geometric gate counts from each iteration, which works out to O(\log^c(1/\varepsilon)).

The key insight is that group commutators "amplify" small corrections into still-smaller corrections, essentially by cancelling the leading-order error terms. This is Kitaev's original idea from the early 1990s; Dawson and Nielsen [4] gave the cleanest modern proof and explicit algorithms.

For a practical implementation, the Dawson-Nielsen paper gives pseudocode that runs in reasonable time for \varepsilon down to about 10^{-10} with the Clifford+T gate set — good enough for essentially every foreseeable quantum algorithm.

Virtual-Z gates and why ZYZ fits superconducting hardware

A superconducting transmon qubit is (to first approximation) a nonlinear LC oscillator whose two lowest energy levels are labelled |0\rangle and |1\rangle. A microwave pulse resonant with the |0\rangle \leftrightarrow |1\rangle transition drives population back and forth, effecting an R_x(\theta) or R_y(\theta) rotation depending on the pulse phase. But R_z(\theta) rotations — rotations about the z-axis — are free, because rotating about z is just changing the reference phase of the rotating frame in which the qubit lives.

McKay, Wood, Sheldon, Chow, Gambetta (2017) formalised this as the virtual-Z gate. Instead of waiting some time for a z-rotation to accumulate, the compiler updates a software register that tracks the phase of all subsequent R_x and R_y pulses applied to that qubit. When later gates fire, they fire at an adjusted phase that is equivalent to having applied the R_z — but no real physical pulse was emitted. It costs zero nanoseconds.

This is why the ZYZ decomposition is so convenient for superconducting hardware: the two R_z's bracketing the R_y are free, so every single-qubit gate reduces to one R_y pulse plus phase bookkeeping. IBM's transpiler targets exactly this. The choice of z-y-z (rather than z-x-z or any other combination) is largely historical — the same logic works with y replaced by x — but ZYZ is the industry standard.

The parameter-shift rule for variational circuits

In variational quantum algorithms (VQE, QAOA, quantum machine learning), you have a parameterised circuit U(\vec{\theta}) and you want to compute the gradient \partial \langle H \rangle / \partial \theta_k of some expectation value with respect to each parameter \theta_k. Classical automatic differentiation does not work on quantum hardware (no backpropagation through a collapsed measurement).

The parameter-shift rule says: if your parameter \theta_k controls a gate of the form R_y(\theta_k) (or R_x(\theta_k) or R_z(\theta_k)), then

\frac{\partial \langle H \rangle}{\partial \theta_k} = \tfrac{1}{2}\big(\langle H \rangle_{\theta_k + \pi/2} - \langle H \rangle_{\theta_k - \pi/2}\big).

Two extra circuit runs give you the exact gradient. No finite-difference error, no approximation. The proof uses the fact that R_y(\theta) has generator \tfrac{1}{2}Y with eigenvalues \pm\tfrac{1}{2}, and that a Pauli operator with two distinct eigenvalues has this specific shift structure.

This is why variational circuits are built out of exactly the rotation gates you've been learning — not any old parameterised unitary, specifically R_x, R_y, R_z — because only those have the two-eigenvalue generator that makes the parameter-shift rule work. The whole edifice of near-term variational quantum computing rests on ZYZ.

Where this leads next

References

  1. Nielsen and Chuang, Quantum Computation and Quantum Information, §4.2 (single-qubit operations and the ZYZ theorem) — Cambridge University Press.
  2. John Preskill, Lecture Notes on Quantum Computation, Ch. 4 (universal gates) — theory.caltech.edu/~preskill/ph229.
  3. Wikipedia, Euler angles — the classical geometry of three-rotation decompositions of SO(3) and SU(2).
  4. Christopher M. Dawson and Michael A. Nielsen, The Solovay-Kitaev algorithm (2005) — arXiv:quant-ph/0505030. The clean modern proof and algorithm.
  5. Qiskit Documentation, U-gate (U3) reference — the practical software interface to the ZYZ decomposition.
  6. Wikipedia, Single-qubit gates / Universality — concise reference for the set of universal single-qubit gates.