Phase Gates S and T — padho-wiki

Note: Company names, engineers, incidents, numbers, and scaling scenarios in this article are hypothetical — even when they resemble real ones. See the full disclaimer.

In short

The S gate is a 90° rotation of the Bloch sphere about the z-axis; the T gate is a 45° rotation about the same axis. Both are diagonal in the computational basis: S = \text{diag}(1, i) and T = \text{diag}(1, e^{i\pi/4}). S leaves |0\rangle alone and multiplies |1\rangle by i; T leaves |0\rangle alone and multiplies |1\rangle by e^{i\pi/4}. They are the fourth-root and eighth-root of Z: S^2 = Z, T^4 = Z, T^8 = I. Their inverses are distinct from themselves: S^\dagger = \text{diag}(1, -i), T^\dagger = \text{diag}(1, e^{-i\pi/4}). Together with H and CNOT, S generates the Clifford group — a set of gates that is powerful but classically simulable (Gottesman–Knill theorem). Adding T makes the gate set universal. That is why T is famously "expensive" on a real fault-tolerant quantum computer: it cannot be implemented directly from an error-corrected Clifford framework; it must be injected via magic-state distillation, a protocol that typically consumes hundreds of physical gates per single reliable T.

You have met the Pauli Z gate — the 180° rotation about the z-axis, which multiplies |1\rangle by -1 and leaves |0\rangle alone. A clean, self-inverse gate: Z twice is the identity. But 180° is a coarse move. What if you wanted a smaller rotation about the same axis — a 90° phase flip, or a 45° one?

That is exactly what S and T deliver. S is half of Z: rotate by only 90° about the z-axis and you have multiplied |1\rangle by i instead of by -1. Two S gates back-to-back complete the half-circle and give you Z. T is half of S: rotate 45° about z and you multiply |1\rangle by e^{i\pi/4}. Four Ts give Z, and eight Ts return to the identity.

That makes S the "square root of Z" and T the "fourth root of Z" — or equivalently, "the square root of S." They are the next two gates in the single-qubit zoo after the Paulis and the Hadamard, and they carry a weight disproportionate to their simple matrices. The T gate, in particular, is the single most consequential gate in fault-tolerant quantum computing theory: it is the one gate that escapes the Clifford group and unlocks the full universe of quantum algorithms — at a fault-tolerance cost that drives almost every resource estimate for quantum advantage.

This chapter lays out both gates across all four pictures — matrix, action on basis states, circuit symbol, and Bloch sphere — then explains the Clifford/non-Clifford distinction that makes T the load-bearing gate of the field.

The matrices and what they do

Write them down once:

S \;=\; \begin{pmatrix} 1 & 0 \\ 0 & i \end{pmatrix}, \qquad T \;=\; \begin{pmatrix} 1 & 0 \\ 0 & e^{i\pi/4} \end{pmatrix}.

Two diagonal 2\times 2 matrices. The top-left entry is 1 in both cases, so |0\rangle is untouched. The bottom-right entry is a complex phase — i = e^{i\pi/2} for S, and e^{i\pi/4} for T. That phase is what distinguishes them from the identity and from each other.

Why diagonal matrices represent phase-only rotations: a diagonal matrix \text{diag}(a, b) acts on \alpha|0\rangle + \beta|1\rangle by multiplying \alpha by a and \beta by b. If |a| = |b| = 1 (as here — they are phases), the probabilities |\alpha|^2 and |\beta|^2 are unchanged, only the phases shift. No bit-flip happens; this is pure phase manipulation.

The two phase gates. $S$ adds a $\pi/2$ phase (multiplication by $i$) to the amplitude of $|1\rangle$; $T$ adds a $\pi/4$ phase. Both are identities on $|0\rangle$. Two $S$s give $Z$; four $T$s give $Z$; eight $T$s return to the identity.

The other common names for these gates are the phase gate (S, sometimes denoted P(\pi/2)) and the \pi/8 gate (T). The "\pi/8" naming for T is a historical accident: T is sometimes written as \text{diag}(e^{-i\pi/8}, e^{i\pi/8}) — which differs from the form above only by a global phase of e^{-i\pi/8}. Since global phases are unobservable, both forms describe the same gate. In this track, we use the cleaner form T = \text{diag}(1, e^{i\pi/4}) throughout.

Action on the computational basis

Multiply S and T into |0\rangle and |1\rangle and you get:

S|0\rangle = |0\rangle, \qquad S|1\rangle = i|1\rangle.

T|0\rangle = |0\rangle, \qquad T|1\rangle = e^{i\pi/4}|1\rangle.

Why |0\rangle is untouched in both cases: the top-left entry of both matrices is 1. Multiplying any column vector that has 0 in its second slot by a diagonal matrix whose second entry is a phase leaves the first entry unchanged — and |0\rangle = (1, 0)^T has 0 in its second slot. So only the amplitude of |1\rangle ever picks up a phase.

On a single computational-basis state, S and T do nothing visible — the result is the same state multiplied by a global phase, which cannot be measured. You will only see the effect of S or T when the qubit is in a superposition, where the phase on |1\rangle becomes a relative phase against the amplitude on |0\rangle.

Action on |+⟩ and |−⟩

Apply S to |+\rangle = \tfrac{1}{\sqrt{2}}(|0\rangle + |1\rangle):

S|+\rangle = \tfrac{1}{\sqrt{2}}(S|0\rangle + S|1\rangle) = \tfrac{1}{\sqrt{2}}(|0\rangle + i|1\rangle) = |+i\rangle.

Why this is called |+i\rangle: this is the standard notation for the state on the +y axis of the Bloch sphere. It is the equator-state one-quarter-turn counter-clockwise from |+\rangle, with an i in the |1\rangle amplitude instead of a +1 or -1.

Apply S to |-\rangle:

S|-\rangle = \tfrac{1}{\sqrt{2}}(S|0\rangle - S|1\rangle) = \tfrac{1}{\sqrt{2}}(|0\rangle - i|1\rangle) = |-i\rangle.

So S rotates the equator by 90°: |+\rangle \to |+i\rangle, |-\rangle \to |-i\rangle. Four applications of S bring you all the way around the equator back to the start — which matches S^4 = I.

For T, the analogous calculation gives:

T|+\rangle = \tfrac{1}{\sqrt{2}}(|0\rangle + e^{i\pi/4}|1\rangle).

This is a state 45° along the equator, between |+\rangle and |+i\rangle. It does not have its own standard name — there is no standard letter for a 45° equator-state the way there is for |+\rangle, |-\rangle, |+i\rangle, |-i\rangle. But that is precisely why T is useful: it reaches states the Paulis and H and S cannot.

Picture: the Bloch sphere

Every diagonal 2\times 2 unitary whose top-left entry is 1 is a rotation about the z-axis on the Bloch sphere. S is a 90° rotation; T is a 45° rotation. The Pauli Z, for comparison, is the 180° rotation you already know.

Two points fall out immediately:

The poles are fixed points of both rotations. Any rotation about the z-axis leaves the north and south poles stationary, which matches the algebra: S|0\rangle = |0\rangle, T|0\rangle = |0\rangle, and |1\rangle is the same state as i|1\rangle or e^{i\pi/4}|1\rangle up to a global phase. Nothing observable changes on a pole.

Equator states rotate into other equator states. |+\rangle sits on the equator at the +x point; S takes it to the +y point (|+i\rangle); T takes it to the point halfway between. All points on the equator are connected by phase-only rotations, which is why those rotations never change the |0\rangle-vs-|1\rangle measurement outcome — only the relative phase between the two amplitudes, which is an equator position.

Picture: circuit symbols

In a quantum circuit, S and T are drawn as labelled boxes on a single wire — exactly like H, X, Y, Z.

Circuit notation for $S$ and $T$. A single labelled box on one wire. When you see a $T$ in a real circuit-decomposition report, pay attention: it is the most expensive gate in the diagram.

The inverses S^\dagger and T^\dagger are drawn the same way, with a dagger (\dagger) added to the label: S^\dagger is the box "S†", and T^\dagger is the box "T†". Both are valid gates in a circuit and are used just as often as their unadagged counterparts — a +90° rotation and a -90° rotation are equally useful building blocks.

Relationships between S, T, and Z

You have already seen the slogan — S is the square root of Z, T is the fourth root of Z. Verify it explicitly.

S^2 = Z. Multiply S by itself:

S^2 = \begin{pmatrix} 1 & 0 \\ 0 & i \end{pmatrix}\begin{pmatrix} 1 & 0 \\ 0 & i \end{pmatrix} = \begin{pmatrix} 1 & 0 \\ 0 & i^2 \end{pmatrix} = \begin{pmatrix} 1 & 0 \\ 0 & -1 \end{pmatrix} = Z.

Why this worked: multiplying two diagonal matrices just multiplies their diagonal entries. i \cdot i = i^2 = -1, which is precisely the bottom-right entry of Z. Two 90° rotations about the same axis combine into one 180° rotation about that axis — the algebra says the same thing.

T^2 = S. Multiply T by itself:

T^2 = \begin{pmatrix} 1 & 0 \\ 0 & e^{i\pi/4} \end{pmatrix}\begin{pmatrix} 1 & 0 \\ 0 & e^{i\pi/4} \end{pmatrix} = \begin{pmatrix} 1 & 0 \\ 0 & e^{i\pi/2} \end{pmatrix} = \begin{pmatrix} 1 & 0 \\ 0 & i \end{pmatrix} = S.

Why e^{i\pi/4} \cdot e^{i\pi/4} = e^{i\pi/2}: the exponents add — this is the defining rule of the complex exponential, e^a \cdot e^b = e^{a+b}. And e^{i\pi/2} = \cos(\pi/2) + i\sin(\pi/2) = 0 + i \cdot 1 = i, by Euler's formula.

Combining: T^4 = (T^2)^2 = S^2 = Z, and T^8 = Z^2 = I. So eight T gates in a row undo themselves. Geometrically: eight 45° rotations about z amount to a single 360° rotation, which is the identity.

T as the square root of S. Since T^2 = S, we say T = \sqrt{S}. And since S = \sqrt{Z}, we have T = \sqrt{\sqrt{Z}} = Z^{1/4}. T is the fourth root of the Pauli Z gate, just as S is the square root of it.

Inverses: S† and T†

Every unitary has an inverse (its conjugate transpose, U^\dagger), and for diagonal unitaries the inverse is easy: just conjugate each diagonal phase.

S^\dagger = \begin{pmatrix} 1 & 0 \\ 0 & -i \end{pmatrix}, \qquad T^\dagger = \begin{pmatrix} 1 & 0 \\ 0 & e^{-i\pi/4} \end{pmatrix}.

Why this is the inverse: the conjugate transpose of a diagonal matrix is the same matrix with each diagonal entry replaced by its complex conjugate. \overline{i} = -i (the conjugate of a + bi is a - bi, and the conjugate of i = 0 + 1i is 0 - 1i = -i). Similarly \overline{e^{i\pi/4}} = e^{-i\pi/4}. So S^\dagger rotates by -90° about z, and T^\dagger rotates by -45°.

Check that S \cdot S^\dagger = I explicitly:

S \cdot S^\dagger = \begin{pmatrix} 1 & 0 \\ 0 & i \end{pmatrix}\begin{pmatrix} 1 & 0 \\ 0 & -i \end{pmatrix} = \begin{pmatrix} 1 & 0 \\ 0 & i \cdot (-i) \end{pmatrix} = \begin{pmatrix} 1 & 0 \\ 0 & 1 \end{pmatrix} = I.

Why i \cdot (-i) = 1: by definition i^2 = -1, so i \cdot (-i) = -i^2 = -(-1) = 1. Equivalently, i = e^{i\pi/2} and -i = e^{-i\pi/2}, and e^{i\pi/2} \cdot e^{-i\pi/2} = e^0 = 1.

The critical thing to notice: S \neq S^\dagger and T \neq T^\dagger. Unlike the Paulis and H (which are their own inverses — they are Hermitian-and-unitary), S and T are not Hermitian. You must apply their daggered versions to undo them. This is a real source of bugs in student calculations and in circuit compilation pipelines: applying S twice gives you Z, not I.

Worked examples

Example 1: T acting on |+⟩

Compute T|+\rangle where |+\rangle = \tfrac{1}{\sqrt{2}}(|0\rangle + |1\rangle).

Step 1. Use linearity.

T|+\rangle = T\cdot\tfrac{1}{\sqrt{2}}(|0\rangle + |1\rangle) = \tfrac{1}{\sqrt{2}}(T|0\rangle + T|1\rangle).

Why linearity applies: every quantum gate is a linear operator. Applying T to a sum of kets is the sum of T applied to each ket. This is the property that lets all of quantum computation be written in terms of matrices.

Step 2. Substitute the action on basis states.

\tfrac{1}{\sqrt{2}}(T|0\rangle + T|1\rangle) = \tfrac{1}{\sqrt{2}}(|0\rangle + e^{i\pi/4}|1\rangle).

Why: T|0\rangle = |0\rangle (top-left entry of T is 1), and T|1\rangle = e^{i\pi/4}|1\rangle (bottom-right entry is e^{i\pi/4}). Substitute and factor.

Step 3. Write the result in closed form.

T|+\rangle = \tfrac{1}{\sqrt{2}}|0\rangle + \tfrac{e^{i\pi/4}}{\sqrt{2}}|1\rangle.

Step 4. Sanity-check the probabilities.

Probability of measuring 0: |1/\sqrt{2}|^2 = 1/2.
Probability of measuring 1: |e^{i\pi/4}/\sqrt{2}|^2 = |e^{i\pi/4}|^2 \cdot 1/2 = 1 \cdot 1/2 = 1/2.
Sum = 1. Correctly normalised. Why |e^{i\pi/4}|^2 = 1: any complex exponential e^{i\theta} has modulus 1 (its real part is \cos\theta, its imaginary part is \sin\theta, and \cos^2\theta + \sin^2\theta = 1). So a pure-phase factor never changes a probability.

Result. T|+\rangle = \tfrac{1}{\sqrt{2}}|0\rangle + \tfrac{e^{i\pi/4}}{\sqrt{2}}|1\rangle — a state with 50-50 measurement probabilities in the computational basis, but a specific relative phase e^{i\pi/4} between its two amplitudes. Geometrically, it sits on the Bloch equator at the point 45° counter-clockwise from |+\rangle, halfway to |+i\rangle.

Applying $T$ to $|+\rangle$ produces a new state on the Bloch equator, $45°$ from $|+\rangle$ on the way to $|+i\rangle$. The computational-basis probabilities are unchanged (still 50-50), but the relative phase is now $e^{i\pi/4}$.

Example 2: The H T H circuit

Apply the circuit H \cdot T \cdot H to |0\rangle. This sequence — Hadamard, then T, then another Hadamard — is one of the simplest non-trivial circuits in quantum computing.

Step 1. First Hadamard on |0\rangle.

H|0\rangle = |+\rangle = \tfrac{1}{\sqrt{2}}(|0\rangle + |1\rangle).

Why: the Hadamard turns |0\rangle into the equal superposition |+\rangle — you saw this in the Hadamard chapter.

Step 2. Apply T.

T|+\rangle = \tfrac{1}{\sqrt{2}}(|0\rangle + e^{i\pi/4}|1\rangle).

Why: as you computed in Example 1. T inserts the phase e^{i\pi/4} on the |1\rangle amplitude.

Step 3. Apply the second Hadamard. Use linearity and H|0\rangle = |+\rangle, H|1\rangle = |-\rangle.

H\cdot\tfrac{1}{\sqrt{2}}(|0\rangle + e^{i\pi/4}|1\rangle) = \tfrac{1}{\sqrt{2}}(H|0\rangle + e^{i\pi/4}H|1\rangle) = \tfrac{1}{\sqrt{2}}(|+\rangle + e^{i\pi/4}|-\rangle).

Why: linearity of H, then substitution of its action on the computational basis. The relative phase e^{i\pi/4} is unaffected — it is a scalar multiplying the |1\rangle piece.

Step 4. Expand |+\rangle and |-\rangle back into |0\rangle and |1\rangle and collect.

\tfrac{1}{\sqrt{2}}\Big(\tfrac{1}{\sqrt{2}}(|0\rangle + |1\rangle) + e^{i\pi/4}\tfrac{1}{\sqrt{2}}(|0\rangle - |1\rangle)\Big)

= \tfrac{1}{2}\big((1 + e^{i\pi/4})|0\rangle + (1 - e^{i\pi/4})|1\rangle\big).

Why: distribute and collect like terms. The \tfrac{1}{\sqrt{2}} \cdot \tfrac{1}{\sqrt{2}} = \tfrac{1}{2} comes from the two normalisations.

Step 5. Examine the state on the Bloch sphere. Check measurement probabilities.

|1 + e^{i\pi/4}|^2 = (1 + \cos(\pi/4))^2 + \sin^2(\pi/4) = 1 + 2\cos(\pi/4) + \cos^2(\pi/4) + \sin^2(\pi/4) = 2 + \sqrt{2}.
|1 - e^{i\pi/4}|^2 = (1 - \cos(\pi/4))^2 + \sin^2(\pi/4) = 2 - \sqrt{2}.
Probability of 0 is (2 + \sqrt 2)/4 \approx 0.854. Probability of 1 is (2 - \sqrt 2)/4 \approx 0.146. Why these are correct: |1 + e^{i\theta}|^2 = (1+\cos\theta)^2 + \sin^2\theta = 2 + 2\cos\theta, and |1 - e^{i\theta}|^2 = 2 - 2\cos\theta. With \theta = \pi/4 and \cos(\pi/4) = \sqrt{2}/2, you get the numbers above. Divide by 4 = 2^2 (the normalisation squared) to get probabilities, and 0.854 + 0.146 = 1.

Result. H\,T\,H\,|0\rangle is a biased superposition: measurement gives 0 with probability (2 + \sqrt{2})/4 \approx 85.4\% and 1 with probability (2 - \sqrt{2})/4 \approx 14.6\%. On the Bloch sphere, the state sits tilted by 45° between the +z pole and the +x equator — exactly what you would expect from "conjugating a z-rotation by a basis-change that swaps x and z": H T H is effectively a 45° rotation about the x-axis.

The $HTH$ circuit. By sandwiching $T$ (a $z$-rotation) between two Hadamards, you effectively convert it into an $x$-rotation by the same angle. The biased output probabilities are a direct fingerprint of the $45°$ rotation axis not aligning with either measurement axis.

What this shows. H U H is the "basis-change sandwich" — wrapping a gate U with two Hadamards converts it into the gate you would get by swapping x and z. Here, T = R_z(\pi/4) (up to a global phase), so HTH = R_x(\pi/4). This trick generalises: HZH = X, HXH = Z, HYH = -Y, and now HTH is the x-axis 45° rotation. When a real compiler needs an x-rotation but the hardware only provides z-rotations, it surrounds the z-rotation with Hadamards.

Why S and T are special: Clifford and non-Clifford

Every gate you have met so far — X, Y, Z, H, S — shares a special property: it belongs to the Clifford group. What that means, concretely, is: if you take any Pauli matrix P and compute U P U^\dagger for a Clifford U, the result is always another Pauli matrix (possibly with a minus sign or factor of i).

You saw this in the last chapter. HXH = Z, HZH = X, HYH = -Y. Similarly, conjugating by S gives SXS^\dagger = Y, SYS^\dagger = -X, SZS^\dagger = Z. In every case, Paulis get mapped to Paulis.

The Clifford group — generated by $H$, $S$, and CNOT — is everything except the $T$ gate in this picture. Clifford-only circuits, however complex they look, are classically simulable in polynomial time. The moment you add a single $T$ gate, the simulation becomes (believed to be) classically hard, and the gate set becomes universal for quantum computing.

The Gottesman–Knill theorem

Here is the punchline that makes Cliffords different from everything else: a quantum circuit built entirely from Clifford gates, starting from a computational-basis state and ending with computational-basis measurement, can be simulated efficiently on a classical computer.

This result — the Gottesman–Knill theorem [5] — says that no amount of Clifford-only quantum circuitry gives any computational advantage over a laptop. A circuit with a million H's, S's, CNOTs, and Paulis is no harder to simulate classically than a circuit with ten. The Clifford group is, in some sense, "classically tame."

Why Cliffords are classically tractable: Clifford gates preserve the Pauli structure of a state. If you describe the initial state by which Paulis commute with it (the "stabiliser formalism"), each Clifford gate just updates this description by mapping Paulis to Paulis — a quick bookkeeping update. A classical computer can track this updated description in polynomial time, and it uses the updated stabilisers to compute measurement probabilities efficiently.

This is why Cliffords are, somewhat paradoxically, the "easy" quantum gates. They create superposition, they create entanglement (via CNOT), they do lots of quantum-looking things — but none of that "quantum-looking" activity is computationally powerful on its own.

T breaks the Clifford ceiling

Enter the T gate. Under conjugation, T does not map Paulis to Paulis. Check what T X T^\dagger gives:

T X T^\dagger = \begin{pmatrix} 1 & 0 \\ 0 & e^{i\pi/4} \end{pmatrix}\begin{pmatrix} 0 & 1 \\ 1 & 0 \end{pmatrix}\begin{pmatrix} 1 & 0 \\ 0 & e^{-i\pi/4} \end{pmatrix}.

Work through it and you get something like \tfrac{1}{\sqrt{2}}(X + Y) — a linear combination of Paulis, not a single Pauli. That is what it means to be non-Clifford: conjugation leaves the Pauli group and lands outside it.

Adding T to the Clifford set has two consequences:

The gate set becomes universal. With \{H, S, T\} plus CNOT, you can approximate any single-qubit or multi-qubit unitary to arbitrary accuracy. This is the Solovay–Kitaev theorem: any unitary within \epsilon precision can be reached by a sequence of O(\log^c(1/\epsilon)) gates from this set. The exact value of c is an area of ongoing improvement; the important thing is the efficiency is polylogarithmic.
Classical simulation stops working. A circuit with just one T gate, let alone many, is no longer known to be efficiently simulable classically. (Widely believed, though not proven unconditionally — a proof would resolve major open questions in complexity theory.)

So T is the single gate that takes your otherwise-classical-feeling Clifford circuit and gives it genuine quantum computational power.

Why T is "expensive" in practice

In a fault-tolerant quantum computer — one that corrects its own errors using quantum error correction — most Clifford gates can be implemented transversally: you apply the gate to each physical qubit of an encoded logical qubit, and the error-correction code naturally protects the operation. Transversal gates are cheap: one logical gate = a few physical gates, and errors do not propagate uncontrollably.

But T cannot be transversal in most codes (the Eastin–Knill theorem states roughly that no code can implement a universal gate set transversally; one gate must be done in a costlier way). In the dominant error-correction scheme (surface code), T is implemented via magic-state distillation: you prepare a large number of low-quality copies of a special "magic state" |T\rangle = T|+\rangle, use Clifford operations and measurements to distil them into fewer, higher-quality copies, and consume one high-quality copy to implement one reliable T gate on a logical qubit.

The cost: to produce one reliable T gate, you typically need to prepare and distil hundreds to thousands of noisy magic states. Recent resource estimates for breaking 2048-bit RSA via Shor's algorithm — the big-ticket application of quantum computing — are dominated by the cost of T-gate injection. One such estimate (Gidney & Ekera, 2019) put the total at roughly 2.7 million T-gates; each of those needs a magic-state distillation factory using thousands of physical qubits in superposition. The enormous qubit counts you see in quantum-computing roadmaps (20 million, 100 million) are mostly about having enough parallel magic-state factories to feed the T-gate demand.

That is why T is the "expensive" gate. In a textbook circuit, T costs the same as S or H — one clock tick, one box in the diagram. In a fault-tolerant implementation, T is where your budget goes.

Practical upshot

When a quantum compiler decomposes a unitary into the standard gate set \{H, S, T, \text{CNOT}\}, it counts T-count and T-depth as the primary cost metrics. An algorithm with T-count 100 is cheaper to run fault-tolerantly than one with T-count 10,000, even if they have the same total gate count. This metric has driven an entire sub-field of quantum circuit compilation — people work hard to rewrite circuits to use fewer Ts, because every T saved is potentially thousands of physical operations saved downstream.

Common confusions

"S is its own inverse, like H or X" — no. S^2 = Z, not I. The inverse of S is S^\dagger, which is a genuinely different gate. Two Ss give you Z, and four Ss give you the identity. The self-inverse gates in the single-qubit zoo are I, X, Y, Z, and H — S and T are not on that list.
"T is basically the same as S, just smaller" — mechanically yes: both are rotations about the z-axis, and T's angle is half of S's. But the Clifford/non-Clifford distinction makes them categorically different for fault tolerance. S is "free" in a transversal error-correction code; T requires magic-state distillation. On paper they look almost identical; on hardware they cost vastly different amounts.
"T is the same as CNOT" — no, unrelated. CNOT is a two-qubit gate; T is a single-qubit gate. Both are mentioned together only because CNOT is the standard entangling gate and T is the standard "magic" non-Clifford gate: together with H and S, they form the canonical universal gate set \{H, S, T, \text{CNOT}\}. But individually they do very different things.
"Adding T is a small change" — not really. Adding T to the Clifford group changes classical simulability from polynomial to (believed) exponential. It changes the hardware cost from transversal to magic-state distillation. These are qualitatively different regimes. The mechanical simplicity of "T is just a diagonal phase gate" hides its enormous conceptual significance.
"Every phase rotation is S or T" — no. S and T are specific fixed-angle rotations (\pi/2 and \pi/4). There is an entire family of continuous phase-rotation gates R_z(\theta) for arbitrary \theta — which you will meet in the next chapter. S and T are just two specific ones picked out because they sit at clean fractions of 2\pi and because \{H, S, T\} turns out to be universal.
"Global and relative phase are the same thing" — a critical subtlety. When T acts on |0\rangle alone, the result is just |0\rangle (no phase inserted, since the matrix's top-left entry is 1). When T acts on |1\rangle alone, the result is e^{i\pi/4}|1\rangle — but a state multiplied by a pure phase is physically the same state. What T actually does, observably, is change the relative phase between the amplitudes of |0\rangle and |1\rangle in a superposition. If the qubit is in a computational-basis state, T is invisible. If it is in a superposition, T is absolutely detectable in an interference measurement.

Going deeper

If you are here for the single-qubit gate zoo, you have S and T. The key takeaways: S is a 90° z-rotation (S = \sqrt{Z}), T is a 45° z-rotation (T = \sqrt{S}), and T is the "expensive" non-Clifford gate that makes universality possible. The rest of this section digs into the Clifford group structure that makes T the lever, the Solovay–Kitaev theorem that formalises "universality to arbitrary precision," the cost of magic-state distillation as it appears in real resource estimates, and a brief mention of T counts in famous algorithms like Shor's factoring.

The Clifford+T universal gate set

The Clifford+T gate set \{H, S, T, \text{CNOT}\} is the standard universal gate set used in most of the quantum-computing literature. Why this particular collection?

\{H, S, \text{CNOT}\} generates the Clifford group. This is itself a powerful collection — it can create any stabiliser state, any entanglement structure needed for error correction, and can transform between any two stabiliser states.
Adding T — a single non-Clifford gate — promotes the set to universal. You could use other non-Clifford gates (e.g., the Toffoli gate, or arbitrary-angle R_z(\theta)), but T has three practical advantages: (1) it is at a clean fraction of 2\pi that works well with standard hardware, (2) it is well-studied in error correction (magic-state protocols are built around it specifically), and (3) its small rotation angle makes it amenable to the Solovay–Kitaev approximation theory.

Some hardware platforms natively support different sets, and there is a compilation step that re-expresses everything in terms of the native gates. When you see "T count" reported for an algorithm, it implicitly assumes Clifford+T.

The Solovay–Kitaev theorem

The Solovay–Kitaev theorem makes precise what "universal" means. It says: given any single-qubit unitary U and any desired precision \epsilon > 0, there exists a sequence of gates from \{H, S, T, H^\dagger, S^\dagger, T^\dagger\} — say, g_1 g_2 \cdots g_L — of length L = O(\log^c(1/\epsilon)) for some constant c, such that \|U - g_1 g_2 \cdots g_L\| < \epsilon.

In plain terms: you can approximate any rotation you like, to any precision you like, using only a polylogarithmic number of Clifford+T gates. The original Solovay–Kitaev bound had c = 3.97; modern improvements bring it close to c = 1.

This matters because it says universality is not just a matter of "can you get there" (existence) but of "can you get there efficiently" (polylogarithmic cost). The practical effect: any unitary you would ever want to apply in an algorithm can be compiled into a Clifford+T sequence of manageable length.

Magic-state distillation, briefly

The magic-state distillation protocol (Bravyi & Kitaev 2005) [6] produces one high-fidelity |T\rangle = T|+\rangle state from many low-fidelity copies, using only Clifford operations and computational-basis measurements. The output error rate decreases quadratically (or faster) with each round of distillation, so a constant number of rounds suffices to produce arbitrarily high-quality magic states.

But the input cost is high: typical protocols consume 15 or more noisy magic states per round to produce one distilled state, and multiple rounds of distillation are stacked to drive the error rate below the surface-code threshold. The result is a factory: several hundred to several thousand physical qubits producing one reliable T gate per distillation cycle. For an algorithm needing millions of T gates (like Shor's factoring for RSA-2048), you need a bank of such factories running in parallel.

This is why resource estimates for quantum advantage are dominated by T-count: the Clifford part of the algorithm is almost free in terms of physical qubits (it adds only a few overhead per logical gate), while each T gate drags in an entire magic-state factory.

T-counts in famous algorithms

Some representative numbers, to give a sense of scale.

Grover search on an n-qubit database with a simple oracle: a few T gates per oracle call; the total depends on how many Grover iterations are needed (which is O(\sqrt{N}) where N = 2^n).
Shor's factoring of an L-bit integer: T-count scales roughly as O(L^3) — this is dominated by the modular multiplication inside the quantum Fourier transform. For L = 2048 (RSA keys), you are in the millions of Ts.
Hamiltonian simulation (used in quantum chemistry): T-count depends heavily on the specific molecule and desired precision. For industrially-interesting molecules the state-of-the-art compilers produce in the range of 10^9 to 10^{12} T-gates — i.e., teraflops of magic-state production.

These numbers are what fault-tolerant hardware targets are calibrated against. When you see "10 million qubits needed for Shor," the majority of those qubits are magic-state distillation factories supplying T gates, not data qubits holding the computation's state.

The dihedral coset problem

One last connection for the more advanced reader. Non-Clifford gate resource counting is closely tied to the dihedral coset problem — a problem in computational group theory that is classically believed hard but quantumly tractable. This connection reveals that the T gate's non-Clifford magic is, at heart, the same magic that lets quantum computers solve certain group-theoretic problems classical computers cannot. It is too technical to fully develop here, but the point is: the T gate is not an arbitrary choice. It is deeply connected to the algebraic structure that makes quantum speedups possible.

The National Quantum Mission dimension

India's National Quantum Mission (launched 2023 with a ₹6000 crore budget over 8 years) explicitly includes fault-tolerant quantum computing as a research pillar, with magic-state distillation and T-gate-efficient compilation among the topics funded. Indian researchers at IIT Madras, TIFR, and the Raman Research Institute have active work on error-correction codes and on circuit compilation optimising T-counts for the superconducting and trapped-ion platforms being developed domestically. The T gate is, in a very practical sense, where the scaling challenge sits — and it is what a substantial fraction of India's quantum-computing research budget is paying to understand better.

Where this leads next

Rotation gates R_x, R_y, R_z — the continuous-parameter versions. R_z(\pi/2) = S and R_z(\pi/4) = T (both up to a global phase).
The Clifford group — the 24-element finite group generated by \{H, S\} on one qubit and \{H, S, \text{CNOT}\} on n qubits. Home of stabiliser codes.
Magic-state distillation — the fault-tolerant protocol that produces |T\rangle states from noisy copies. Where the cost of T gates actually lives.
Universal gate sets — why \{H, S, T, \text{CNOT}\} is universal, and what "universal" means precisely.
Quantum Fourier transform — the engine of Shor, built from controlled-R_z gates which are really controlled-S, controlled-T, and controlled fractional phase gates.

References

Wikipedia, Quantum logic gate — Phase and T gates — matrix forms, circuit symbols, and Clifford-group placement.
Nielsen and Chuang, Quantum Computation and Quantum Information (2010), §4.2 and §10.6 on fault tolerance — Cambridge University Press.
John Preskill, Lecture Notes on Quantum Computation, Ch. 7 (fault tolerance and the Clifford hierarchy) — theory.caltech.edu/~preskill/ph229.
Qiskit Textbook, Single Qubit Gates — hands-on S, T examples with a live simulator.
Wikipedia, Gottesman–Knill theorem — why Clifford circuits are classically simulable.
Sergey Bravyi and Alexei Kitaev, Universal quantum computation with ideal Clifford gates and noisy ancillas (2005) — arXiv:quant-ph/0403025. The origin of magic-state distillation.