In short
A quantum channel \mathcal E — the most general physically-realisable one-shot operation on a density matrix — has the Kraus representation
The operators \{K_k\} are called Kraus operators. The completeness condition \sum_k K_k^\dagger K_k = I is what guarantees trace preservation. Unitary gates are the special case of a single Kraus K_0 = U. Unread projective measurements are the case K_m = P_m. Noise channels — bit-flip \{\sqrt{1-p}\,I, \sqrt p\,X\}, phase-flip \{\sqrt{1-p}\,I, \sqrt p\,Z\}, depolarizing, amplitude damping — all fit the same template. Stinespring's theorem gives the physical meaning: every quantum channel is equivalent to a unitary on the system-plus-environment, followed by tracing out the environment. One framework, every operation, derived from a single picture.
You have seen three kinds of thing happen to a density matrix. A unitary gate sandwiches it: \rho \mapsto U\rho U^\dagger. A recorded measurement projects and renormalises: \rho \mapsto P_m\rho P_m/p(m). An unread measurement sums the projected pieces: \rho \mapsto \sum_m P_m\rho P_m. All three are different expressions doing apparently different things.
But look at them again. Each one sandwiches \rho between some operator on the left and its dagger on the right, then sums over some index. That is the shape of the rule. And there is a single framework that captures not only these three but also every kind of noise a real quantum device can suffer — amplitude damping, phase damping, depolarizing noise, bit-flip errors, coherent over-rotations, everything — inside the same template. That framework is the Kraus representation, and this chapter develops it end to end.
The payoff is considerable. You will learn to write down a bit-flip channel, a phase-flip channel, a depolarizing channel, and an amplitude-damping channel as a few explicit 2\times 2 matrices. You will see how each corresponds to a physical process that a superconducting qubit or a trapped ion can actually experience. And you will meet Stinespring's theorem, which says that every quantum channel is secretly a unitary on a bigger system — with the part you cannot see, the environment, traced out. No new axioms needed; the ones you already have do the job.
The Kraus form
Start from the shape of the rule you want.
Quantum channel — Kraus representation
A quantum channel on a finite-dimensional Hilbert space \mathcal H is a map \mathcal E : \mathcal D(\mathcal H) \to \mathcal D(\mathcal H) from density matrices to density matrices, of the form
where the Kraus operators \{K_k\} satisfy the completeness relation
The Kraus operators are linear maps \mathcal H \to \mathcal H; there is no constraint that they be unitary, Hermitian, or anything beyond the completeness relation itself.
The completeness relation is the entire trace-preserving condition, compressed. Take the trace of \mathcal E(\rho):
Why the trace slides through: \text{tr}(AB) = \text{tr}(BA) (cyclic property), so \text{tr}(K_k \rho K_k^\dagger) = \text{tr}(K_k^\dagger K_k\,\rho). The rest is linearity and the completeness relation. Drop the completeness relation and the output has trace \neq 1; impose it, and unit trace is automatic.
Hermiticity of the output: \left(\sum_k K_k\rho K_k^\dagger\right)^\dagger = \sum_k (K_k^\dagger)^\dagger \rho^\dagger K_k^\dagger = \sum_k K_k \rho K_k^\dagger, using \rho^\dagger = \rho and the double-dagger rule. Hermitian in, Hermitian out.
Positive semi-definiteness: for any |\phi\rangle, \langle\phi|\mathcal E(\rho)|\phi\rangle = \sum_k \langle\phi|K_k\rho K_k^\dagger|\phi\rangle = \sum_k \langle K_k^\dagger\phi|\rho|K_k^\dagger\phi\rangle \geq 0, because each term is a non-negative number (positive semi-definite \rho applied to the vector K_k^\dagger|\phi\rangle). PSD in, PSD out.
So the Kraus form guarantees all three density-matrix axioms — Hermiticity, PSD, unit trace — by construction. You always end up with a valid quantum state.
Every special case you already know
The Kraus form is not a new kind of operation; it is the general expression of which everything you have seen is a special case.
- Unitary gate. One Kraus operator: K_0 = U. Completeness: K_0^\dagger K_0 = U^\dagger U = I. Action: \mathcal E(\rho) = U\rho U^\dagger. Exactly the unitary sandwich rule.
- Unread projective measurement. One Kraus operator per outcome: K_m = P_m. Completeness: \sum_m P_m^\dagger P_m = \sum_m P_m = I (orthogonal projectors are Hermitian and sum to the identity). Action: \mathcal E(\rho) = \sum_m P_m \rho P_m. Exactly the unread-measurement rule from the previous chapter.
- Identity channel (do nothing). Single Kraus K_0 = I. Action: \mathcal E(\rho) = \rho.
- Noise channels. Several Kraus operators, each with a non-unitary scaling, encoding the probability of different noise events. You will see four examples below.
Stinespring's theorem — where Kraus operators come from
The Kraus form was not pulled from thin air. It has a physical origin: every quantum channel is the shadow of a unitary on a larger system. This is Stinespring's dilation theorem, first proved in 1955.
The setup. Take the system S and add an environment E (an auxiliary register of sufficient size). Initialise E in some fixed pure state |0\rangle_E. Apply a unitary U_{SE} to the joint system. Trace out E. The result — a map on \rho_S alone — is a quantum channel, and by direct calculation, it has the Kraus form:
where the Kraus operators are extracted by picking a basis \{|k\rangle_E\} of the environment and defining
Why this produces a valid Kraus set: the completeness relation \sum_k K_k^\dagger K_k = \sum_k\langle 0|U^\dagger|k\rangle\langle k|U|0\rangle = \langle 0|U^\dagger U|0\rangle_E = \langle 0|I|0\rangle_E = I_S, using \sum_k|k\rangle\langle k| = I_E (completeness of the environment basis). So Stinespring's construction automatically gives Kraus operators satisfying the trace-preservation condition.
Stinespring's theorem is a two-way street: every Kraus representation comes from some unitary-plus-environment pair, and every unitary-plus-environment pair gives a Kraus representation. The two views are equivalent. The Kraus form is an operational, compact way to write down a channel; the Stinespring form is a physical way to understand it.
The environment size
How big must the environment be? Not very — and you can be precise. A channel on an n-dimensional system has at most n^2 Kraus operators (because the Kraus operators live in the n^2-dimensional space of linear maps on \mathcal H_S). So an environment of dimension n^2 suffices to purify any channel: \dim\mathcal H_E = n^2 is always enough.
For a qubit (n = 2), the environment needs at most 4 dimensions — that is, 2 qubits of ancilla. Every one-qubit channel can be realised by entangling your qubit with at most 2 ancilla qubits, applying a unitary, and throwing the ancillae away. This is the practical recipe for simulating arbitrary quantum noise inside a clean quantum circuit.
Non-uniqueness
Here is a strange fact. The same channel \mathcal E can be written with different Kraus sets. The two sets \{K_k\} and \{K_k'\} produce the same channel if and only if they are related by a unitary change of environment basis:
Under this change, the Kraus form of \mathcal E is unchanged: \sum_k K_k' \rho K_k'^\dagger = \sum_k (\sum_j V_{kj} K_j) \rho (\sum_l \bar V_{kl} K_l^\dagger) = \sum_{jl}\left(\sum_k V_{kj}\bar V_{kl}\right) K_j \rho K_l^\dagger = \sum_{jl}\delta_{jl} K_j\rho K_l^\dagger = \sum_j K_j\rho K_j^\dagger = \mathcal E(\rho), using \sum_k V_{kj}\bar V_{kl} = (V^\dagger V)_{jl} = \delta_{jl}.
Why this non-uniqueness is physical: it corresponds to choosing a different basis for the environment in Stinespring's picture. The environment basis is a choice of "what to call outcome k"; the physical channel does not depend on that choice. So the Kraus set is defined only up to an environment basis change — same ambiguity as the non-uniqueness of ensemble decompositions of \rho you met in the density operator chapter.
In practice, two Kraus sets for the same channel often look very different. The depolarizing channel can be written in Pauli form (\{I, X, Y, Z\}-like) or in a rotated form related by a 4\times 4 unitary on the Pauli index. Both describe the same physical noise.
Four noise channels you will meet everywhere
Four specific quantum channels come up constantly in quantum-computing practice. Each has a 1-line physical story and a 2-or-4-Kraus-operator mathematical description.
Bit-flip channel
With probability p, apply the Pauli X gate (a bit-flip). With probability 1 - p, do nothing. The Kraus operators are
Completeness check: K_0^\dagger K_0 + K_1^\dagger K_1 = (1-p)I + p\,X^\dagger X = (1-p)I + pI = I, using X^\dagger X = X^2 = I (the Pauli X is self-inverse and Hermitian). Passes.
Physical origin: classical bit-flip errors — a transistor or logical bit accidentally gets set to the wrong value. In superconducting qubits, relaxation and thermal excitation contribute a bit-flip-like component when the noise is energy-conserving (like cross-talk between qubits at the same frequency).
Phase-flip channel
With probability p, apply Pauli Z. With probability 1 - p, do nothing. Kraus operators:
The effect: flips the sign of the |1\rangle amplitude, which is what Z|1\rangle = -|1\rangle means. |0\rangle\langle 0| and |1\rangle\langle 1| are unchanged by the phase flip; only the off-diagonal coherences get multiplied by (1 - 2p), suppressing them.
Physical origin: pure dephasing — the physical mechanism where the qubit's phase randomises without any population transfer. It is the dominant noise process in NMR and a large component of noise in superconducting systems. The characteristic timescale is T_2 — the dephasing time — and it directly controls the maximum circuit depth for coherent quantum computing.
Depolarizing channel
With probability p, replace the state with the maximally mixed state I/2. With probability 1 - p, do nothing. One compact Kraus set:
Why the scaling works: expanding \mathcal E(\rho) = (1 - \tfrac{3p}{4})\rho + \tfrac{p}{4}(X\rho X + Y\rho Y + Z\rho Z) and using the Pauli identity \tfrac{1}{4}(\rho + X\rho X + Y\rho Y + Z\rho Z) = \tfrac{I}{2}\text{tr}(\rho) = \tfrac{I}{2}, you get \mathcal E(\rho) = (1-p)\rho + p\tfrac{I}{2}. With probability 1 - p, do nothing; with probability p, output the maximally mixed state. The algebraic coefficients \tfrac{3p}{4} and \tfrac{p}{4} are exactly what makes the Pauli identity reproduce this story.
Geometrically, the depolarizing channel shrinks the Bloch vector uniformly toward the origin: \vec r \mapsto (1 - p)\vec r. At p = 0, do nothing. At p = 1, collapse to the origin (maximally mixed). Intermediate p gives a uniform contraction — the Bloch ball shrinks to a smaller concentric ball, and every state moves proportionally toward I/2.
Physical origin: the depolarizing channel is the most "symmetric" noise model — it treats all directions on the Bloch ball equally. Real hardware rarely has exactly depolarizing noise, but the depolarizing model is the standard textbook proxy for "generic" noise and is used in error-correction analyses where the noise is assumed isotropic.
Amplitude damping channel
The qubit spontaneously decays from |1\rangle to |0\rangle with probability \gamma; otherwise it stays as it was but with a modified amplitude. Kraus operators:
Completeness check: K_0^\dagger K_0 = \begin{pmatrix}1 & 0 \\ 0 & 1 - \gamma\end{pmatrix}, K_1^\dagger K_1 = \begin{pmatrix}0 & 0 \\ 0 & \gamma\end{pmatrix}, sum = I. Passes.
Physical story: K_1 = \sqrt\gamma\,|0\rangle\langle 1| is the operator "decay from |1\rangle to |0\rangle with probability amplitude \sqrt\gamma." It happens with probability \gamma on |1\rangle, probability 0 on |0\rangle. K_0 is what happens in the other branch — "did not decay" — which has amplitude 1 on |0\rangle and amplitude \sqrt{1 - \gamma} on |1\rangle.
This is the T_1 process — energy relaxation. The characteristic timescale T_1 controls how quickly a qubit in |1\rangle loses its excitation. For superconducting qubits T_1 is typically 10–500 microseconds in current hardware; for trapped ions it can be seconds; for NMR nuclear spins it can be tens of seconds. In every platform, T_1 is one of the dominant experimental limitations and one of the headline numbers you quote when you publish a new qubit design.
Worked examples
Example 1 — Bit-flip channel with $p = 0.1$ applied to $|0\rangle\langle 0|$
Compute \mathcal E(|0\rangle\langle 0|) for the bit-flip channel with flip probability p = 0.1.
Step 1. Write the Kraus operators.
Step 2. Compute K_0\rho K_0^\dagger for \rho = |0\rangle\langle 0| = \begin{pmatrix}1 & 0 \\ 0 & 0\end{pmatrix}.
Why this branch gives 0.9|0\rangle\langle 0|: the "did not flip" branch has probability amplitude \sqrt{0.9}, so its contribution to the density matrix scales by \sqrt{0.9}\cdot\sqrt{0.9} = 0.9.
Step 3. Compute K_1\rho K_1^\dagger.
So K_1\rho K_1^\dagger = 0.1\cdot X\rho X = 0.1|1\rangle\langle 1|.
Step 4. Add the two branches.
Step 5. Check. Trace: 0.9 + 0.1 = 1. Check. Hermitian: the matrix is diagonal and real. Check. Eigenvalues 0.9, 0.1, both positive. Check. This is a valid density matrix — and it is a classical 90-10 mixture of |0\rangle and |1\rangle.
Result. Bit-flip with p = 0.1 on |0\rangle\langle 0| gives 0.9|0\rangle\langle 0| + 0.1|1\rangle\langle 1|. Purity \text{tr}(\rho'^2) = 0.81 + 0.01 = 0.82, down from the input purity of 1. A pure state has become a slightly mixed classical distribution.
What this shows. Kraus operators are the bookkeeping for "with some probability this happens, with some other probability that happens." The bit-flip channel is the quantum version of a noisy classical wire, and the Kraus form makes the noise budget explicit: two operators, two probabilities, one output density matrix.
Example 2 — Depolarizing channel with $p = 1$ destroys any state
Compute \mathcal E(\rho) for the depolarizing channel with p = 1 applied to an arbitrary qubit state \rho.
Step 1. Plug p = 1 into the depolarizing Kraus operators.
Step 2. Apply to \rho. Write \rho in its Bloch-vector form \rho = \tfrac{1}{2}(I + r_x X + r_y Y + r_z Z) for some Bloch vector \vec r.
Step 3. Use the Pauli identity. For any 2\times 2 matrix \rho,
Why this identity holds: expanding each side in Pauli components, the left side picks out only the identity component of \rho (the Pauli components cancel by the traceless-Pauli orthogonality). The right side is exactly that identity component times I/2. For a density matrix, \text{tr}(\rho) = 1, so the result is I/2.
Step 4. Conclude.
Result. The depolarizing channel at p = 1 sends every qubit state — pure or mixed, any Bloch vector — to the maximally mixed state I/2. All information about the input has been erased. Purity drops to 1/2 (the minimum); the Bloch vector becomes \vec 0.
What this shows. A single channel parameter (p in the depolarizing channel) controls how much information is lost. At p = 0, nothing happens. At p = 1, everything is destroyed. In between, the Bloch vector shrinks linearly — a simple, tractable noise model that is the go-to choice for benchmarking error-correction protocols, even when real hardware noise is more complicated.
Common confusions
-
"Kraus operators must be unitary." No — they are not unitary in general. For a channel with N > 1 Kraus operators, each K_k is typically not unitary (it cannot be, because then K_k^\dagger K_k = I for every k, and the completeness relation \sum_k K_k^\dagger K_k = I would force N = 1). Unitarity is the special single-Kraus case. Non-unitary Kraus operators are where the interesting physics — noise, decoherence, measurement — lives.
-
"Kraus operators are Hermitian." Usually not. The projectors P_m for unread measurement are Hermitian, but for amplitude damping, K_1 = \sqrt\gamma|0\rangle\langle 1| is not Hermitian (its dagger is \sqrt\gamma|1\rangle\langle 0|, which is different). Hermiticity is not required; only the completeness relation is.
-
"A channel and a gate are the same thing." A gate is a unitary — the simplest kind of channel, with one Kraus operator. A channel in general can be a unitary, a measurement, a noise process, or any combination. The word "gate" in quantum-computing practice usually means "something you can implement in the circuit as a building block" — unitaries plus measurements. A "channel" is the broader mathematical object that also includes noise.
-
"A 1-Kraus channel is unitary." Yes — this is the only way to have a single Kraus operator and still satisfy completeness, because K_0^\dagger K_0 = I is the unitary condition. Single-Kraus channels are exactly unitaries.
-
"The Kraus representation is unique." No. Two Kraus sets \{K_k\} and \{K_k'\} can describe the same channel if they are related by a unitary on the environment index (see the "non-uniqueness" subsection above). For a given channel, there is a minimal Kraus representation (with the fewest operators), and that minimum is the Kraus rank of the channel — a quantity bounded above by d^2 where d = \dim\mathcal H.
-
"\sum_k K_k K_k^\dagger = I." No — the completeness relation is \sum_k K_k^\dagger K_k = I (dagger on the left factor). The reverse sum \sum_k K_k K_k^\dagger also equals I in some cases (for unital channels — those that map I to I), but is not in general equal to I. The trace-preservation condition is specifically the dagger-on-the-left form.
Going deeper
If you are here for the Kraus form of a channel, the Stinespring picture, and the four canonical noise channels (bit-flip, phase-flip, depolarizing, amplitude damping), you have the core. The rest of this section develops the formal CPTP framework, complete positivity, the Choi matrix, and channel capacity.
CPTP — completely positive, trace preserving
The formal definition of a quantum channel is a completely positive, trace-preserving (CPTP) linear map on density matrices.
- Trace-preserving (TP): \text{tr}(\mathcal E(\rho)) = \text{tr}(\rho) = 1 for all valid inputs. Guaranteed by the completeness relation \sum_k K_k^\dagger K_k = I.
- Positive: \mathcal E maps positive-semi-definite operators to positive-semi-definite operators. Every valid density matrix goes to a valid density matrix.
- Completely positive (CP): even when \mathcal E acts on only part of a bipartite system — \mathcal E \otimes \text{id}_B on \rho_{AB} — the result is still positive-semi-definite. This is strictly stronger than positivity and rules out pathological maps like the matrix transpose, which is positive but not completely positive.
Why complete positivity matters: if \mathcal E is not completely positive, applying it to one half of an entangled pair can produce a matrix with negative eigenvalues — not a valid density matrix. The transpose map is the textbook example: (T \otimes I) applied to a Bell state gives a matrix with negative eigenvalues, even though T on one qubit alone is positive. Physically-realisable operations cannot do this, so complete positivity is mandatory.
Choi-Kraus theorem (1975): every CPTP linear map has a Kraus representation, and conversely, every Kraus representation gives a CPTP linear map. So "CPTP map" and "Kraus representation" are two names for the same mathematical object. The Kraus form is the concrete realisation; CPTP is the axiomatic characterisation.
The Choi matrix
A deep identification. Every linear map \mathcal E : \mathcal B(\mathcal H) \to \mathcal B(\mathcal H) on operators of an n-dimensional space corresponds bijectively to an operator on \mathcal H \otimes \mathcal H — a n^2-dimensional space — via the Choi-Jamiolkowski isomorphism. Specifically,
where |\Omega\rangle is the unnormalised maximally entangled state. J(\mathcal E) is called the Choi matrix of \mathcal E.
The beauty: \mathcal E is CPTP if and only if J(\mathcal E) is a valid "quantum operation matrix" — PSD with appropriate trace. Complete positivity, which seems like a subtle condition on the map, becomes simple positive-semi-definiteness of the Choi matrix. This is why the Choi matrix is so widely used in quantum-channel theory: it converts questions about maps into questions about matrices.
From Choi to Kraus. Diagonalise the Choi matrix: J(\mathcal E) = \sum_k |\phi_k\rangle\langle\phi_k| (spectral decomposition with eigenvalues absorbed into the |\phi_k\rangle). Reshape each eigenvector |\phi_k\rangle (an n^2-dim vector) back into an n\times n matrix K_k. Those are the Kraus operators. The recipe is a single n^2 \times n^2 eigendecomposition, which is computationally cheap even for moderately-large systems.
The Choi matrix is how the QuTiP library stores and manipulates quantum channels internally, and it is the basis for most numerical channel-fidelity calculations.
Channel fidelity and diamond norm
How close are two channels \mathcal E_1 and \mathcal E_2? Two common measures:
Entanglement fidelity: F_e(\mathcal E_1, \mathcal E_2) = \text{tr}(J(\mathcal E_1)J(\mathcal E_2))/n^2. Direct in terms of the Choi matrices.
Diamond norm distance: \|\mathcal E_1 - \mathcal E_2\|_\diamond = \max_\rho \|\mathcal E_1 \otimes \text{id}(\rho) - \mathcal E_2 \otimes \text{id}(\rho)\|_1. The maximisation is over all states on a double-system space, and the norm is the Schatten-1 norm on the output. The diamond norm is the operationally "correct" distance — it gives the optimal probability of distinguishing two channels via any quantum experiment, including ones that use auxiliary entanglement.
Diamond-norm distances underpin the thresholds for fault-tolerant quantum computing. Shor-Steane-style error-correction theorems state: if each gate is within diamond-norm distance \varepsilon of its ideal version, with \varepsilon below some threshold, then arbitrary-scale fault-tolerant computing is possible. The threshold number — around 10^{-4} for the most permissive models, 10^{-2} for the tightest — is the headline engineering target for quantum-computing hardware.
Channel capacity
A classical channel has a capacity: the maximum number of bits per use at which information can be sent reliably (Shannon 1948, \log_2 the ratio of input alphabet to noise). A quantum channel has multiple capacities, because "information" can be classical bits, quantum states, or shared entanglement. The main ones:
- Classical capacity C: classical bits sent through a quantum channel per use.
- Quantum capacity Q: qubits of quantum state sent per use.
- Entanglement-assisted classical capacity C_E: classical bits per use, given free shared entanglement. Often dramatically larger than C.
Each has a formula in terms of the Kraus operators or Choi matrix (Holevo, Schumacher-Westmoreland, Lloyd-Shor-Devetak, Bennett-Shor-Smolin-Thapliyal). These are the analogues of Shannon's theorem and they tell you, for a specific noise model, how much you can do. The depolarizing channel's quantum capacity is zero for p > 1/4 (below some threshold the channel cannot transmit quantum information at all); the amplitude-damping channel's capacity has an intricate dependence on \gamma. These are active research questions.
Where this leads next
- Quantum Channels — the category of CPTP maps, their composition, inverses, and the full information-theoretic framework.
- Stinespring Dilation — a dedicated chapter on the unitary-plus-environment construction that underlies every Kraus representation.
- Bit-flip Channel, Depolarizing Channel, Amplitude Damping — each canonical noise channel gets a chapter of its own.
- Lindblad Master Equation — the continuous-time version of a Kraus channel.
- Evolution of Rho — the previous chapter, from which this one generalises.
- Partial Trace — the operation that carries Stinespring's unitary down to a channel on the system.
References
- Wikipedia, Quantum operation — definitions, Kraus form, Stinespring dilation, examples.
- Nielsen and Chuang, Quantum Computation and Quantum Information, §8.2 (the operator-sum representation) — Cambridge University Press.
- John Preskill, Lecture Notes on Quantum Computation, Ch. 3 — theory.caltech.edu/~preskill/ph229.
- John Watrous, The Theory of Quantum Information (2018), Ch. 2 — cs.uwaterloo.ca/~watrous/TQI.
- Wikipedia, Stinespring dilation theorem — the 1955 result that every channel lifts to a unitary on a larger system.
- Wikipedia, Kraus operators — summary of Kraus's 1971 theorem, including the non-uniqueness and minimal-Kraus structure.