A Preview of Density Matrices

Note: Company names, engineers, incidents, numbers, and scaling scenarios in this article are hypothetical — even when they resemble real ones. See the full disclaimer.

In short

Every quantum state so far has been a ket |\psi\rangle — a single, perfectly-known pure state. The density matrix \rho is the full description that works even when the state is not perfectly known. For a pure state, \rho = |\psi\rangle\langle\psi| — just the outer product, a rank-1 projector. For a mixed state — "with probability p_1 the state is |\psi_1\rangle, with probability p_2 the state is |\psi_2\rangle, and so on" — the density matrix is the convex combination \rho = \sum_i p_i |\psi_i\rangle\langle\psi_i|. Every measurement rule you knew still works: p(m) = \text{tr}(P_m \rho), expectation value \langle A\rangle = \text{tr}(A\rho). You need density matrices whenever there is noise, whenever you hold one half of an entangled pair, whenever you are unsure which pure state was prepared. The test for pure vs mixed is the purity, \text{tr}(\rho^2): equal to 1 for pure states, strictly less for mixed. This chapter is the preview; Part 13 will give the full treatment.

Every state you have written down in this track has been a ket. A single complex unit vector: |\psi\rangle = \alpha|0\rangle + \beta|1\rangle, |\Phi^+\rangle = \tfrac{1}{\sqrt{2}}(|00\rangle + |11\rangle), and so on. You apply unitaries to it, you measure it, you trace over one half of it. The ket is the object; the ket is the state.

That works beautifully for situations where the state is perfectly known. Perfect preparation, zero noise, no part of the system hidden from you. Which — if you are being honest with yourself — is a situation that never actually arises.

Here are three situations where the ket fails as a description.

You prepare a qubit in |0\rangle, but your control pulse has a small random phase drift. Half the time the state is |0\rangle; the other half it is some other state |\phi\rangle you don't know exactly. The true state of the qubit is not any single ket.
You and your friend share a Bell pair |\Phi^+\rangle = \tfrac{1}{\sqrt{2}}(|00\rangle + |11\rangle). Your friend takes their qubit to Chennai; you keep yours in Delhi. You want a description of your qubit alone. As the partial-trace chapter showed, there is no pure-state ket that describes just your half — the joint state is entangled.
An NMR quantum computer runs the same pulse sequence on 10^{17} molecules in parallel. Each molecule is a different physical qubit; thermal fluctuations mean they are not all in the same state. You want the single mathematical object that describes "what the ensemble looks like."

In all three cases, the ket formalism has run out. What you actually have is a distribution over kets — a probabilistic mixture, or a reduced piece of something entangled, or an average over microscopic configurations. The object that handles all of them at once is the density matrix. This chapter is the first proper look at it. Part 13 will give the full treatment; here you will build enough intuition to know when to reach for \rho and what it gives you.

The density matrix of a pure state

Start with the easy case, where the density matrix is just a dressed-up version of the ket.

Density matrix of a pure state

If |\psi\rangle is a pure state, its density matrix is the outer product

\rho \;=\; |\psi\rangle\langle\psi|.

This is a square matrix: the column |\psi\rangle times the row \langle\psi|. It contains exactly the same information as the ket (up to a physically unobservable global phase).

Concretely. For |\psi\rangle = \alpha|0\rangle + \beta|1\rangle,

\rho \;=\; |\psi\rangle\langle\psi| \;=\; \begin{pmatrix}\alpha \\ \beta\end{pmatrix}\begin{pmatrix}\alpha^* & \beta^*\end{pmatrix} \;=\; \begin{pmatrix}|\alpha|^2 & \alpha\beta^* \\ \alpha^*\beta & |\beta|^2\end{pmatrix}.

Why outer product, not inner: the inner product \langle\psi|\psi\rangle is a single complex number (in fact, 1 for a unit vector). The outer product |\psi\rangle\langle\psi| is a square matrix — a column times a row. You need a matrix to hold the information of a state in a form you can act on with operators.

Try it on the equator state |+\rangle = \tfrac{1}{\sqrt{2}}(|0\rangle + |1\rangle). The column is \tfrac{1}{\sqrt{2}}(1, 1)^T, the row is \tfrac{1}{\sqrt{2}}(1, 1), and

\rho_+ \;=\; |+\rangle\langle+| \;=\; \frac{1}{2}\begin{pmatrix}1 \\ 1\end{pmatrix}\begin{pmatrix}1 & 1\end{pmatrix} \;=\; \frac{1}{2}\begin{pmatrix}1 & 1 \\ 1 & 1\end{pmatrix}.

Every entry is 1/2. The diagonal entries — 1/2 and 1/2 — are the probabilities of measuring 0 and 1 in the computational basis (the Born rule, in a new language). The off-diagonal entries — also 1/2 — carry the phase relationship between the two amplitudes. If the state had been |-\rangle instead of |+\rangle, those off-diagonal entries would have been -1/2, not +1/2. The sign is where "this is a superposition, not a random mixture" lives.

The pure-state density matrix $\rho = |\psi\rangle\langle\psi|$ is a column times a row, producing a $2\times 2$ matrix. Diagonal entries are the probabilities $|\alpha|^2$, $|\beta|^2$. Off-diagonal entries carry the phase — the part that distinguishes a superposition from a statistical mixture.

Three properties of every density matrix

Whether pure or mixed, a density matrix \rho always satisfies three properties. Check them against the \rho_+ above.

Hermitian: \rho^\dagger = \rho. The conjugate transpose equals itself. For \rho_+, every entry is real (1/2), and the matrix is clearly symmetric, so yes.
Trace one: \text{tr}(\rho) = 1. Summing the diagonal gives 1. For \rho_+: \tfrac{1}{2} + \tfrac{1}{2} = 1.
Positive semidefinite: \langle\phi|\rho|\phi\rangle \geq 0 for every |\phi\rangle. Equivalently, all eigenvalues are \geq 0. The eigenvalues of \rho_+ are 1 and 0, both non-negative.

Why these three properties: Hermitian ensures the diagonal is real, so it can represent probabilities. Trace 1 means those probabilities sum to 1. Positive semidefinite means no "negative probabilities" are hiding on the off-diagonal. Together these three are the exact mathematical conditions a matrix has to satisfy to represent a valid quantum state — pure or mixed.

Mixed states — a convex combination

Now the reason density matrices exist at all.

Suppose you run an experiment in which, with probability p_1, your qubit was prepared in pure state |\psi_1\rangle; with probability p_2, in |\psi_2\rangle; and so on. The p_i are classical probabilities — they sum to 1, they are non-negative, and they represent genuine ignorance on your part about which pure state the qubit is actually in. (Maybe the preparation device has noise. Maybe someone else prepared it and rolled a die. Maybe you lost track.)

The right description of this situation is the mixed state

\rho \;=\; \sum_i p_i\, |\psi_i\rangle\langle\psi_i|.

A convex combination of pure-state density matrices, weighted by the classical probabilities.

Mixed-state density matrix

A mixed state with preparation ensemble \{(p_i, |\psi_i\rangle)\} has density matrix

\rho \;=\; \sum_i p_i\, |\psi_i\rangle\langle\psi_i|, \qquad p_i \geq 0, \qquad \sum_i p_i = 1.

This is not the same as the superposition \sum_i \sqrt{p_i}\,|\psi_i\rangle. A mixture is classical ignorance; a superposition is a quantum object. They behave differently under measurement.

The canonical example — the one you should burn into memory — is the 50-50 mixture of |0\rangle and |1\rangle. With probability 1/2 the state is |0\rangle; with probability 1/2 it is |1\rangle. The density matrix:

\rho_{\text{mix}} \;=\; \tfrac{1}{2}|0\rangle\langle 0| + \tfrac{1}{2}|1\rangle\langle 1| \;=\; \tfrac{1}{2}\begin{pmatrix}1 & 0 \\ 0 & 0\end{pmatrix} + \tfrac{1}{2}\begin{pmatrix}0 & 0 \\ 0 & 1\end{pmatrix} \;=\; \tfrac{1}{2}\begin{pmatrix}1 & 0 \\ 0 & 1\end{pmatrix} \;=\; \frac{I}{2}.

Why it is I/2 and not something more exotic: the two projectors |0\rangle\langle 0| and |1\rangle\langle 1| live on the diagonal, with 0 off-diagonal. Adding them with equal weights produces a diagonal matrix with 1/2 down the diagonal — which is exactly \tfrac{1}{2} times the identity matrix. No phase information survives, because the individual pure states |0\rangle and |1\rangle have no phases relative to each other.

The state I/2 is called the maximally mixed state. It represents complete classical ignorance: every basis you measure in gives 50-50 outcomes. It is the quantum analogue of "I know nothing about this qubit."

The maximally mixed state of a qubit is the convex combination $\tfrac{1}{2}|0\rangle\langle 0| + \tfrac{1}{2}|1\rangle\langle 1| = I/2$. The classical probabilities $p_i$ multiply the rank-1 projectors $|\psi_i\rangle\langle\psi_i|$, and the sum is a valid density matrix.

Mixture vs superposition — the identity you must burn in

This is the single most important distinction in this chapter. Compare:

Superposition |+\rangle = \tfrac{1}{\sqrt{2}}(|0\rangle + |1\rangle) has density matrix \rho_+ = \tfrac{1}{2}\begin{pmatrix}1 & 1 \\ 1 & 1\end{pmatrix}. Off-diagonal entries are 1/2.
Mixture \{(1/2, |0\rangle), (1/2, |1\rangle)\} has density matrix \rho_{\text{mix}} = \tfrac{1}{2}\begin{pmatrix}1 & 0 \\ 0 & 1\end{pmatrix}. Off-diagonal entries are 0.

Both give 50-50 when measured in the computational basis — the diagonals are identical, and the diagonals are the computational-basis probabilities.

But measure them in the plus-minus basis. Your hardware measures in the z-basis, so first you apply a Hadamard and then measure. Equivalently, compute \langle + | \rho | + \rangle and \langle - | \rho | - \rangle for each.

For the superposition \rho_+: \langle + | \rho_+ | + \rangle = 1, \langle - | \rho_+ | - \rangle = 0. You get + every time. No randomness at all.

For the mixture \rho_{\text{mix}}: \langle + | (I/2) | + \rangle = \tfrac{1}{2}, \langle - | (I/2) | - \rangle = \tfrac{1}{2}. You get 50-50, same as any other basis.

The off-diagonal entries — the ones that are 1/2 for the superposition and 0 for the mixture — are where the difference lives. They are called coherences, and they are the mathematical fingerprint of quantum superposition. A mixture has zero coherence; a pure state has maximum coherence (consistent with normalisation). Decoherence in a noisy qubit is literally the off-diagonal entries shrinking towards zero over time.

The purity test

Given a density matrix \rho, how do you tell if it is pure (represents a single ket) or mixed (a genuine probabilistic mixture)?

The cleanest test is the purity, defined as

\gamma(\rho) \;=\; \text{tr}(\rho^2).

Purity

The purity of a density matrix \rho on a d-dimensional space is

\gamma(\rho) \;=\; \text{tr}(\rho^2), \qquad \frac{1}{d} \leq \gamma(\rho) \leq 1.

It equals 1 if and only if \rho is pure; it equals 1/d (the minimum) for the maximally mixed state \rho = I/d.

Why pure states give trace one

Start with a pure state \rho = |\psi\rangle\langle\psi|. Then

Why \langle\psi|\psi\rangle = 1 drops out: the middle piece is the inner product of |\psi\rangle with itself, which is 1 because kets are unit vectors. Multiplying by 1 does nothing, so you are left with the original \rho.

Taking the trace, \text{tr}(\rho^2) = \text{tr}(\rho) = 1. So every pure state has purity 1.

Why the maximally mixed state gives 1/d

Now take \rho = I/d, the maximally mixed state on a d-dimensional space (for a qubit, d = 2 and \rho = I/2). Then

\rho^2 \;=\; \frac{I^2}{d^2} \;=\; \frac{I}{d^2},

and \text{tr}(\rho^2) = \text{tr}(I/d^2) = d/d^2 = 1/d.

For a single qubit this is 1/2. So I/2 has purity 1/2, half the maximum. Every mixed qubit state lies somewhere between 1/2 and 1 on the purity scale, with 1/2 being "maximally mixed" and 1 being "pure."

The purity $\text{tr}(\rho^2)$ ranks states on a single axis. A pure state sits at $1$. The maximally mixed state $I/2$ sits at $1/2$ (the minimum for a qubit). Every physical state lies somewhere between these extremes.

The purity test is one line of algebra — square the matrix, sum the diagonal — and it tells you the full story. This is why density matrices are the honest way to describe a quantum system: the "pure vs mixed" question has a clean, computable answer.

Partial traces give mixed states

Here is the moment density matrices stop being a formal dressing-up exercise and become the only sensible description.

Recall from the partial-trace chapter: if you hold one qubit of the Bell state |\Phi^+\rangle = \tfrac{1}{\sqrt{2}}(|00\rangle + |11\rangle) and want to describe your qubit alone, you compute \rho_A = \text{tr}_B(|\Phi^+\rangle\langle\Phi^+|). The result, computed there line by line, is

\rho_A \;=\; \tfrac{1}{2}|0\rangle\langle 0| + \tfrac{1}{2}|1\rangle\langle 1| \;=\; \frac{I}{2}.

The maximally mixed state. Your half of a Bell pair looks classically random — every measurement you do on your qubit alone gives 50-50 outcomes, regardless of basis.

But your qubit was never classically random. There is a perfectly pure, perfectly definite joint state |\Phi^+\rangle describing both halves together. The randomness appears only because you threw away half the information (the correlation with qubit B). Entanglement is the mechanism that turns a pure joint state into mixed reduced states.

This is why the partial trace is the moment density matrices become unavoidable: there is no single-qubit ket that describes your half of |\Phi^+\rangle. No unit vector in \mathbb{C}^2 gives 50-50 outcomes in every basis. Only a mixed state does. Density matrices are not optional here — they are the only object big enough to hold "my half of an entangled pair."

The measurement rules, restated in density-matrix form

Every rule you know for kets has a density-matrix version. Here they are, side by side.

Quantity	Ket form	Density-matrix form
Probability of outcome m	$p(m) = \langle\psi	P_m
Post-measurement state	$\dfrac{P_m	\psi\rangle}{\sqrt{p(m)}}$
Expectation value of A	$\langle A\rangle = \langle\psi	A
Time evolution by U	$	\psi\rangle \mapsto U

Why each density-matrix rule reduces to the ket rule for pure states: substitute \rho = |\psi\rangle\langle\psi|. The probability rule becomes \text{tr}(P_m|\psi\rangle\langle\psi|); cycle the trace to get \langle\psi|P_m|\psi\rangle. Same for the others. The density-matrix formulas are strict generalisations — they agree with the ket rules on pure states and extend to mixed ones.

In particular: every quantum computation you have ever written in terms of kets and unitaries can be rewritten in terms of density matrices and sandwich operations U\rho U^\dagger. The density-matrix formulation is equivalent to the ket formulation on pure states, and strictly more expressive on mixed states.

What genuinely needs density matrices

Four kinds of situations where kets fail and \rho is the only honest description.

Noisy preparation and decoherence. Real quantum hardware does not prepare perfect pure states. A laser pulse has phase jitter; a superconducting qubit sees its environment. Every noisy process — dephasing, amplitude damping, thermal noise — is described by a completely-positive, trace-preserving map on density matrices (a "quantum channel"). Kets cannot describe noise without averaging, and averaging produces a mixture, which is a density matrix.

One half of an entangled state. As you just saw, the partial trace over an entangled state produces a mixed state. If you hold one qubit of a Bell pair or one half of a three-particle GHZ state, the density-matrix description of your half is where the theory lives.

Ensembles and statistical mixtures. An NMR quantum computer runs the same pulse on about 10^{17} molecules in a test tube [1]. Thermal fluctuations mean the molecules are in a classical distribution over pure states. The measurement signal (a macroscopic magnetisation) depends on the ensemble-average density matrix \rho = \sum_i p_i |\psi_i\rangle\langle\psi_i|, where the p_i come from the Boltzmann distribution. Indian NMR labs at TIFR and IISc Bangalore have decades of experience extracting density matrices from ensemble readouts — a technique called quantum state tomography — and it is how the earliest experimental verifications of small quantum algorithms were done.

Open-system dynamics. A closed quantum system evolves by |\psi\rangle \mapsto U|\psi\rangle. A system coupled to an environment does not — it leaks information and accumulates noise. The honest way to describe this is the Lindblad master equation, written in terms of \rho, which you will meet in Part 13 on density matrices and open systems.

In each of these situations, insisting on ket descriptions forces you to lie about what is actually happening. The density matrix is the truthful object.

Worked examples

Example 1: Pure density matrix for $(|0⟩ + i|1⟩)/\sqrt 2$

Compute the density matrix of the pure state

|\psi\rangle \;=\; \frac{|0\rangle + i|1\rangle}{\sqrt{2}}

(which is the Bloch-sphere state |+i\rangle, sitting on the +y equator point). Verify that the resulting matrix is Hermitian, has trace 1, and has purity 1.

Step 1. Write |\psi\rangle as a column and \langle\psi| as a row.

|\psi\rangle \;=\; \frac{1}{\sqrt{2}}\begin{pmatrix}1 \\ i\end{pmatrix}, \qquad \langle\psi| \;=\; \frac{1}{\sqrt{2}}\begin{pmatrix}1 & -i\end{pmatrix}.

Why the second entry flips sign: the bra is the conjugate transpose of the ket. Complex conjugation turns i into -i. This is why Hermitian conjugates always flip imaginary parts.

Step 2. Multiply to form the outer product.

\rho \;=\; |\psi\rangle\langle\psi| \;=\; \frac{1}{2}\begin{pmatrix}1 \\ i\end{pmatrix}\begin{pmatrix}1 & -i\end{pmatrix} \;=\; \frac{1}{2}\begin{pmatrix}1\cdot 1 & 1\cdot(-i) \\ i\cdot 1 & i\cdot(-i)\end{pmatrix} \;=\; \frac{1}{2}\begin{pmatrix}1 & -i \\ i & 1\end{pmatrix}.

Why i\cdot(-i) = 1: the product i\cdot(-i) = -i^2 = -(-1) = 1. This is the standard identity that keeps diagonal entries of density matrices real and non-negative.

Step 3. Check Hermiticity: \rho^\dagger = \rho. The conjugate transpose first transposes (swap rows and columns), then conjugates every entry. Transposing gives \begin{pmatrix}1 & i \\ -i & 1\end{pmatrix}; conjugating sends i \to -i and -i \to i, producing \begin{pmatrix}1 & -i \\ i & 1\end{pmatrix}. Put the 1/2 back in front and you get \rho again, so \rho^\dagger = \rho. Hermitian. Why Hermiticity matters: diagonal entries are automatically real (because they equal their own conjugates). Any valid density matrix must have real diagonal probabilities, so Hermiticity is a built-in sanity check.

Step 4. Check the trace: \text{tr}(\rho) = \tfrac{1}{2}(1 + 1) = 1. Trace one — as required.

Step 5. Check the purity: \text{tr}(\rho^2) \stackrel{?}{=} 1. Compute

\rho^2 \;=\; \frac{1}{4}\begin{pmatrix}1 & -i \\ i & 1\end{pmatrix}\begin{pmatrix}1 & -i \\ i & 1\end{pmatrix} \;=\; \frac{1}{4}\begin{pmatrix}1 + (-i)(i) & 1\cdot(-i) + (-i)\cdot 1 \\ i\cdot 1 + 1\cdot i & i\cdot(-i) + 1\end{pmatrix}.

The top-left entry is \tfrac{1}{4}(1 + 1) = \tfrac{1}{2}. The top-right is \tfrac{1}{4}(-i - i) = -i/2. The bottom-left is \tfrac{1}{4}(2i) = i/2. The bottom-right is \tfrac{1}{2}. So

\rho^2 \;=\; \frac{1}{2}\begin{pmatrix}1 & -i \\ i & 1\end{pmatrix} \;=\; \rho.

The square equals \rho itself — exactly what a pure state does. Taking the trace, \text{tr}(\rho^2) = 1. Pure. Why \rho^2 = \rho for pure states: the identity |\psi\rangle\langle\psi|\cdot|\psi\rangle\langle\psi| = |\psi\rangle\langle\psi| follows because the inner product \langle\psi|\psi\rangle = 1 collapses the middle. Rank-1 projectors are always idempotent.

Result. \rho = \tfrac{1}{2}\begin{pmatrix}1 & -i \\ i & 1\end{pmatrix}. Hermitian, trace 1, purity 1, rank 1 — a valid pure-state density matrix for |+i\rangle.

The pure state $|+i\rangle$ sits on the surface of the Bloch sphere. Every pure state lives on the sphere's surface; the density matrix has purity $1$.

What this shows. For a pure state, \rho = |\psi\rangle\langle\psi| is a rank-1 Hermitian matrix with trace 1 and purity 1. Its diagonal gives computational-basis probabilities; its off-diagonals carry the phase information that marks it as a genuine superposition rather than a mixture.

Example 2: A mixture that is NOT a superposition

Form the density matrix of the following preparation: with probability \tfrac{1}{2} the qubit was prepared in |0\rangle, with probability \tfrac{1}{2} in |+\rangle = \tfrac{1}{\sqrt{2}}(|0\rangle + |1\rangle). Then compare it with the density matrix of the superposition \tfrac{1}{2}|0\rangle + \tfrac{1}{2}|+\rangle (unnormalised; let's fix normalisation below) to show they are genuinely different objects.

Step 1. Write the two pure-state density matrices.

|0\rangle\langle 0| \;=\; \begin{pmatrix}1 & 0 \\ 0 & 0\end{pmatrix}, \qquad |+\rangle\langle+| \;=\; \tfrac{1}{2}\begin{pmatrix}1 & 1 \\ 1 & 1\end{pmatrix}.

Why two different matrices: |0\rangle is a pole of the Bloch sphere; |+\rangle is on the equator. They are distinct pure states, and their density matrices are distinct rank-1 projectors.

Step 2. Form the convex combination with weights (1/2, 1/2).

\rho_{\text{mix}} \;=\; \tfrac{1}{2}|0\rangle\langle 0| + \tfrac{1}{2}|+\rangle\langle+| \;=\; \tfrac{1}{2}\begin{pmatrix}1 & 0 \\ 0 & 0\end{pmatrix} + \tfrac{1}{4}\begin{pmatrix}1 & 1 \\ 1 & 1\end{pmatrix} \;=\; \begin{pmatrix}3/4 & 1/4 \\ 1/4 & 1/4\end{pmatrix}.

Why the top-left entry is 3/4 and not 1/2: the mixture has two "sources" for getting outcome 0 — the |0\rangle component contributes 1\cdot 1/2, and the |+\rangle component contributes 1/2 \cdot 1/2 = 1/4. Together that is 3/4.

Step 3. Now form the density matrix of the superposition

|\chi\rangle \;=\; \mathcal N\cdot(|0\rangle + |+\rangle) \;=\; \mathcal N\cdot\left(|0\rangle + \tfrac{1}{\sqrt{2}}(|0\rangle + |1\rangle)\right) \;=\; \mathcal N\cdot\left(\tfrac{\sqrt{2}+1}{\sqrt{2}}|0\rangle + \tfrac{1}{\sqrt{2}}|1\rangle\right).

Compute the normalisation \mathcal N. The squared norm of the unnormalised vector is \tfrac{(\sqrt{2}+1)^2 + 1}{2} = \tfrac{3 + 2\sqrt{2} + 1}{2} = 2 + \sqrt{2}, so \mathcal N = 1/\sqrt{2 + \sqrt{2}}. Then

|\chi\rangle \;=\; \frac{1}{\sqrt{2(2+\sqrt{2})}}\begin{pmatrix}\sqrt{2}+1 \\ 1\end{pmatrix}.

The amplitudes are \alpha = (\sqrt{2}+1)/\sqrt{2(2+\sqrt{2})} and \beta = 1/\sqrt{2(2+\sqrt{2})}. Why we had to normalise: the sum |0\rangle + |+\rangle is not a unit vector, so dividing by the norm is necessary before calling it a quantum state. Without normalisation, the Born rule doesn't give probabilities summing to 1.

Step 4. Form the superposition's density matrix \rho_{\text{sup}} = |\chi\rangle\langle\chi|. The diagonal entries are |\alpha|^2 = (\sqrt{2}+1)^2/(2(2+\sqrt{2})) = (3 + 2\sqrt{2})/(4 + 2\sqrt{2}); simplify by multiplying top and bottom by (2 - \sqrt{2}) or just evaluate numerically: |\alpha|^2 \approx 0.854, |\beta|^2 \approx 0.146. The key point is that these are different from the mixture's diagonal entries (3/4, 1/4).

More importantly: the off-diagonal entry of \rho_{\text{sup}} is \alpha\beta^* \approx 0.354, not 1/4. The superposition has different coherences than the mixture.

Step 5. Compute the purity of each. For the mixture:

\rho_{\text{mix}}^2 \;=\; \begin{pmatrix}3/4 & 1/4 \\ 1/4 & 1/4\end{pmatrix}^2 \;=\; \begin{pmatrix}10/16 & 4/16 \\ 4/16 & 2/16\end{pmatrix} \;=\; \begin{pmatrix}5/8 & 1/4 \\ 1/4 & 1/8\end{pmatrix}

with trace 5/8 + 1/8 = 6/8 = 3/4. So \text{tr}(\rho_{\text{mix}}^2) = 3/4 — less than 1. Genuinely mixed.

For the superposition \rho_{\text{sup}}, a direct calculation (or the shortcut "all pure states have purity 1") gives \text{tr}(\rho_{\text{sup}}^2) = 1. Pure.

Result. \rho_{\text{mix}} = \begin{pmatrix}3/4 & 1/4 \\ 1/4 & 1/4\end{pmatrix} with purity 3/4; the superposition (|0\rangle + |+\rangle)/\mathcal N^{-1} has a different matrix with purity 1. They are different quantum objects — a fact no ket-only description could ever express.

The mixture $\tfrac{1}{2}|0\rangle\langle 0| + \tfrac{1}{2}|+\rangle\langle+|$ sits **inside** the Bloch ball — not on the surface — because its purity is less than $1$. The superposition $(|0\rangle + |+\rangle)/\mathcal N$ sits on the surface, because its purity is $1$. Pure states live on the sphere; mixed states live strictly inside.

What this shows. "Average the states" and "add the states" are fundamentally different operations. The mixture is a classical probabilistic combination (purity <1, sits inside the Bloch ball). The superposition is a quantum linear combination (purity =1, sits on the surface). A density-matrix description makes the distinction visible on the page; ket notation alone could not even represent the mixture.

Common confusions

"\rho is a state vector." No. A state vector is a ket, a column of complex numbers. A density matrix is a matrix — a square grid of complex numbers, built by taking an outer product (for pure states) or a convex combination of outer products (for mixed states). You cannot add a density matrix to a ket, and you cannot apply a unitary to a density matrix by writing U\rho; the correct operation is U\rho U^\dagger, the sandwich.
"Mixed just means superposed." The single sharpest error in this chapter. A superposition \tfrac{1}{\sqrt{2}}(|0\rangle + |1\rangle) = |+\rangle is a pure state with a coherent phase relationship between its two components. A mixture \tfrac{1}{2}|0\rangle\langle 0| + \tfrac{1}{2}|1\rangle\langle 1| = I/2 is classical ignorance about which pure state was prepared. They give different outcomes when measured in the x-basis — the superposition always gives +, the mixture gives 50-50. Off-diagonal entries of \rho are the diagnostic: present (superposition) or absent (mixture).
"A mixed state means the qubit is entangled with something." Usually yes, but not necessarily. If you hold one half of a Bell pair, your reduced state is I/2 — mixed because of entanglement with Bob. If you have a noisy qubit that is 50-50 in |0\rangle versus |1\rangle from classical thermal noise (no hidden partner), your state is also I/2 — mixed for a completely different reason. The density matrix I/2 cannot tell the two sources apart. The purification theorem (coming in Part 13) says every mixed state can be written as a pure state on a larger system by adding a fictional partner, but whether that partner actually exists in your lab is a separate physical question.
"Purity below 1 means broken." No. A purity below 1 just means the state is not perfectly known to you, or is coupled to something else. Every real quantum system has purity strictly less than 1 once you include environmental noise. The maximally mixed state has purity 1/d, which is the minimum — and it is a perfectly valid state, just one carrying no information.
"I/2 is not a real state because it is just the identity." I/2 is a perfectly valid density matrix. It is Hermitian, has trace 1, and has non-negative eigenvalues (1/2 and 1/2). It represents maximum uncertainty about a qubit — every measurement in every basis gives 50-50. You can prepare it: flip a fair classical coin, and prepare |0\rangle or |1\rangle accordingly. Alternatively: take one half of any maximally entangled two-qubit state.
"Adding density matrices needs to be done carefully." True — but only classically. A convex combination \rho = p_1\rho_1 + p_2\rho_2 with p_1, p_2 \geq 0 and p_1 + p_2 = 1 is a valid density matrix. But an unweighted sum \rho_1 + \rho_2 (trace 2, not 1) is not a density matrix. The "convex" part of "convex combination" is non-negotiable.

Going deeper

If you are just here to know why density matrices exist and what they compute, you have it — pure states are |\psi\rangle\langle\psi|, mixed states are convex combinations, every measurement rule becomes a trace. The rest of this section previews what Part 13 will develop: the Bloch-vector representation of a qubit density matrix, the non-uniqueness of the ensemble decomposition, the Kraus operator picture of noise, and the deep connection to classical statistical mechanics.

The full treatment — Part 13

Part 13 of this track is the full density-matrix formalism: the definition axioms (Hermitian, trace 1, positive semidefinite, in that order), the full set of measurement rules, the no-signalling theorem (mixed states can't carry classical information faster than light even in the presence of entanglement), and the characterisation of quantum channels as completely positive trace-preserving maps. Everything in the current chapter is a shadow of that treatment. Use this chapter to build the picture; when you hit Part 13, you will see the same ideas laid out with full rigour.

Bloch-vector representation

For a single qubit, every density matrix has a beautiful geometric parameterisation:

\rho \;=\; \frac{I + \vec{r}\cdot\vec{\sigma}}{2} \;=\; \frac{1}{2}\begin{pmatrix}1 + r_z & r_x - i r_y \\ r_x + i r_y & 1 - r_z\end{pmatrix},

where \vec{\sigma} = (\sigma_x, \sigma_y, \sigma_z) are the three Pauli matrices (see the next chapter) and \vec{r} = (r_x, r_y, r_z) is a real 3-vector with |\vec{r}| \leq 1. This is called the Bloch vector of the state.

The magic: |\vec{r}| = 1 iff \rho is pure (the state sits on the surface of the Bloch sphere), and |\vec{r}| < 1 iff \rho is mixed (the state sits strictly inside the Bloch sphere, in the Bloch ball). The Bloch sphere you already know — where pure states live — is just the surface of the full three-dimensional ball of density matrices.

In this parameterisation: the maximally mixed state I/2 corresponds to \vec{r} = 0, the centre of the ball. Pure states of |0\rangle, |1\rangle, |+\rangle, |-\rangle, |+i\rangle, |-i\rangle correspond to the six unit vectors (\pm\hat z, \pm\hat x, \pm\hat y) on the surface. The state you used in Example 2's mixture sits inside the ball at a specific interior point.

The purity has a beautiful Bloch-vector form:

\text{tr}(\rho^2) \;=\; \frac{1 + |\vec{r}|^2}{2}.

Pure states (|\vec{r}| = 1) give purity 1. Maximally mixed (|\vec{r}| = 0) give purity 1/2. Smoothly interpolating in between.

This is Part 13's workhorse picture. The full ball — not just the sphere — is the state space of one qubit, once you admit mixed states.

Non-uniqueness of the ensemble

One of the deepest facts about mixed states: the preparation ensemble is not unique. The same density matrix can arise from many different classical distributions.

Example: \rho = I/2 can be prepared as

\tfrac{1}{2}|0\rangle\langle 0| + \tfrac{1}{2}|1\rangle\langle 1| (flip a fair coin, prepare |0\rangle or |1\rangle), or
\tfrac{1}{2}|+\rangle\langle+| + \tfrac{1}{2}|-\rangle\langle-| (flip a coin, prepare |+\rangle or |-\rangle), or
\tfrac{1}{4}|0\rangle\langle 0| + \tfrac{1}{4}|1\rangle\langle 1| + \tfrac{1}{4}|+\rangle\langle+| + \tfrac{1}{4}|-\rangle\langle-|, or infinitely many others.

All three preparations produce the exact same density matrix I/2, which means all three are physically indistinguishable by any measurement on the single qubit. The classical distribution behind the mixture is not an observable — only the matrix \rho is.

This non-uniqueness is the mathematical content of the statement that mixed states are equivalence classes of preparation procedures. The density matrix erases the distinction between "classical randomness this way" and "classical randomness that way," as long as both give the same measurement statistics.

Kraus operators — a preview

Quantum noise is described by the Kraus operator-sum formalism: a noisy process acting on a state \rho transforms it as

\rho \;\mapsto\; \sum_k K_k\, \rho\, K_k^\dagger,

where the operators \{K_k\} satisfy \sum_k K_k^\dagger K_k = I. Examples you will meet: the bit-flip channel K_0 = \sqrt{1-p}\,I, K_1 = \sqrt{p}\,X (with probability p, flip the qubit). The amplitude-damping channel models energy loss in a qubit falling from |1\rangle to |0\rangle. The depolarising channel sends every state toward I/2 at a rate.

None of these maps takes kets to kets. They take density matrices to density matrices. A ket goes in; a density matrix, possibly mixed, comes out — because noise is a channel into which information is lost. This is the formal language of real hardware.

The classical analogue

Before quantum mechanics, statistical mechanics had already worked out the mathematics of ensembles. The 1920 Saha ionisation equation, written by Meghnad Saha in Calcutta, computes the thermal equilibrium of ionised atoms in stellar atmospheres as a classical probability distribution over energy states: "with probability p_i \propto e^{-E_i/kT}, the atom is in state i." Swap "state" for "quantum state" and "probability" for "classical weight," and Saha's ensemble is a diagonal density matrix \rho = \sum_i p_i |i\rangle\langle i| — a density matrix with zero off-diagonal entries, which is the purely-classical corner of the quantum state space.

Quantum density matrices generalise classical ensembles by allowing off-diagonal entries — coherences — that classical distributions cannot have. A diagonal density matrix is a classical mixture; a non-diagonal one is quantum. The density-matrix formalism thus unifies classical statistical mechanics and quantum mechanics in one formal object. This is why Kolmogorov's classical probability theory is a "commutative" special case of the non-commutative probability theory induced by quantum observables.

Indian NMR quantum computing experiments at TIFR and IISc in the early 2000s [4] used this picture directly: the magnetic moments of the 10^{17} nuclear spins in a sample behave as a classical Boltzmann ensemble, but the quantum algorithms run on their deviation density matrix — the off-diagonal part that carries genuine quantum coherence. Reading out the result meant doing quantum state tomography: measuring enough expectation values to reconstruct \rho.

This is where density matrices stop being a formal preview and become the daily working object of experimental quantum computing.

Where this leads next

Pauli X, Y, Z — the three Pauli matrices, which form the basis for the Bloch-vector representation of a qubit density matrix.
Density matrices — the full introduction (Part 13) — the rigorous definition, the three axioms, and the full set of measurement rules.
Decoherence — an introduction — how the off-diagonal entries of \rho shrink as a qubit interacts with its environment.
Quantum channels — the Kraus operator formalism; noisy processes as maps on density matrices.
The Bloch sphere — the surface where pure states live; the Bloch ball is the full state space of mixed states.
The partial trace — where density matrices first became unavoidable.

References

Nielsen and Chuang, Quantum Computation and Quantum Information (2010), §2.4 (the density operator) — Cambridge University Press.
John Preskill, Lecture Notes on Quantum Computation, Ch. 2 (foundations of quantum theory) — theory.caltech.edu/~preskill/ph229.
Wikipedia, Density matrix — definition, properties, and the Bloch-vector picture.
Wikipedia, Nuclear magnetic resonance quantum computer — the ensemble-density-matrix interpretation used in early experimental QC at Indian institutions.
John Watrous, The Theory of Quantum Information (2018), Ch. 2 — cs.uwaterloo.ca/~watrous/TQI.
Qiskit Textbook, Density matrices and mixed states — worked computations and code.