Purification — padho-wiki

Note: Company names, engineers, incidents, numbers, and scaling scenarios in this article are hypothetical — even when they resemble real ones. See the full disclaimer.

In short

Every mixed state \rho on a Hilbert space \mathcal H is secretly the marginal of some pure state |\psi\rangle on a larger space \mathcal H \otimes \mathcal H' — that is, \rho = \text{tr}_{\mathcal H'}(|\psi\rangle\langle\psi|). The larger space \mathcal H' (the "purifying ancilla") can always be chosen with \dim\mathcal H' \geq \text{rank}(\rho). The explicit construction uses the spectral decomposition \rho = \sum_i p_i |u_i\rangle\langle u_i| and builds |\psi\rangle = \sum_i \sqrt{p_i}\,|u_i\rangle \otimes |u_i\rangle — a pure state whose reduction on the first factor is \rho. Purifications are not unique: |\psi\rangle and (I \otimes U)|\psi\rangle have the same reduction for any unitary U on \mathcal H'. Uhlmann's theorem sharpens this: the fidelity between two density operators equals the maximum overlap of their purifications on a shared extended space, F(\rho, \sigma) = \max |\langle\psi_\rho|\psi_\sigma\rangle|. Philosophically: "The Church of the Larger Hilbert Space" — every mixed state becomes pure once you include its purifying ancilla. This reframing underpins security proofs in quantum cryptography, channel-capacity theorems in quantum information, and the Stinespring dilation theorem for quantum channels.

In chapter 104 you met mixed states in three guises. The classical mixture: you prepared |0\rangle with probability 1/2 and |1\rangle with probability 1/2, and forgot which. The partial measurement: you measured one half of a joint state and threw away the classical outcome. The entangled subsystem: you traced out the other half of a Bell pair and were left with I/2. Three scenarios, three different origins, one common mathematical object — the density matrix \rho = I/2 in each case.

The purification theorem says those three origins are secretly one. Every mixed state — no matter how it was produced — can be viewed as the reduction of some pure state on a larger Hilbert space. The classical uncertainty that looked like an intrinsic feature of the preparation can be rewritten as entanglement with a fictional ancilla you chose to ignore. The mixed state is not the fundamental object; it is the shadow on \mathcal H of a pure state on \mathcal H \otimes \mathcal H'.

This is more than a mathematical curiosity. It is the most powerful proof technique in quantum information theory — the trick that unifies quantum cryptography, channel-capacity theorems, open-system dynamics, and the Stinespring dilation of quantum channels. If you learn one "conversion move" in this subject, let it be this one: whenever you have a mixed state, replace it with its purification and work on the larger pure-state space. Almost every cleaner proof in the field starts this way.

The theorem

Purification theorem

Let \rho be a density operator on a Hilbert space \mathcal H of dimension d. Then there exists an ancilla Hilbert space \mathcal H' (with \dim\mathcal H' \geq \text{rank}(\rho)) and a pure state |\psi\rangle_{\mathcal H \otimes \mathcal H'} such that

\rho \;=\; \text{tr}_{\mathcal H'}\bigl(|\psi\rangle\langle\psi|\bigr).

Any such |\psi\rangle is called a purification of \rho. Purifications are not unique: if |\psi\rangle is one, so is (I_{\mathcal H} \otimes U)|\psi\rangle for any unitary U on \mathcal H', and every purification of \rho on the same \mathcal H \otimes \mathcal H' arises this way.

Minimum ancilla size: \dim\mathcal H' \geq \text{rank}(\rho) is the tight bound. If \rho is rank r you need at least an r-dimensional ancilla; the ancilla can be larger (nothing stops you from adding unused dimensions), but anything smaller cannot purify \rho because the rank of a reduction equals the Schmidt rank of its purification.

The purification picture: a mixed state $\rho$ on $\mathcal H$ is the reduction of some pure state $|\psi\rangle$ on a larger $\mathcal H \otimes \mathcal H'$. The larger state is always available, whether or not the ancilla $\mathcal H'$ is physically real.

The explicit construction

You want a pure state |\psi\rangle \in \mathcal H \otimes \mathcal H' whose reduction on \mathcal H is a given \rho. Here is the construction — short, mechanical, and universal.

Step 1: spectral-decompose \rho

Every density operator admits a spectral decomposition:

\rho \;=\; \sum_{i=1}^{r} p_i\,|u_i\rangle\langle u_i|,

where \{|u_i\rangle\} is an orthonormal set of eigenvectors in \mathcal H, the p_i > 0 are the non-zero eigenvalues, \sum_i p_i = 1, and r = \text{rank}(\rho). Why the rank is the number of strictly positive eigenvalues: zero eigenvalues correspond to directions \rho doesn't touch; they can be ignored in the sum without losing any information about \rho.

Step 2: pick an ancilla and an orthonormal basis

Take \mathcal H' to be any Hilbert space of dimension \geq r. The simplest choice: \mathcal H' = \mathcal H itself (then \dim\mathcal H' = d \geq r automatically). Fix any orthonormal basis \{|e_i\rangle\}_{i=1}^r of an r-dimensional subspace of \mathcal H'.

Step 3: write the purification

Define

|\psi\rangle \;=\; \sum_{i=1}^r \sqrt{p_i}\,|u_i\rangle_{\mathcal H} \otimes |e_i\rangle_{\mathcal H'}.

This is a unit-norm vector in \mathcal H \otimes \mathcal H'. Check normalisation:

\langle\psi|\psi\rangle \;=\; \sum_{i,j} \sqrt{p_i p_j}\,\langle u_j|u_i\rangle\langle e_j|e_i\rangle \;=\; \sum_{i,j}\sqrt{p_i p_j}\,\delta_{ij}\delta_{ij} \;=\; \sum_i p_i \;=\; 1. \checkmark

Why both inner products collapse to Kronecker deltas: \{|u_i\rangle\} and \{|e_i\rangle\} are orthonormal sets, so \langle u_j|u_i\rangle = \delta_{ij} and \langle e_j|e_i\rangle = \delta_{ij}. The double delta forces i = j, leaving \sum_i p_i, which is 1 because \rho has unit trace.

Step 4: verify that \text{tr}_{\mathcal H'}(|\psi\rangle\langle\psi|) = \rho

Expand the outer product:

|\psi\rangle\langle\psi| \;=\; \sum_{i,j} \sqrt{p_i p_j}\,|u_i\rangle\langle u_j|_{\mathcal H} \otimes |e_i\rangle\langle e_j|_{\mathcal H'}.

Apply the outer-product rule for partial trace (\text{tr}_{\mathcal H'}(X \otimes Y) = X \cdot \text{tr}(Y) on tiles):

\text{tr}_{\mathcal H'}(|\psi\rangle\langle\psi|) \;=\; \sum_{i,j} \sqrt{p_i p_j}\,|u_i\rangle\langle u_j|\cdot\text{tr}(|e_i\rangle\langle e_j|) \;=\; \sum_{i,j}\sqrt{p_i p_j}\,|u_i\rangle\langle u_j|\cdot\delta_{ij}

=\; \sum_i p_i\,|u_i\rangle\langle u_i| \;=\; \rho. \checkmark

Why the trace of the ancilla factor collapses to \delta_{ij}: \text{tr}(|e_i\rangle\langle e_j|) = \langle e_j|e_i\rangle = \delta_{ij} — the Kronecker delta kills every off-diagonal term in the double sum, leaving only the diagonal contributions.

The reduction is exactly \rho. You have purified it.

The three-step recipe: spectral-decompose $\rho$, pick an ancilla basis, then form the purification as the amplitude-matched superposition $\sum_i \sqrt{p_i}|u_i\rangle|e_i\rangle$. Its reduction on $\mathcal H$ is the original $\rho$.

Minimum-rank purification

The construction above uses r = \text{rank}(\rho) ancilla dimensions — the minimum. If you add dummy dimensions to \mathcal H' (beyond the ones indexed in the sum), the purification still works, you just have unused orthogonal directions in the ancilla. The minimum bound is tight: if \dim\mathcal H' < r, no |\psi\rangle can purify \rho, because the Schmidt rank of any bipartite pure state is bounded by the smaller subsystem dimension, and the Schmidt rank of |\psi\rangle equals the rank of both reductions.

For the maximally mixed qubit \rho = I/2 on \mathbb C^2, rank is 2, so the minimum ancilla is \mathbb C^2 — another qubit. Purification on two qubits: |\psi\rangle = \tfrac{1}{\sqrt 2}(|0\rangle|0\rangle + |1\rangle|1\rangle) = |\Phi^+\rangle. The Bell state is the minimum-ancilla purification of the maximally mixed qubit. Worked in Example 1 below.

Non-uniqueness

Purifications are not unique. Two sources of non-uniqueness:

Ancilla unitaries

If |\psi\rangle \in \mathcal H \otimes \mathcal H' purifies \rho, so does (I_{\mathcal H} \otimes U)|\psi\rangle for every unitary U on \mathcal H'. Check:

\text{tr}_{\mathcal H'}\bigl((I \otimes U)|\psi\rangle\langle\psi|(I \otimes U^\dagger)\bigr) \;=\; \text{tr}_{\mathcal H'}\bigl(|\psi\rangle\langle\psi|\bigr) \;=\; \rho,

because the cyclic property of the partial trace lets you commute U and U^\dagger through the ancilla trace, collapsing them to U^\dagger U = I. Why ancilla unitaries don't change \rho_A: from the no-signalling corollary of the partial trace, anything Bob does to his ancilla (including applying a unitary) leaves Alice's reduction unchanged. Ancilla unitaries are "inside" the tracing, and they evaporate.

Ancilla size

You can always enlarge \mathcal H' by tensoring on extra dimensions. If |\psi\rangle \in \mathcal H \otimes \mathcal H' purifies \rho, then |\psi\rangle \otimes |0\rangle_{\mathcal H''} on \mathcal H \otimes \mathcal H' \otimes \mathcal H'' also purifies \rho. Different Hilbert space, same marginal on \mathcal H.

The uniqueness statement

Given a fixed \mathcal H', all purifications on \mathcal H \otimes \mathcal H' are related by ancilla unitaries. More formally: if |\psi_1\rangle, |\psi_2\rangle \in \mathcal H \otimes \mathcal H' both purify the same \rho, there is a unitary U on \mathcal H' with |\psi_2\rangle = (I_{\mathcal H} \otimes U)|\psi_1\rangle. This is the most useful uniqueness result: in practice, you pick any convenient purification and know that every other purification (on the same ancilla) differs by a rotation of the ancilla alone.

Non-uniqueness: infinitely many pure states $|\psi_i\rangle$ on $\mathcal H \otimes \mathcal H'$ all reduce to the same $\rho$ on $\mathcal H$. They are all related by ancilla unitaries $(I \otimes U)$ acting on the second factor. The reduction is invariant.

Uhlmann's theorem — purifications and fidelity

One of the deepest consequences of the purification construction is Uhlmann's theorem, which connects the operator-level distance between two mixed states to a pure-state overlap on a shared extended space.

Uhlmann's theorem

For any two density operators \rho, \sigma on \mathcal H, their fidelity is

F(\rho, \sigma) \;=\; \text{tr}\sqrt{\sqrt{\rho}\,\sigma\,\sqrt{\rho}}.

Uhlmann's theorem states:

F(\rho, \sigma) \;=\; \max_{|\psi_\rho\rangle, |\psi_\sigma\rangle}\,|\langle\psi_\rho|\psi_\sigma\rangle|,

where the maximum is over all purifications |\psi_\rho\rangle, |\psi_\sigma\rangle on any shared extended Hilbert space \mathcal H \otimes \mathcal H'. The theorem reduces to F(\rho, \sigma)^2 = |\langle\psi|\phi\rangle|^2 for pure states, the familiar overlap.

Why Uhlmann's theorem is beautiful

The fidelity F(\rho, \sigma) is an operator-level quantity — it requires taking a matrix square root, multiplying, tracing, and taking another square root. Messy. Uhlmann's theorem rewrites it as a geometric quantity: the maximum inner product of two pure states on a larger space. Pure-state inner products are trivial to compute (just \langle\psi|\phi\rangle, a complex number); the mixed-state fidelity inherits all of pure-state geometry once you purify.

This is the power move: whenever you need to bound or compute a distance between mixed states, purify and work with pure-state overlaps. Security proofs in quantum cryptography (BB84, QKD), channel-capacity theorems in quantum information theory (coding theorems, converse bounds), and the triangle inequality for trace distance all use this pattern.

The "same extended space" clause

Uhlmann's theorem requires both purifications to live in the same \mathcal H \otimes \mathcal H', with the ancilla dimension large enough for both (so \dim\mathcal H' \geq \max(\text{rank}\rho, \text{rank}\sigma)). Then the maximum overlap — achieved by choosing the right ancilla unitary — is exactly the fidelity. If you only allow one fixed pair of purifications, the overlap can be smaller; the theorem says you get the actual fidelity by optimising.

Uhlmann's formula in practice

For computing fidelity, Uhlmann's characterisation is rarely faster than the direct formula. But for proving things about fidelity — symmetry F(\rho, \sigma) = F(\sigma, \rho), multiplicativity under tensor products, the data-processing inequality F(\mathcal E(\rho), \mathcal E(\sigma)) \geq F(\rho, \sigma) — purification-based arguments are almost always cleaner than the operator-theoretic ones. The Fuchs-van de Graaf inequalities 1 - F \leq T \leq \sqrt{1 - F^2} (relating fidelity and trace distance) are easiest to prove via purification as well.

The Church of the Larger Hilbert Space

"The Church of the Larger Hilbert Space" (the phrase is John Smolin's, circulated among quantum information theorists in the 1990s) is the working philosophy that follows from the purification theorem. Its central claim:

Every mixed state is secretly a pure state. Every noisy channel is secretly a unitary. Every open-system dynamics is secretly closed-system dynamics on a larger space. Whenever you see mixture, noise, or non-unitarity, add the ancilla that explains it, and proceed as if everything were clean.

This is not a statement about nature — it is a statement about mathematics. The "ancilla" may or may not be a real physical system. For a classically mixed qubit (the coin-flip preparation of chapter 104), the ancilla is fictional: the purification puts the qubit in an entangled state with a ghost "coin qubit" that doesn't exist in the lab. For a decohering qubit coupled to a bath of photons, the ancilla is physical: it's the photon modes of the electromagnetic field. The theorem treats both cases identically, and that is its power — one mathematical picture serves every physical origin.

Three proof-technique wins from purification

Quantum channels as unitaries. The Stinespring dilation theorem says every quantum channel \mathcal E (CPTP map) can be written as \mathcal E(\rho) = \text{tr}_E(U(\rho \otimes |0\rangle\langle 0|_E)U^\dagger) for some ancilla E and unitary U on the combined space. This is the channel-level version of the Church of the Larger Hilbert Space: every noisy operation is a unitary operation followed by partial-tracing out the environment. Once you believe this, you can analyse noisy channels using only unitary mathematics — the partial trace does all the "noise" work at the end.
Security proofs in QKD. In a quantum key distribution protocol, the eavesdropper Eve's most general strategy is modelled as follows: Eve holds a purification of everything Alice and Bob don't control. By making Eve's state the purifying ancilla, you can bound her information without having to enumerate every possible eavesdropping attack — a single purification captures all of them. The security proof of BB84 (Bennett-Brassard 1984) is much cleaner in the purification picture than in the original measurement-by-measurement analysis.
Channel capacity theorems. The coherent information — the main quantity in quantum channel coding — is most naturally defined via purifications: I_c(\rho, \mathcal E) = S(\mathcal E(\rho)) - S(\mathcal E_c(\rho)), where \mathcal E_c is the channel's complementary channel, obtainable from the Stinespring dilation by tracing out the system and keeping the environment. The purification picture makes the complementary channel natural and the proofs of capacity theorems tractable.

Worked examples

Example 1: Purifying the maximally mixed qubit

Purify the maximally mixed qubit state \rho = I/2 on \mathcal H = \mathbb C^2. Show that the Bell state |\Phi^+\rangle = \tfrac{1}{\sqrt 2}(|00\rangle + |11\rangle) is a purification, and verify by computing the partial trace.

Step 1. Spectral-decompose \rho. In the computational basis:

\rho \;=\; \tfrac{1}{2} I \;=\; \tfrac{1}{2}|0\rangle\langle 0| + \tfrac{1}{2}|1\rangle\langle 1|.

Eigenvalues p_0 = p_1 = 1/2, eigenvectors |u_0\rangle = |0\rangle and |u_1\rangle = |1\rangle. Rank is 2. Why the eigenbasis is the computational basis: I/2 is proportional to the identity, and the identity has every orthonormal basis as an eigenbasis. The choice is arbitrary; the computational basis is convenient.

Step 2. Choose an ancilla of dimension \geq 2. The minimum choice: \mathcal H' = \mathbb C^2 — another qubit. Use the same computational basis: |e_0\rangle = |0\rangle, |e_1\rangle = |1\rangle.

Step 3. Build the purification by the formula:

|\psi\rangle \;=\; \sqrt{p_0}\,|u_0\rangle|e_0\rangle + \sqrt{p_1}\,|u_1\rangle|e_1\rangle \;=\; \tfrac{1}{\sqrt 2}|0\rangle|0\rangle + \tfrac{1}{\sqrt 2}|1\rangle|1\rangle \;=\; \frac{|00\rangle + |11\rangle}{\sqrt 2} \;=\; |\Phi^+\rangle.

The Bell state |\Phi^+\rangle pops out — no coincidence, because |\Phi^+\rangle is the Schmidt-decomposition purification of the maximally mixed qubit, and the construction is exactly Schmidt.

Step 4. Verify \text{tr}_{\mathcal H'}(|\Phi^+\rangle\langle\Phi^+|) = I/2. This was Example 1 of the partial-trace-revisited chapter: tracing out the second qubit of |\Phi^+\rangle gives I/2. The check passes.

Step 5. Not unique — another purification. Consider |\psi'\rangle = (I \otimes X)|\Phi^+\rangle = \tfrac{1}{\sqrt 2}(|01\rangle + |10\rangle) = |\Psi^+\rangle — another Bell state. Its reduction is still I/2 (the second Bell state has the same A-marginal as the first). More generally, (I \otimes U)|\Phi^+\rangle for any single-qubit unitary U is also a purification of I/2, and there are infinitely many distinct such purifications. Why all four Bell states have the same A-reduction: they are pairwise related by ancilla unitaries (X, Z, XZ applied to the second qubit), so by the non-uniqueness theorem, they must share a reduction. Every maximally entangled two-qubit pure state purifies I/2.

Result. The Bell state |\Phi^+\rangle = \tfrac{1}{\sqrt 2}(|00\rangle + |11\rangle) is the minimum-ancilla purification of the maximally mixed qubit. The other three Bell states — and infinitely many other two-qubit pure states — are purifications related by ancilla unitaries.

The maximally mixed qubit purifies to the Bell state on two qubits. The mixed-state "50-50 coin flip" is re-expressed as the pure-state entanglement of a Bell pair; the ancilla qubit carries the "which branch" information that the original $\rho$ had integrated over.

What this shows. The classical-looking randomness of I/2 is fully captured by entanglement with an invisible partner. If you believe |\Phi^+\rangle is a pure, deterministic, classically-complete state of two qubits (which it is), then the apparent randomness of the single qubit I/2 is not fundamental — it is the view from inside half of a pure whole. The Church of the Larger Hilbert Space, in miniature.

Example 2: Purifying a non-uniform qubit mixture

Purify the qubit mixture \rho = \tfrac{2}{3}|0\rangle\langle 0| + \tfrac{1}{3}|1\rangle\langle 1|. Work out the minimum-ancilla purification and verify by computing the partial trace.

Step 1. Identify the spectral decomposition. The state is already in its eigenbasis — it is diagonal in \{|0\rangle, |1\rangle\} with eigenvalues p_0 = 2/3, p_1 = 1/3. Rank is 2.

Step 2. Pick a 2-dimensional ancilla. Use \mathcal H' = \mathbb C^2 with basis \{|0\rangle, |1\rangle\}.

Step 3. Apply the formula:

|\psi\rangle \;=\; \sqrt{\tfrac{2}{3}}\,|0\rangle_{\mathcal H}|0\rangle_{\mathcal H'} + \sqrt{\tfrac{1}{3}}\,|1\rangle_{\mathcal H}|1\rangle_{\mathcal H'} \;=\; \sqrt{\tfrac{2}{3}}\,|00\rangle + \sqrt{\tfrac{1}{3}}\,|11\rangle.

Why the amplitudes are \sqrt{p_i} and not p_i: the reduction formula is \sum p_i |u_i\rangle\langle u_i|, and taking the outer product |\psi\rangle\langle\psi| of a sum \sum_i \sqrt{p_i}|u_i\rangle|e_i\rangle gives cross-terms \sqrt{p_i p_j}|u_i\rangle\langle u_j| \otimes |e_i\rangle\langle e_j|. Tracing the ancilla factor enforces i = j and collapses \sqrt{p_i p_i} = p_i. The square root in the amplitude is what produces the probability p_i in the density matrix.

Step 4. Verify. Compute |\psi\rangle\langle\psi|:

|\psi\rangle\langle\psi| \;=\; \tfrac{2}{3}|00\rangle\langle 00| + \sqrt{\tfrac{2}{9}}|00\rangle\langle 11| + \sqrt{\tfrac{2}{9}}|11\rangle\langle 00| + \tfrac{1}{3}|11\rangle\langle 11|.

Trace out the ancilla (second qubit). The rule \text{tr}_{\mathcal H'}(|ab\rangle\langle cd|) = \delta_{bd}|a\rangle\langle c| kills the cross-terms (because b = 0, d = 1 or vice versa in those):

\text{tr}_{\mathcal H'}(|\psi\rangle\langle\psi|) \;=\; \tfrac{2}{3}|0\rangle\langle 0| + \tfrac{1}{3}|1\rangle\langle 1| \;=\; \rho. \checkmark

Step 5. Purity check. The original \rho has \text{tr}(\rho^2) = (2/3)^2 + (1/3)^2 = 4/9 + 1/9 = 5/9 \approx 0.556. The purification |\psi\rangle\langle\psi| has \text{tr}((|\psi\rangle\langle\psi|)^2) = 1 (it is pure, as it must be). The step from mixed to pure happened by adding the ancilla that records the "which eigenvector" information.

Result. |\psi\rangle = \sqrt{2/3}|00\rangle + \sqrt{1/3}|11\rangle is a minimum-ancilla purification of \rho = (2/3)|0\rangle\langle 0| + (1/3)|1\rangle\langle 1|. It is not the only purification — any (I \otimes U)|\psi\rangle is another — but it is the simplest.

A non-uniform qubit mixture — probability $2/3$ of $|0\rangle$, $1/3$ of $|1\rangle$ — purifies to an entangled two-qubit pure state with Schmidt coefficients $\sqrt{2/3}$ and $\sqrt{1/3}$. The Schmidt coefficients of the purification are the square roots of the eigenvalues of the original mixed state.

What this shows. For a general mixed state, the purification is an entangled pure state on a two-factor Hilbert space, and the Schmidt coefficients of the purification are the square roots of the eigenvalues of \rho. This is the Schmidt-decomposition-purification duality: every mixed-state eigenvalue profile corresponds to a Schmidt-coefficient profile on the purification. The more non-uniform the mixed state, the more "peaked" the Schmidt spectrum, and the less entangled the purification is — F(\rho, |u_{\max}\rangle\langle u_{\max}|) = \sqrt{p_{\max}}, big when \rho is nearly pure.

Common confusions

"The purification is the state." No. A purification is a pure state whose reduction gives \rho. There are infinitely many purifications — related by ancilla unitaries and by ancilla enlargement. The density matrix \rho is the physically observable object; its purifications are mathematical scaffolding that extend it. When physicists say "the purification," they usually mean a specific convenient choice (the Schmidt-decomposition one), but that choice is no more fundamental than any other.
"Purification requires the ancilla to be the same size as the system." Not quite. The ancilla must have dimension \geq \text{rank}(\rho). For a pure \rho (rank 1), the "ancilla" is 1-dimensional — just a scalar — and the "purification" is the pure state itself. For a full-rank mixed \rho on a d-dimensional space, the ancilla must be at least d-dimensional (typically just a copy of \mathcal H). The equality case is the Schmidt-decomposition purification with minimum ancilla.
"The purifying environment is physically real." Purification is a mathematical theorem; the "environment" \mathcal H' need not correspond to any physical system. Sometimes it does — for a decohering qubit coupled to photon modes, the photon field really is the purifying environment. Sometimes it doesn't — for a classically-uncertain preparation, the "ancilla" is a fictional coin-qubit that was never entangled with anything. The theorem treats both cases identically because the mathematics doesn't care about physical realisation.
"If you purify, you haven't really done anything." You have. You have replaced a mixed-state problem with an equivalent pure-state problem on a larger space, and pure-state problems are often much easier. The partial trace at the end recovers the original mixed state, but in between, you can use all the tools that only work on pure states: Schmidt decomposition, inner-product overlaps, unitary dynamics without dissipation, and so on. The Church of the Larger Hilbert Space is a working philosophy precisely because the "same answer in the end" equivalence buys you clean tools in the middle.
"Purification is the inverse of partial trace." Not quite an inverse — partial trace is many-to-one (many purifications give the same \rho), so purification cannot be its inverse in the usual sense. Purification is a right inverse: for any \rho you can pick some purification that traces back to \rho, but the choice is not unique. The correct statement is: partial trace is surjective onto density operators, and purification picks a preimage.
"Uhlmann's theorem works only for minimum-dimension ancillas." No — Uhlmann's theorem works for any shared extended space that is large enough to purify both states. Larger ancillas give the same fidelity; the optimisation over ancilla unitaries does all the work. Larger ancillas just give you more unused dimensions.

Going deeper

You have the theorem, the construction, the non-uniqueness, and the Church of the Larger Hilbert Space. The rest of this section proves the purification theorem in full, discusses Uhlmann's theorem in more detail, surveys the Stinespring dilation (the channel analogue of purification), and connects purification to the entanglement theory and quantum-statistical arguments it unlocks.

Formal proof of the purification theorem

The construction |\psi\rangle = \sum_i \sqrt{p_i}|u_i\rangle|e_i\rangle already proves existence, but the uniqueness-up-to-ancilla-unitary part deserves its own argument. Suppose |\psi_1\rangle, |\psi_2\rangle \in \mathcal H \otimes \mathcal H' both purify the same \rho, and \mathcal H' is large enough. Use the Schmidt decomposition of each: |\psi_k\rangle = \sum_i \lambda_i |u_i\rangle \otimes |v_i^{(k)}\rangle, with the same Schmidt coefficients \lambda_i and system kets |u_i\rangle (because both reduce to \rho, and \rho determines \lambda_i^2 = p_i and |u_i\rangle uniquely, up to degeneracy).

The only remaining freedom is in the ancilla kets |v_i^{(1)}\rangle and |v_i^{(2)}\rangle, both orthonormal in \mathcal H'. Any two orthonormal sets of the same size in \mathcal H' are related by a unitary on \mathcal H': there exists U such that U|v_i^{(1)}\rangle = |v_i^{(2)}\rangle for each i. Then (I \otimes U)|\psi_1\rangle = |\psi_2\rangle. Done.

This proof makes clear why ancilla unitaries are the exact source of non-uniqueness: they are the freedom in choosing which orthonormal basis of \mathcal H' the purification lives in.

Uhlmann's theorem — proof sketch

Given \rho, \sigma on \mathcal H, construct canonical purifications |\psi_\rho\rangle = (I \otimes \sqrt{\rho})|I\rangle\!\rangle and |\psi_\sigma\rangle = (I \otimes \sqrt{\sigma})|I\rangle\!\rangle, where |I\rangle\!\rangle = \sum_i |i\rangle|i\rangle is the unnormalised maximally entangled state. Then any purification of \rho is |\psi_\rho^U\rangle = (I \otimes U\sqrt{\rho})|I\rangle\!\rangle for some isometry U. The overlap

\langle\psi_\rho^U|\psi_\sigma^V\rangle \;=\; \text{tr}(\sqrt{\rho}\,U^\dagger V\sqrt{\sigma}).

Maximising |\langle\psi_\rho^U|\psi_\sigma^V\rangle| over U, V unitary reduces to maximising |\text{tr}(\sqrt{\rho}\,W\sqrt{\sigma})| over unitary W. The polar-decomposition argument gives

\max_W |\text{tr}(\sqrt{\rho}\,W\sqrt{\sigma})| \;=\; \text{tr}|\sqrt{\rho}\sqrt{\sigma}| \;=\; \text{tr}\sqrt{\sqrt{\rho}\sigma\sqrt{\rho}} \;=\; F(\rho, \sigma).

The proof finishes. Full details can be found in Watrous or Nielsen-Chuang; the key technical tool is the matrix polar decomposition.

Stinespring dilation — the channel analogue

Every quantum channel \mathcal E: \mathcal B(\mathcal H) \to \mathcal B(\mathcal H') that is CPTP can be represented as

\mathcal E(\rho) \;=\; \text{tr}_E\bigl(U(\rho \otimes |0\rangle\langle 0|_E)U^\dagger\bigr),

for some environment space \mathcal H_E and unitary U on \mathcal H \otimes \mathcal H_E. This is Stinespring's dilation theorem (Stinespring 1955, pre-dating quantum information theory by decades). Operationally: every noisy operation is a unitary acting on the system plus an initially-|0\rangle ancilla, followed by discarding the ancilla.

The connection to purification: if \rho purifies to |\psi\rangle and \mathcal E dilates to U, then \mathcal E(\rho) = \text{tr}_E(U|\psi\rangle|0\rangle_E\langle\psi|\langle 0|_E U^\dagger) — both purifications combine cleanly. The Church of the Larger Hilbert Space thus embraces channels as well as states: every mixed-state dynamic is a pure-state dynamic plus partial trace at the end.

Purification and entanglement theory

The purification theorem is the foundational ingredient in the definition of entanglement of formation and other entanglement measures. For a bipartite mixed state \rho_{AB}, the entanglement of formation is defined by optimising the average entanglement entropy over all ensemble decompositions of \rho_{AB} — and each ensemble decomposition corresponds to a classical-quantum purification on a larger space. The purification picture makes the optimisation tractable and gives closed-form answers in the few cases where they exist (e.g. Wootters' formula for two-qubit entanglement of formation).

Quantum cryptography security proofs

In the security proof of BB84 (Bennett-Brassard quantum key distribution), the eavesdropper Eve's most general state is modelled as the purification of the state Alice and Bob share after their protocol. This single move replaces "consider every possible measurement Eve could make" with "consider the pure state she shares with Alice and Bob, and bound her information about the key via the von Neumann entropy of her marginal." The purification picture is what makes BB84 provably secure against arbitrary quantum attacks, not just the simple intercept-and-resend attack the original paper considered.

The operational meaning

A different way of stating the theorem: every classical ignorance about a quantum system can be reframed as quantum correlation with an external system. The sharpest version of this is the quantum de Finetti theorem (for permutation-invariant mixed states, the classical ignorance approaches entanglement with a hidden pure-state ensemble in a specific asymptotic sense), but the intuition is present already in the purification theorem: what looks like "I don't know which state I have" is mathematically indistinguishable from "I have half of a pure state and someone else has the other half."

Where this leads next

Partial trace revisited — the operation that closes the loop: partial-trace a purification, recover the mixed state.
Density operator — the mixed-state object being purified.
Stinespring dilation — the channel-level analogue of purification: every noisy dynamic is a unitary on a larger space.
Quantum channels — CPTP maps, where Stinespring dilation is the core structural theorem.
Schmidt decomposition — the decomposition that makes the purification construction mechanical.
Bell states — the minimum-ancilla purification of the maximally mixed qubit, and the canonical entangled pairs.

References

Wikipedia, Purification of quantum state — theorem statement and examples.
Nielsen and Chuang, Quantum Computation and Quantum Information (2010), §2.5 (purifications and the Schmidt decomposition) — Cambridge University Press.
John Preskill, Lecture Notes on Quantum Computation, Ch. 3 (purifications and the Church of the Larger Hilbert Space) — theory.caltech.edu/~preskill/ph229.
John Watrous, The Theory of Quantum Information (2018), §2.2 (purifications and Uhlmann's theorem) — cs.uwaterloo.ca/~watrous/TQI.
Armin Uhlmann, *The 'transition probability' in the state space of a -algebra (1976) — summary at Wikipedia: Uhlmann's theorem.
W. Forrest Stinespring, Positive functions on C-algebras* (1955) — summary at Wikipedia: Stinespring dilation theorem.