The Partial Trace — padho-wiki

Note: Company names, engineers, incidents, numbers, and scaling scenarios in this article are hypothetical — even when they resemble real ones. See the full disclaimer.

In short

The partial trace is the mathematical operation that takes a joint state of two quantum systems and produces the best possible description of just one of them, when you ignore the other. On a product state, the partial trace gives you back the pure state of the subsystem you kept — nothing is lost. On an entangled state, something strange happens: the partial trace of the Bell state |\Phi^+\rangle = \tfrac{1}{\sqrt{2}}(|00\rangle + |11\rangle) over qubit B is the maximally mixed state I/2 — a classical-looking 50-50 coin for qubit A. Entanglement shows up as "there is no pure state for the part when you look at it alone." This is the mathematical origin of the informal slogan that entanglement means the whole contains more than the sum of its parts.

Here is the setup that this chapter exists to answer.

Alice and Bob share a pair of qubits. Alice holds qubit A; Bob holds qubit B, maybe in a lab 100 kilometres away. The two qubits are in some joint state |\psi\rangle_{AB} — you know what the joint state is, you wrote it down in the last chapter. Now Alice wants to know: what is the state of my qubit? Not "what is the state of the whole system" — just her part. She is going to run some experiments on her single qubit, measure it, apply gates to it, and she needs to know the best mathematical description of it.

If the joint state happens to be a product, |\psi\rangle_{AB} = |a\rangle_A \otimes |b\rangle_B, the answer is easy: Alice's qubit is in state |a\rangle. Bob's half is irrelevant; the two halves are mathematically separate.

But what if the joint state is entangled? What if it is the Bell state |\Phi^+\rangle = \tfrac{1}{\sqrt{2}}(|00\rangle + |11\rangle)? The joint state has no tensor-product decomposition — that was the whole point of calling it entangled. So there is no single-qubit ket |a\rangle that describes Alice's qubit. What does Alice have, then?

The operation that answers this is the partial trace. It is the rigorous version of "ignore Bob and describe only Alice." Its output, in general, is not a pure-state ket but a more general object called a density matrix — and on entangled inputs that density matrix represents a genuinely mixed state, not any pure state. This is where density matrices stop being a formal convenience and become the only sensible way to describe a subsystem of an entangled pair.

By the end of this chapter you will know how to compute a partial trace by hand, you will have traced out one side of a Bell state and seen the I/2 come out of the algebra, and you will understand why the partial trace is the technical content of every sentence that starts with "when you look at one half of an entangled pair..."

A warm-up: the density matrix of a pure state

Before the partial trace, one small detour to introduce the object the trace operates on.

For any quantum state |\psi\rangle, the density matrix of that state is the outer product

\rho \;=\; |\psi\rangle\langle\psi|.

Why define this: a density matrix packages exactly the same information as the ket |\psi\rangle — no more, no less, as long as the state is pure — but in a form that can also describe mixtures of different pure states (coming in later chapters). Writing every pure state as |\psi\rangle\langle\psi| gets you used to the format before the more general object shows up.

If |\psi\rangle = \alpha|0\rangle + \beta|1\rangle, then

In matrix form:

\rho = \begin{pmatrix} |\alpha|^2 & \alpha\beta^* \\ \alpha^*\beta & |\beta|^2 \end{pmatrix}.

Reading the matrix. The diagonal entries |\alpha|^2 and |\beta|^2 are the probabilities of measuring 0 and 1 — exactly what the Born rule predicts for the ket |\psi\rangle. The off-diagonal entries \alpha\beta^* and \alpha^*\beta carry the phase information — the coherence between the two basis states, which is what distinguishes a genuine superposition from a classical mixture of 0 and 1. The diagonal tells you the measurement probabilities; the off-diagonal tells you "this is a pure superposition, not a statistical ensemble."

Two properties of density matrices that you should have in your hands before the partial trace:

Trace equals 1: \text{tr}(\rho) = |\alpha|^2 + |\beta|^2 = 1, because the state is normalised. This is true for every density matrix.
Pure states have \rho^2 = \rho: applying |\psi\rangle\langle\psi| twice pulls out a factor of \langle\psi|\psi\rangle = 1 in the middle and gives back the same matrix. Equivalently, \text{tr}(\rho^2) = 1. This characterises pure states; for a general density matrix, \text{tr}(\rho^2) \leq 1, with equality iff \rho is pure.

A two-qubit pure state |\psi\rangle_{AB} has its own density matrix \rho_{AB} = |\psi\rangle_{AB}\langle\psi|_{AB}, which is a 4 \times 4 matrix acting on the 4-dimensional joint Hilbert space. That is the object the partial trace will act on.

What the partial trace is, informally

Suppose you have a two-qubit density matrix \rho_{AB} and you care only about qubit A. You want to produce a single-qubit density matrix \rho_A that reproduces every measurement statistic that depends only on qubit A.

The operation that does this is written \rho_A = \text{tr}_B(\rho_{AB}) — "trace out subsystem B." The letter B tells you which subsystem you are ignoring (tracing over), and what comes out is a density matrix on the other subsystem.

The partial trace takes a joint state and returns the best description of the subsystem you keep, when the rest is ignored. The output is a density matrix, not a ket.

The operational meaning: any measurement Alice performs on qubit A alone has outcome statistics that are completely determined by \rho_A, no matter what Bob does with qubit B. If Alice measures in the computational basis, the probability of 0 is \langle 0 | \rho_A | 0 \rangle; the probability of 1 is \langle 1 | \rho_A | 1 \rangle. If she measures in any other basis \{|\phi\rangle, |\phi^\perp\rangle\}, the probabilities are \langle \phi | \rho_A | \phi\rangle and \langle \phi^\perp | \rho_A | \phi^\perp\rangle. The reduced matrix \rho_A is the complete book on Alice's qubit.

The partial trace — the mechanical rule

Now the algebra. The partial trace is defined on outer-product building blocks by

\text{tr}_B\bigl(|a\rangle\langle b| \otimes |c\rangle\langle d|\bigr) \;=\; \langle d | c\rangle \cdot |a\rangle\langle b|.

Why this is the rule: the outer product |c\rangle\langle d| is the B-part of a two-qubit operator; tracing it "collapses" it to a single number by taking the trace of a single-qubit operator. The trace of |c\rangle\langle d| alone is \text{tr}(|c\rangle\langle d|) = \langle d|c\rangle (cyclic property of the trace). So tracing out B of the composite operator is: keep the A-part unchanged, replace the B-part by its trace.

Extend by linearity. Any two-qubit density matrix can be written as a sum of building blocks |i\rangle\langle j| \otimes |k\rangle\langle l| with complex coefficients P_{ij,kl}:

\rho_{AB} \;=\; \sum_{i,j,k,l} P_{ij,kl}\,|i\rangle\langle j| \otimes |k\rangle\langle l|.

Apply the rule term by term:

\text{tr}_B(\rho_{AB}) = \sum_{i,j,k,l} P_{ij,kl} \langle l|k\rangle\,|i\rangle\langle j|.

If the basis on B is the orthonormal computational basis, \langle l | k\rangle = \delta_{lk} — zero unless l = k. The double sum collapses to

\text{tr}_B(\rho_{AB}) = \sum_{i,j} \Bigl(\sum_k P_{ij,kk}\Bigr) |i\rangle\langle j|.

Why the inner sum over k is "the trace" part of "partial trace": you are summing over the diagonal index of the B-part while keeping the A-part intact. That is a trace of the B-block, done for each fixed (i,j) pair on A.

A clean way to think about it using the matrix block structure. Any 4 \times 4 two-qubit matrix can be partitioned into a 2 \times 2 grid of 2 \times 2 blocks:

\rho_{AB} = \begin{pmatrix} M_{00} & M_{01} \\ M_{10} & M_{11} \end{pmatrix}

where each M_{ij} is a 2 \times 2 matrix acting on B, labelled by which basis state of A sits in the row and column of the big matrix. The partial trace over B then takes the trace of each 2 \times 2 block:

\text{tr}_B(\rho_{AB}) = \begin{pmatrix} \text{tr}(M_{00}) & \text{tr}(M_{01}) \\ \text{tr}(M_{10}) & \text{tr}(M_{11}) \end{pmatrix}.

That is the computational recipe, and you will use it twice in the worked examples.

Viewing a 4×4 two-qubit density matrix as a 2×2 grid of 2×2 blocks, the partial trace over B replaces each block by its trace.

The payoff: tracing out one qubit of a Bell state

Now comes the result that makes this whole construction famous. Take the Bell state

|\Phi^+\rangle = \tfrac{1}{\sqrt{2}}\bigl(|00\rangle + |11\rangle\bigr).

Form its density matrix. Using |\psi\rangle\langle\psi| with |\psi\rangle = \tfrac{1}{\sqrt{2}}(|00\rangle + |11\rangle):

\rho_{\Phi^+} = \tfrac{1}{2}\bigl(|00\rangle + |11\rangle\bigr)\bigl(\langle 00| + \langle 11|\bigr).

Distribute. Four terms come out:

\rho_{\Phi^+} = \tfrac{1}{2}\bigl(|00\rangle\langle 00| + |00\rangle\langle 11| + |11\rangle\langle 00| + |11\rangle\langle 11|\bigr).

In the basis \{|00\rangle, |01\rangle, |10\rangle, |11\rangle\} the 4 \times 4 matrix is

\rho_{\Phi^+} = \tfrac{1}{2}\begin{pmatrix} 1 & 0 & 0 & 1 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 1 & 0 & 0 & 1 \end{pmatrix}.

Reading the matrix. Only four of the sixteen entries are non-zero: the four corners of the matrix. The two diagonal corners give the measurement probabilities of |00\rangle and |11\rangle (both 1/2); the two anti-diagonal corners are the coherence between them — the quantum superposition that makes this a Bell state and not a classical 50-50 mixture of |00\rangle and |11\rangle.

Now take the partial trace over B using the block-trace rule. Divide the matrix into four 2\times 2 blocks:

\rho_{\Phi^+} = \tfrac{1}{2}\begin{pmatrix} \begin{array}{cc|cc} 1 & 0 & 0 & 1 \\ 0 & 0 & 0 & 0 \\ \hline 0 & 0 & 0 & 0 \\ 1 & 0 & 0 & 1 \end{array} \end{pmatrix}.

The four blocks are:

M_{00} = \tfrac{1}{2}\begin{pmatrix} 1 & 0 \\ 0 & 0 \end{pmatrix}, \quad M_{01} = \tfrac{1}{2}\begin{pmatrix} 0 & 1 \\ 0 & 0 \end{pmatrix}, \quad M_{10} = \tfrac{1}{2}\begin{pmatrix} 0 & 0 \\ 1 & 0 \end{pmatrix}, \quad M_{11} = \tfrac{1}{2}\begin{pmatrix} 0 & 0 \\ 0 & 1 \end{pmatrix}.

Take the trace of each:

\text{tr}(M_{00}) = \tfrac{1}{2}, \quad \text{tr}(M_{01}) = 0, \quad \text{tr}(M_{10}) = 0, \quad \text{tr}(M_{11}) = \tfrac{1}{2}.

Assemble:

\rho_A = \text{tr}_B(\rho_{\Phi^+}) = \begin{pmatrix} 1/2 & 0 \\ 0 & 1/2 \end{pmatrix} = \tfrac{1}{2} I.

Why the off-diagonal blocks had zero trace: M_{01} and M_{10} had their only non-zero entries on the anti-diagonal, so tracing (summing the diagonal) gives zero. The coherence of the Bell state lived in these off-diagonal blocks; the partial trace throws that coherence away and leaves only the diagonal probabilities.

The result is the maximally mixed state, \rho_A = I/2. It is the density matrix that assigns probability 1/2 to every basis state and has no off-diagonal coherence — the density matrix of a fair coin. For Alice, looking only at her qubit of the Bell pair, the experimental predictions are indistinguishable from those of a randomly prepared single qubit: 50% chance of measuring 0, 50% chance of 1, in any basis you like.

Starting from the Bell state $|\Phi^+\rangle$ and tracing out qubit B gives $\rho_A = I/2$: every measurement of qubit A alone looks like a fair coin flip.

Compare this to the joint state |\Phi^+\rangle. The joint state has perfect correlation: if Alice and Bob both measure in the computational basis, they always get the same bit — 00 or 11, never 01 or 10. That is a specific, testable, two-party property of the pair. But if Alice measures alone and does not hear from Bob, all she can say is "50% 0, 50% 1, indistinguishable from noise." The information about the correlation is there in the joint state, but it is invisible from just one side. You need both halves, or at least classical communication between them, to see it.

This is the main lesson of the partial trace: entanglement is the phenomenon that the whole pair has a pure, definite state while each half, viewed in isolation, looks like classical randomness. The global order is there; it just cannot be extracted from a single marginal.

Purity and the detection of entanglement

The quantity \text{tr}(\rho^2) is called the purity of a density matrix. It has an immediate physical meaning:

\text{tr}(\rho^2) = 1 iff \rho is pure (i.e. \rho = |\psi\rangle\langle\psi| for some |\psi\rangle).
\text{tr}(\rho^2) < 1 iff \rho is mixed.
The minimum possible purity on a d-dimensional system is 1/d, achieved by the maximally mixed state I/d.

Compute it for the reduced Bell-state state \rho_A = I/2:

\rho_A^2 = (I/2)^2 = I/4, \qquad \text{tr}(\rho_A^2) = \text{tr}(I/4) = 2/4 = 1/2.

Exactly the minimum possible for a qubit. And \text{tr}(\rho_A^2) = 1/2 < 1, so \rho_A is mixed.

Contrast with a product state: if |\psi\rangle_{AB} = |a\rangle_A \otimes |b\rangle_B, the reduced state \rho_A is the pure state |a\rangle\langle a|, and \text{tr}(\rho_A^2) = 1.

The rule is beautifully clean:

For a pure joint state |\psi\rangle_{AB}, the reduced state \rho_A is pure if and only if |\psi\rangle_{AB} is a product state; and \rho_A is mixed if and only if |\psi\rangle_{AB} is entangled.

So the partial trace gives you an entanglement detector: trace out one subsystem, check whether the reduction is pure. Pure \Leftrightarrow product. Mixed \Leftrightarrow entangled. The amount by which the purity drops below 1 measures how entangled the state is — this will be made precise in later chapters via the entanglement entropy, S(\rho_A) = -\text{tr}(\rho_A \log \rho_A).

The marginals of a product state stay pure; the marginals of an entangled state go mixed. Partial trace is the entanglement detector.

Why the partial trace matters — a catalogue

The partial trace is not a specialised tool for Bell states. It shows up every time a subsystem is discarded or ignored, and this happens all the time:

Discarded ancilla qubits. Many quantum algorithms use scratch ("ancilla") qubits to hold intermediate state. At the end of the algorithm you want the ancilla thrown away — and the effective state of the data qubits is the partial trace of the full state over the ancilla. If the ancilla ends up entangled with the data, tracing it out leaves the data qubits in a mixed state, which can corrupt the algorithm.

Measurement of one subsystem. When you measure only qubit B and ignore the classical outcome, the state of qubit A conditional on the unread measurement is exactly the partial trace over B. This is sometimes called the "non-selective" or "unread-measurement" evolution.

Noise and decoherence. Every real qubit is weakly coupled to its environment — stray electromagnetic fields, vibrations of the fridge, photons leaking in. The environment becomes entangled with the qubit over time; tracing it out leaves the qubit in a mixed state. This is the partial-trace description of decoherence, and it is the central practical obstacle to building fault-tolerant quantum computers. Every quantum error correction scheme is, in one way or another, a scheme for fighting the damage the partial trace does to quantum information.

Reduced density matrices in condensed-matter physics. When a many-body system has a ground state, the reduced state of a subregion — obtained by tracing out the rest of the system — encodes the entanglement structure of the ground state. Entanglement entropy of reduced states is a central diagnostic in the study of topological phases of matter and tensor-network representations of ground states.

Quantum communication protocols. Alice and Bob share a resource state across a channel; what Alice has locally is her reduced state, and similarly for Bob. The rates at which they can send classical or quantum information through the channel are expressible entirely in terms of these reductions.

The partial trace is one of the handful of operations (alongside unitary evolution, measurement, and state preparation) that underlie every formal statement about open quantum systems.

Example 1 — partial trace of a product state

Setup. Let the joint state be the product |\psi\rangle_{AB} = |+\rangle_A \otimes |0\rangle_B. Compute \rho_{AB}, then trace out B and identify what you get.

Step 1 — form the joint density matrix. Write the joint state explicitly:

|+\rangle_A |0\rangle_B = \tfrac{1}{\sqrt{2}}(|0\rangle + |1\rangle) \otimes |0\rangle = \tfrac{1}{\sqrt{2}}(|00\rangle + |10\rangle).

Why write it in the joint basis: the partial trace acts on the 4\times 4 joint density matrix, so you need the state expanded on the four-basis \{|00\rangle, |01\rangle, |10\rangle, |11\rangle\}.

Step 2 — compute \rho_{AB}. The density matrix is |\psi\rangle\langle\psi|:

\rho_{AB} = \tfrac{1}{2}(|00\rangle + |10\rangle)(\langle 00| + \langle 10|) = \tfrac{1}{2}(|00\rangle\langle 00| + |00\rangle\langle 10| + |10\rangle\langle 00| + |10\rangle\langle 10|).

In matrix form on the basis \{|00\rangle, |01\rangle, |10\rangle, |11\rangle\}:

\rho_{AB} = \tfrac{1}{2}\begin{pmatrix} 1 & 0 & 1 & 0 \\ 0 & 0 & 0 & 0 \\ 1 & 0 & 1 & 0 \\ 0 & 0 & 0 & 0 \end{pmatrix}.

Step 3 — extract the 2×2 blocks. The four blocks are

M_{00} = \tfrac{1}{2}\begin{pmatrix} 1 & 0 \\ 0 & 0 \end{pmatrix}, \quad M_{01} = \tfrac{1}{2}\begin{pmatrix} 1 & 0 \\ 0 & 0 \end{pmatrix},

M_{10} = \tfrac{1}{2}\begin{pmatrix} 1 & 0 \\ 0 & 0 \end{pmatrix}, \quad M_{11} = \tfrac{1}{2}\begin{pmatrix} 1 & 0 \\ 0 & 0 \end{pmatrix}.

Why all four blocks look the same here: qubit B is in the definite state |0\rangle, so its density matrix is the projector |0\rangle\langle 0| = \begin{pmatrix} 1 & 0 \\ 0 & 0 \end{pmatrix}, and the joint density matrix factors as \rho_{AB} = \rho_A \otimes \rho_B — each block is a multiple of \rho_B.

Step 4 — trace each block.

\text{tr}(M_{00}) = \tfrac{1}{2}, \quad \text{tr}(M_{01}) = \tfrac{1}{2}, \quad \text{tr}(M_{10}) = \tfrac{1}{2}, \quad \text{tr}(M_{11}) = \tfrac{1}{2}.

Step 5 — assemble.

\rho_A = \begin{pmatrix} 1/2 & 1/2 \\ 1/2 & 1/2 \end{pmatrix}.

Step 6 — recognise the result. This is the density matrix of the pure state |+\rangle:

|+\rangle\langle +| = \tfrac{1}{2}(|0\rangle + |1\rangle)(\langle 0| + \langle 1|) = \tfrac{1}{2}(|0\rangle\langle 0| + |0\rangle\langle 1| + |1\rangle\langle 0| + |1\rangle\langle 1|) = \begin{pmatrix} 1/2 & 1/2 \\ 1/2 & 1/2 \end{pmatrix}.

Result. \rho_A = |+\rangle\langle +| — a pure state, with purity \text{tr}(\rho_A^2) = 1.

What this shows. The reduced state of a product state is pure, and it is exactly the single-qubit state that appeared in the product: |+\rangle on A, as expected. Nothing was lost. The partial trace is only "destructive" when there is entanglement to destroy; for a product state it is the identity operation on the marginal.

The partial trace of a product state gives back the pure state of the kept subsystem. No information is lost; there was never any to lose.

Example 2 — partial trace of the Bell state $|\Phi^-\rangle$

Setup. Take a different Bell state, |\Phi^-\rangle = \tfrac{1}{\sqrt{2}}(|00\rangle - |11\rangle) — same as |\Phi^+\rangle except for a minus sign on the |11\rangle branch. Trace out B.

Step 1 — form \rho_{AB}. Distribute the outer product:

\rho_{\Phi^-} = \tfrac{1}{2}(|00\rangle - |11\rangle)(\langle 00| - \langle 11|) = \tfrac{1}{2}(|00\rangle\langle 00| - |00\rangle\langle 11| - |11\rangle\langle 00| + |11\rangle\langle 11|).

In the \{|00\rangle, |01\rangle, |10\rangle, |11\rangle\} basis:

\rho_{\Phi^-} = \tfrac{1}{2}\begin{pmatrix} 1 & 0 & 0 & -1 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ -1 & 0 & 0 & 1 \end{pmatrix}.

Why the only difference from \rho_{\Phi^+} is the sign of the two corner off-diagonal entries: the minus sign in the ket multiplies the cross-terms |00\rangle\langle 11| and |11\rangle\langle 00| in the outer-product expansion.

Step 2 — extract the blocks.

M_{00} = \tfrac{1}{2}\begin{pmatrix} 1 & 0 \\ 0 & 0 \end{pmatrix}, \quad M_{01} = \tfrac{1}{2}\begin{pmatrix} 0 & -1 \\ 0 & 0 \end{pmatrix}, \quad M_{10} = \tfrac{1}{2}\begin{pmatrix} 0 & 0 \\ -1 & 0 \end{pmatrix}, \quad M_{11} = \tfrac{1}{2}\begin{pmatrix} 0 & 0 \\ 0 & 1 \end{pmatrix}.

Step 3 — trace each. The off-diagonal blocks M_{01} and M_{10} have zero diagonal, so their traces are zero. The diagonal blocks give \text{tr}(M_{00}) = \text{tr}(M_{11}) = 1/2.

Step 4 — assemble.

\rho_A = \begin{pmatrix} 1/2 & 0 \\ 0 & 1/2 \end{pmatrix} = I/2.

Result. Same as for |\Phi^+\rangle: the reduced state of A is maximally mixed, I/2.

What this shows. The sign that distinguishes |\Phi^+\rangle from |\Phi^-\rangle is a relative phase between |00\rangle and |11\rangle. It is absolutely observable — it changes the joint state's response to bases other than the computational — but it lives entirely in the cross-terms of the density matrix, and the partial trace throws those terms away. So Alice cannot tell the two Bell states apart by measuring her qubit alone: both reduce to I/2 on her side. She would need classical communication with Bob, or joint measurements, to distinguish them. This is the "no-communication" face of entanglement: the two Bell states share all their information between the two parties, and neither party can extract any of it unilaterally.

Common confusions

"Tracing out destroys information." Slightly misleading. The global state \rho_{AB} still exists and holds all the information. Tracing out B produces \rho_A, which holds only the information accessible to A. If you keep both reductions \rho_A and \rho_B, you do not recover \rho_{AB} in general — the correlations that live in the joint state are missing from the pair of marginals. So the partial trace "throws away correlations," not "destroys information." The information is there; it has just moved to the correlations between the parts, which a one-sided view cannot see.
"Partial trace is the same as measurement." No. Measurement is a random event: it delivers a classical outcome and updates the quantum state conditionally on what outcome was seen. Partial trace is deterministic and unconditional: it averages over the other subsystem rather than reading it. Operationally, partial trace is equivalent to measuring the other subsystem and then forgetting what outcome you got — not to "measuring nothing," and not to "measuring and remembering."
"Every mixed state is entangled with something." False as stated; true in a richer sense. Mixed states can arise from:
- Classical ignorance: the experimenter prepared |0\rangle with probability 1/2 and |1\rangle with probability 1/2, gave you the qubit without telling you which. The state is I/2, but there is no partner system holding entanglement.
- Entanglement: you hold one half of a Bell pair whose other half is somewhere else. Both look identical from your side (I/2 is I/2). There is a useful fact called purification: every mixed state can be viewed as the reduction of some pure state on a larger system. That purification is mathematically constructible but not physically necessary — not every mixed state in the world actually came from tracing out a partner.
"The reduced state on A depends on what Bob does." No — this is the no-communication theorem's direct content. If the joint state is \rho_{AB}, the reduction \rho_A is fixed, and applying any operation on B alone — unitary, measurement, throwing the qubit away — does not change \rho_A. Bob's choices cannot send information to Alice through their shared state; her marginal is what it is. The partial trace is the formal statement of this, and it is a sharp and beautiful theorem (you saw an illustration of it in Example 2: |\Phi^+\rangle and |\Phi^-\rangle have the same \rho_A).
"The partial trace works only on qubits." No. The partial trace is defined for any bipartite Hilbert space \mathcal{H}_A \otimes \mathcal{H}_B of any dimensions. The formula \text{tr}_B(|a\rangle\langle b| \otimes |c\rangle\langle d|) = \langle d|c\rangle\,|a\rangle\langle b| is the same; the bookkeeping just has more basis states. Continuous-variable systems (wavefunctions on the line) have a partial trace too, written as an integral over the traced-out variable.

Going deeper

If you came here to understand what the partial trace is and why tracing out half of a Bell state gives you classical-looking randomness, you have it. The rest of the chapter covers the formal definition in more generality, the tight connection to the Schmidt decomposition that will let you read off the reduction without computing blocks by hand, why the partial trace is the exact mathematical content of decoherence, and the purification theorem — the statement that every mixed state can be viewed as a marginal of a pure state on a bigger system.

Reduced density matrices — the formal definition

Let \mathcal{H}_A and \mathcal{H}_B be finite-dimensional Hilbert spaces. Given a density matrix \rho_{AB} on \mathcal{H}_A \otimes \mathcal{H}_B and an orthonormal basis \{|k\rangle_B\} of \mathcal{H}_B, the partial trace over B is the linear map defined by

\rho_A = \text{tr}_B(\rho_{AB}) = \sum_k \bigl(I_A \otimes \langle k|_B\bigr)\,\rho_{AB}\,\bigl(I_A \otimes |k\rangle_B\bigr).

The formula is basis-independent: if you chose a different orthonormal basis of \mathcal{H}_B and redid the sum, you would get the same \rho_A. Linearity: \text{tr}_B is linear in \rho_{AB}. Completeness: \text{tr}_A \circ \text{tr}_B = \text{tr} (the full trace), which is consistent with the fact that tracing out all subsystems returns a scalar (the total trace).

The trace-preserving property \text{tr}(\rho_A) = \text{tr}(\rho_{AB}) = 1 follows immediately: summing \langle j | \rho_A | j\rangle over a basis of A gives the full trace of \rho_{AB}. So \rho_A is always a valid density matrix — positive semidefinite, Hermitian, trace 1.

Schmidt decomposition and a faster way to compute reductions

Any bipartite pure state has a Schmidt decomposition (chapter 40):

|\psi\rangle_{AB} = \sum_i \lambda_i\,|u_i\rangle_A \otimes |v_i\rangle_B, \qquad \sum_i \lambda_i^2 = 1, \quad \lambda_i \geq 0.

The \{|u_i\rangle\} and \{|v_i\rangle\} are orthonormal (in their respective spaces) and the \lambda_i are real non-negative Schmidt coefficients. Using the Schmidt form, the partial trace is immediate:

\rho_A = \text{tr}_B\Bigl(\sum_{i,j} \lambda_i \lambda_j |u_i\rangle\langle u_j|_A \otimes |v_i\rangle\langle v_j|_B\Bigr) = \sum_{i,j} \lambda_i \lambda_j \langle v_j | v_i\rangle_B |u_i\rangle\langle u_j|_A = \sum_i \lambda_i^2 |u_i\rangle\langle u_i|_A,

using the orthonormality of the |v_i\rangle to kill the cross-terms. The reduced state \rho_A is already diagonal in the Schmidt basis \{|u_i\rangle\}, with eigenvalues \lambda_i^2. Same story for \rho_B = \sum_i \lambda_i^2 |v_i\rangle\langle v_i|_B — same spectrum!

This is a strong statement: the two reductions \rho_A and \rho_B of a bipartite pure state have identical non-zero spectra, equal to the squared Schmidt coefficients. They can live in spaces of different dimensions, but their entanglement contents are encoded in the same list of numbers. And the purity \text{tr}(\rho_A^2) = \sum_i \lambda_i^4 is a function of the Schmidt coefficients alone.

The entanglement entropy — which will be developed in its own chapter — is the von Neumann entropy of the reduction: S(\rho_A) = -\sum_i \lambda_i^2 \log_2 \lambda_i^2. For a product state there is one non-zero \lambda_i equal to 1, and the entropy is 0. For the Bell state |\Phi^+\rangle there are two Schmidt coefficients equal to 1/\sqrt{2}, so \lambda_i^2 = 1/2 each, and the entropy is -2 \cdot (1/2) \log_2(1/2) = 1 bit — the maximally entangled two-qubit entropy.

Decoherence as environmental partial tracing

The canonical model of decoherence runs like this. You have a qubit in some state \alpha|0\rangle + \beta|1\rangle. It is weakly coupled to an environment (a microwave field, a stray phonon, a photon wandering through). Over time the system-environment pair evolves under unitary dynamics into an entangled joint state:

\alpha|0\rangle_S|E_0\rangle + \beta|1\rangle_S|E_1\rangle,

where |E_0\rangle, |E_1\rangle are (possibly orthogonal) environment states that record "which qubit state I saw." The reduced state of the system, obtained by tracing out the environment, is

The off-diagonal coherences are multiplied by the environment-overlap \langle E_0 | E_1\rangle. If the environment states are nearly orthogonal (the environment "knows" the qubit's state to high distinguishability), this overlap is nearly zero, and the reduced density matrix becomes nearly diagonal:

\rho_S \approx |\alpha|^2 |0\rangle\langle 0| + |\beta|^2 |1\rangle\langle 1|.

A classical statistical mixture of |0\rangle and |1\rangle — no superposition left, no interference possible. That is decoherence. It is not a new phenomenon on top of partial traces; it is precisely the partial trace of an entangled system-environment state in which the environment has recorded "which" information about the system. Quantum error correction is the art of keeping the system-environment entanglement weak enough that the partial-trace damage stays small — or, when it inevitably grows, redistributing the information across more qubits so the damage becomes recoverable.

The purification theorem

Every mixed density matrix \rho_A on \mathcal{H}_A can be purified: there exists a Hilbert space \mathcal{H}_{A'} and a pure state |\psi\rangle_{AA'} such that \rho_A = \text{tr}_{A'}(|\psi\rangle\langle\psi|_{AA'}). The purification |\psi\rangle_{AA'} is unique up to a unitary on \mathcal{H}_{A'}, and \dim(\mathcal{H}_{A'}) can always be chosen equal to the rank of \rho_A.

This is one of the deepest and most useful facts in quantum information. It says: every classical-looking uncertainty ("qubit is |0\rangle with probability 1/2 and |1\rangle with probability 1/2") can be reframed as "qubit is entangled with some external system, whose state we have not told you about." The Church of the Larger Hilbert Space, as the joke goes: mixedness is just entanglement with something you chose to ignore. Whether that "something" is a real physical system (the environment, another qubit) or a fictional bookkeeping device depends on the scenario — but the mathematical unification is complete, and it is the lens through which almost every modern treatment of noise, channels, and error correction is written.

Where this leads next

Density matrices — introduction — the full machinery of density matrices, including how to describe mixed states directly, how to evolve them under unitaries and channels, and why they are the fundamental objects of quantum information theory.
Bell states — the four maximally entangled two-qubit states, their properties, and the protocols (teleportation, superdense coding, device-independent cryptography) they enable.
Schmidt decomposition — the theorem that diagonalises every bipartite pure state and makes the partial trace a one-line computation.
Decoherence — introduction — how environmental partial tracing explains the loss of quantum coherence in real hardware, and what error correction does about it.
Entanglement entropy — the von Neumann entropy of a reduction, which measures how entangled a bipartite pure state is.
No-communication theorem — the rigorous proof, built on partial traces, that Alice's marginal cannot change under anything Bob does alone.

References

Nielsen and Chuang, Quantum Computation and Quantum Information (2010), §2.4.3 — Cambridge University Press.
John Preskill, Lecture Notes on Quantum Computation, Ch. 3 (density matrices and the partial trace) — theory.caltech.edu/~preskill/ph229.
John Watrous, The Theory of Quantum Information, §1.1.2 — cs.uwaterloo.ca/~watrous/TQI.
Wikipedia, Partial trace — the formal definition and its properties.
Wikipedia, Density matrix — reduced density matrices and purity.
Wojciech H. Zurek, Decoherence, einselection, and the quantum origins of the classical (2003) — arXiv:quant-ph/0105127.