Partial Trace Revisited

Note: Company names, engineers, incidents, numbers, and scaling scenarios in this article are hypothetical — even when they resemble real ones. See the full disclaimer.

In short

The partial trace \text{tr}_B(\rho_{AB}) is the operation that takes a joint density operator on \mathcal H_A \otimes \mathcal H_B and returns the reduced state on \mathcal H_A. Its rule on outer products is \text{tr}_B(|a\rangle\langle a'| \otimes |b\rangle\langle b'|) = \langle b'|b\rangle\,|a\rangle\langle a'|, extended by linearity. Equivalently, \text{tr}_B(\rho_{AB}) = \sum_j (I_A \otimes \langle j|_B)\rho_{AB}(I_A \otimes |j\rangle_B) for any orthonormal basis \{|j\rangle_B\}. The partial trace is the unique map that preserves every A-local expectation value: \text{tr}\bigl(A \cdot \text{tr}_B(\rho_{AB})\bigr) = \text{tr}\bigl((A\otimes I_B)\,\rho_{AB}\bigr) for every observable A on \mathcal H_A. For a product state \rho_A \otimes \rho_B the rule collapses to \rho_A; for a Bell state |\Phi^+\rangle, \text{tr}_B gives I/2 (maximally mixed); for GHZ, tracing out one qubit leaves a classical mixture \tfrac{1}{2}|00\rangle\langle 00| + \tfrac{1}{2}|11\rangle\langle 11| with no coherence. Purity of the reduction measures entanglement: if the joint state is pure, \rho_A is pure iff |\psi\rangle_{AB} is a product. Multi-subsystem traces compose — \text{tr}_{BC} = \text{tr}_B \circ \text{tr}_C = \text{tr}_C \circ \text{tr}_B.

You already met the partial trace once, in chapter 9, before you had the density-operator machinery. You computed \text{tr}_B on a Bell state, saw I/2 pop out, and understood — at the level of a formula — what "ignore Bob's qubit" means. That chapter paid for its keep: it introduced the operation, the Bell-state calculation, and the purity test for entanglement. But it built everything on block-trace mechanics, with the density matrix smuggled in as a formal convenience.

Now you have the density operator as a first-class object. Hermitian. Positive semi-definite. Trace one. The Bloch ball. The full measurement ruleset. Against that backdrop, the partial trace looks different — it is not a recipe for one special kind of matrix; it is a linear map \text{tr}_B : \mathcal D(\mathcal H_A \otimes \mathcal H_B) \to \mathcal D(\mathcal H_A) with a sharp, almost inevitable, universal property. This chapter revisits partial trace from that vantage point: formal definition, uniqueness, computational rules, then three worked traces on Bell, GHZ, and W states.

By the end you will be able to take any density matrix of two or more subsystems, trace out whichever parts you don't care about, and read off what the observer who keeps the rest actually sees.

What you already know, in one paragraph

If Alice and Bob share a state \rho_{AB} on \mathcal H_A \otimes \mathcal H_B, and Alice performs any measurement or operation that acts only on her subsystem, the outcome statistics are completely determined by a single-subsystem object \rho_A on \mathcal H_A. The map \rho_{AB} \mapsto \rho_A is the partial trace over B, written \rho_A = \text{tr}_B(\rho_{AB}). Nothing Bob does to his qubit — unitary, measurement, discarding — changes \rho_A. Bob's half of the world is invisible from Alice's side once the partial trace has been taken.

The partial trace turns a joint state into the reduced state on the retained subsystem. Once $\rho_A$ is in hand, every single-subsystem prediction about Alice's side follows from it — and only from it.

This is the picture. The rest of the chapter is the machinery and the universal property that nails down why "partial trace" is not just a convenient operation but the only one that can play this role.

The formal definition

Fix finite-dimensional Hilbert spaces \mathcal H_A and \mathcal H_B with dimensions d_A and d_B, and let \{|j\rangle_B\}_{j=0}^{d_B - 1} be any orthonormal basis of \mathcal H_B. The partial trace over B is the linear map

\text{tr}_B \colon \mathcal B(\mathcal H_A \otimes \mathcal H_B) \to \mathcal B(\mathcal H_A),\qquad \text{tr}_B(X) \;=\; \sum_{j=0}^{d_B - 1} \bigl(I_A \otimes \langle j|_B\bigr)\,X\,\bigl(I_A \otimes |j\rangle_B\bigr).

Here \mathcal B(\mathcal H) is the space of linear operators on \mathcal H — in finite dimensions, just the matrices of the right size. The expression (I_A \otimes \langle j|_B) is a "half-partial" map that turns a vector in \mathcal H_A \otimes \mathcal H_B into a vector in \mathcal H_A by contracting its B-component against \langle j|_B. Squeezing X between (I_A \otimes \langle j|_B) on the left and (I_A \otimes |j\rangle_B) on the right reduces X to a matrix on \mathcal H_A, and the sum over j runs that reduction through every basis vector of \mathcal H_B.

The rule on outer products

The definition above is correct but dense. The working rule — the one you use when computing by hand — is that the partial trace acts on outer-product tiles as

\text{tr}_B\bigl(|a\rangle\langle a'|_A \otimes |b\rangle\langle b'|_B\bigr) \;=\; \langle b'|b\rangle \cdot |a\rangle\langle a'|_A.

Why the inner product \langle b'|b\rangle appears and not \langle b|b'\rangle: the trace of |b\rangle\langle b'| as a standalone operator on \mathcal H_B is \langle b'|b\rangle by the cyclic property. Tracing out B replaces the B-factor |b\rangle\langle b'| by its trace \langle b'|b\rangle, a number, and leaves the A-factor |a\rangle\langle a'| untouched. Multiplication: (\text{number}) \cdot (\text{operator on } A).

Because any operator on \mathcal H_A \otimes \mathcal H_B can be written as a linear combination of outer-product tiles, this rule plus linearity defines \text{tr}_B on everything.

Basis-independence

You might worry that the formula \text{tr}_B(X) = \sum_j (I_A \otimes \langle j|_B) X (I_A \otimes |j\rangle_B) depends on the choice of basis \{|j\rangle_B\}. It does not. If \{|j'\rangle_B\} is any other orthonormal basis, there is a unitary U mapping one basis to the other; the sum \sum_j |j\rangle\langle j| = I_B is the resolution of identity in any orthonormal basis, so computing the partial trace in two different bases gives the same answer. Why this matters: if the answer depended on the basis, the partial trace would not be a well-defined operation on states, only on states-plus-a-basis. Physics does not care about which orthonormal frame you pick for Bob; the partial trace inherits that indifference.

What comes out is a density operator

If \rho_{AB} is Hermitian, positive semi-definite, and trace one, so is \text{tr}_B(\rho_{AB}). All three inherit directly. Hermiticity: \text{tr}_B commutes with the \dagger operation because it is built from Hermitian-conjugated pieces. Positive semi-definiteness: for any |\phi\rangle_A,

\langle\phi|_A\,\text{tr}_B(\rho_{AB})\,|\phi\rangle_A \;=\; \sum_j \langle\phi|_A\langle j|_B\,\rho_{AB}\,|\phi\rangle_A|j\rangle_B \;\geq\; 0,

because each term is a diagonal matrix element of the positive semi-definite \rho_{AB} in the product basis \{|\phi\rangle|j\rangle\}. Unit trace: \text{tr}(\text{tr}_B(\rho_{AB})) = \text{tr}(\rho_{AB}) = 1 by the full-trace identity \text{tr}_A\text{tr}_B = \text{tr}. So the output of the partial trace is always a valid reduced state.

The universal property — why this is the "right" definition

Here is the non-obvious thing. Many linear maps \mathcal B(\mathcal H_A \otimes \mathcal H_B) \to \mathcal B(\mathcal H_A) exist. Why pick this one? The answer is a short theorem that pins it down uniquely.

The universal property of the partial trace

Let \rho_{AB} be any density operator on \mathcal H_A \otimes \mathcal H_B. The partial trace \text{tr}_B is the unique linear map such that, for every operator A on \mathcal H_A,

\text{tr}\bigl(A \cdot \text{tr}_B(\rho_{AB})\bigr) \;=\; \text{tr}\bigl((A \otimes I_B)\,\rho_{AB}\bigr).

In words: the expectation value of any A-local observable, computed on \rho_A = \text{tr}_B(\rho_{AB}), agrees with the expectation value of that observable computed on the joint state \rho_{AB} (by padding A to A \otimes I_B). No other map on density operators has this property.

Why this property forces the definition

Suppose some other map f: \mathcal B(\mathcal H_A \otimes \mathcal H_B) \to \mathcal B(\mathcal H_A) also satisfies \text{tr}(A \cdot f(\rho_{AB})) = \text{tr}((A \otimes I_B)\rho_{AB}) for every Hermitian A. Then

\text{tr}\bigl(A \cdot (f(\rho_{AB}) - \text{tr}_B(\rho_{AB}))\bigr) \;=\; 0 \qquad \text{for every Hermitian } A.

Why this forces f = \text{tr}_B on \rho_{AB}: if the trace of A against an operator M vanishes for every Hermitian A, then M is the zero operator. (Pick A to be each of the basis matrices |i\rangle\langle j| + |j\rangle\langle i| and i(|i\rangle\langle j| - |j\rangle\langle i|) — you get every entry of M one at a time.) So f(\rho_{AB}) = \text{tr}_B(\rho_{AB}).

In plain language: the partial trace is the only way to get a subsystem density operator that reproduces every A-local measurement statistic. If you want a map from joint states to reduced states that is consistent with the measurement rule \langle A\rangle_\rho = \text{tr}(A\rho) for observables that only touch A, you have no choice. The partial trace is forced.

This is the sense in which the partial trace is the "right" way to describe subsystems. It is not a convention; it is the answer to a question — "what reduced state preserves all A-local predictions?" — that has exactly one solution.

The no-signalling corollary

A direct consequence: if Bob applies any unitary U_B or even a measurement to his subsystem, Alice's reduced state \rho_A is unchanged. Compute \text{tr}_B((I_A \otimes U_B)\rho_{AB}(I_A \otimes U_B^\dagger)) by the cyclic property inside the partial trace, and the U_B U_B^\dagger = I_B collapses the expression to \text{tr}_B(\rho_{AB}) = \rho_A. Why this is no-signalling: if Bob could change \rho_A by acting on his qubit alone, Alice could detect it by measuring her qubit — giving them a way to communicate instantaneously. The universal property of the partial trace rules this out algebraically. The same fact, read at the level of physics, is the no-communication theorem.

Computational rules in practice

A small zoo of identities makes partial traces easy to compute, once you know the rule on outer products.

Product states

For a product state \rho_{AB} = \rho_A \otimes \rho_B,

\text{tr}_B(\rho_A \otimes \rho_B) \;=\; \rho_A \cdot \text{tr}(\rho_B) \;=\; \rho_A.

Why the trace of \rho_B enters and then becomes 1: expand \rho_B = \sum_i q_i |e_i\rangle\langle e_i|_B, apply the rule on each tile, sum. The result is \rho_A \cdot \sum_i q_i = \rho_A \cdot 1 = \rho_A, because \rho_B is a density operator and its trace is exactly 1.

Product states are the "trivial" case: tracing out doesn't mix anything, and the surviving subsystem just keeps its own state. The interesting phenomenon — reduction of pure to mixed — happens only when the joint state is entangled.

Composition and order invariance

For a three-subsystem state \rho_{ABC}, tracing out B and C gives

\text{tr}_{BC}(\rho_{ABC}) \;=\; \text{tr}_B\bigl(\text{tr}_C(\rho_{ABC})\bigr) \;=\; \text{tr}_C\bigl(\text{tr}_B(\rho_{ABC})\bigr).

The order of tracing out does not matter. You can trace out B first, then C, or C first, then B, and you get the same \rho_A.

Why the order doesn't matter: each partial trace is a sum of basis elements against the corresponding subsystem; summing over two independent subsystems' bases is a double sum that commutes with itself. Formally, \text{tr}_{BC} = \text{tr}_B \circ \text{tr}_C = \text{tr}_C \circ \text{tr}_B because the two partial-trace operations act on disjoint tensor factors.

This generalises: for \rho_{A_1 \ldots A_n}, you can trace out any subset S \subseteq \{1,\ldots,n\} in any order and the result is the same \rho_{A_{\bar S}} on the remaining subsystems.

The block-trace recipe

The cleanest way to compute a partial trace by hand, for a matrix written in the computational basis, is the block-trace rule you saw in chapter 9. Any two-qubit 4 \times 4 density matrix can be partitioned into a 2 \times 2 grid of 2 \times 2 blocks:

\rho_{AB} \;=\; \begin{pmatrix} M_{00} & M_{01} \\ M_{10} & M_{11} \end{pmatrix},

indexed by the basis of A. Tracing out B replaces each block by its trace:

\text{tr}_B(\rho_{AB}) \;=\; \begin{pmatrix} \text{tr}(M_{00}) & \text{tr}(M_{01}) \\ \text{tr}(M_{10}) & \text{tr}(M_{11}) \end{pmatrix}.

Tracing out A instead sums the diagonal blocks:

\text{tr}_A(\rho_{AB}) \;=\; M_{00} + M_{11}.

Why the asymmetry: "tracing out A" keeps B, and the B-block at A-index (i,i) is M_{ii}. Summing over i is the trace over A. "Tracing out B" keeps A, and the A-element at (i,j) is the trace of the B-block M_{ij}. Different operation, different formula; both are special cases of the outer-product rule.

You will use both variants in the worked examples below.

Two block-trace recipes. Tracing out $B$ (right arrow) replaces every 2×2 block by its scalar trace. Tracing out $A$ (left arrow) sums the diagonal blocks. Both are instances of the outer-product rule applied to a full 4×4 matrix.

Purity, entanglement, and the structure of the reduced state

A single scalar — the purity \text{tr}(\rho_A^2) of the reduction — separates entangled joint pure states from product joint pure states. This is the main diagnostic tool you get from partial tracing.

The three cases for a bipartite pure state

Start with a pure joint state \rho_{AB} = |\psi\rangle\langle\psi|_{AB}, and let \rho_A = \text{tr}_B(\rho_{AB}).

\rho_A pure, \rho_B pure iff |\psi\rangle_{AB} is a product state. In this case |\psi\rangle = |a\rangle_A \otimes |b\rangle_B, \rho_A = |a\rangle\langle a|, \rho_B = |b\rangle\langle b|, and there is no entanglement.
\rho_A mixed, \rho_B mixed iff |\psi\rangle_{AB} is entangled. The purity \text{tr}(\rho_A^2) < 1 measures the depth of entanglement: \text{tr}(\rho_A^2) = 1/d_A at the maximum (maximally entangled); \text{tr}(\rho_A^2) = 1 only at the product boundary.
The pure-pure and mixed-pure cases never both occur for a bipartite pure state. From the Schmidt decomposition |\psi\rangle_{AB} = \sum_i \lambda_i |u_i\rangle_A \otimes |v_i\rangle_B, both reductions \rho_A = \sum_i \lambda_i^2 |u_i\rangle\langle u_i| and \rho_B = \sum_i \lambda_i^2 |v_i\rangle\langle v_i| have the same non-zero spectrum \{\lambda_i^2\}. So they have the same purity, same rank, same entropy.

Why the two reductions have identical spectra: the Schmidt decomposition is the SVD of the amplitude tensor c_{ij} of |\psi\rangle_{AB} = \sum_{ij} c_{ij} |i\rangle_A|j\rangle_B. The left and right singular vectors are the Schmidt kets; the singular values \lambda_i are squared to give the eigenvalues of both reductions. Same singular values → same eigenvalues on both sides.

Purity as entanglement detector

For a bipartite pure state:

\text{tr}(\rho_A^2) \;=\; 1 \;\iff\; |\psi\rangle_{AB} \text{ is a product state (not entangled)}.

\text{tr}(\rho_A^2) \;<\; 1 \;\iff\; |\psi\rangle_{AB} \text{ is entangled}.

This is the test. You can run it mechanically: compute \rho_A by partial trace, square it, take the trace, compare to 1. The distance 1 - \text{tr}(\rho_A^2) measures roughly how entangled you are; the precise entanglement measure is the von Neumann entropy S(\rho_A) = -\text{tr}(\rho_A \log \rho_A), zero for products and increasing with entanglement.

For mixed joint states, the test breaks down — a mixed \rho_{AB} can have mixed reductions without being entangled (e.g. the product state I/4 \otimes I/4 = I/16 = I/2 \otimes I/2 has maximally mixed reductions, but it is not entangled). Separating mixed-state entanglement from classical mixture is a harder problem, and the partial-trace purity test is not enough. For bipartite pure states, though, the test is exact — which is why purity-of-reduction is the cleanest entanglement diagnostic in the textbook.

Worked examples

Example 1: Partial trace on the Bell state $|\Phi^+\rangle$

Compute \rho_A = \text{tr}_B(|\Phi^+\rangle\langle\Phi^+|) for

|\Phi^+\rangle \;=\; \frac{|00\rangle + |11\rangle}{\sqrt 2}.

Verify the answer by the universal property — check that \text{tr}(A\,\rho_A) = \text{tr}((A\otimes I)\rho_{AB}) for the observables A = Z and A = X. Interpret: the reduction is maximally mixed; the Bell state's entanglement is maximal.

Step 1. Form the density matrix. Distribute the outer product:

\rho_{\Phi^+} \;=\; \tfrac{1}{2}\bigl(|00\rangle + |11\rangle\bigr)\bigl(\langle 00| + \langle 11|\bigr) \;=\; \tfrac{1}{2}\bigl(|00\rangle\langle 00| + |00\rangle\langle 11| + |11\rangle\langle 00| + |11\rangle\langle 11|\bigr).

Step 2. Apply the outer-product rule term by term. Each term is a tile |ab\rangle\langle cd| = |a\rangle\langle c|_A \otimes |b\rangle\langle d|_B, and the rule gives \text{tr}_B(|a\rangle\langle c|_A \otimes |b\rangle\langle d|_B) = \langle d|b\rangle\,|a\rangle\langle c|_A.

\text{tr}_B(|00\rangle\langle 00|) \;=\; \langle 0|0\rangle\cdot|0\rangle\langle 0|_A \;=\; |0\rangle\langle 0|_A.

\text{tr}_B(|00\rangle\langle 11|) \;=\; \langle 1|0\rangle\cdot|0\rangle\langle 1|_A \;=\; 0.

\text{tr}_B(|11\rangle\langle 00|) \;=\; \langle 0|1\rangle\cdot|1\rangle\langle 0|_A \;=\; 0.

\text{tr}_B(|11\rangle\langle 11|) \;=\; \langle 1|1\rangle\cdot|1\rangle\langle 1|_A \;=\; |1\rangle\langle 1|_A.

Why the cross-terms vanish: the coherence |00\rangle\langle 11| has B-factor |0\rangle\langle 1|_B, and \text{tr}(|0\rangle\langle 1|) = \langle 1|0\rangle = 0 because |0\rangle, |1\rangle are orthogonal. The off-diagonal coherences of the Bell state live entirely in these cross-terms, and the partial trace kills them.

Step 3. Assemble the sum.

\rho_A \;=\; \tfrac{1}{2}\bigl(|0\rangle\langle 0|_A + |1\rangle\langle 1|_A\bigr) \;=\; \tfrac{1}{2}\,I_A \;=\; \frac{I}{2}.

Step 4. Verify via the universal property for A = Z.

\text{tr}(Z \cdot I/2) \;=\; \tfrac{1}{2}\,\text{tr}(Z) \;=\; 0.

And from the joint state:

\text{tr}\bigl((Z \otimes I)\,\rho_{\Phi^+}\bigr) \;=\; \langle\Phi^+|\,Z\otimes I\,|\Phi^+\rangle \;=\; \tfrac{1}{2}(\langle 00| + \langle 11|)(Z\otimes I)(|00\rangle + |11\rangle).

Apply (Z\otimes I)|00\rangle = |00\rangle and (Z\otimes I)|11\rangle = -|11\rangle:

= \tfrac{1}{2}(\langle 00| + \langle 11|)(|00\rangle - |11\rangle) \;=\; \tfrac{1}{2}(1 - 1) \;=\; 0. \checkmark

Why both sides vanish: \rho_A = I/2 is symmetric in the \{|0\rangle, |1\rangle\} basis, so any traceless observable like Z averages to zero. Independently, the joint state |\Phi^+\rangle has perfect correlation between A and B, so "Z on A alone" has zero expectation — the +1 outcome from |00\rangle cancels the -1 outcome from |11\rangle.

Step 5. Verify for A = X.

\text{tr}(X \cdot I/2) \;=\; \tfrac{1}{2}\text{tr}(X) \;=\; 0.

From the joint state: (X \otimes I)|00\rangle = |10\rangle, (X\otimes I)|11\rangle = |01\rangle, so

\text{tr}\bigl((X \otimes I)\rho_{\Phi^+}\bigr) \;=\; \tfrac{1}{2}(\langle 00| + \langle 11|)(|10\rangle + |01\rangle) \;=\; \tfrac{1}{2}(0 + 0) \;=\; 0. \checkmark

Step 6. Check purity. \text{tr}(\rho_A^2) = \text{tr}((I/2)^2) = \text{tr}(I/4) = 2/4 = 1/2. That is the minimum possible for a qubit — the reduction is maximally mixed, the Bloch vector is \vec 0, and by the entanglement-detector rule, |\Phi^+\rangle is maximally entangled.

Result. \rho_A = I/2. Alice's reduced state on her half of the Bell pair is the center of the Bloch ball — indistinguishable, by local measurement, from a fair-coin ensemble \tfrac{1}{2}|0\rangle\langle 0| + \tfrac{1}{2}|1\rangle\langle 1| or any other ensemble with \vec r = 0.

The Bell state $|\Phi^+\rangle$ is a pure joint state. Its reduction to a single qubit is the maximally mixed state $I/2$ — the origin of the Bloch ball. All of the coherence of the joint state lived in the off-diagonal blocks, which partial trace discarded. Alice sees a fair coin.

What this shows. The Bell state is maximally entangled in the sharpest possible sense: the local description on either side carries no information at all. Every A-local observable averages to zero (its traceless part) or to \tfrac{1}{2} (its identity part) — statistics indistinguishable from pure randomness. The quantum structure of the Bell pair lives entirely in the correlations between the two halves, invisible to any single-side viewer.

Example 2: Tracing out one qubit of the GHZ state

The three-qubit GHZ state is

|\text{GHZ}\rangle \;=\; \frac{|000\rangle + |111\rangle}{\sqrt 2}.

Trace out qubit 3 (the third subsystem) and describe the reduced two-qubit state \rho_{12}. Compare to the W state reduction for contrast.

Step 1. Form the density matrix. Distribute:

\rho_{\text{GHZ}} \;=\; \tfrac{1}{2}\bigl(|000\rangle\langle 000| + |000\rangle\langle 111| + |111\rangle\langle 000| + |111\rangle\langle 111|\bigr).

Step 2. Apply the outer-product rule, tracing on the third subsystem. Each term |abc\rangle\langle a'b'c'| = |ab\rangle\langle a'b'|_{12} \otimes |c\rangle\langle c'|_3, and the rule gives \text{tr}_3(|ab\rangle\langle a'b'|_{12} \otimes |c\rangle\langle c'|_3) = \langle c'|c\rangle\,|ab\rangle\langle a'b'|_{12}.

\text{tr}_3(|000\rangle\langle 000|) = \langle 0|0\rangle\,|00\rangle\langle 00| \;=\; |00\rangle\langle 00|,

\text{tr}_3(|000\rangle\langle 111|) = \langle 1|0\rangle\,|00\rangle\langle 11| \;=\; 0,

\text{tr}_3(|111\rangle\langle 000|) = \langle 0|1\rangle\,|11\rangle\langle 00| \;=\; 0,

\text{tr}_3(|111\rangle\langle 111|) = \langle 1|1\rangle\,|11\rangle\langle 11| \;=\; |11\rangle\langle 11|.

Why the GHZ cross-terms vanish the same way as Bell cross-terms: the third qubit distinguishes |0\rangle from |1\rangle perfectly in the joint state, and tracing it out throws away the coherence between |000\rangle and |111\rangle. The relative phase that distinguished |\text{GHZ}\rangle from \tfrac{1}{\sqrt 2}(|000\rangle - |111\rangle) lived in the coherence; it is gone after partial trace.

Step 3. Assemble the reduction on qubits 1, 2.

\rho_{12} \;=\; \tfrac{1}{2}\bigl(|00\rangle\langle 00|_{12} + |11\rangle\langle 11|_{12}\bigr).

In the basis \{|00\rangle, |01\rangle, |10\rangle, |11\rangle\} the matrix is

\rho_{12} \;=\; \tfrac{1}{2}\begin{pmatrix}1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1\end{pmatrix}.

Step 4. Read the result. \rho_{12} is a classical mixture — a 50{-}50 blend of |00\rangle and |11\rangle, with no coherence between them (no off-diagonal corner entries). This looks very different from the Bell state, which is an equal-amplitude coherent superposition of |00\rangle and |11\rangle on two qubits.

Step 5. Check purity.

\rho_{12}^2 \;=\; \tfrac{1}{4}\bigl(|00\rangle\langle 00| + |11\rangle\langle 11|\bigr), \qquad \text{tr}(\rho_{12}^2) \;=\; \tfrac{1}{4} + \tfrac{1}{4} \;=\; \tfrac{1}{2}.

Genuinely mixed. Rank 2. The reduction is not pure, so the GHZ state is entangled in the (qubit 3) vs (qubits 1, 2) bipartition — exactly as expected.

Step 6. Is \rho_{12} itself entangled across qubit 1 vs qubit 2? No: it is a convex combination of two product states |00\rangle\langle 00| = |0\rangle\langle 0|_1 \otimes |0\rangle\langle 0|_2 and |11\rangle\langle 11| = |1\rangle\langle 1|_1 \otimes |1\rangle\langle 1|_2, each of which is separable. So \rho_{12} is a separable mixed state — classically correlated, but not entangled. This is the remarkable feature of GHZ: when you trace out one party, the remaining bipartite state loses all quantum correlations and becomes classical.

Result. \rho_{12} = \tfrac{1}{2}(|00\rangle\langle 00| + |11\rangle\langle 11|) — a classical mixture, rank 2, purity 1/2, separable across qubits 1 and 2.

Tracing out one qubit of the GHZ state leaves a purely classical 50-50 mixture of $|00\rangle$ and $|11\rangle$ on the remaining two qubits. The $(|00\rangle, |11\rangle)$ correlation survives; the quantum coherence between them does not. The reduced state is separable — classically correlated, not entangled.

Comparison with the W state. The three-qubit W state is

|\text{W}\rangle \;=\; \frac{|001\rangle + |010\rangle + |100\rangle}{\sqrt 3}.

Tracing out qubit 3 of the W state, by the same rule, gives

\rho_{12}^{\text{W}} \;=\; \frac{1}{3}\bigl(|01\rangle\langle 01| + |01\rangle\langle 10| + |10\rangle\langle 01| + |10\rangle\langle 10| + |00\rangle\langle 00|\bigr),

How this comes out: the |100\rangle piece keeps its |10\rangle on qubits 1-2 with the |0\rangle on qubit 3 collapsing to \langle 0|0\rangle = 1. Similarly for |010\rangle \to |01\rangle, with qubit-3 |0\rangle collapsing. The |001\rangle piece keeps |00\rangle on qubits 1-2 with the qubit-3 |1\rangle collapsing to \langle 1|1\rangle = 1. All diagonal terms contribute; the |001\rangle vs |010\rangle and |001\rangle vs |100\rangle cross-terms vanish (qubit-3 mismatch), but the |010\rangle vs |100\rangle cross-term survives because both have qubit-3 state |0\rangle. which has a non-trivial off-diagonal entry between |01\rangle and |10\rangle and is itself entangled across qubits 1 and 2. This is the famous structural contrast: GHZ entanglement is all-or-nothing — tracing out one party collapses it to classical; W entanglement is robust — tracing out one party leaves a still-entangled pair on the others.

Common confusions

"Partial trace destroys information." Half right. It discards information about subsystem B, and about the correlations between A and B. It does not change the joint state \rho_{AB} — that state still exists, still carries every correlation it ever did. Partial trace is a map that outputs a smaller description; it does not mutate the input. If you still have \rho_{AB} in hand, no information is lost; if you only have \rho_A, then yes, everything about B and the correlations is gone.
"Partial trace is the same as summing over B's values." In the computational basis this looks true: (\rho_A)_{ij} = \sum_k (\rho_{AB})_{ik, jk}, which looks like "sum over the diagonal index of B." But the actual operation is basis-independent — you can trace in any orthonormal basis of \mathcal H_B and get the same answer. The "sum" structure is how the outer-product rule happens to render in one basis; it is not the definition.
"Order matters when tracing out multiple subsystems." No. \text{tr}_B \text{tr}_C = \text{tr}_C \text{tr}_B = \text{tr}_{BC}. Tracing out subsystems B and C, in any order, gives the reduced state \rho_A. The proof is that each partial trace is a sum over a basis of its subsystem, and the two basis sums — over B and over C — commute because they act on different tensor factors.
"If \rho_{AB} is pure and \rho_A is pure, then |\psi\rangle_{AB} = |\psi\rangle_A \otimes |\psi\rangle_B." True in direction ("product state → pure reductions"), and true in reverse ("pure reduction → product state") when the joint is pure. The combination of "joint pure + reduction pure" is a clean characterisation of bipartite product states. For mixed joint states, though, the implication fails both ways and you need finer tools (entanglement measures, separability tests).
"Every mixed single-qubit state must come from tracing out something." Not literally — a classically-uncertain preparation (the coin-flip qubit) gives I/2 without any physical environment. But every mixed state has a purification (next chapter) — an abstract "extended" pure state on a larger space from which the mixed state is obtained by tracing out. That purification may or may not correspond to a real physical system. Mathematically, the partial-trace picture is universal; physically, it is one of several valid origins for mixture.
"The partial trace is only for qubits." The partial trace is defined on any bipartite Hilbert space \mathcal H_A \otimes \mathcal H_B of any dimensions, including infinite-dimensional continuous-variable systems (where the sum becomes an integral). The outer-product rule is the same.

Going deeper

If you came here to know what partial trace is, when it matters, and how to compute tr_B on Bell, GHZ and W states — you have it. The rest of the section explores the deeper structure: a proof of uniqueness from the completely-positive-trace-preserving (CPTP) map framework, the operator-Schmidt decomposition, the Choi-Jamiolkowski isomorphism, partial trace as a quantum channel, and information-theoretic characterisations.

Uniqueness of partial trace — the full argument

The universal property \text{tr}(A \cdot \text{tr}_B(\rho_{AB})) = \text{tr}((A\otimes I)\rho_{AB}) can be read as: there is a unique linear map \mathcal E such that \text{tr}(A \cdot \mathcal E(\rho_{AB})) = \text{tr}((A \otimes I)\rho_{AB}) for every A. The argument has three pieces. (1) Existence: the map \text{tr}_B defined by the outer-product rule satisfies the identity. Straight calculation on tiles. (2) Uniqueness: if \mathcal E, \mathcal E' both satisfy the identity, then \text{tr}(A(\mathcal E - \mathcal E')(\rho_{AB})) = 0 for every A; choose A ranging over a basis of Hermitian matrices (e.g. Pauli-tensor basis for qubits) and conclude (\mathcal E - \mathcal E')(\rho_{AB}) = 0. (3) Range over all \rho_{AB}: since density operators span the whole operator space, \mathcal E = \mathcal E' as linear maps.

A cleaner statement: the partial trace is the dual (in the trace-pairing \langle A, M\rangle = \text{tr}(A^\dagger M)) of the embedding A \mapsto A \otimes I from \mathcal B(\mathcal H_A) \to \mathcal B(\mathcal H_A \otimes \mathcal H_B). That embedding is a linear injection; its adjoint is the partial trace. This framing is what generalises to the operator-Schmidt and Choi-Jamiolkowski structures below.

Operator-Schmidt decomposition

For any operator X on \mathcal H_A \otimes \mathcal H_B, there is an operator-Schmidt decomposition analogous to the state-Schmidt form:

X \;=\; \sum_k s_k\,A_k \otimes B_k,

where \{A_k\} and \{B_k\} are orthonormal in the Hilbert-Schmidt inner product (\langle A, A'\rangle = \text{tr}(A^\dagger A')) and s_k \geq 0. The partial trace in this basis is immediate:

\text{tr}_B(X) \;=\; \sum_k s_k\,A_k\,\text{tr}(B_k).

If the B_k are traceless (which they usually are, after orthogonalising against I/\sqrt{d_B}), most terms vanish and only the B_k = I/\sqrt{d_B} component survives. This is why the partial trace of a "noise" operator typically discards every traceless piece and leaves only the identity component on the traced-out subsystem.

Choi-Jamiolkowski and partial trace as a channel

The partial trace \text{tr}_B : \mathcal B(\mathcal H_A \otimes \mathcal H_B) \to \mathcal B(\mathcal H_A) is a completely positive, trace-preserving (CPTP) map — a quantum channel. It preserves Hermiticity, positivity, and trace; it extends completely positively to any reference system (\text{tr}_B \otimes I_R maps \rho_{ABR} to \rho_{AR} and is itself positive). So partial trace is a special case of the general quantum-channel framework developed in chapters 108-110 on open quantum evolution.

Through the Choi-Jamiolkowski isomorphism, the partial trace corresponds to the isotropic state

J_{\text{tr}_B} \;=\; \tfrac{1}{d_B}\,I_{AB} \otimes |I_A\rangle\!\rangle\langle\!\langle I_A|,

where |I_A\rangle\!\rangle = \sum_i |i\rangle_A|i\rangle_{A'} is the (unnormalised) maximally entangled vector in \mathcal H_A \otimes \mathcal H_{A'}. The channel-state duality makes the partial trace one of the "simplest" CPTP maps — no Kraus complexity, no dephasing noise, just a clean contraction on the B-index.

Information-theoretic meaning

From quantum information theory: the partial trace is the operation of forgetting a subsystem. If \rho_{AB} has von Neumann entropy S(\rho_{AB}), the reduction satisfies the subadditivity inequality

S(\rho_A) + S(\rho_B) \;\geq\; S(\rho_{AB}),

with equality iff \rho_{AB} = \rho_A \otimes \rho_B (i.e., no correlations). The mutual information I(A:B) = S(\rho_A) + S(\rho_B) - S(\rho_{AB}) is exactly the information destroyed when you replace the joint state with the pair of marginals — the amount by which partial tracing of each side separately loses the correlation structure.

For pure joint states, subadditivity becomes exact: S(\rho_{AB}) = 0 (pure), so I(A:B) = 2 S(\rho_A) = 2 S(\rho_B) — twice the entanglement entropy. The mutual information double-counts the entanglement, and partial tracing is the way that counting gets concretely realised.

Indian labs doing partial-trace experiments

Partial-trace-based tomography is daily work at the Indian NMR quantum computing groups — TIFR Mumbai and IIT Madras — where multi-qubit NMR states are characterised by reconstructing reduced density matrices via Pauli-expectation measurements. The partial-trace operation is what turns a full 2^n-qubit dataset into the single-qubit and two-qubit marginals that reveal local noise sources. More recently, Raman Research Institute, Bangalore groups have used partial-trace-based entanglement witnesses in polarisation-qubit photonic experiments. When Indian experimentalists report "single-qubit fidelities" in a multi-qubit setup, the underlying object is always a partial-traced reduction — the whole edifice of noise characterisation rests on partial-trace arithmetic.

Where this leads next

Purification — the inverse of partial trace: every mixed state is the reduction of some pure state on a larger space.
Partial trace (chapter 9) — the original introduction, with block-trace mechanics and the Bell-state reduction.
Density operator — the object on which the partial trace acts.
Schmidt decomposition — the decomposition that makes partial-trace computation immediate for bipartite pure states.
GHZ and W states — the three-qubit entangled states whose partial traces showed up in Example 2.
Evolution of \rho — closed-system dynamics, and the forward reference to channels where partial trace becomes a special CPTP map.

References

Wikipedia, Partial trace — definition, basis-independence, examples.
Nielsen and Chuang, Quantum Computation and Quantum Information (2010), §2.4.3 — Cambridge University Press.
John Preskill, Lecture Notes on Quantum Computation, Ch. 3 (density operators, partial traces, decoherence) — theory.caltech.edu/~preskill/ph229.
John Watrous, The Theory of Quantum Information (2018), §1.1.2 on the partial trace — cs.uwaterloo.ca/~watrous/TQI.
Wikipedia, Reduced density matrix — the density-matrix view of partial traces and subsystem states.
Dénes Petz, Quantum Information Theory and Quantum Statistics (2008) — DOI link.