In short

Every bipartite pure state |\psi\rangle_{AB} has a Schmidt decomposition — a canonical form where the amplitudes become diagonal against a special choice of bases:

|\psi\rangle_{AB} \;=\; \sum_{i=1}^{r} \lambda_i\,|u_i\rangle_A \otimes |v_i\rangle_B,

with \lambda_i > 0, \sum_i \lambda_i^2 = 1, and \{|u_i\rangle\}, \{|v_i\rangle\} orthonormal on their respective sides. The integer r is the Schmidt rank — the length of the sum. Rank 1 means the state is a product (no entanglement); rank \geq 2 means the state is entangled. The \lambda_i, called Schmidt coefficients, are the singular values of the amplitude matrix, and their squares are the eigenvalues of the reduced density matrix \rho_A. The entanglement entropy S(\rho_A) = -\sum_i \lambda_i^2 \log_2 \lambda_i^2 measures, in bits, how entangled the state is. Schmidt decomposition is the bipartite entanglement story in one canonical form — the analogue, for quantum states, of diagonalising a matrix.

You have met the Bell state |\Phi^+\rangle = \tfrac{1}{\sqrt{2}}(|00\rangle + |11\rangle) — two terms, symmetric, beautiful. You have met an arbitrary two-qubit state |\psi\rangle = c_{00}|00\rangle + c_{01}|01\rangle + c_{10}|10\rangle + c_{11}|11\rangle — four amplitudes, no obvious structure, entangled or not depending on whether the amplitudes factor.

Stand those two examples next to each other and a question appears. The Bell state's amplitudes show their structure on sight — you can see the entanglement in the fact that only |00\rangle and |11\rangle appear, and with equal amplitude. The generic state's amplitudes hide whatever structure it has behind four complex numbers. Is there a way to always write a bipartite pure state in a form where the entanglement structure is visible? Where you can read off "this state is a product" or "this state is entangled with such-and-such strength" from the coefficients alone?

Yes. The form is the Schmidt decomposition, and it says something remarkable: for any bipartite pure state, there exist orthonormal bases on each side — not the computational basis in general, but some other basis tailored to the state — in which the amplitudes are real, non-negative, and diagonal. The off-diagonal entries are all zero. The state becomes a sum of "matched" product terms, one per diagonal entry, with the size of each entry measuring how much that term contributes.

This is the entanglement analogue of diagonalising a matrix. It exists for every bipartite pure state. It is unique up to degeneracies. It is the bridge between the amplitude picture of a state and the spectral picture of its reduced density matrix. And it is the first tool you should reach for whenever someone says the word "bipartite entanglement."

What the theorem says

Let Alice hold a d_A-dimensional system and Bob hold a d_B-dimensional system. A joint pure state |\psi\rangle_{AB} lives in the tensor-product Hilbert space \mathcal{H}_A \otimes \mathcal{H}_B of dimension d_A d_B.

Schmidt decomposition

For every pure state |\psi\rangle_{AB} \in \mathcal{H}_A \otimes \mathcal{H}_B, there exist orthonormal bases \{|u_1\rangle, \dots, |u_{d_A}\rangle\} of \mathcal{H}_A and \{|v_1\rangle, \dots, |v_{d_B}\rangle\} of \mathcal{H}_B, and non-negative real numbers \lambda_1 \geq \lambda_2 \geq \cdots \geq 0, such that

|\psi\rangle_{AB} \;=\; \sum_{i=1}^{\min(d_A, d_B)} \lambda_i\,|u_i\rangle_A \otimes |v_i\rangle_B,

with \sum_i \lambda_i^2 = 1. The number r of strictly positive \lambda_i is the Schmidt rank of |\psi\rangle_{AB}; the \lambda_i are the Schmidt coefficients.

Reading the theorem. Three things happen at once. First, the double sum over d_A \cdot d_B amplitudes collapses to a single sum over at most \min(d_A, d_B) terms. Second, the amplitudes are real and non-negative — every complex phase has been absorbed into the basis vectors. Third, the bases on A and B are matched in pairs: the i-th basis vector on A is matched with the i-th basis vector on B, and no other. No cross-terms. The amplitude matrix is diagonal.

That last fact is the full content of the theorem. For a two-qubit state, the generic amplitude matrix is 2\times 2 with four independent complex entries; the Schmidt form is a diagonal matrix with two non-negative real entries. The reduction in complexity is the reduction of entanglement to its bare bones.

Schmidt decomposition as sum of orthogonal product pairsOn the left, a generic two-qubit state with four complex amplitudes in a 2 by 2 grid. An arrow labelled Schmidt decomposition points right to a diagonal form: a sum of r terms, each lambda_i ket u_i tensor ket v_i, shown as two paired boxes linked by a tensor symbol.any |ψ⟩_ABc₀₀|00⟩ + c₀₁|01⟩+ c₁₀|10⟩ + c₁₁|11⟩four complexamplitudesSVDSchmidt formλ₁|u₁⟩_A ⊗ |v₁⟩_B+ λ₂|u₂⟩_A ⊗ |v₂⟩_B+ …diagonal — each term pairs the i-th basis on A with the i-th on B
Every bipartite pure state, no matter how complex its amplitude table, admits a canonical diagonal form: a sum of paired product terms with real non-negative weights $\lambda_i$.

The picture: what "diagonal" looks like

Write the joint state's amplitudes as a matrix. For two qubits,

|\psi\rangle_{AB} = \sum_{i,j} c_{ij}\,|i\rangle_A \otimes |j\rangle_B, \qquad C = \begin{pmatrix}c_{00} & c_{01} \\ c_{10} & c_{11}\end{pmatrix}.

The matrix C is the amplitude matrix — rows indexed by Alice's basis states, columns by Bob's. In general C has four complex entries.

If C were diagonal — say, C = \text{diag}(\lambda_1, \lambda_2) with \lambda_i \geq 0 — then the state is already in Schmidt form in the computational basis:

|\psi\rangle_{AB} = \lambda_1 |0\rangle_A|0\rangle_B + \lambda_2 |1\rangle_A|1\rangle_B.

If C is not diagonal, you can always diagonalise it, not by similarity (which changes eigenvalues) but by the singular-value decomposition (SVD) — which splits C as C = U\,\Sigma\,V^\dagger, where U and V are unitary matrices and \Sigma is diagonal with non-negative entries. The Schmidt bases \{|u_i\rangle\} and \{|v_i\rangle\} are the columns of U and V respectively, and the Schmidt coefficients are the diagonal entries of \Sigma.

This is not an accident of the notation — the SVD is the Schmidt decomposition, rewritten in matrix language instead of ket language. Every linear algebra library you have ever used can compute a Schmidt decomposition for you with a single function call (numpy's svd, scipy's linalg.svd, MATLAB's svd) — you feed it the amplitude matrix, it returns U, \Sigma, V, and the Schmidt decomposition falls out immediately.

The SVD connection — the proof sketch

Here is the full argument in five lines, because the theorem is short enough to deserve a proof you can see in one sitting.

Start with an arbitrary joint state:

|\psi\rangle_{AB} = \sum_{i,j} c_{ij}\,|i\rangle_A|j\rangle_B.

Collect the c_{ij} into a matrix C. By the SVD theorem (a theorem of linear algebra, true for any complex matrix), there exist unitary U (of size d_A \times d_A) and V (of size d_B \times d_B), and a diagonal matrix \Sigma with non-negative entries \sigma_i, such that C = U \Sigma V^\dagger. Write the entries: c_{ij} = \sum_k U_{ik}\,\sigma_k\,\overline{V_{jk}}.

Substitute back:

|\psi\rangle_{AB} = \sum_{i,j,k} U_{ik}\,\sigma_k\,\overline{V_{jk}}\,|i\rangle_A|j\rangle_B = \sum_k \sigma_k \Bigl(\sum_i U_{ik}|i\rangle_A\Bigr) \otimes \Bigl(\sum_j \overline{V_{jk}}|j\rangle_B\Bigr).

Define |u_k\rangle_A = \sum_i U_{ik}|i\rangle_A and |v_k\rangle_B = \sum_j \overline{V_{jk}}|j\rangle_B. Because U and V are unitary, their columns form orthonormal bases, so \{|u_k\rangle\} and \{|v_k\rangle\} are orthonormal. Setting \lambda_k = \sigma_k:

|\psi\rangle_{AB} = \sum_k \lambda_k\,|u_k\rangle_A \otimes |v_k\rangle_B.

Why this is everything: the SVD decomposes any matrix into "unitary times diagonal times unitary." The two unitaries become the Schmidt bases, the diagonal becomes the Schmidt coefficients, and the overall sum structure becomes the diagonal pairing. The existence of an SVD for every complex matrix is a theorem of linear algebra; applying it to the amplitude matrix is the existence proof of the Schmidt decomposition. No quantum-mechanical content is needed — the whole theorem is linear algebra in disguise.

Normalisation: the state |\psi\rangle_{AB} is a unit vector, which means \sum_{ij} |c_{ij}|^2 = 1. The Frobenius norm of C equals the sum of squared singular values: \|C\|_F^2 = \sum_i \sigma_i^2. Therefore \sum_i \lambda_i^2 = 1, matching the theorem statement.

Amplitude matrix and its SVDThree boxes in a row. The first box shows C, the amplitude matrix, with four entries c_00 through c_11. A label 'SVD' points to three factor matrices: U (unitary), Sigma (diagonal), V-dagger (unitary). Below each, the Schmidt-side identification: columns of U give |u_i>, diagonal of Sigma gives lambda_i, rows of V-dagger give |v_i>.Cc₀₀c₀₁c₁₀c₁₁=Uunitary|u_i⟩ columns·Σλ₁λ₂00Schmidt coefs·V†unitary|v_i⟩ rowsΣ λᵢ|uᵢ⟩|vᵢ⟩Schmidt form
The amplitude matrix $C$ of a bipartite pure state factors by SVD as $U \Sigma V^\dagger$. The columns of $U$ become the Schmidt basis on A, the diagonal of $\Sigma$ gives the Schmidt coefficients, and the rows of $V^\dagger$ become the Schmidt basis on B.

Schmidt rank and what it means

The Schmidt rank r is the number of strictly positive \lambda_i in the decomposition. Equivalently, it is the matrix rank of the amplitude matrix C.

Two bounds trap the rank:

For two qubits, d_A = d_B = 2, so r \in \{1, 2\}. Exactly two possibilities.

Rank 1 means product state. If only one \lambda_i is non-zero — say \lambda_1 — the decomposition collapses to |\psi\rangle = \lambda_1 |u_1\rangle_A \otimes |v_1\rangle_B. Since \lambda_1^2 = 1 (normalisation), \lambda_1 = 1. So |\psi\rangle = |u_1\rangle \otimes |v_1\rangle — a single tensor product. No entanglement.

Rank \geq 2 means entangled. Two or more non-zero terms, each a distinct product, means the sum cannot be collapsed to a single product. That is the definition of entanglement (ch.18). So Schmidt rank is a clean binary test: rank 1 ⇔ product; rank > 1 ⇔ entangled.

For two qubits this is as fine a classification as rank gives. For higher dimensions the rank gets more expressive — a qutrit-qutrit system can have Schmidt ranks 1, 2, or 3, with 3 being "more entangled" than 2 in a precise sense the rank alone captures.

Why rank isn't the end of the story: two qutrit-qutrit states can both have Schmidt rank 3 but differ wildly in how evenly the three Schmidt coefficients are distributed. A state with (\lambda_1, \lambda_2, \lambda_3) = (0.99, 0.1, 0.1) has rank 3 but is almost a product. A state with \lambda_i = 1/\sqrt{3} each is maximally entangled at rank 3. The rank counts how many terms appear; the distribution of the \lambda_i says how "spread out" the entanglement is. Together they determine the state up to local unitaries.

Connection to the reduced density matrix

Recall from the partial-trace chapter (ch.9) that \rho_A = \text{tr}_B(|\psi\rangle\langle\psi|_{AB}) is the reduced state of Alice's side when Bob is ignored. The Schmidt decomposition makes the spectrum of \rho_A transparent.

Compute \rho_A using the Schmidt form. Let |\psi\rangle = \sum_i \lambda_i |u_i\rangle_A|v_i\rangle_B. Then

|\psi\rangle\langle\psi| = \sum_{i,j} \lambda_i \lambda_j\,|u_i\rangle\langle u_j|_A \otimes |v_i\rangle\langle v_j|_B.

Trace over B:

\rho_A = \text{tr}_B(|\psi\rangle\langle\psi|) = \sum_{i,j} \lambda_i \lambda_j\,\langle v_j|v_i\rangle\,|u_i\rangle\langle u_j|_A.

But \{|v_i\rangle\} is orthonormal, so \langle v_j|v_i\rangle = \delta_{ij}. Only diagonal terms survive:

\rho_A = \sum_i \lambda_i^2\,|u_i\rangle\langle u_i|_A.

This is \rho_A's spectral decomposition — written with the Schmidt basis \{|u_i\rangle\} as eigenvectors and the squared Schmidt coefficients \lambda_i^2 as eigenvalues. The Schmidt decomposition is simultaneously the canonical form of the state and the diagonalisation of \rho_A. The eigenvalues of \rho_A are exactly \lambda_1^2, \lambda_2^2, \ldots — the squares of the Schmidt coefficients.

Why this matters operationally: computing the Schmidt decomposition doesn't require actually running SVD every time. If you want the Schmidt coefficients of a known state, you can instead compute \rho_A (take the partial trace over B), diagonalise it to get eigenvalues p_i, and then \lambda_i = \sqrt{p_i}. The two routes — SVD on the amplitude matrix, or diagonalisation of the reduced density matrix — give the same \lambda_i. Whichever is easier for a given problem, use it.

A non-obvious corollary: \rho_A and \rho_B share the same non-zero eigenvalues. From the Schmidt form, \rho_B = \sum_i \lambda_i^2 |v_i\rangle\langle v_i| — same \lambda_i^2 as eigenvalues, just in a different eigenbasis. So tracing out either side gives the same spectrum. This is the Schmidt decomposition's most beautiful consequence: the "entanglement content" is symmetric between the two parties, even though the bases \{|u_i\rangle\} and \{|v_i\rangle\} might be entirely different.

Worked example 1 — Schmidt form of a Bell state

Example 1 — the Bell state's Schmidt decomposition is staring at you

Find the Schmidt decomposition of |\Phi^+\rangle = \tfrac{1}{\sqrt{2}}(|00\rangle + |11\rangle), and compute the resulting eigenvalues of \rho_A and the entanglement entropy.

Step 1 — Write the amplitude matrix. In the computational basis, the |00\rangle, |01\rangle, |10\rangle, |11\rangle coefficients are \tfrac{1}{\sqrt 2}, 0, 0, \tfrac{1}{\sqrt 2}. Arranging as a 2\times 2 matrix (rows indexed by Alice's bit, columns by Bob's):

C = \begin{pmatrix} c_{00} & c_{01} \\ c_{10} & c_{11}\end{pmatrix} = \begin{pmatrix} \tfrac{1}{\sqrt 2} & 0 \\ 0 & \tfrac{1}{\sqrt 2}\end{pmatrix} = \tfrac{1}{\sqrt 2} I.

Step 2 — Read off the SVD. The matrix is already diagonal. Its SVD is C = I \cdot \text{diag}(\tfrac{1}{\sqrt 2}, \tfrac{1}{\sqrt 2}) \cdot I, i.e., U = V = I and \Sigma = \text{diag}(\tfrac{1}{\sqrt 2}, \tfrac{1}{\sqrt 2}). Why no work is needed: the Bell state is already written in a form where the amplitudes pair up — |00\rangle and |11\rangle, each with the same coefficient. The diagonal amplitude matrix is the fingerprint of a state that's already in Schmidt form. Most states aren't this lucky.

Step 3 — Identify Schmidt basis and coefficients. The columns of U = I are |u_1\rangle = |0\rangle, |u_2\rangle = |1\rangle; the columns of V = I are |v_1\rangle = |0\rangle, |v_2\rangle = |1\rangle. The Schmidt coefficients are both 1/\sqrt{2}.

|\Phi^+\rangle = \tfrac{1}{\sqrt 2}|0\rangle_A|0\rangle_B + \tfrac{1}{\sqrt 2}|1\rangle_A|1\rangle_B.

Step 4 — Compute the reduced density matrix. Using \rho_A = \sum_i \lambda_i^2 |u_i\rangle\langle u_i|:

\rho_A = \tfrac{1}{2}|0\rangle\langle 0| + \tfrac{1}{2}|1\rangle\langle 1| = \tfrac{1}{2}I.

The eigenvalues are \tfrac{1}{2}, \tfrac{1}{2} — exactly \lambda_i^2.

Step 5 — Compute the entanglement entropy.

S(\rho_A) = -\sum_i \lambda_i^2 \log_2 \lambda_i^2 = -\tfrac{1}{2}\log_2\tfrac{1}{2} - \tfrac{1}{2}\log_2\tfrac{1}{2} = -\tfrac{1}{2}(-1) - \tfrac{1}{2}(-1) = 1.

One bit of entanglement. This is the maximum possible entanglement between two qubits — the Bell state saturates the bound.

Result. The Bell state |\Phi^+\rangle has Schmidt rank 2, Schmidt coefficients (\tfrac{1}{\sqrt{2}}, \tfrac{1}{\sqrt{2}}), reduced eigenvalues (\tfrac{1}{2}, \tfrac{1}{2}), and entanglement entropy 1 bit — maximally entangled.

What this shows. The Bell state wears its Schmidt form on its sleeve. Because it was already written in "matched pair" form with equal coefficients, the SVD is trivial. The maximality of its entanglement — a Bell state is the most entangled two-qubit state — corresponds directly to the equal distribution of the Schmidt coefficients: for any fixed rank r, the entanglement entropy is maximised when all \lambda_i are equal, and \log_2 r is the ceiling. For r = 2, the ceiling is 1 bit; Bell states hit it.

Bell state entropy bar: 1 bitTwo side-by-side bar charts. The left shows a product state with one bar at height 1, entropy 0. The right shows a Bell state with two bars each at height one half, entropy 1. A horizontal dashed line at height 1 marks the two-qubit maximum.product state10λ₁²=1λ₂²=0S = 0 bitsBell state1½0λ₁²=½λ₂²=½S = 1 bitmax
Eigenvalues of $\rho_A$ (which are the squared Schmidt coefficients) for a product state versus a Bell state. The product state has a single non-zero eigenvalue of $1$; the Bell state has two equal eigenvalues of $\tfrac{1}{2}$, saturating the entropy bound.

Worked example 2 — a state that requires real computation

Example 2 — Schmidt decomposition of $\tfrac{1}{\sqrt{3}}(|00\rangle + |01\rangle + |10\rangle)$

Find the Schmidt decomposition of |\psi\rangle = \tfrac{1}{\sqrt{3}}(|00\rangle + |01\rangle + |10\rangle). This state looks symmetric but is not already in Schmidt form.

Step 1 — Write the amplitude matrix. The coefficients are c_{00} = c_{01} = c_{10} = \tfrac{1}{\sqrt 3} and c_{11} = 0:

C = \tfrac{1}{\sqrt 3}\begin{pmatrix} 1 & 1 \\ 1 & 0\end{pmatrix}.

Step 2 — Compute the reduced density matrix \rho_A. Shortcut: \rho_A = C C^\dagger. Here C^\dagger = \tfrac{1}{\sqrt 3}\begin{pmatrix}1 & 1 \\ 1 & 0\end{pmatrix} (real, so transpose conjugate equals transpose).

\rho_A = C C^\dagger = \tfrac{1}{3}\begin{pmatrix}1 & 1 \\ 1 & 0\end{pmatrix}\begin{pmatrix}1 & 1 \\ 1 & 0\end{pmatrix} = \tfrac{1}{3}\begin{pmatrix}2 & 1 \\ 1 & 1\end{pmatrix}.

Why the shortcut works: \rho_A = \text{tr}_B(|\psi\rangle\langle\psi|), and if you write |\psi\rangle as a matrix-valued amplitude, tracing out B is equivalent to C times C-dagger. The computation is a standard matrix product. This trick saves a lot of time in practice.

Step 3 — Find the eigenvalues of \rho_A. Solve \det(\rho_A - p I) = 0:

\det\begin{pmatrix} \tfrac{2}{3} - p & \tfrac{1}{3} \\ \tfrac{1}{3} & \tfrac{1}{3} - p\end{pmatrix} = \bigl(\tfrac{2}{3} - p\bigr)\bigl(\tfrac{1}{3} - p\bigr) - \tfrac{1}{9} = 0.

Expand:

p^2 - p + \bigl(\tfrac{2}{9} - \tfrac{1}{9}\bigr) = p^2 - p + \tfrac{1}{9} = 0.

Apply the quadratic formula:

p = \frac{1 \pm \sqrt{1 - \tfrac{4}{9}}}{2} = \frac{1 \pm \sqrt{\tfrac{5}{9}}}{2} = \frac{1 \pm \tfrac{\sqrt 5}{3}}{2} = \frac{3 \pm \sqrt 5}{6}.

So p_1 = \frac{3 + \sqrt 5}{6} \approx 0.873 and p_2 = \frac{3 - \sqrt 5}{6} \approx 0.127.

Normalisation check: p_1 + p_2 = \frac{6}{6} = 1. Good — the trace is 1 as it must be.

Step 4 — Schmidt coefficients. \lambda_i = \sqrt{p_i}:

\lambda_1 = \sqrt{\tfrac{3 + \sqrt 5}{6}} \approx 0.934, \qquad \lambda_2 = \sqrt{\tfrac{3 - \sqrt 5}{6}} \approx 0.357.

Sum of squares: \lambda_1^2 + \lambda_2^2 = p_1 + p_2 = 1. Normalised.

Step 5 — Rank and entropy. Both \lambda_i are strictly positive, so the Schmidt rank is 2 — the state is entangled but not maximally so. Entanglement entropy:

S = -p_1 \log_2 p_1 - p_2 \log_2 p_2 \approx -0.873 \log_2(0.873) - 0.127 \log_2(0.127) \approx 0.550 \text{ bits}.

About half a bit. The state is genuinely entangled — well short of a Bell state's full bit — but not a product either.

Result. Schmidt rank 2, Schmidt coefficients (0.934, 0.357), entanglement entropy \approx 0.550 bits.

What this shows. A symmetric-looking amplitude expression can hide an unbalanced Schmidt decomposition. The state \tfrac{1}{\sqrt 3}(|00\rangle + |01\rangle + |10\rangle) has three equal amplitudes in the computational basis, but its Schmidt form has one dominant coefficient (\lambda_1 \approx 0.934) and one small one (\lambda_2 \approx 0.357) — much more "product-like" than "maximally entangled." Schmidt coefficients tell the truth that amplitude listings disguise. Any time a state looks symmetric but entangled, compute the Schmidt coefficients — the imbalance (or lack thereof) is where the entanglement story actually lives.

Uneven Schmidt coefficientsA bar chart with two bars side by side. The left bar is tall at about 0.87 of the vertical scale and is labelled lambda_1 squared. The right bar is short at about 0.13 and is labelled lambda_2 squared. A dashed horizontal line at height one-half marks where balanced coefficients would sit.1½0balancedλ₁² ≈ 0.87λ₂² ≈ 0.13entropy ≈ 0.55 bits
The Schmidt eigenvalues of $\tfrac{1}{\sqrt 3}(|00\rangle + |01\rangle + |10\rangle)$: one large ($\approx 0.87$), one small ($\approx 0.13$). The state is entangled, but far from maximally so — its entropy is about half of a Bell state's.

Common confusions

Going deeper

You have the theorem, the SVD-based proof, the bipartite-only caveat, and two worked examples at either end of the entanglement spectrum. What follows is the deeper technical story: the exact role of the SVD, Schmidt rank as an LOCC monotone, the purification theorem as a consequence, entanglement spectra in many-body physics, matrix-product states as an iterated-Schmidt construction, and why multipartite Schmidt-like theorems are elusive.

Why the SVD always exists

The singular-value decomposition C = U \Sigma V^\dagger holds for every complex matrix C, rectangular or square, of any size. The standard proof constructs \Sigma's diagonal entries as the square roots of eigenvalues of the Hermitian matrix C^\dagger C, which is always non-negative definite; hence its eigenvalues are non-negative and their square roots are real. The eigenvectors of C^\dagger C form the columns of V; then U is constructed to make U \Sigma V^\dagger = C work out.

For infinite-dimensional Hilbert spaces (continuous-variable quantum systems — photonic modes, harmonic oscillators), a Schmidt-like decomposition still exists for "reasonable" states, using the singular-value decomposition of compact operators from functional analysis. The sum over i becomes potentially countably infinite, but the \lambda_i still converge to zero fast enough that \sum \lambda_i^2 = 1 holds. The basic theorem survives.

Schmidt rank as an LOCC invariant

Local operations and classical communication (LOCC) — the protocol framework from chapter 39 — cannot increase the Schmidt rank of a bipartite pure state. If Alice and Bob share a state of Schmidt rank r, no sequence of local gates, measurements, and classical communication can produce a state of Schmidt rank r+1 or higher on their qubits. Schmidt rank is LOCC-monotone downward: it can only decrease or stay the same.

This has an operational consequence. A state with Schmidt rank r can simulate any state of rank \leq r via LOCC, but not vice versa. In particular, r = 2 states (Bell states and their cousins) are the LOCC-minimal resource that allows any non-trivial entanglement-assisted protocol; r = 1 (product states) cannot do anything that requires entanglement.

For asymptotic conversions — when Alice and Bob share many copies of one state and want to convert them to many copies of another — the right invariant is not the Schmidt rank but the entanglement entropy. Under LOCC, rate-optimal conversions of pure states are governed by ratios of entropies: n copies of state |\psi\rangle with entropy S_\psi can be converted by LOCC into approximately n S_\psi / S_\phi copies of state |\phi\rangle with entropy S_\phi, in the large-n limit (Bennett, Bernstein, Popescu, Schumacher, 1996). Entropy is the asymptotic currency of bipartite entanglement, and Schmidt decomposition gives you that currency directly.

The purification theorem

Schmidt decomposition enables an elegant structural result. Let \rho_A be any mixed state on a system A — say, \rho_A = \sum_i p_i |a_i\rangle\langle a_i| (spectral decomposition). The purification theorem says: there exists a larger system AB and a pure state |\psi\rangle_{AB} such that \text{tr}_B(|\psi\rangle\langle\psi|) = \rho_A.

The construction is immediate from Schmidt. Set |\psi\rangle_{AB} = \sum_i \sqrt{p_i}\,|a_i\rangle_A \otimes |b_i\rangle_B, where \{|b_i\rangle\} is any orthonormal basis of an ancilla B of dimension at least equal to the rank of \rho_A. Then by direct calculation, tracing out B gives back \rho_A. The Schmidt coefficients of this pure state are \sqrt{p_i} — the "square roots of the probabilities."

Every mixed state "comes from" a pure state on a larger system. In quantum information, this means you can always (mentally, or in theory) promote any mixed state to a pure one at the cost of adding fictitious degrees of freedom — an ancilla, an environment, a reference system. Many proofs use this trick to reduce statements about mixed states to statements about pure states.

Entanglement spectrum in many-body physics

In quantum many-body physics, bipartition a large system into regions A and B — say, the left half and right half of a one-dimensional chain — and compute \rho_A. Its spectrum (eigenvalues) is called the entanglement spectrum of the state across the cut. The entanglement entropy is one number derived from the spectrum; the full spectrum contains much more information.

Li and Haldane (2008) showed that the entanglement spectrum of the fractional quantum Hall state reveals the edge-mode structure of the phase — a ground-state calculation that replaces the traditional need for boundary analysis. This turned the Schmidt spectrum from an entanglement-measurement tool into a diagnostic for topological order. The area-law theorems (Hastings, 2007) for gapped phases establish that the entanglement entropy across a region's boundary scales with the boundary's area, not the region's volume — a quantum analogue of black-hole entropy. Schmidt decomposition sits at the heart of these results.

Matrix-product states and iterated Schmidt

The matrix-product state (MPS) ansatz for one-dimensional quantum systems applies Schmidt decomposition site by site. Take a chain of N qubits in a pure state |\psi\rangle. Bipartition after the first qubit: Schmidt-decompose, keep only the largest \chi Schmidt coefficients (truncation). Bipartition after the second qubit: Schmidt-decompose what's left, keep only \chi coefficients. Continue along the chain.

The result is a compressed representation of |\psi\rangle as a product of \chi \times \chi matrices (with physical indices for the local states). The compression is lossless if the state has small enough entanglement across every cut — which is true for ground states of gapped one-dimensional Hamiltonians, thanks to the area law. Density-matrix renormalization group (DMRG) and tensor-network methods are all, at their core, applications of iterated Schmidt decomposition.

The multipartite gap

Schmidt decomposition works because the SVD works — and the SVD is fundamentally a two-index theorem. The amplitude "matrix" has two indices (one per party). Three parties yield an amplitude "tensor" c_{ijk} with three indices, and no direct analogue of the SVD exists for general tensors. There is no canonical way to diagonalise a three-index tensor into a sum of products of three orthonormal vectors, with real non-negative weights, that works for every tensor.

This is why chapter 39's multipartite classification isn't a Schmidt story. The GHZ and W classes are distinguished by polynomial invariants (the 3-tangle and friends), not by singular values. Partial generalisations exist — the HOSVD (higher-order SVD), the tensor-train decomposition, the PARAFAC decomposition — each captures some of the Schmidt intuition in higher dimensions, but none exactly generalises the clean bipartite story. Schmidt decomposition is not just a tool; it is tied to the fundamental two-party-ness of bipartite entanglement.

Where this leads next

References

  1. Wikipedia, Schmidt decomposition — the theorem, the proof sketch, and pointers to applications.
  2. Nielsen and Chuang, Quantum Computation and Quantum Information (2010), §2.5 on Schmidt decomposition and purifications — Cambridge University Press.
  3. John Preskill, Lecture Notes on Quantum Computation, Ch. 4 on bipartite entanglement — theory.caltech.edu/~preskill/ph229.
  4. Wikipedia, Singular value decomposition — the linear-algebra backbone of Schmidt decomposition.
  5. H. Li and F. D. M. Haldane, Entanglement Spectrum as a Generalization of Entanglement Entropy (2008) — arXiv:0805.0332. The entanglement spectrum as a phase diagnostic.
  6. John Watrous, The Theory of Quantum Information (2018), Ch. 2 on bipartite pure states — cs.uwaterloo.ca/~watrous/TQI.