In short
The von Neumann entropy of a density operator \rho is
where \{\lambda_i\} are the eigenvalues of \rho — so S(\rho) is just the Shannon entropy of \rho's eigenvalues. For a pure state, \rho = |\psi\rangle\langle\psi| has eigenvalues (1, 0, \ldots, 0), so S = 0: no classical uncertainty. For the maximally mixed state \rho = I/d, every eigenvalue is 1/d, so S = \log_2 d: maximum uncertainty. For a qubit, S runs from 0 (Bloch-sphere surface) to 1 bit (Bloch-ball centre), scaling with Bloch-vector length as S = H\bigl(\tfrac{1 + |\vec r|}{2}\bigr) where H is the binary entropy. The joint entropy S(A, B) of a bipartite density operator \rho_{AB} is -\text{tr}(\rho_{AB}\log\rho_{AB}); the conditional entropy S(A | B) = S(A, B) - S(B) — and the shocker: for entangled states, S(A | B) can be negative. No classical conditional entropy can ever go below zero. The |{\rm Bell}\rangle state has S(A|B) = -1. Subadditivity S(A, B) \leq S(A) + S(B) and strong subadditivity S(A, B, C) + S(B) \leq S(A, B) + S(B, C) (Lieb-Ruskai 1973) are the two master inequalities; almost every quantum-information theorem sits downstream of them.
Classical Shannon entropy asks: how uncertain are you about a random variable? Quantum von Neumann entropy asks the same question, but for a density operator — an object that encodes both quantum superposition and classical ignorance. The answer is a single non-negative number, measured in bits, that you can compute by diagonalising \rho and running the eigenvalues through the Shannon formula.
This chapter builds that picture, then shows where the quantum version diverges sharply from the classical one. The key departure: conditional entropy can be negative. A Bell state has S(AB) = 0 (the joint state is pure, no uncertainty) but S(B) = 1 (each qubit alone is maximally mixed), so S(A | B) = S(AB) - S(B) = -1. A negative amount of classical uncertainty. That number is not a mistake; it is the formal signature of entanglement, and it will reappear in every quantum-information derivation you ever meet.
From eigenvalues to entropy
You already have the machinery. The density operator chapter showed that every density matrix \rho has a spectral decomposition
where \{|u_i\rangle\} is an orthonormal eigenbasis and the eigenvalues \{\lambda_i\} form a classical probability distribution. That distribution is the bridge from quantum to classical: apply Shannon entropy to it, and you have defined a quantum entropy.
Von Neumann entropy
For a density operator \rho on a finite-dimensional Hilbert space, the von Neumann entropy is
where \{\lambda_i\} are the eigenvalues of \rho and the convention 0 \log_2 0 = 0 is used. The unit is bits. S(\rho) is the Shannon entropy of the probability distribution formed by \rho's eigenvalues.
Reading the definition. \text{tr}(\rho \log_2 \rho) is shorthand for "apply \log_2 to \rho in its eigenbasis, then take the trace." Explicitly: diagonalise \rho, replace each diagonal entry \lambda_i with \log_2 \lambda_i, multiply element-wise with the diagonal of \rho, sum. The result is the expected value of \log_2 \rho under \rho, and the negative of that expected value is the entropy.
Why the matrix logarithm is well-defined here: \rho is Hermitian and positive semi-definite, so its eigenvalues are non-negative reals. The function f(x) = x \log_2 x extends continuously to x = 0 (with f(0) = 0), so even zero eigenvalues cause no trouble. The matrix f(\rho) is Hermitian with eigenvalues f(\lambda_i), and the trace picks up exactly \sum_i f(\lambda_i).
Picture: every density matrix reduces to a probability distribution
Two quick observations from the picture:
- Unitary evolution preserves entropy. S(U\rho U^\dagger) = S(\rho) because unitary conjugation does not change the eigenvalues — it only rotates the eigenbasis. Quantum dynamics cannot create or destroy entropy (measurement is the exception, which is a non-unitary operation).
- Mixing two different density operators with the same eigenvalue spectrum but different eigenbases gives a state with larger entropy (by concavity). This is the quantum version of the classical fact that mixing two distributions cannot decrease entropy.
The extremes — pure states and maximally mixed states
Before diving into inequalities, two special cases anchor the scale.
Pure states have S = 0
If \rho = |\psi\rangle\langle\psi| is a pure state, the spectral decomposition is \rho = 1 \cdot |\psi\rangle\langle\psi| — one eigenvalue equal to 1, all others zero. Then
Why this is zero and not "undefined" even though \log_2 0 = -\infty: the convention 0 \log_2 0 = 0 is the limit \lim_{x \to 0^+} x \log_2 x = 0 (a standard L'Hôpital computation). Zero eigenvalues contribute zero entropy. The Shannon formula extends continuously to distributions with zero-probability events.
Pure states have zero von Neumann entropy, and conversely: S(\rho) = 0 iff \rho is pure. This is the quantum analogue of "a deterministic random variable has zero Shannon entropy."
The maximally mixed state has S = \log_2 d
For \rho = I/d on a d-dimensional space, every eigenvalue equals 1/d. Then
For a single qubit (d = 2), this is 1 bit. For two qubits (d = 4), 2 bits. For n qubits (d = 2^n), n bits. The maximally mixed state on n qubits has the entropy of n fair coin flips — it is, in entropy terms, equivalent to a uniform classical distribution over all 2^n basis states.
And \log_2 d is the maximum: for any density operator on a d-dimensional space,
with the lower bound at pure states and the upper bound only at I/d. Every other state sits strictly in between.
Why \log_2 d is the maximum: the eigenvalues form a probability distribution on d outcomes, and Shannon entropy of a distribution on d outcomes is bounded by \log_2 d, achieved iff the distribution is uniform. The uniform eigenvalue distribution corresponds exactly to \rho = I/d.
Bloch-ball interpretation for a qubit
For a single qubit, you already know (Bloch ball) that \rho = \tfrac{1}{2}(I + \vec r \cdot \vec\sigma) with |\vec r| \leq 1. The eigenvalues of \rho are
Plugging into the Shannon formula,
where H(\cdot) is the binary entropy. Three landmarks:
- |\vec r| = 1 (pure, Bloch-sphere surface): eigenvalues (1, 0), S = 0.
- |\vec r| = 0 (maximally mixed, Bloch-ball centre): eigenvalues (1/2, 1/2), S = 1 bit.
- |\vec r| = 1/2: eigenvalues (3/4, 1/4), S \approx 0.811 bits.
So for a qubit, S(\rho) and |\vec r| carry the same information: they are equivalent, monotonically-related measures of how far the state is from pure. Entropy is the information-theoretic face of the same geometric fact.
Joint entropy of a bipartite state
Suppose you have two systems A and B with a joint density operator \rho_{AB} on the tensor product space \mathcal{H}_A \otimes \mathcal{H}_B. The joint von Neumann entropy is the obvious generalisation:
The marginal states are the reduced density operators \rho_A = \text{tr}_B(\rho_{AB}) and \rho_B = \text{tr}_A(\rho_{AB}), and their entropies are S(A) = S(\rho_A) and S(B) = S(\rho_B).
The shocker — pure bipartite states
Here is a fact with no classical analogue. If \rho_{AB} is a pure state, so \rho_{AB} = |\psi\rangle_{AB}\langle\psi|_{AB}, then S(A, B) = 0. But the reduced states \rho_A and \rho_B can be mixed — and if |\psi\rangle_{AB} is entangled, they are mixed, with S(A), S(B) > 0.
In the classical world, the joint distribution p(x, y) is at least as uncertain as the marginal p(y) — you cannot be more certain about (X, Y) together than about Y alone. So classically H(X, Y) \geq H(Y), always.
Quantumly, this rule fails. A Bell state |\Phi^+\rangle = (|00\rangle + |11\rangle)/\sqrt 2 has
- S(AB) = 0 (pure joint state, zero uncertainty),
- \rho_A = \rho_B = I/2 (tracing out one qubit of a Bell state gives a maximally mixed single qubit),
- S(A) = S(B) = 1 bit.
So S(A, B) = 0 < 1 = S(B). The joint system is more determined than its parts. This is the information-theoretic signature of entanglement: whole more certain than its pieces.
The classical intuition fails because entanglement correlates the two subsystems in a non-classical way. The correlation shows up not in the marginals but in the joint state's purity — and only the quantum formalism tracks the difference.
Conditional entropy — and the negative-entropy surprise
The classical conditional entropy H(Y | X) has a clean interpretation: the uncertainty of Y remaining after X is known. It satisfies 0 \leq H(Y | X) \leq H(Y). The quantum definition copies the classical chain rule:
Quantum conditional entropy
For a bipartite density operator \rho_{AB}, the quantum conditional entropy is
Unlike the classical case, S(A | B) can be negative. It is negative exactly when \rho_{AB} has entanglement that the classical marginal data cannot account for.
Reading the definition. S(A | B) reads as "the joint uncertainty minus the uncertainty of B alone" — in the classical Venn picture, it was the crescent of Y-alone outside the X circle. In the quantum picture, the "crescent" can be negative area: the B circle is larger than the joint, because the joint is pure and the marginals are mixed.
For the Bell state, S(A | B) = -1
Take \rho_{AB} = |\Phi^+\rangle\langle\Phi^+|, a pure Bell state. Then:
- S(A, B) = 0 (pure).
- \rho_B = I/2, so S(B) = 1 bit.
- S(A | B) = S(A, B) - S(B) = 0 - 1 = -1 bit.
Negative one bit of conditional entropy. Classically impossible. Quantumly real.
Why negative conditional entropy is possible at all: the classical derivation of H(Y | X) \geq 0 uses a chain-rule identity H(X, Y) = H(X) + H(Y | X) together with H(X, Y) \geq H(X). The second inequality relies on H being a function of a joint probability distribution — and in the classical world, "probability of the pair" cannot drop below "probability of a single part." In the quantum world, a joint state can be pure (one-dimensional support) while its marginals span the whole space (full-rank, high-dimensional support). That is impossible classically.
The "coherent information" interpretation
Negative conditional entropy has a formal name in quantum information theory: -S(A | B) = S(B) - S(A, B) is the coherent information I_c(A \rangle B) from A to B. It measures the amount of quantum information that could be transmitted from A to B via a quantum channel if one existed. For the Bell state, I_c = +1 bit, exactly matching the intuition that a Bell pair holds one qubit of quantum information about the joint state that neither side alone can read off.
This interpretation is pursued properly in the joint-conditional-entropy chapter; for now, the takeaway is that negative conditional entropy is a positive quantity of quantum resources. The minus sign is measuring "how entangled", not "how uncertain".
Subadditivity and strong subadditivity
Two inequalities govern almost every calculation in quantum information theory. They are the von Neumann analogues of a pair of classical results, but their proofs are deeply harder.
Subadditivity
with equality iff \rho_{AB} = \rho_A \otimes \rho_B (the subsystems are uncorrelated).
This is the quantum version of "the joint entropy of two classical variables is at most the sum of their individual entropies." Unlike the classical case, though, the left side can be much smaller — even zero, for entangled pure states — while the right side stays positive. The gap
is the quantum mutual information — the subject of its own chapter. For a Bell state, I(A; B) = 1 + 1 - 0 = 2 bits, which is already larger than the maximum classical I(X; Y) = \min(H(X), H(Y)) could ever be for the same marginals. That extra bit is purely quantum correlation.
Strong subadditivity
The deeper inequality involves three systems A, B, C:
This was conjectured by Lanford and Robinson in the 1960s and proved by Elliott Lieb and Mary Beth Ruskai in 1973 [arXiv:math-ph/0205013]. The proof is hard — it uses the Golden-Thompson inequality and operator-convexity arguments that took a decade to be fully accepted. Strong subadditivity is the single most important inequality in quantum information theory. Every major result — channel capacities, data-processing inequalities, the monotonicity of relative entropy, security proofs of QKD — descends from it.
One direct consequence: the conditional mutual information
is always non-negative, exactly mirroring the classical inequality.
Worked examples
Example 1: $S(\rho)$ for $|0\rangle$, $I/2$, and the maximally mixed $n$-qubit state
Setup. Compute the von Neumann entropy of three canonical states: the pure single-qubit |0\rangle, the maximally mixed single qubit I/2, and the maximally mixed n-qubit state I/2^n.
Step 1 — |0\rangle, a pure state.
The eigenvalues are (1, 0). Apply Shannon: S = -1\log_2 1 - 0\log_2 0 = 0 - 0 = 0 bits. Why every pure state gives zero: in its eigenbasis, a pure state's density matrix is \text{diag}(1, 0, \ldots, 0), and \sum_i \lambda_i \log_2 \lambda_i = 1 \cdot 0 = 0. The quantum state is known exactly, so no classical bits of uncertainty.
Step 2 — I/2, the maximally mixed qubit.
Eigenvalues are (1/2, 1/2). Apply Shannon: S = -\tfrac{1}{2}\log_2\tfrac{1}{2} - \tfrac{1}{2}\log_2\tfrac{1}{2} = \tfrac{1}{2} + \tfrac{1}{2} = 1 bit. One full classical bit of uncertainty — the qubit is as ignorant-of-itself as a fair coin.
Step 3 — I/2^n, the maximally mixed n-qubit state. Eigenvalues are (1/2^n, 1/2^n, \ldots, 1/2^n), all 2^n of them. Shannon:
Step 4 — interpret. For n = 2 qubits, S(I/4) = 2 bits. For n = 10, S(I/1024) = 10 bits. n maximally mixed qubits carry exactly n bits of von Neumann entropy — the same as n independent fair classical coins. The entropy scales linearly with the number of qubits for the maximally mixed case.
What this shows. The von Neumann entropy gives a single clean number per state. For pure states, S = 0: a quantum computer in a pure state is, in information-theoretic terms, perfectly known. For maximally mixed states, S = \log_2 d scales with the Hilbert-space dimension — more qubits, more bits of ignorance.
Example 2: $S(A|B) = -1$ for the Bell state
Setup. Verify directly that the Bell state |\Phi^+\rangle = (|00\rangle + |11\rangle)/\sqrt 2 has S(A, B) = 0, S(A) = S(B) = 1, and therefore S(A | B) = -1. Interpret the result.
Step 1 — build the joint density matrix.
Expanding as a 4 \times 4 matrix in the basis |00\rangle, |01\rangle, |10\rangle, |11\rangle:
Why the four non-zero entries land in the corners: the outer product (|00\rangle + |11\rangle)(\langle 00| + \langle 11|) gives four terms — |00\rangle\langle 00|, |00\rangle\langle 11|, |11\rangle\langle 00|, |11\rangle\langle 11| — each with coefficient 1/2. These correspond to the (1,1), (1,4), (4,1), and (4,4) entries.
Step 2 — joint entropy S(A, B). The matrix above is a rank-1 projector (the outer product of a unit vector with itself), so it has eigenvalues (1, 0, 0, 0). Apply Shannon: S(A, B) = 0. Pure state, zero uncertainty.
Step 3 — partial trace to get \rho_B.
Computing each term:
- \langle 0|_A \rho_{AB} |0\rangle_A picks up contributions from the |0\rangle_A\langle 0|_A parts of \rho_{AB}. The only such contribution is \tfrac{1}{2}|0\rangle_B\langle 0|_B.
- \langle 1|_A \rho_{AB} |1\rangle_A picks up \tfrac{1}{2}|1\rangle_B\langle 1|_B.
So \rho_B = \tfrac{1}{2}|0\rangle\langle 0| + \tfrac{1}{2}|1\rangle\langle 1| = I/2. The Bell-state partial trace is maximally mixed on subsystem B.
Step 4 — marginal entropy. From Example 1, S(I/2) = 1 bit. By the Bell state's symmetry under A \leftrightarrow B, \rho_A = I/2 as well, and S(A) = 1 bit.
Step 5 — conditional entropy. By definition,
Negative one bit.
Step 6 — interpret. The joint Bell state is known perfectly (pure, S = 0), but each qubit individually looks completely random (marginal mixed, S = 1). Knowing B does not reduce uncertainty about A toward zero; it reduces it to a negative number. The classical interpretation of "uncertainty of A given B" breaks down. The formal replacement is the coherent information I_c(A\rangle B) = -S(A | B) = +1 bit: the amount of quantum information encoded in the correlations that no classical marginal can see.
Every entangled pure state with maximal entanglement gives S(A|B) = -S(B), the most negative value possible.
What this shows. The classical rule "you cannot be more uncertain about A alone than about (A, B) together" is a theorem about probability distributions. In the quantum world, with density operators replacing probabilities and entanglement replacing mere correlation, the rule is false. The sign of S(A | B) is a litmus test for entanglement — negative iff the state has enough entanglement to make the joint more determined than each part.
Common confusions
-
"S(\rho) is the entanglement of \rho." Not quite — S(\rho) is the classical uncertainty about \rho's basis, which for a mixed state could come from classical mixing, entanglement with an environment, or a combination. The entanglement entropy is specifically S(\rho_A) where \rho_{AB} is a pure bipartite state; here S(\rho_A) = S(\rho_B) (the Schmidt number) does directly measure entanglement. For arbitrary \rho without a bipartition, S is not a measure of entanglement.
-
"Pure states have zero information." Wrong. Pure states have zero classical uncertainty — zero von Neumann entropy. But a pure state |\psi\rangle can encode an enormous amount of quantum information in its amplitudes. Think of the entanglement spread across a large-n-qubit Schrodinger-cat state: S = 0 but the state is exquisitely structured. Zero entropy means "no ignorance about which state", not "no information."
-
"Negative entropy is a bookkeeping error." No, it is a structural feature of quantum information. S(A | B) < 0 is always a valid number; it is the signature that the joint state has quantum correlations exceeding what any classical joint distribution can produce.
-
"Subadditivity means joint entropy always decreases." Subadditivity is an upper bound — S(A, B) \leq S(A) + S(B). It says the joint entropy is at most the sum, not that it always decreases. For uncorrelated product states equality holds; for entangled states the joint can be far below the sum; it is never above.
-
"Strong subadditivity has a one-line proof." No. It took Lieb and Ruskai nearly two years of work in the early 1970s and a clever use of the Golden-Thompson inequality. Most textbooks cite the result without proof, which is appropriate — it is a foundational theorem, not a routine calculation.
-
"\log in S(\rho) = -\text{tr}(\rho \log\rho) is base-e." It depends on the author. For units of bits use \log_2 (as in this article). For units of nats use \log_e. Nielsen and Chuang use base 2; Preskill sometimes mixes. Whenever you see S(\rho) quoted as a number, check the base convention.
Going deeper
If you just need the definition S(\rho) = -\text{tr}(\rho \log \rho), the pure-state and maximally-mixed-state extremes, and the fact that S(A|B) can be negative for entangled states, you have the essentials. The remainder treats the deeper structure — purification and the relation to Schmidt decomposition, the operational meaning via typical subspaces (the quantum AEP), the relative entropy and its monotonicity under CPTP maps, and a sketch of how strong subadditivity underpins channel capacities.
Purification and the Schmidt number
Every mixed state \rho_A on a system A can be viewed as the reduced state of some pure state |\psi\rangle_{AR} on a larger system A \otimes R, where R is a suitable "reference" or "environment" system. This is the purification theorem. The Schmidt decomposition of |\psi\rangle_{AR} is
where \{|u_i\rangle_A\} and \{|v_i\rangle_R\} are orthonormal bases of A and R, and \{\lambda_i\} are the eigenvalues of \rho_A (also of \rho_R). Then
The entropy of the reduced state equals the entropy of its purification on the complement. For a pure entangled state, S(\rho_A) measures the entanglement between A and R: it is the entanglement entropy, and it quantifies how many "ebits" (maximally entangled qubit pairs) the state is worth.
Quantum relative entropy
The quantum analogue of the Kullback-Leibler divergence is
defined for \rho, \sigma with \text{supp}(\rho) \subseteq \text{supp}(\sigma). It is non-negative, zero iff \rho = \sigma, and — critically — monotonic under completely positive trace-preserving (CPTP) maps:
for any CPTP map \Phi. This is the data-processing inequality for quantum relative entropy, and it is equivalent to strong subadditivity (they imply each other). Every meaningful inequality in quantum information — including channel capacity bounds, entanglement monotonicity, and QKD security — is some repackaging of this single monotonicity theorem.
Typical subspaces and the quantum AEP
Just as Shannon's source coding theorem is powered by the asymptotic equipartition property, quantum source coding (Schumacher 1995) is powered by the quantum AEP. For n independent copies of \rho, the state \rho^{\otimes n} has a "typical subspace" \mathcal{T}_\epsilon^{(n)} \subseteq (\mathcal{H})^{\otimes n} with dimension \approx 2^{n S(\rho)}, such that most of the probability weight in any measurement lies inside \mathcal{T}_\epsilon^{(n)}. This is why S(\rho) is the quantum source-coding rate: you can reliably compress a long stream of copies of \rho down to nS(\rho) qubits, and not fewer. Schumacher's theorem formalises this and earns von Neumann entropy its status as the fundamental quantum information measure.
An Indian connection — Raman scattering and phonon entropies
In experimental quantum information at Indian Institute of Science (IISc) Bangalore and TIFR Mumbai, von Neumann entropies of light-matter density matrices are routinely measured in Raman-scattering experiments. The Raman effect itself (named for C. V. Raman's 1928 Nobel-prize-winning work) is a textbook instance where the joint photon-phonon density matrix after scattering has higher entropy than the initial photon state — the extra entropy comes from the entanglement between the scattered photon and the excited phonon mode. Indian groups regularly report Raman-entropy numbers in nats and convert to bits via \log_2 e \approx 1.443 when publishing. This is where von Neumann entropy stops being a theoretical convenience and becomes a daily-reported experimental observable.
Where this leads next
- Shannon entropy recap — the classical parent of von Neumann entropy; its definitions, interpretations, and Shannon's noisy-channel theorem.
- Joint and conditional entropy — deeper treatment of S(A, B), S(A | B), negative conditional entropy, and coherent information.
- Quantum mutual information — I(A;B) = S(A) + S(B) - S(A,B), the total-correlation measure that survives to the quantum setting.
- Strong subadditivity — the Lieb-Ruskai theorem and its role as the master inequality.
- Entanglement of formation — an operational entanglement measure built on von Neumann entropies of purifications.
- Density operator — the prerequisite object on which the von Neumann entropy is defined.
References
- John von Neumann, Mathematische Grundlagen der Quantenmechanik (1932; English edition 1955) — Princeton University Press.
- Nielsen and Chuang, Quantum Computation and Quantum Information (2010), §11.3 (Von Neumann entropy) — Cambridge University Press.
- Elliott H. Lieb and Mary Beth Ruskai, Proof of the strong subadditivity of quantum-mechanical entropy (1973) — arXiv:math-ph/0205013 (later reprint).
- John Preskill, Lecture Notes on Quantum Computation, Ch. 10 (Quantum information theory) — theory.caltech.edu/~preskill/ph229.
- Mark M. Wilde, Quantum Information Theory (2nd ed., 2017) — arXiv:1106.1445.
- Wikipedia, Von Neumann entropy — definitions, properties, subadditivity, and strong-subadditivity references.