In short

For a bipartite density operator \rho_{AB}, the joint von Neumann entropy is

S(A, B) \;=\; -\text{tr}\bigl(\rho_{AB} \log_2 \rho_{AB}\bigr),

the Shannon entropy of the eigenvalues of \rho_{AB}. The conditional entropy copies the classical chain rule:

S(A | B) \;=\; S(A, B) - S(B).

Classically H(Y | X) \geq 0 always. Quantumly, S(A | B) can be negative. A Bell state has S(A, B) = 0 (pure), S(B) = 1 (locally maximally mixed), and therefore S(A | B) = -1 bit. That minus sign is the fingerprint of entanglement. Its absolute value is the coherent information I_c(A \rangle B) = -S(A | B), a positive quantum resource that bounds how many qubits A can send to B through any quantum channel. Two universal inequalities govern everything: subadditivity S(A,B) \leq S(A) + S(B) and strong subadditivity S(A | B, C) \leq S(A | B) — equivalently S(A, B, C) + S(B) \leq S(A, B) + S(B, C), proved by Lieb and Ruskai in 1973. Almost every theorem in quantum Shannon theory — channel capacities, the data-processing inequality, QKD security proofs — is a repackaging of one of these two.

You already have the von Neumann entropy of a single density operator: diagonalise, run eigenvalues through the Shannon formula, read off a number in bits. This chapter does the same job for two systems side-by-side — system A and system B, with a joint state \rho_{AB} that could be correlated, entangled, or neither. The questions are classical-looking: how much total uncertainty is there in the pair? How much uncertainty about A remains once you know B? How many bits do A and B share in common?

The answers are mostly the expected quantum generalisations of Shannon's definitions — with one sharp exception. The conditional entropy S(A | B) can be negative, and the sign flip is not a mistake. It is the formal signature that \rho_{AB} is entangled in a way no classical joint distribution could match. By the end of the chapter that negative number will have a name (coherent information), a sign-corrected interpretation (a quantum resource you can send through a channel), and a role as the central quantity of the quantum source-coding and channel-coding theorems you will meet in the next two chapters.

Joint entropy — the picture before the formula

Two systems, A and B. A joint state \rho_{AB} living on the tensor-product Hilbert space \mathcal{H}_A \otimes \mathcal{H}_B. You want one number measuring the total uncertainty of the pair.

The recipe is the same as for a single system: diagonalise \rho_{AB}, collect the eigenvalues \{\mu_k\} (a probability distribution on d_A \cdot d_B outcomes), and apply Shannon.

Joint von Neumann entropy

For a bipartite density operator \rho_{AB} on \mathcal{H}_A \otimes \mathcal{H}_B, the joint von Neumann entropy is

S(A, B) \;\equiv\; S(\rho_{AB}) \;=\; -\text{tr}\bigl(\rho_{AB} \log_2 \rho_{AB}\bigr) \;=\; -\sum_k \mu_k \log_2 \mu_k,

where \{\mu_k\} are the eigenvalues of \rho_{AB} (a probability distribution, since \rho_{AB} is trace-1 and positive semi-definite). The unit is bits.

Reading the definition. The joint entropy forgets, momentarily, that \rho_{AB} lives on a tensor product. It treats \rho_{AB} as one big density operator on a d_A d_B-dimensional Hilbert space and asks the von Neumann question of it. The tensor structure only re-enters the moment you trace out a subsystem to get \rho_A or \rho_B — and compare the joint number to the marginals.

Joint entropy as the Shannon entropy of joint eigenvaluesA flow diagram with three panels. Left panel shows a 4x4 density matrix labelled rho_AB with a tensor-product label H_A tensor H_B below it. Middle panel shows a bar chart with four bars of varying height labelled mu_1 through mu_4, summing to 1. Right panel shows the boxed formula S(A,B) equals minus sum mu log mu. Arrows between panels are labelled diagonalise and Shannon.joint densityρ_ABon ℋ_A ⊗ ℋ_BHermitian, tr = 1dim = d_A · d_Bdiagonalisejoint eigenvaluesμ₁μ₂μ₃μ₄Σ μ_k = 1Shannonjoint entropyS(A, B)= −Σ μ log μin bits0 ≤ S ≤ log(d_A·d_B)
The joint entropy $S(A, B)$ is the von Neumann entropy of $\rho_{AB}$ treated as a density operator on the full product space. The tensor-product structure only matters when you compare $S(A, B)$ to the marginals $S(A)$ and $S(B)$ — the comparison is where entanglement shows up.

Two quick consequences from the definition alone:

S(A, B) \;=\; -\sum_{i,j} \lambda_i^A \lambda_j^B \log_2(\lambda_i^A \lambda_j^B) \;=\; S(A) + S(B).

Why the product splits: \log_2(\lambda_i^A \lambda_j^B) = \log_2\lambda_i^A + \log_2\lambda_j^B, and the double sum factorises because \sum_j \lambda_j^B = 1. This mirrors the classical fact that independent random variables have H(X, Y) = H(X) + H(Y).

For anything other than a product state, the joint entropy is strictly less than S(A) + S(B) — the gap measures correlation.

Marginals and partial trace — a two-line reminder

To talk about S(A) alone you need \rho_A, the reduced state on A. Built by partial trace:

\rho_A \;=\; \text{tr}_B(\rho_{AB}) \;=\; \sum_{b} \langle b |_B \rho_{AB} | b \rangle_B,

where \{|b\rangle_B\} is any orthonormal basis of B. The operation is "sum over the B-diagonal." The density operator chapter derived it from first principles; here it is a tool. \rho_B is defined symmetrically by tracing out A.

Once you have \rho_A and \rho_B, their entropies S(A) = S(\rho_A) and S(B) = S(\rho_B) are just single-system von Neumann entropies.

Subadditivity — the easy master inequality

For any bipartite state \rho_{AB},

\boxed{\;S(A, B) \;\leq\; S(A) + S(B)\;}

with equality iff \rho_{AB} = \rho_A \otimes \rho_B (the subsystems are uncorrelated). This is subadditivity, the direct quantum analogue of the classical H(X, Y) \leq H(X) + H(Y).

A short proof using relative entropy

The cleanest route is via quantum relative entropy. Define the relative entropy

S(\rho \| \sigma) \;=\; \text{tr}(\rho \log_2 \rho) - \text{tr}(\rho \log_2 \sigma) \;\geq\; 0,

Klein's inequality (1931), proved by the concavity of \log. Now set \rho = \rho_{AB} and \sigma = \rho_A \otimes \rho_B:

S(\rho_{AB} \| \rho_A \otimes \rho_B) \;=\; -S(A, B) - \text{tr}\bigl(\rho_{AB}\log_2(\rho_A \otimes \rho_B)\bigr).

Since \log_2(\rho_A \otimes \rho_B) = (\log_2 \rho_A) \otimes I_B + I_A \otimes (\log_2 \rho_B), the second trace splits:

\text{tr}\bigl(\rho_{AB}\log_2(\rho_A \otimes \rho_B)\bigr) \;=\; \text{tr}(\rho_A \log_2\rho_A) + \text{tr}(\rho_B \log_2\rho_B) \;=\; -S(A) - S(B).

Why the trace splits this way: \text{tr}(\rho_{AB} (X_A \otimes I_B)) = \text{tr}_A(\rho_A X_A) after tracing out B using the definition of the partial trace. The \log_2 \rho_A piece only sees the A-marginal; the \log_2 \rho_B piece only sees the B-marginal. Entanglement contributes nothing to these trace expressions because they are linear in \rho_{AB}.

Combining,

S(\rho_{AB} \| \rho_A \otimes \rho_B) \;=\; S(A) + S(B) - S(A, B).

Relative entropy is non-negative — so S(A) + S(B) - S(A, B) \geq 0, which is subadditivity. Equality in Klein's inequality happens iff \rho_{AB} = \rho_A \otimes \rho_B, giving the equality condition.

The quantity

I(A ; B) \;=\; S(A) + S(B) - S(A, B) \;\geq\; 0

is the quantum mutual information, the subject of the next chapter. It is the "gap" in subadditivity, and it measures the total amount of correlation — classical plus quantum — between A and B.

Conditional entropy — and the negative-entropy surprise

Classically, the chain rule H(X, Y) = H(X) + H(Y | X) defines conditional entropy via subtraction:

H(Y | X) \;=\; H(X, Y) - H(X).

It measures the uncertainty of Y remaining after X is known. It is always non-negative, and in fact 0 \leq H(Y | X) \leq H(Y).

The quantum definition copies the formula:

Quantum conditional entropy

For a bipartite state \rho_{AB}, the quantum conditional entropy is

S(A | B) \;=\; S(A, B) - S(B).

Unlike the classical case, S(A | B) can be negative. It is negative exactly when \rho_{AB} has stronger-than-classical correlations — i.e. entanglement.

Why the quantum version can go below zero

The classical proof that H(Y | X) \geq 0 runs through H(X, Y) \geq H(X), which itself rests on the fact that a joint probability distribution on (X, Y) is at least as uncertain as any marginal — knowing the pair (x, y) is at least as informative as knowing x.

Quantumly, that chain breaks. A pure entangled joint state |\psi\rangle_{AB} has S(A, B) = 0, yet tracing out one side can produce a mixed marginal with S(B) > 0. The "joint" is more determined than the "parts." No classical distribution can be like this — in classical probability, marginalising always loses information, never creates it.

Why pure-state marginals can be mixed: any pure bipartite state |\psi\rangle_{AB} has a Schmidt decomposition |\psi\rangle = \sum_i \sqrt{\lambda_i}\,|u_i\rangle_A |v_i\rangle_B. Tracing out B leaves \rho_A = \sum_i \lambda_i |u_i\rangle\langle u_i|, which has entropy -\sum_i \lambda_i \log_2 \lambda_i = S(B). Unless one \lambda_i = 1 (the state is a product state), the marginal is mixed and has strictly positive entropy. Entanglement creates marginal entropy even when the joint state is pure.

Bell state, worked

The cleanest worked example is the Bell state |\Phi^+\rangle = (|00\rangle + |11\rangle)/\sqrt 2.

\rho_B \;=\; \text{tr}_A |\Phi^+\rangle\langle \Phi^+| \;=\; \tfrac{1}{2}|0\rangle\langle 0|_B + \tfrac{1}{2}|1\rangle\langle 1|_B \;=\; \tfrac{I}{2}.

So S(B) = 1 bit.

Negative one bit of conditional uncertainty. Classically impossible. Quantumly routine.

Bell state entropiesA bar chart showing four quantities for a Bell state. S(A,B) is 0 shown as a thin bar at the origin. S(A) is one bit and S(B) is one bit shown as equal positive bars. S(A|B) is minus one bit shown as a negative bar extending left from zero. A short caption at the bottom says that negative conditional entropy is the signature of entanglement.−1 bit0+1 bitBell state |Φ⁺⟩ = (|00⟩ + |11⟩)/√2S(A,B)= 0 (pure)S(A)= 1S(B)= 1S(A|B)= −1
The Bell state compresses the paradox onto one chart. The joint is pure ($S(A,B) = 0$), each piece is fully random ($S(A) = S(B) = 1$), and the conditional entropy is $-1$ bit — a classically impossible value that crisply encodes the presence of entanglement.

The pure-state formula S(A | B) = -S(A)

For any pure bipartite state |\psi\rangle_{AB}, the Schmidt decomposition gives S(A) = S(B) (same eigenvalues) and S(A, B) = 0. Therefore

S(A | B) \;=\; S(A, B) - S(B) \;=\; -S(A).

Pure-state conditional entropy is exactly the negative of the marginal entropy. The more entangled the pure state (larger S(A)), the more negative the conditional entropy. Unentangled pure states have S(A) = 0, giving S(A | B) = 0, matching the classical intuition. Maximally entangled pure states saturate at S(A | B) = -\log_2 d, one negative bit per Schmidt level.

The coherent information — reading the minus sign correctly

"Negative uncertainty" sounds like nonsense. The right way to read it is to flip the sign and give the positive quantity a name.

Coherent information

For a bipartite state \rho_{AB}, the coherent information from A to B is

I_c(A \rangle B) \;=\; -S(A | B) \;=\; S(B) - S(A, B).

For a pure entangled state, I_c(A\rangle B) = S(A) \geq 0. More generally I_c can take either sign; it is positive when \rho_{AB} has enough quantum correlation to beat classical joint distributions.

Operational meaning. The coherent information quantifies how much quantum information about A is effectively stored in B — information that is not accessible from B alone (which would be classical mutual information) but that is jointly recoverable if A and B are combined. For the Bell state, I_c = +1 bit, which matches the fact that a shared Bell pair is worth exactly one "ebit" — one unit of quantum entanglement, enough for one qubit of teleportation [quantum-teleportation].

The coherent information will reappear in the next two chapters as:

So the negative sign of S(A | B) is not pathology — it is, after sign-flipping, the number that quantifies the "qubit-pipe width" of any quantum channel.

Strong subadditivity — the master inequality

Subadditivity bounds S(A, B) in terms of the marginals. Strong subadditivity adds a third system and is drastically harder to prove — and drastically more powerful.

Strong subadditivity (Lieb-Ruskai 1973)

For any tripartite state \rho_{ABC},

S(A, B, C) + S(B) \;\leq\; S(A, B) + S(B, C).

Equivalently, written in terms of conditional entropies,

S(A | B, C) \;\leq\; S(A | B),

which reads "conditioning on more cannot increase conditional entropy." Proved by Elliott Lieb and Mary Beth Ruskai in 1973.

Reading the equivalence. Subtract S(B, C) from both sides of the first form: S(A, B, C) - S(B, C) \leq S(A, B) - S(B), i.e. S(A | B, C) \leq S(A | B). The left side is the conditional entropy of A given (B, C); the right side given only B. Adding information (C) can only reduce uncertainty about A — as it should.

Why it is hard

The classical version is one line:

H(A | B, C) \;\leq\; H(A | B) \quad \text{(classical)}

follows immediately because conditioning is averaging over more refined events, and averaging reduces variance-style quantities.

The quantum version has no such short proof. In 1971, Oscar Lanford and Derek Robinson conjectured it. In 1973, Lieb and Ruskai [arXiv:math-ph/0205013] proved it using the Golden-Thompson inequality \text{tr}(e^{A + B}) \leq \text{tr}(e^A e^B) for Hermitian matrices, plus intricate convexity estimates. The proof stood alone for two decades as the deepest known result in quantum information theory. Modern proofs (Ruskai-Effros, Nielsen-Petz) have simplified it somewhat, but it remains a non-trivial theorem.

What it buys you

Nearly every structural theorem in quantum information theory descends from strong subadditivity:

If you remember one inequality from this chapter, make it strong subadditivity. Every other result sits downstream.

Subadditivity and strong subadditivityTwo boxes side by side. The left box labelled subadditivity contains the inequality S(A,B) less than or equal to S(A) plus S(B), with a small footnote that this follows from Klein's inequality. The right box labelled strong subadditivity contains two equivalent forms: S(A,B,C) plus S(B) less than or equal to S(A,B) plus S(B,C), and the conditional-form S(A given BC) less than or equal to S(A given B). A footnote credits Lieb and Ruskai 1973.subadditivityS(A,B) ≤ S(A) + S(B)equality iff uncorrelatedproof: Klein's inequalitystrong subadditivityS(A,B,C) + S(B) ≤ S(A,B) + S(B,C)⇔ S(A|B,C) ≤ S(A|B)Lieb & Ruskai, 1973proof: Golden-Thompson ineq.⇒ I(A;B) ≥ 0 (mutual info non-negative)⇒ I(A;C|B) ≥ 0, DPI, all capacitiesAlmost every theorem in quantum information theory follows from these two.
Subadditivity and strong subadditivity. The first is a single-page proof from Klein's inequality; the second is a two-decade-old landmark result. Between them, they imply channel capacity theorems, data-processing inequalities, security proofs for QKD, and the monotonicity of relative entropy under noise.

Worked examples

Example 1 — Bell state: $S(A, B) = 0$, $S(A | B) = -1$

Setup. Compute joint entropy, marginals, marginal entropies, and conditional entropy for the Bell state |\Phi^+\rangle = (|00\rangle + |11\rangle)/\sqrt 2. Interpret each number.

Step 1 — the joint density matrix. Write |\Phi^+\rangle in the computational basis \{|00\rangle, |01\rangle, |10\rangle, |11\rangle\}:

|\Phi^+\rangle \;=\; \tfrac{1}{\sqrt 2}\bigl(|00\rangle + |11\rangle\bigr) \;=\; \tfrac{1}{\sqrt 2}\begin{pmatrix}1 \\ 0 \\ 0 \\ 1\end{pmatrix}.

The joint density \rho_{AB} = |\Phi^+\rangle\langle \Phi^+| is the outer product of this column with its row transpose:

\rho_{AB} \;=\; \tfrac{1}{2}\begin{pmatrix}1 & 0 & 0 & 1 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 1 & 0 & 0 & 1\end{pmatrix}.

Why only the four corners are non-zero: |\Phi^+\rangle\langle \Phi^+| has entries (|\Phi^+\rangle)_i \cdot (|\Phi^+\rangle)_j^*. The column vector has non-zero entries only at positions 1 (for |00\rangle) and 4 (for |11\rangle); products of these non-zero entries land at positions (1,1), (1,4), (4,1), (4,4) of the matrix. All other entries are products with zero.

Step 2 — joint entropy S(A, B). The matrix is a rank-1 projector onto |\Phi^+\rangle, so its eigenvalues are (1, 0, 0, 0). Apply Shannon:

S(A, B) \;=\; -1 \cdot \log_2 1 - 3 \cdot (0 \log_2 0) \;=\; 0.

Pure joint state \Rightarrow zero joint entropy. No classical uncertainty about which joint state you have.

Step 3 — marginal \rho_B by partial trace.

\rho_B \;=\; \text{tr}_A(\rho_{AB}) \;=\; \langle 0|_A \rho_{AB} |0\rangle_A + \langle 1|_A \rho_{AB} |1\rangle_A.

Compute each:

  • \langle 0|_A \rho_{AB} |0\rangle_A: pick the |0\rangle\langle 0|_A block of \rho_{AB}, which is the upper-left 2 \times 2 submatrix. From the matrix above, the (|00\rangle, |01\rangle) rows and columns give \tfrac{1}{2}\,\text{diag}(1, 0) = \tfrac{1}{2}|0\rangle\langle 0|_B.
  • \langle 1|_A \rho_{AB} |1\rangle_A: the |1\rangle\langle 1|_A block is the lower-right 2 \times 2 submatrix, giving \tfrac{1}{2}\,\text{diag}(0, 1) = \tfrac{1}{2}|1\rangle\langle 1|_B.
  • The off-diagonal A-blocks (like |0\rangle\langle 1|_A) do not contribute to the partial trace because \langle a | a'\rangle_A = \delta_{aa'}.

Summing,

\rho_B \;=\; \tfrac{1}{2}|0\rangle\langle 0|_B + \tfrac{1}{2}|1\rangle\langle 1|_B \;=\; \frac{I_B}{2}.

The marginal is maximally mixed — a single Bell-state qubit, viewed alone, is a fair coin.

Step 4 — marginal entropy. S(B) = S(I/2) = 1 bit. By A \leftrightarrow B symmetry, S(A) = 1 bit too.

Step 5 — conditional entropy.

S(A | B) \;=\; S(A, B) - S(B) \;=\; 0 - 1 \;=\; -1 \text{ bit}.

Step 6 — coherent information. I_c(A \rangle B) = -S(A | B) = +1 bit. The Bell state carries one ebit of quantum correlation — exactly the amount required for one round of teleportation.

What this shows. The Bell state compresses all the paradoxes: the joint is totally determined, each piece is totally undetermined, conditioning makes things "more negative," and the negative value names a positive quantum resource. Every computation in quantum Shannon theory eventually reduces to a variant of this calculation.

Example 2 — A product state: all quantities vanish

Setup. Compute S(A, B), S(A), S(B), and S(A | B) for the product state

\rho_{AB} \;=\; \rho_A \otimes \rho_B, \qquad \rho_A \;=\; \tfrac{3}{4}|0\rangle\langle 0| + \tfrac{1}{4}|1\rangle\langle 1|, \qquad \rho_B \;=\; \tfrac{1}{2}|+\rangle\langle +| + \tfrac{1}{2}|-\rangle\langle -|.

Step 1 — marginals already in diagonal form. \rho_A is diagonal in the computational basis with eigenvalues (3/4, 1/4). \rho_B is diagonal in the \{|+\rangle, |-\rangle\} basis with eigenvalues (1/2, 1/2); note \rho_B = I/2 in disguise — writing it as a mix of |+\rangle\langle +| and |-\rangle\langle -| gives the same operator as mixing |0\rangle and |1\rangle with weights (1/2, 1/2).

Step 2 — marginal entropies.

S(A) \;=\; H(3/4) \;=\; -\tfrac{3}{4}\log_2 \tfrac{3}{4} - \tfrac{1}{4}\log_2 \tfrac{1}{4} \;\approx\; 0.811 \text{ bits}.

Why this numerical value: \log_2 (3/4) = \log_2 3 - 2 \approx 1.585 - 2 = -0.415 and \log_2(1/4) = -2. So S(A) = -(3/4)(-0.415) - (1/4)(-2) = 0.311 + 0.500 = 0.811 bits. This is the standard binary-entropy value at p = 3/4.

S(B) \;=\; H(1/2) \;=\; 1 \text{ bit}.

Step 3 — joint entropy via the product-state rule. Because \rho_{AB} = \rho_A \otimes \rho_B, the joint eigenvalues are products of marginal eigenvalues: \{3/8, 3/8, 1/8, 1/8\}. Apply Shannon:

S(A, B) \;=\; -2 \cdot \tfrac{3}{8}\log_2\tfrac{3}{8} - 2 \cdot \tfrac{1}{8}\log_2\tfrac{1}{8}.

Compute: \log_2(3/8) = \log_2 3 - 3 \approx -1.415 and \log_2(1/8) = -3. So

S(A, B) \;=\; -2(3/8)(-1.415) - 2(1/8)(-3) \;=\; 1.061 + 0.750 \;=\; 1.811 \text{ bits}.

Sanity-check against the additivity rule: S(A) + S(B) = 0.811 + 1 = 1.811. Matches exactly, as product states must.

Step 4 — conditional entropy.

S(A | B) \;=\; S(A, B) - S(B) \;=\; 1.811 - 1 \;=\; 0.811 \text{ bits} \;=\; S(A).

Knowing B gave no information about A — conditional entropy equals marginal entropy. This is the hallmark of an uncorrelated pair: conditioning does nothing, because there is nothing to condition on.

Step 5 — quantum mutual information.

I(A ; B) \;=\; S(A) + S(B) - S(A, B) \;=\; 0.811 + 1 - 1.811 \;=\; 0.

Zero mutual information. No correlation, classical or quantum.

What this shows. For a product state, all the quantum-specific phenomena vanish. Conditional entropy is non-negative and equals the marginal; mutual information is zero; subadditivity is saturated. The only reason the quantum numbers differ from the classical ones is entanglement. Uncorrelated quantum states behave exactly like independent classical random variables.

Product state entropiesFour horizontal bars showing S(A,B) = 1.811 bits, S(A) = 0.811 bits, S(B) = 1 bit, and S(A|B) = 0.811 bits for the product state of example 2. All bars are positive. A note below the chart says S(A|B) equals S(A), meaning no correlation.012 bitsProduct state: ρ_A ⊗ ρ_B entropiesS(A,B)≈ 1.811S(A)≈ 0.811S(B)= 1S(A|B)≈ 0.811 = S(A)
For a product state, conditional entropy equals marginal entropy and mutual information is zero. Subadditivity holds with equality: $S(A,B) = S(A) + S(B)$. Nothing surprising happens because there is nothing to be surprised by.

Common confusions

Going deeper

If you just need S(A, B) = -\text{tr}(\rho_{AB}\log \rho_{AB}), the chain rule S(A|B) = S(A,B) - S(B), the fact that S(A|B) < 0 signals entanglement, and the coherent information I_c = -S(A|B) as its sign-corrected avatar, you have the essentials. The rest of this section treats the Araki-Lieb inequality, continuity bounds, the monotonicity of relative entropy equivalence to SSA, the role of coherent information in quantum channel capacity, and why strong subadditivity is not a consequence of convexity alone.

The Araki-Lieb triangle inequality

Alongside subadditivity, a lower bound on joint entropy:

|S(A) - S(B)| \;\leq\; S(A, B) \;\leq\; S(A) + S(B).

The left inequality is Araki-Lieb (1970). It ensures that even for entangled states, S(A, B) cannot be more negative-relative-to-marginals than the difference of the marginals themselves. For a pure state |\psi\rangle_{AB} with S(A) = S(B) (by Schmidt symmetry), the Araki-Lieb bound is 0 \leq S(A, B) \leq 2 S(A), saturated on the left by pure entangled states and on the right by maximally mixed product states. The combination of subadditivity and Araki-Lieb is called the entropic triangle inequality.

Strong subadditivity equivalent forms

The inequality S(A, B, C) + S(B) \leq S(A, B) + S(B, C) has several equivalent statements, each useful in different contexts:

  1. Conditional form: S(A | B, C) \leq S(A | B) — conditioning on more cannot increase uncertainty.
  2. Mutual information form: I(A; C | B) \geq 0 — conditional mutual information is non-negative.
  3. Data processing inequality (DPI): I(A; B) \geq I(A; \Phi(B)) for any CPTP map \Phi on B — information about A cannot increase under local processing of B.
  4. Monotonicity of relative entropy: S(\Phi(\rho) \| \Phi(\sigma)) \leq S(\rho \| \sigma) for CPTP \Phi.

All four statements are equivalent to SSA and to each other; Lindblad (1975) proved the equivalences. The monotonicity-of-relative-entropy form is the one that generalises most smoothly to infinite-dimensional systems, quantum field theory, and operator-algebraic settings.

Continuity: the Fannes-Audenaert inequality

How much can S(\rho) change when \rho is perturbed? The Fannes-Audenaert inequality bounds the change by the trace-distance:

|S(\rho) - S(\sigma)| \;\leq\; T \log_2(d - 1) + h(T),

where T = \tfrac{1}{2}\|\rho - \sigma\|_1 is the trace distance, d is the dimension, and h is the binary entropy. Plugging in for conditional entropy gives the Alicki-Fannes inequality

|S(A|B)_\rho - S(A|B)_\sigma| \;\leq\; 4T \log_2 d_A + 2 h(T),

ensuring that conditional entropy is a continuous function of the state. Useful when \rho_{AB} is known only approximately (as in any experiment).

Coherent information and quantum channel capacity

The Lloyd-Shor-Devetak theorem (first announced by Lloyd in 1997, proved rigorously by Shor 2002 and Devetak 2005 [arXiv:quant-ph/0304127]) identifies the quantum capacity of a channel \mathcal{N} as

Q(\mathcal{N}) \;=\; \lim_{n \to \infty} \frac{1}{n}\max_{\rho_{A^n}} I_c(A^n \rangle B^n)_{\mathcal{N}^{\otimes n}(\rho)}.

This is the regularised coherent information. The regularisation (limit over many channel uses) is necessary because I_c is not additive: there exist channels \mathcal{N}_1, \mathcal{N}_2 where I_c of the tensor product strictly exceeds the sum — a phenomenon called superadditivity of coherent information. Smith and Yard (2008) showed the extreme version: two channels each with Q = 0 can combine to give Q > 0. Classical channels have no analogue.

The Petz recovery map and approximate SSA

Petz (1986) showed that equality in SSA — S(A | B, C) = S(A | B) — holds iff there is a "quantum Markov chain" structure: a CPTP map \mathcal{R}: B \to BC satisfying \mathcal{R}(\rho_{AB}) = \rho_{ABC}. This \mathcal{R} is the Petz recovery map. Fawzi-Renner (2015) [arXiv:1410.0664] proved an approximate version: if I(A; C | B) is small, the Petz map approximately recovers \rho_{ABC} from \rho_{AB}, with explicit error bounds in trace distance. This is the modern tool powering many recent results in quantum many-body physics and holography.

Indian research connections

Quantum information groups at HRI (Harish-Chandra Research Institute, Allahabad) and IIT Madras have produced significant work on entropic inequalities and multipartite coherent information. Ujjwal Sen's group at HRI has developed conditional-entropy measures for multipartite entanglement quantification. At IISc Bangalore, the quantum gravity community uses SSA routinely to derive constraints on holographic entanglement entropies (Ryu-Takayanagi surfaces must satisfy SSA, a non-trivial check on any proposed bulk geometry). The Raman Research Institute in Bengaluru has instrumented quantum-optics experiments reporting two-photon conditional entropies directly, with values in good agreement with theoretical predictions to within 10^{-3} bits.

Where this leads next

References

  1. Elliott H. Lieb and Mary Beth Ruskai, Proof of the strong subadditivity of quantum-mechanical entropy (1973) — arXiv:math-ph/0205013 (reprint).
  2. Mark M. Wilde, Quantum Information Theory (2nd ed., 2017), Ch. 11–14 (Entropy inequalities and quantum capacities) — arXiv:1106.1445.
  3. John Preskill, Lecture Notes on Quantum Computation, Ch. 10 (Quantum information theory) — theory.caltech.edu/~preskill/ph229.
  4. Igor Devetak, The private classical capacity and quantum capacity of a quantum channel (2005) — arXiv:quant-ph/0304127.
  5. Omar Fawzi and Renato Renner, Quantum conditional mutual information and approximate Markov chains (2015) — arXiv:1410.0664.
  6. Wikipedia, Strong subadditivity of quantum entropy — statements, equivalent forms, and proof sketch.