Joint and Conditional Entropy

Note: Company names, engineers, incidents, numbers, and scaling scenarios in this article are hypothetical — even when they resemble real ones. See the full disclaimer.

In short

For a bipartite density operator \rho_{AB}, the joint von Neumann entropy is

S(A, B) \;=\; -\text{tr}\bigl(\rho_{AB} \log_2 \rho_{AB}\bigr),

the Shannon entropy of the eigenvalues of \rho_{AB}. The conditional entropy copies the classical chain rule:

S(A | B) \;=\; S(A, B) - S(B).

Classically H(Y | X) \geq 0 always. Quantumly, S(A | B) can be negative. A Bell state has S(A, B) = 0 (pure), S(B) = 1 (locally maximally mixed), and therefore S(A | B) = -1 bit. That minus sign is the fingerprint of entanglement. Its absolute value is the coherent information I_c(A \rangle B) = -S(A | B), a positive quantum resource that bounds how many qubits A can send to B through any quantum channel. Two universal inequalities govern everything: subadditivity S(A,B) \leq S(A) + S(B) and strong subadditivity S(A | B, C) \leq S(A | B) — equivalently S(A, B, C) + S(B) \leq S(A, B) + S(B, C), proved by Lieb and Ruskai in 1973. Almost every theorem in quantum Shannon theory — channel capacities, the data-processing inequality, QKD security proofs — is a repackaging of one of these two.

You already have the von Neumann entropy of a single density operator: diagonalise, run eigenvalues through the Shannon formula, read off a number in bits. This chapter does the same job for two systems side-by-side — system A and system B, with a joint state \rho_{AB} that could be correlated, entangled, or neither. The questions are classical-looking: how much total uncertainty is there in the pair? How much uncertainty about A remains once you know B? How many bits do A and B share in common?

The answers are mostly the expected quantum generalisations of Shannon's definitions — with one sharp exception. The conditional entropy S(A | B) can be negative, and the sign flip is not a mistake. It is the formal signature that \rho_{AB} is entangled in a way no classical joint distribution could match. By the end of the chapter that negative number will have a name (coherent information), a sign-corrected interpretation (a quantum resource you can send through a channel), and a role as the central quantity of the quantum source-coding and channel-coding theorems you will meet in the next two chapters.

Joint entropy — the picture before the formula

Two systems, A and B. A joint state \rho_{AB} living on the tensor-product Hilbert space \mathcal{H}_A \otimes \mathcal{H}_B. You want one number measuring the total uncertainty of the pair.

The recipe is the same as for a single system: diagonalise \rho_{AB}, collect the eigenvalues \{\mu_k\} (a probability distribution on d_A \cdot d_B outcomes), and apply Shannon.

Joint von Neumann entropy

For a bipartite density operator \rho_{AB} on \mathcal{H}_A \otimes \mathcal{H}_B, the joint von Neumann entropy is

S(A, B) \;\equiv\; S(\rho_{AB}) \;=\; -\text{tr}\bigl(\rho_{AB} \log_2 \rho_{AB}\bigr) \;=\; -\sum_k \mu_k \log_2 \mu_k,

where \{\mu_k\} are the eigenvalues of \rho_{AB} (a probability distribution, since \rho_{AB} is trace-1 and positive semi-definite). The unit is bits.

Reading the definition. The joint entropy forgets, momentarily, that \rho_{AB} lives on a tensor product. It treats \rho_{AB} as one big density operator on a d_A d_B-dimensional Hilbert space and asks the von Neumann question of it. The tensor structure only re-enters the moment you trace out a subsystem to get \rho_A or \rho_B — and compare the joint number to the marginals.

The joint entropy $S(A, B)$ is the von Neumann entropy of $\rho_{AB}$ treated as a density operator on the full product space. The tensor-product structure only matters when you compare $S(A, B)$ to the marginals $S(A)$ and $S(B)$ — the comparison is where entanglement shows up.

Two quick consequences from the definition alone:

Pure joint states have S(A, B) = 0. If \rho_{AB} = |\psi\rangle_{AB}\langle\psi|_{AB}, the joint density is rank-1, eigenvalues are (1, 0, 0, \ldots), entropy is 0. The joint system is perfectly known.
Product states split additively. If \rho_{AB} = \rho_A \otimes \rho_B, the eigenvalues of the product are the pairwise products of the marginals' eigenvalues: \mu_{ij} = \lambda_i^A \lambda_j^B. Then

S(A, B) \;=\; -\sum_{i,j} \lambda_i^A \lambda_j^B \log_2(\lambda_i^A \lambda_j^B) \;=\; S(A) + S(B).

Why the product splits: \log_2(\lambda_i^A \lambda_j^B) = \log_2\lambda_i^A + \log_2\lambda_j^B, and the double sum factorises because \sum_j \lambda_j^B = 1. This mirrors the classical fact that independent random variables have H(X, Y) = H(X) + H(Y).

For anything other than a product state, the joint entropy is strictly less than S(A) + S(B) — the gap measures correlation.

Marginals and partial trace — a two-line reminder

To talk about S(A) alone you need \rho_A, the reduced state on A. Built by partial trace:

\rho_A \;=\; \text{tr}_B(\rho_{AB}) \;=\; \sum_{b} \langle b |_B \rho_{AB} | b \rangle_B,

where \{|b\rangle_B\} is any orthonormal basis of B. The operation is "sum over the B-diagonal." The density operator chapter derived it from first principles; here it is a tool. \rho_B is defined symmetrically by tracing out A.

Once you have \rho_A and \rho_B, their entropies S(A) = S(\rho_A) and S(B) = S(\rho_B) are just single-system von Neumann entropies.

Subadditivity — the easy master inequality

For any bipartite state \rho_{AB},

\boxed{\;S(A, B) \;\leq\; S(A) + S(B)\;}

with equality iff \rho_{AB} = \rho_A \otimes \rho_B (the subsystems are uncorrelated). This is subadditivity, the direct quantum analogue of the classical H(X, Y) \leq H(X) + H(Y).

A short proof using relative entropy

The cleanest route is via quantum relative entropy. Define the relative entropy

S(\rho \| \sigma) \;=\; \text{tr}(\rho \log_2 \rho) - \text{tr}(\rho \log_2 \sigma) \;\geq\; 0,

Klein's inequality (1931), proved by the concavity of \log. Now set \rho = \rho_{AB} and \sigma = \rho_A \otimes \rho_B:

S(\rho_{AB} \| \rho_A \otimes \rho_B) \;=\; -S(A, B) - \text{tr}\bigl(\rho_{AB}\log_2(\rho_A \otimes \rho_B)\bigr).

Since \log_2(\rho_A \otimes \rho_B) = (\log_2 \rho_A) \otimes I_B + I_A \otimes (\log_2 \rho_B), the second trace splits:

\text{tr}\bigl(\rho_{AB}\log_2(\rho_A \otimes \rho_B)\bigr) \;=\; \text{tr}(\rho_A \log_2\rho_A) + \text{tr}(\rho_B \log_2\rho_B) \;=\; -S(A) - S(B).

Why the trace splits this way: \text{tr}(\rho_{AB} (X_A \otimes I_B)) = \text{tr}_A(\rho_A X_A) after tracing out B using the definition of the partial trace. The \log_2 \rho_A piece only sees the A-marginal; the \log_2 \rho_B piece only sees the B-marginal. Entanglement contributes nothing to these trace expressions because they are linear in \rho_{AB}.

Combining,

S(\rho_{AB} \| \rho_A \otimes \rho_B) \;=\; S(A) + S(B) - S(A, B).

Relative entropy is non-negative — so S(A) + S(B) - S(A, B) \geq 0, which is subadditivity. Equality in Klein's inequality happens iff \rho_{AB} = \rho_A \otimes \rho_B, giving the equality condition.

The quantity

I(A ; B) \;=\; S(A) + S(B) - S(A, B) \;\geq\; 0

is the quantum mutual information, the subject of the next chapter. It is the "gap" in subadditivity, and it measures the total amount of correlation — classical plus quantum — between A and B.

Conditional entropy — and the negative-entropy surprise

Classically, the chain rule H(X, Y) = H(X) + H(Y | X) defines conditional entropy via subtraction:

H(Y | X) \;=\; H(X, Y) - H(X).

It measures the uncertainty of Y remaining after X is known. It is always non-negative, and in fact 0 \leq H(Y | X) \leq H(Y).

The quantum definition copies the formula:

Quantum conditional entropy

For a bipartite state \rho_{AB}, the quantum conditional entropy is

S(A | B) \;=\; S(A, B) - S(B).

Unlike the classical case, S(A | B) can be negative. It is negative exactly when \rho_{AB} has stronger-than-classical correlations — i.e. entanglement.

Why the quantum version can go below zero

The classical proof that H(Y | X) \geq 0 runs through H(X, Y) \geq H(X), which itself rests on the fact that a joint probability distribution on (X, Y) is at least as uncertain as any marginal — knowing the pair (x, y) is at least as informative as knowing x.

Quantumly, that chain breaks. A pure entangled joint state |\psi\rangle_{AB} has S(A, B) = 0, yet tracing out one side can produce a mixed marginal with S(B) > 0. The "joint" is more determined than the "parts." No classical distribution can be like this — in classical probability, marginalising always loses information, never creates it.

Why pure-state marginals can be mixed: any pure bipartite state |\psi\rangle_{AB} has a Schmidt decomposition |\psi\rangle = \sum_i \sqrt{\lambda_i}\,|u_i\rangle_A |v_i\rangle_B. Tracing out B leaves \rho_A = \sum_i \lambda_i |u_i\rangle\langle u_i|, which has entropy -\sum_i \lambda_i \log_2 \lambda_i = S(B). Unless one \lambda_i = 1 (the state is a product state), the marginal is mixed and has strictly positive entropy. Entanglement creates marginal entropy even when the joint state is pure.

Bell state, worked

The cleanest worked example is the Bell state |\Phi^+\rangle = (|00\rangle + |11\rangle)/\sqrt 2.

Joint entropy. The joint state is pure, so S(A, B) = 0.
Marginal \rho_B. Tracing out A,

\rho_B \;=\; \text{tr}_A |\Phi^+\rangle\langle \Phi^+| \;=\; \tfrac{1}{2}|0\rangle\langle 0|_B + \tfrac{1}{2}|1\rangle\langle 1|_B \;=\; \tfrac{I}{2}.

So S(B) = 1 bit.

Conditional entropy. S(A | B) = S(A, B) - S(B) = 0 - 1 = -1 bit.

Negative one bit of conditional uncertainty. Classically impossible. Quantumly routine.

The Bell state compresses the paradox onto one chart. The joint is pure ($S(A,B) = 0$), each piece is fully random ($S(A) = S(B) = 1$), and the conditional entropy is $-1$ bit — a classically impossible value that crisply encodes the presence of entanglement.

The pure-state formula S(A | B) = -S(A)

For any pure bipartite state |\psi\rangle_{AB}, the Schmidt decomposition gives S(A) = S(B) (same eigenvalues) and S(A, B) = 0. Therefore

S(A | B) \;=\; S(A, B) - S(B) \;=\; -S(A).

Pure-state conditional entropy is exactly the negative of the marginal entropy. The more entangled the pure state (larger S(A)), the more negative the conditional entropy. Unentangled pure states have S(A) = 0, giving S(A | B) = 0, matching the classical intuition. Maximally entangled pure states saturate at S(A | B) = -\log_2 d, one negative bit per Schmidt level.

The coherent information — reading the minus sign correctly

"Negative uncertainty" sounds like nonsense. The right way to read it is to flip the sign and give the positive quantity a name.

Coherent information

For a bipartite state \rho_{AB}, the coherent information from A to B is

I_c(A \rangle B) \;=\; -S(A | B) \;=\; S(B) - S(A, B).

For a pure entangled state, I_c(A\rangle B) = S(A) \geq 0. More generally I_c can take either sign; it is positive when \rho_{AB} has enough quantum correlation to beat classical joint distributions.

Operational meaning. The coherent information quantifies how much quantum information about A is effectively stored in B — information that is not accessible from B alone (which would be classical mutual information) but that is jointly recoverable if A and B are combined. For the Bell state, I_c = +1 bit, which matches the fact that a shared Bell pair is worth exactly one "ebit" — one unit of quantum entanglement, enough for one qubit of teleportation [quantum-teleportation].

The coherent information will reappear in the next two chapters as:

The Schumacher compression rate for pure quantum sources.
The quantum channel capacity Q = \lim_{n \to \infty} \tfrac{1}{n} \max I_c(A \rangle B^n) over Kraus-extended channels (the Lloyd-Shor-Devetak theorem).

So the negative sign of S(A | B) is not pathology — it is, after sign-flipping, the number that quantifies the "qubit-pipe width" of any quantum channel.

Strong subadditivity — the master inequality

Subadditivity bounds S(A, B) in terms of the marginals. Strong subadditivity adds a third system and is drastically harder to prove — and drastically more powerful.

Strong subadditivity (Lieb-Ruskai 1973)

For any tripartite state \rho_{ABC},

S(A, B, C) + S(B) \;\leq\; S(A, B) + S(B, C).

Equivalently, written in terms of conditional entropies,

S(A | B, C) \;\leq\; S(A | B),

which reads "conditioning on more cannot increase conditional entropy." Proved by Elliott Lieb and Mary Beth Ruskai in 1973.

Reading the equivalence. Subtract S(B, C) from both sides of the first form: S(A, B, C) - S(B, C) \leq S(A, B) - S(B), i.e. S(A | B, C) \leq S(A | B). The left side is the conditional entropy of A given (B, C); the right side given only B. Adding information (C) can only reduce uncertainty about A — as it should.

Why it is hard

The classical version is one line:

H(A | B, C) \;\leq\; H(A | B) \quad \text{(classical)}

follows immediately because conditioning is averaging over more refined events, and averaging reduces variance-style quantities.

The quantum version has no such short proof. In 1971, Oscar Lanford and Derek Robinson conjectured it. In 1973, Lieb and Ruskai [arXiv:math-ph/0205013] proved it using the Golden-Thompson inequality \text{tr}(e^{A + B}) \leq \text{tr}(e^A e^B) for Hermitian matrices, plus intricate convexity estimates. The proof stood alone for two decades as the deepest known result in quantum information theory. Modern proofs (Ruskai-Effros, Nielsen-Petz) have simplified it somewhat, but it remains a non-trivial theorem.

What it buys you

Nearly every structural theorem in quantum information theory descends from strong subadditivity:

Monotonicity of relative entropy under CPTP maps: S(\Phi(\rho) \| \Phi(\sigma)) \leq S(\rho \| \sigma).
Data processing inequality for quantum mutual information: I(A; B) \geq I(A; \Phi(B)) for any CPTP map \Phi on B.
Holevo bound on classical information extractable from a quantum source [holevo-bound].
Quantum channel capacity theorems (classical, quantum, entanglement-assisted).
QKD security proofs (BB84, E91) — all ultimately rely on strong subadditivity.

If you remember one inequality from this chapter, make it strong subadditivity. Every other result sits downstream.

Subadditivity and strong subadditivity. The first is a single-page proof from Klein's inequality; the second is a two-decade-old landmark result. Between them, they imply channel capacity theorems, data-processing inequalities, security proofs for QKD, and the monotonicity of relative entropy under noise.

Worked examples

Example 1 — Bell state: $S(A, B) = 0$, $S(A | B) = -1$

Setup. Compute joint entropy, marginals, marginal entropies, and conditional entropy for the Bell state |\Phi^+\rangle = (|00\rangle + |11\rangle)/\sqrt 2. Interpret each number.

Step 1 — the joint density matrix. Write |\Phi^+\rangle in the computational basis \{|00\rangle, |01\rangle, |10\rangle, |11\rangle\}:

|\Phi^+\rangle \;=\; \tfrac{1}{\sqrt 2}\bigl(|00\rangle + |11\rangle\bigr) \;=\; \tfrac{1}{\sqrt 2}\begin{pmatrix}1 \\ 0 \\ 0 \\ 1\end{pmatrix}.

The joint density \rho_{AB} = |\Phi^+\rangle\langle \Phi^+| is the outer product of this column with its row transpose:

\rho_{AB} \;=\; \tfrac{1}{2}\begin{pmatrix}1 & 0 & 0 & 1 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 1 & 0 & 0 & 1\end{pmatrix}.

Why only the four corners are non-zero: |\Phi^+\rangle\langle \Phi^+| has entries (|\Phi^+\rangle)_i \cdot (|\Phi^+\rangle)_j^*. The column vector has non-zero entries only at positions 1 (for |00\rangle) and 4 (for |11\rangle); products of these non-zero entries land at positions (1,1), (1,4), (4,1), (4,4) of the matrix. All other entries are products with zero.

Step 2 — joint entropy S(A, B). The matrix is a rank-1 projector onto |\Phi^+\rangle, so its eigenvalues are (1, 0, 0, 0). Apply Shannon:

S(A, B) \;=\; -1 \cdot \log_2 1 - 3 \cdot (0 \log_2 0) \;=\; 0.

Pure joint state \Rightarrow zero joint entropy. No classical uncertainty about which joint state you have.

Step 3 — marginal \rho_B by partial trace.

\rho_B \;=\; \text{tr}_A(\rho_{AB}) \;=\; \langle 0|_A \rho_{AB} |0\rangle_A + \langle 1|_A \rho_{AB} |1\rangle_A.

Compute each:

\langle 0|_A \rho_{AB} |0\rangle_A: pick the |0\rangle\langle 0|_A block of \rho_{AB}, which is the upper-left 2 \times 2 submatrix. From the matrix above, the (|00\rangle, |01\rangle) rows and columns give \tfrac{1}{2}\,\text{diag}(1, 0) = \tfrac{1}{2}|0\rangle\langle 0|_B.
\langle 1|_A \rho_{AB} |1\rangle_A: the |1\rangle\langle 1|_A block is the lower-right 2 \times 2 submatrix, giving \tfrac{1}{2}\,\text{diag}(0, 1) = \tfrac{1}{2}|1\rangle\langle 1|_B.
The off-diagonal A-blocks (like |0\rangle\langle 1|_A) do not contribute to the partial trace because \langle a | a'\rangle_A = \delta_{aa'}.

Summing,

\rho_B \;=\; \tfrac{1}{2}|0\rangle\langle 0|_B + \tfrac{1}{2}|1\rangle\langle 1|_B \;=\; \frac{I_B}{2}.

The marginal is maximally mixed — a single Bell-state qubit, viewed alone, is a fair coin.

Step 4 — marginal entropy. S(B) = S(I/2) = 1 bit. By A \leftrightarrow B symmetry, S(A) = 1 bit too.

Step 5 — conditional entropy.

S(A | B) \;=\; S(A, B) - S(B) \;=\; 0 - 1 \;=\; -1 \text{ bit}.

Step 6 — coherent information. I_c(A \rangle B) = -S(A | B) = +1 bit. The Bell state carries one ebit of quantum correlation — exactly the amount required for one round of teleportation.

What this shows. The Bell state compresses all the paradoxes: the joint is totally determined, each piece is totally undetermined, conditioning makes things "more negative," and the negative value names a positive quantum resource. Every computation in quantum Shannon theory eventually reduces to a variant of this calculation.

Example 2 — A product state: all quantities vanish

Setup. Compute S(A, B), S(A), S(B), and S(A | B) for the product state

\rho_{AB} \;=\; \rho_A \otimes \rho_B, \qquad \rho_A \;=\; \tfrac{3}{4}|0\rangle\langle 0| + \tfrac{1}{4}|1\rangle\langle 1|, \qquad \rho_B \;=\; \tfrac{1}{2}|+\rangle\langle +| + \tfrac{1}{2}|-\rangle\langle -|.

Step 1 — marginals already in diagonal form. \rho_A is diagonal in the computational basis with eigenvalues (3/4, 1/4). \rho_B is diagonal in the \{|+\rangle, |-\rangle\} basis with eigenvalues (1/2, 1/2); note \rho_B = I/2 in disguise — writing it as a mix of |+\rangle\langle +| and |-\rangle\langle -| gives the same operator as mixing |0\rangle and |1\rangle with weights (1/2, 1/2).

Step 2 — marginal entropies.

S(A) \;=\; H(3/4) \;=\; -\tfrac{3}{4}\log_2 \tfrac{3}{4} - \tfrac{1}{4}\log_2 \tfrac{1}{4} \;\approx\; 0.811 \text{ bits}.

Why this numerical value: \log_2 (3/4) = \log_2 3 - 2 \approx 1.585 - 2 = -0.415 and \log_2(1/4) = -2. So S(A) = -(3/4)(-0.415) - (1/4)(-2) = 0.311 + 0.500 = 0.811 bits. This is the standard binary-entropy value at p = 3/4.

S(B) \;=\; H(1/2) \;=\; 1 \text{ bit}.

Step 3 — joint entropy via the product-state rule. Because \rho_{AB} = \rho_A \otimes \rho_B, the joint eigenvalues are products of marginal eigenvalues: \{3/8, 3/8, 1/8, 1/8\}. Apply Shannon:

S(A, B) \;=\; -2 \cdot \tfrac{3}{8}\log_2\tfrac{3}{8} - 2 \cdot \tfrac{1}{8}\log_2\tfrac{1}{8}.

Compute: \log_2(3/8) = \log_2 3 - 3 \approx -1.415 and \log_2(1/8) = -3. So

S(A, B) \;=\; -2(3/8)(-1.415) - 2(1/8)(-3) \;=\; 1.061 + 0.750 \;=\; 1.811 \text{ bits}.

Sanity-check against the additivity rule: S(A) + S(B) = 0.811 + 1 = 1.811. Matches exactly, as product states must.

Step 4 — conditional entropy.

S(A | B) \;=\; S(A, B) - S(B) \;=\; 1.811 - 1 \;=\; 0.811 \text{ bits} \;=\; S(A).

Knowing B gave no information about A — conditional entropy equals marginal entropy. This is the hallmark of an uncorrelated pair: conditioning does nothing, because there is nothing to condition on.

Step 5 — quantum mutual information.

I(A ; B) \;=\; S(A) + S(B) - S(A, B) \;=\; 0.811 + 1 - 1.811 \;=\; 0.

Zero mutual information. No correlation, classical or quantum.

What this shows. For a product state, all the quantum-specific phenomena vanish. Conditional entropy is non-negative and equals the marginal; mutual information is zero; subadditivity is saturated. The only reason the quantum numbers differ from the classical ones is entanglement. Uncorrelated quantum states behave exactly like independent classical random variables.

For a product state, conditional entropy equals marginal entropy and mutual information is zero. Subadditivity holds with equality: $S(A,B) = S(A) + S(B)$. Nothing surprising happens because there is nothing to be surprised by.

Common confusions

"Negative conditional entropy means uncertainty below zero." No — "negative uncertainty" is not a physical quantity. The right reading is that S(A | B) is the arithmetic difference S(A, B) - S(B), and for quantum joint states that difference can go negative. The operational meaning lives in the sign-flipped coherent information I_c(A \rangle B) = -S(A | B), which measures a positive quantum resource.
"S(A | B) = entropy of \rho_{A | B} for some conditional state \rho_{A | B}." Wrong, there is no single "conditional density operator" in quantum theory. Classical conditional entropy can be written as H(Y | X) = \sum_x p(x) H(Y | X = x), an average of entropies of conditional distributions. The quantum version has no analogous decomposition — the formula S(A | B) = S(A, B) - S(B) is the primary definition, not a consequence.
"Strong subadditivity is just subadditivity repeated." It is not. Subadditivity applied twice gives S(A, B, C) \leq S(A) + S(B) + S(C), which is weaker than strong subadditivity. SSA involves overlapping subsystems (B appearing on both sides) and is a genuinely deeper statement requiring operator-convexity arguments.
"Coherent information is always positive." Only for pure states is I_c \geq 0 automatic. For mixed states I_c can be positive, zero, or negative. The maximum over input states of I_c across uses of a channel is the channel's quantum capacity — so I_c being non-negative on some input is what makes a channel useful for quantum transmission at all.
"Subadditivity says joint entropy decreases under correlation." Subadditivity is an upper bound: S(A, B) \leq S(A) + S(B). It says joint is at most the sum; it does not say joint decreases as correlation is added. In fact for a product state the equality holds; as you add classical correlation the joint drops (so the gap I(A:B) increases); as you add entanglement, the joint can drop all the way to zero while the marginals stay fixed.
"S(A | B) depends only on \rho_A and \rho_B." No — it depends on the joint state \rho_{AB}. Different joint states with the same marginals can have very different conditional entropies. The Bell state and the classically correlated state \tfrac{1}{2}(|00\rangle\langle 00| + |11\rangle\langle 11|) both have maximally mixed marginals but conditional entropies of -1 and 0 respectively.

Going deeper

If you just need S(A, B) = -\text{tr}(\rho_{AB}\log \rho_{AB}), the chain rule S(A|B) = S(A,B) - S(B), the fact that S(A|B) < 0 signals entanglement, and the coherent information I_c = -S(A|B) as its sign-corrected avatar, you have the essentials. The rest of this section treats the Araki-Lieb inequality, continuity bounds, the monotonicity of relative entropy equivalence to SSA, the role of coherent information in quantum channel capacity, and why strong subadditivity is not a consequence of convexity alone.

The Araki-Lieb triangle inequality

Alongside subadditivity, a lower bound on joint entropy:

|S(A) - S(B)| \;\leq\; S(A, B) \;\leq\; S(A) + S(B).

The left inequality is Araki-Lieb (1970). It ensures that even for entangled states, S(A, B) cannot be more negative-relative-to-marginals than the difference of the marginals themselves. For a pure state |\psi\rangle_{AB} with S(A) = S(B) (by Schmidt symmetry), the Araki-Lieb bound is 0 \leq S(A, B) \leq 2 S(A), saturated on the left by pure entangled states and on the right by maximally mixed product states. The combination of subadditivity and Araki-Lieb is called the entropic triangle inequality.

Strong subadditivity equivalent forms

The inequality S(A, B, C) + S(B) \leq S(A, B) + S(B, C) has several equivalent statements, each useful in different contexts:

Conditional form: S(A | B, C) \leq S(A | B) — conditioning on more cannot increase uncertainty.
Mutual information form: I(A; C | B) \geq 0 — conditional mutual information is non-negative.
Data processing inequality (DPI): I(A; B) \geq I(A; \Phi(B)) for any CPTP map \Phi on B — information about A cannot increase under local processing of B.
Monotonicity of relative entropy: S(\Phi(\rho) \| \Phi(\sigma)) \leq S(\rho \| \sigma) for CPTP \Phi.

All four statements are equivalent to SSA and to each other; Lindblad (1975) proved the equivalences. The monotonicity-of-relative-entropy form is the one that generalises most smoothly to infinite-dimensional systems, quantum field theory, and operator-algebraic settings.

Continuity: the Fannes-Audenaert inequality

How much can S(\rho) change when \rho is perturbed? The Fannes-Audenaert inequality bounds the change by the trace-distance:

|S(\rho) - S(\sigma)| \;\leq\; T \log_2(d - 1) + h(T),

where T = \tfrac{1}{2}\|\rho - \sigma\|_1 is the trace distance, d is the dimension, and h is the binary entropy. Plugging in for conditional entropy gives the Alicki-Fannes inequality

|S(A|B)_\rho - S(A|B)_\sigma| \;\leq\; 4T \log_2 d_A + 2 h(T),

ensuring that conditional entropy is a continuous function of the state. Useful when \rho_{AB} is known only approximately (as in any experiment).

Coherent information and quantum channel capacity

The Lloyd-Shor-Devetak theorem (first announced by Lloyd in 1997, proved rigorously by Shor 2002 and Devetak 2005 [arXiv:quant-ph/0304127]) identifies the quantum capacity of a channel \mathcal{N} as

Q(\mathcal{N}) \;=\; \lim_{n \to \infty} \frac{1}{n}\max_{\rho_{A^n}} I_c(A^n \rangle B^n)_{\mathcal{N}^{\otimes n}(\rho)}.

This is the regularised coherent information. The regularisation (limit over many channel uses) is necessary because I_c is not additive: there exist channels \mathcal{N}_1, \mathcal{N}_2 where I_c of the tensor product strictly exceeds the sum — a phenomenon called superadditivity of coherent information. Smith and Yard (2008) showed the extreme version: two channels each with Q = 0 can combine to give Q > 0. Classical channels have no analogue.

The Petz recovery map and approximate SSA

Petz (1986) showed that equality in SSA — S(A | B, C) = S(A | B) — holds iff there is a "quantum Markov chain" structure: a CPTP map \mathcal{R}: B \to BC satisfying \mathcal{R}(\rho_{AB}) = \rho_{ABC}. This \mathcal{R} is the Petz recovery map. Fawzi-Renner (2015) [arXiv:1410.0664] proved an approximate version: if I(A; C | B) is small, the Petz map approximately recovers \rho_{ABC} from \rho_{AB}, with explicit error bounds in trace distance. This is the modern tool powering many recent results in quantum many-body physics and holography.

Indian research connections

Quantum information groups at HRI (Harish-Chandra Research Institute, Allahabad) and IIT Madras have produced significant work on entropic inequalities and multipartite coherent information. Ujjwal Sen's group at HRI has developed conditional-entropy measures for multipartite entanglement quantification. At IISc Bangalore, the quantum gravity community uses SSA routinely to derive constraints on holographic entanglement entropies (Ryu-Takayanagi surfaces must satisfy SSA, a non-trivial check on any proposed bulk geometry). The Raman Research Institute in Bengaluru has instrumented quantum-optics experiments reporting two-photon conditional entropies directly, with values in good agreement with theoretical predictions to within 10^{-3} bits.

Where this leads next

Quantum mutual information — the non-negative gap in subadditivity, I(A;B) = S(A) + S(B) - S(A,B), as the total-correlation measure.
Coherent information — the sign-flipped conditional entropy as the operational measure of quantum communication capacity.
Strong subadditivity — deeper treatment of the Lieb-Ruskai theorem and its many equivalent forms.
Quantum channel capacities — how coherent information defines the quantum capacity via the Lloyd-Shor-Devetak theorem.
Holevo bound — classical information extractable from a quantum source, a consequence of SSA.
Von Neumann entropy — the single-system prerequisite that all of this builds on.

References

Elliott H. Lieb and Mary Beth Ruskai, Proof of the strong subadditivity of quantum-mechanical entropy (1973) — arXiv:math-ph/0205013 (reprint).
Mark M. Wilde, Quantum Information Theory (2nd ed., 2017), Ch. 11–14 (Entropy inequalities and quantum capacities) — arXiv:1106.1445.
John Preskill, Lecture Notes on Quantum Computation, Ch. 10 (Quantum information theory) — theory.caltech.edu/~preskill/ph229.
Igor Devetak, The private classical capacity and quantum capacity of a quantum channel (2005) — arXiv:quant-ph/0304127.
Omar Fawzi and Renato Renner, Quantum conditional mutual information and approximate Markov chains (2015) — arXiv:1410.0664.
Wikipedia, Strong subadditivity of quantum entropy — statements, equivalent forms, and proof sketch.