Quantum Mutual Information

Note: Company names, engineers, incidents, numbers, and scaling scenarios in this article are hypothetical — even when they resemble real ones. See the full disclaimer.

In short

The quantum mutual information of a bipartite state \rho_{AB} is

I(A : B) \;=\; S(A) + S(B) - S(A, B),

the gap in subadditivity. It is always non-negative (subadditivity), zero iff \rho_{AB} = \rho_A \otimes \rho_B, and it measures the total amount of correlation — classical plus quantum — between A and B. Classically, I(X:Y) \leq \min(H(X), H(Y)). Quantumly, I(A:B) can reach 2 \min(S(A), S(B)) — twice the classical ceiling. A Bell state saturates: S(A) = S(B) = 1 and S(A,B) = 0 give I(A:B) = 2 bits. The doubling is entanglement, bit-counted. The quantum mutual information can be decomposed as I(A:B) = J(A:B)_{\text{classical}} + D(A:B)_{\text{quantum}}, where the quantum discord D(A:B) isolates the non-classical part and vanishes iff \rho_{AB} is classical on (at least) one side. In the channel setting, I(A:B) under an entanglement-assisted transmission through \mathcal{N} gives the entanglement-assisted classical capacity C_E = \max I(A:B) — the cleanest capacity formula in quantum Shannon theory (Bennett-Shor-Smolin-Thapliyal, 1999). Every meaningful correlation measure in quantum information sits downstream of I(A:B).

In the previous chapter you met the joint and conditional entropies S(A, B) and S(A | B). You saw one inequality — subadditivity S(A, B) \leq S(A) + S(B) — that bounds the joint in terms of the marginals. This chapter names and studies the gap in that inequality. The gap is the quantum mutual information, and it is the single most important correlation measure in quantum information theory.

The classical story is familiar: I(X; Y) = H(X) + H(Y) - H(X, Y) reduces the uncertainty of Y when X is learned, and it measures how much information X and Y share. The quantum version copies the formula but breaks the ceiling. Classically, correlations max out at \min(H(X), H(Y)) — no random variable can share more bits with another than it has to offer. Quantumly, correlations can reach twice that ceiling, because quantum states carry entanglement-mediated correlation on top of every classical bit. The Bell state's I(A:B) = 2 bits — with only one qubit per side — is the clean numerical witness.

The gap in subadditivity — picture first

Subadditivity says the joint entropy is at most the sum of the marginals:

S(A, B) \;\leq\; S(A) + S(B).

The gap between the two sides is a non-negative number that tells you how correlated the two systems are. Zero gap means no correlation (product state). Positive gap means correlation — either classical (like a pair of coupled dice), quantum (like a Bell pair), or some mixture of the two.

The classical Venn-diagram intuition carries over: $I(A:B)$ is the "overlap" between the marginal entropies $S(A)$ and $S(B)$. The total region is $S(A, B)$, the overlap is $I(A:B)$, and the crescents are the conditional entropies $S(A | B)$ and $S(B | A)$. Quantumly the Venn picture stretches: for entangled states the "crescents" can have negative area (negative conditional entropy), and the "overlap" can be larger than either disc.

With that picture in mind, the formal definition.

Quantum mutual information

For a bipartite density operator \rho_{AB}, the quantum mutual information is

I(A : B) \;=\; S(A) + S(B) - S(A, B),

where S(A) = S(\text{tr}_B \rho_{AB}), S(B) = S(\text{tr}_A \rho_{AB}), and S(A, B) = S(\rho_{AB}) are von Neumann entropies. Equivalently, I(A:B) = S(\rho_{AB} \| \rho_A \otimes \rho_B) — the relative entropy from the joint to the product of marginals.

Reading the definition. The subtraction S(A) + S(B) - S(A, B) quantifies "how much of the marginal uncertainty is shared." If the joint is fully product-like (no correlation), the joint entropy equals the sum of marginals and the mutual info is zero. If the joint is pure and entangled, the joint entropy is zero but the marginals are both non-zero, and the mutual info reaches its maximum.

The relative-entropy form. Subtracting S(\rho_A \otimes \rho_B) = S(A) + S(B) from inside the relative entropy gives I(A:B) = S(\rho_{AB} \| \rho_A \otimes \rho_B). This is the more fundamental definition: the quantum mutual information measures how far \rho_{AB} is from being a product state, using the relative-entropy "distance." Relative entropy is non-negative, which re-proves subadditivity as a corollary.

Properties — the fast tour

Non-negativity: I(A:B) \geq 0

By subadditivity (equivalently, by non-negativity of relative entropy),

I(A:B) \;=\; S(\rho_{AB} \| \rho_A \otimes \rho_B) \;\geq\; 0,

with equality iff \rho_{AB} = \rho_A \otimes \rho_B. Zero mutual information iff the two systems are uncorrelated.

Symmetry: I(A:B) = I(B:A)

Immediate from the definition — swapping A and B leaves S(A) + S(B) - S(A, B) unchanged.

Chain rule with conditional entropy

Rearranging S(A | B) = S(A, B) - S(B):

I(A:B) \;=\; S(A) - S(A | B).

The quantum mutual information is the reduction in A's uncertainty once B is "known" (with the quantum caveat that S(A|B) can itself be negative, in which case I(A:B) > S(A) — more on this below).

Monotonicity under local operations

For any CPTP map \Phi acting on B alone,

I(A : B) \;\geq\; I(A : \Phi(B)).

Local processing of one system cannot increase the correlation with the other. This is the data processing inequality for mutual information, directly equivalent to strong subadditivity.

Additivity on tensor products

I(A_1 A_2 : B_1 B_2) \;=\; I(A_1 : B_1) + I(A_2 : B_2) \quad \text{for } \rho_{A_1 B_1} \otimes \rho_{A_2 B_2}.

Independent pairs contribute independent mutual information. (For entangled or correlated composites this fails.)

The maximum — classical vs quantum ceilings

Classically, the mutual information between two random variables is bounded above by the smaller of the two marginal entropies:

I(X:Y)_{\text{classical}} \;\leq\; \min\bigl(H(X), H(Y)\bigr).

This is because H(X | Y) \geq 0 classically, so I(X:Y) = H(X) - H(X | Y) \leq H(X), and symmetrically \leq H(Y).

Quantumly the ceiling is doubled. For any bipartite state \rho_{AB},

I(A:B) \;\leq\; 2\min\bigl(S(A), S(B)\bigr).

Why the quantum ceiling is higher

The classical argument fails because S(A | B) can be negative. Start from I(A:B) = S(A) - S(A | B). The most negative S(A | B) can be is -S(A) (achieved by pure entangled states). Plug in the extreme:

I(A:B)_{\max} \;=\; S(A) - (-S(A)) \;=\; 2 S(A).

Symmetrically, I(A:B) \leq 2 S(B). So I(A:B) \leq 2\min(S(A), S(B)), doubling the classical bound.

Saturating the bound: the Bell state

For the Bell state |\Phi^+\rangle = (|00\rangle + |11\rangle)/\sqrt 2:

S(A) = S(B) = 1 bit.
S(A, B) = 0.
I(A : B) = 1 + 1 - 0 = 2 bits.

The Bell state is a two-qubit system — each side has a Hilbert space of dimension 2, so the classical max of \min(S(A), S(B)) is 1 bit. But the quantum mutual information is two bits. That extra bit is pure entanglement, bit-counted.

Three joint states with the same marginals $\rho_A = \rho_B = I/2$. Two are capped by the classical ceiling of $1$ bit: a fair coin pair (classical correlation) and the maximally correlated mixed state (classical, zero entanglement). The Bell state — same marginals — doubles to $2$ bits because the joint is pure. The extra bit is entanglement.

Total correlation: classical plus quantum

Quantum mutual information measures all correlation between A and B, whatever its origin. This raises a natural question: how much of I(A:B) is "classical" and how much is "quantum"?

The Groisman-Popescu-Winter decomposition

Groisman, Popescu, and Winter (2005) [arXiv:quant-ph/0410091] gave an operational answer. Consider how much randomness (fresh classical noise) you must add to destroy correlations:

The minimum classical noise required to destroy the quantum correlations in \rho_{AB} is the entanglement of formation E_F(A:B) — or more precisely its regularised version.
The minimum additional classical noise required to destroy the classical correlations (after the quantum ones are gone) is the classical correlation J(A:B).
Total noise required to fully decorrelate is I(A:B).

Hence the split:

I(A:B) \;=\; J(A:B)_{\text{classical}} + D(A:B)_{\text{quantum}}.

The quantum part D(A:B) is called the quantum discord. It is zero iff \rho_{AB} is classically correlated on at least one side — meaning there exists a local measurement on A (or B) after which the joint becomes a classical-classical distribution without losing any correlation information. Product states have J = D = 0. Classically-correlated mixed states like \tfrac{1}{2}(|00\rangle\langle 00| + |11\rangle\langle 11|) have J > 0, D = 0. Entangled pure states have D > 0. Mixed entangled states generally have both J, D > 0.

The Bell state splits cleanly

For the Bell state:

Total: I(A:B) = 2 bits.
Classical correlation: J(A:B) = 1 bit (the correlation visible after measuring one side).
Quantum discord: D(A:B) = 1 bit.
Entanglement of formation: E_F(A:B) = 1 ebit.

So the Bell state is exactly half classical correlation and half quantum, by the discord split. That half-and-half pattern is a clean feature of maximally entangled pure states.

The channel capacity connection

Quantum mutual information has an especially clean operational role in entanglement-assisted communication. Suppose Alice and Bob share pre-shared entanglement before using a noisy channel \mathcal{N} — how many classical bits per channel use can Alice reliably send?

Theorem (Bennett, Shor, Smolin, Thapliyal, 1999 [arXiv:quant-ph/9904023]). The entanglement-assisted classical capacity of a quantum channel \mathcal{N} is

C_E(\mathcal{N}) \;=\; \max_{\rho_A} I(A : B)_{\rho_{AB}},

where \rho_{AB} = (I_A \otimes \mathcal{N})(|\psi\rangle_{AA'}\langle\psi|) is the output of the channel on one half of a purification |\psi\rangle_{AA'} of the input \rho_{A'}. The maximum is over all input states.

Why this is remarkable. Most quantum capacity formulas require regularisation — a limit over many channel uses — because the relevant quantity is not additive (coherent information, private information, Holevo information of compound channels). The entanglement-assisted capacity is an exception: I(A:B) is additive on tensor-product channels, so a single-letter formula works. This makes C_E the cleanest of all quantum capacity theorems. It is the quantum analogue of Shannon's classical capacity formula C = \max I(X:Y) — down to the letter.

Different capacities of the same channel satisfy

Q(\mathcal{N}) \;\leq\; C(\mathcal{N}) \;\leq\; C_E(\mathcal{N}),

where Q is the quantum capacity (via coherent information), C is the plain classical capacity (Holevo), and C_E is the entanglement-assisted capacity. Pre-shared entanglement never reduces capacity and can strictly increase it.

Worked examples

Example 1 — Bell state: $I(A:B) = 2$ bits

Setup. Compute the quantum mutual information of the Bell state |\Phi^+\rangle = (|00\rangle + |11\rangle)/\sqrt 2 directly. Interpret what the two-bit value means.

Step 1 — joint entropy. The joint state is pure: \rho_{AB} = |\Phi^+\rangle\langle\Phi^+|, so S(A, B) = 0.

Step 2 — marginal entropies. Tracing out B gives \rho_A = I/2 with S(A) = 1 bit. By symmetry S(B) = 1 bit. (The partial-trace computation is worked in detail in the joint-conditional-entropy chapter.)

Step 3 — mutual information.

I(A : B) \;=\; S(A) + S(B) - S(A, B) \;=\; 1 + 1 - 0 \;=\; 2 \text{ bits}.

Step 4 — interpret the bits. The Bell state is a two-qubit system: each side has a two-dimensional Hilbert space, so any classical probability distribution on (A, B) would have I(X : Y) \leq 1 bit. The quantum value is twice that.

The "extra" bit beyond the classical ceiling is entanglement. You can see it concretely in the decomposition:

Classical correlation J = 1 bit — the fact that a computational-basis measurement on one qubit predicts the other's outcome.
Quantum discord D = 1 bit — the additional correlation in the coherences (the off-diagonal entries of \rho_{AB}) that no classical-only measurement can capture.

Step 5 — connection to teleportation. The entanglement-assisted capacity of a noiseless qubit channel is C_E = 2 bits per channel use — a single qubit can transmit 2 classical bits if Alice and Bob share a Bell pair (superdense coding). The I(A:B) = 2 of a Bell state is exactly the C_E value, consistent with the BSST capacity formula.

What this shows. The doubling of mutual information is the information-theoretic fingerprint of entanglement. Any bipartite state with I(A:B) > \min(S(A), S(B)) must be entangled — classical correlations alone cannot push past that threshold.

The Bell state's $2$ bits of mutual information split evenly: one bit of classical correlation (what a measurement can extract) plus one bit of quantum discord (the extra correlation in the coherences). The classical half maxes out the single-qubit classical ceiling; the quantum half is the entanglement-only contribution that doubles the total.

Example 2 — Product state: $I(A:B) = 0$

Setup. Take an uncorrelated product state

\rho_{AB} \;=\; \rho_A \otimes \rho_B, \qquad \rho_A = \tfrac{3}{4}|0\rangle\langle 0| + \tfrac{1}{4}|1\rangle\langle 1|, \qquad \rho_B = \tfrac{I}{2}.

Compute the mutual information and verify it vanishes.

Step 1 — marginal entropies. \rho_A has eigenvalues (3/4, 1/4), so S(A) = H(3/4) \approx 0.811 bits. \rho_B = I/2 so S(B) = 1 bit.

Step 2 — joint entropy. Because \rho_{AB} is a product, its eigenvalues are the products of marginal eigenvalues:

\{3/4 \cdot 1/2, 3/4 \cdot 1/2, 1/4 \cdot 1/2, 1/4 \cdot 1/2\} \;=\; \{3/8, 3/8, 1/8, 1/8\}.

Apply Shannon:

S(A, B) \;=\; -2\cdot\tfrac{3}{8}\log_2\tfrac{3}{8} - 2\cdot\tfrac{1}{8}\log_2\tfrac{1}{8}.

Why the four eigenvalues pair up: the product has two copies of 3/8 (from 3/4 \cdot 1/2, one for each basis vector of B) and two copies of 1/8 (from 1/4 \cdot 1/2). The double-counting is automatic because \rho_B = I/2 is degenerate. Numerically: \log_2(3/8) \approx -1.415 and \log_2(1/8) = -3. So S(A, B) = -2(3/8)(-1.415) - 2(1/8)(-3) = 1.061 + 0.750 = 1.811 bits.

Step 3 — mutual information.

I(A : B) \;=\; S(A) + S(B) - S(A, B) \;=\; 0.811 + 1 - 1.811 \;=\; 0.

Exactly zero, as expected for a product state.

Step 4 — alternate check via relative entropy. For a product state, \rho_{AB} = \rho_A \otimes \rho_B, and so S(\rho_{AB} \| \rho_A \otimes \rho_B) = S(\rho_{AB} \| \rho_{AB}) = 0 directly. No relative-entropy distance from the product means no correlation.

What this shows. Zero mutual information is the quantum certification of "truly uncorrelated." Any I(A:B) > 0 signals correlation — the sign-test of whether \rho_{AB} factors. If you ever derive I(A:B) < 0, you have made an error somewhere; subadditivity forbids it.

For a product state, the joint entropy exactly equals the sum of the marginal entropies — subadditivity holds with equality. The mutual information, which measures the gap, is zero. Any product state of any dimension obeys this identity.

Common confusions

"I(A:B) measures entanglement." Not quite — it measures total correlation. A classically correlated state like \tfrac{1}{2}(|00\rangle\langle 00| + |11\rangle\langle 11|) has I(A:B) = 1 bit but is not entangled (it is a probabilistic mixture of product states). To isolate the purely quantum part, use the entanglement of formation E_F or the quantum discord D. I(A:B) > 0 is necessary but not sufficient for entanglement.
"I(A:B) > \min(S(A), S(B)) always means entanglement." This is sufficient: any state exceeding the classical ceiling must be entangled. The converse fails — some entangled mixed states (like Werner states with low singlet fraction) can have I(A:B) < \min(S(A), S(B)) while still being entangled. The ceiling is a one-way witness.
"I(A:B) is always \leq \log_2(d_A d_B)." More precisely, I(A:B) \leq 2\min(\log_2 d_A, \log_2 d_B). For a two-qubit system, I \leq 2 \log_2 2 = 2 bits. For a two-qutrit system, I \leq 2\log_2 3 \approx 3.17 bits. The "extra" headroom above \log_2 d is exactly the entanglement contribution.
"The entanglement-assisted capacity C_E requires a new theorem for each channel." It does not. Because I(A:B) is additive, the single-letter formula C_E = \max I(A:B) works for any channel with no regularisation. This is unlike the quantum capacity Q, classical capacity C, and private capacity P, which all require regularisation in general. C_E is the uniquely clean capacity in quantum Shannon theory.
"Mutual information is a metric." It is not — I does not satisfy the triangle inequality in its arguments, nor is I(A:A) zero (in fact I(A:A) = 2 S(A) for a classical double copy). The relative-entropy form I(A:B) = S(\rho_{AB} \| \rho_A \otimes \rho_B) is a divergence from the product state, asymmetric in its arguments.
"I(A:B) = 0 means the systems are independent." True — I(A:B) = 0 iff \rho_{AB} = \rho_A \otimes \rho_B. This is the quantum notion of independence. However, "independence" is stronger than "unmeasurable correlation" — two systems can be classically uncorrelated after a local measurement (discord zero on one side) while still being quantum-correlated (I > 0). Zero mutual information is stronger than zero discord.

Going deeper

If you just need I(A:B) = S(A) + S(B) - S(A,B) \geq 0, the doubling I \leq 2\min(S(A), S(B)) with the Bell state as the saturating example, and the classical/quantum split via discord, you have the essentials. The rest treats quantum discord in detail, the Groisman-Popescu-Winter operational interpretation, the entanglement-assisted capacity and the reverse Shannon theorem, the additivity of I(A:B) on tensor channels, and the continuity (Alicki-Fannes) bound.

Quantum discord — the precise definition

For a bipartite state \rho_{AB}, define two classically-motivated forms of mutual information:

I(A:B) \;=\; S(A) + S(B) - S(A, B), \qquad J(A:B) \;=\; \max_{\{M_i^A\}} \sum_i p_i \bigl(S(B) - S(B | i)\bigr),

where \{M_i^A\} ranges over all POVMs on A, and S(B | i) is the entropy of B conditioned on measurement outcome i. The first is the symmetric joint-entropy formula; the second asks "after measuring A optimally, how much does it reduce B's uncertainty?" Classically these agree. Quantumly they can differ. The difference is the quantum discord:

D(A:B) \;=\; I(A:B) - J(A:B) \;\geq\; 0,

introduced by Ollivier and Zurek (2001) [arXiv:quant-ph/0105072]. Zero discord characterises classical-quantum states — states of the form \rho_{AB} = \sum_i p_i |i\rangle\langle i|_A \otimes \rho_B^i, for some orthonormal basis \{|i\rangle_A\}.

The entanglement-assisted capacity in full

The Bennett-Shor-Smolin-Thapliyal theorem's full statement: for a quantum channel \mathcal{N} with input system A' and output system B,

C_E(\mathcal{N}) \;=\; \max_{\rho_{A'}} I(A : B)_{\rho_{AB}}, \qquad \rho_{AB} \;=\; (I_A \otimes \mathcal{N})(|\psi\rangle_{AA'}\langle\psi|),

with |\psi\rangle_{AA'} any purification of \rho_{A'}. The theorem is proved via the quantum reverse Shannon theorem (Bennett, Devetak, Harrow, Shor, Winter, 2009 [arXiv:0912.5537]), which shows that any quantum channel can be simulated using a perfect classical channel plus shared entanglement at the rate C_E.

Continuity and robustness

The Alicki-Fannes-Winter inequality bounds how much I(A:B) can change under small perturbations:

|I(A:B)_\rho - I(A:B)_\sigma| \;\leq\; T \bigl(\log_2 d_A + \log_2 d_B\bigr) + 2 h(T),

where T = \tfrac{1}{2}\|\rho - \sigma\|_1 is trace distance. This matters in experiments — your estimated I from a tomographically-reconstructed \rho_{AB} is close to the true value whenever tomography is accurate.

Mutual information and holographic entanglement entropy

In the AdS/CFT correspondence (holographic duality), entanglement entropies of CFT regions are computed by Ryu-Takayanagi surfaces in the dual gravitational geometry. Mutual information between two regions is then the combination I(A:B) = S(A) + S(B) - S(A, B), and its positivity is a non-trivial geometric consequence of the area law. Strong subadditivity and SSA-equivalent bounds place real constraints on any proposed bulk geometry. This has become a workhorse calculation for quantum gravity researchers — the Indian quantum gravity group at ICTS-TIFR Bengaluru has contributed several results on mutual information in black-hole backgrounds and its bearing on the Page curve.

Why I(A:B) is additive but Q(\mathcal{N}) is not

Mutual information of a tensor-product state satisfies I(A_1 A_2 : B_1 B_2) = I(A_1 : B_1) + I(A_2 : B_2) for product \rho_{A_1 B_1} \otimes \rho_{A_2 B_2}. This cleanly implies the single-letter entanglement-assisted capacity. Coherent information, by contrast, is superadditive: there exist channels for which I_c(\mathcal{N}_1 \otimes \mathcal{N}_2) > I_c(\mathcal{N}_1) + I_c(\mathcal{N}_2), forcing the regularisation in Q = \lim_n \tfrac{1}{n} \max I_c(\mathcal{N}^{\otimes n}). The fundamental reason: coherent information can be sensitive to entangled inputs across channel uses, while mutual information (via entanglement-assisted capacity) already includes unlimited pre-shared entanglement and has nothing to gain from extra structure.

Indian research context — NQM and correlation measures

The National Quantum Mission (2023, ₹6000 crore) funds substantial quantum-information research across Indian institutions. The Harish-Chandra Research Institute (HRI) Allahabad group, led for years by Arun Pati and Ujjwal Sen, has contributed several results on multipartite quantum discord, the relationship between I(A:B) and entanglement monotones, and the use of mutual information as a diagnostic in quantum thermodynamics. At TIFR Mumbai and IIT Bombay, experimental groups routinely report two-qubit mutual information values from tomographic reconstructions of photonic and superconducting-qubit Bell pairs, with discrepancies from the theoretical 2-bit Bell-state value quantifying the decoherence in the device.

Where this leads next

Entanglement of formation — the regularised cost of creating \rho_{AB} from ebits, a clean entanglement monotone distinct from mutual information.
Coherent information — the sign-flipped conditional entropy that parameterises the quantum capacity Q(\mathcal{N}).
Quantum channel capacities — the full capacity hierarchy Q \leq C \leq C_E and the additivity properties of each.
Holevo bound — classical information extractable from a quantum ensemble, the entropy-based capacity of ensemble encoding.
Joint and conditional entropy — the entropy-pair from which I(A:B) was built.
Strong subadditivity — the master inequality that guarantees I(A:B) \geq 0 and propagates through every quantum Shannon theorem.

References

Charles H. Bennett, Peter W. Shor, John A. Smolin, Ashish V. Thapliyal, Entanglement-assisted classical capacity of noisy quantum channels (1999) — arXiv:quant-ph/9904023.
Mark M. Wilde, Quantum Information Theory (2nd ed., 2017), Ch. 11, 13, 21 — arXiv:1106.1445.
Harold Ollivier and Wojciech H. Zurek, Quantum discord: a measure of the quantumness of correlations (2001) — arXiv:quant-ph/0105072.
Berry Groisman, Sandu Popescu, Andreas Winter, Quantum, classical, and total amount of correlations in a quantum state (2005) — arXiv:quant-ph/0410091.
John Preskill, Lecture Notes on Quantum Computation, Ch. 10 (Quantum information theory) — theory.caltech.edu/~preskill/ph229.
Wikipedia, Quantum mutual information — definitions, properties, and references.