In short
Given two quantum states \rho and \sigma on the same Hilbert space, there are two standard numbers that quantify their relationship. Fidelity asks how close they are: F(\rho,\sigma) = \bigl(\text{tr}\sqrt{\sqrt\rho\,\sigma\,\sqrt\rho}\bigr)^2, a number in [0, 1] that equals 1 iff \rho = \sigma and reduces to |\langle\psi|\phi\rangle|^2 for pure states. Trace distance asks how distinguishable they are: D(\rho,\sigma) = \tfrac{1}{2}\,\text{tr}|\rho-\sigma|, also in [0, 1], equal to 1 iff the states have orthogonal supports. Both are preserved by unitaries and monotone under quantum channels. Their operational meanings are sharp: fidelity equals the maximum overlap of purifications on a shared extended space (Uhlmann), and trace distance equals 2p_{\text{correct}} - 1 where p_{\text{correct}} is the best single-shot probability of telling \rho from \sigma. They are related by the Fuchs-van de Graaf inequalities 1 - \sqrt F \leq D \leq \sqrt{1 - F}. Fidelity is the metric experimental papers quote when reporting how well a state was prepared; trace distance is the metric error-correction thresholds and cryptographic security proofs are written in.
You run a quantum circuit on a real device. You wanted the state \rho_{\text{target}} = |\psi\rangle\langle\psi| with |\psi\rangle = \tfrac{1}{\sqrt 2}(|00\rangle + |11\rangle) — a Bell pair. You ask a tomography routine to reconstruct what you actually got, and it hands you back a 4\times 4 matrix \rho_{\text{actual}} that is nearly but not exactly \rho_{\text{target}}. Some elements are slightly off. The eigenvalues are not quite (1, 0, 0, 0); they are more like (0.97, 0.02, 0.008, 0.002).
How close did you get? That is the question this chapter answers. "Close" is a quantitative word, and there are two standard quantitative answers — fidelity and trace distance. They measure different things, they have different operational meanings, and the right tool depends on the question you are asking. An error-correction paper will quote one; a tomography paper will quote the other; a security proof will use the first to prove the second; a cross-platform comparison will use both. Learn them together.
Two questions, two numbers
Before any formulas, separate the two questions cleanly.
"How close are \rho and \sigma?" The answer is fidelity F(\rho, \sigma). F = 1 means the states are identical. F = 0 means they are as far apart as the formalism allows. Higher is better. Fidelity is the quantity experimentalists most often report because it has a clean pure-state limit: if both states are pure, F = |\langle\psi|\phi\rangle|^2, the familiar Born-rule overlap.
"How distinguishable are \rho and \sigma?" The answer is trace distance D(\rho, \sigma). D = 0 means you can't tell them apart by any measurement. D = 1 means a single copy is enough to distinguish them with certainty. Higher is worse if you wanted them equal; higher is better if you wanted to tell them apart. Trace distance is the quantity cryptographic security proofs and error-correction thresholds are usually written in, because it controls the probability that an adversary — or an error — can be caught.
Both are numbers in [0, 1]. Both collapse to familiar classical objects in limits. Both obey strong invariance and monotonicity properties. But they are not the same number, and the relationship between them has its own theorem (Fuchs-van de Graaf, §4 below).
Fidelity
Fidelity takes its cleanest form for pure states, which is where you should meet it first.
Pure-state fidelity — the Born overlap
For two pure states |\psi\rangle and |\phi\rangle, the fidelity is just the squared modulus of their inner product:
This is exactly the Born-rule probability you already know: if you prepared |\phi\rangle and measured in a basis containing |\psi\rangle, the probability of getting the outcome |\psi\rangle is |\langle\psi|\phi\rangle|^2. Why the square: the inner product \langle\psi|\phi\rangle is a complex amplitude; the observable quantity is the probability |\langle\psi|\phi\rangle|^2, and it is this probability that is 1 when the states are identical and 0 when they are orthogonal.
So for pure states, fidelity has an immediate physical meaning: "if you thought you had |\phi\rangle and were testing whether it is really |\psi\rangle by measuring in the \{|\psi\rangle, \ldots\} basis, fidelity is the probability that you pass the test." Identical states pass with probability 1; orthogonal states always fail.
Mixed-state fidelity — the general formula
For two general density operators, pure-state overlap is no longer defined (you can't take an inner product of density matrices directly; they are operators, not vectors). Fidelity generalises via a surprising but beautiful formula:
Fidelity
The fidelity between two density operators \rho, \sigma on the same Hilbert space is
It satisfies 0 \leq F(\rho,\sigma) \leq 1, with F = 1 iff \rho = \sigma and F = 0 iff \rho, \sigma have orthogonal supports (no vector is non-zero under both).
The formula has two matrix square roots and a trace, and at first sight is heavy. Notice three things.
First, the definition is symmetric: F(\rho, \sigma) = F(\sigma, \rho), even though the formula is not manifestly symmetric. This is a theorem (proof via Uhlmann's characterisation below), and it is deep — fidelity measures the relationship between states, not the order you supply them in.
Second, when \rho = |\psi\rangle\langle\psi| is pure, the matrix \sqrt\rho = \rho (a rank-1 projector is its own square root), and \sqrt\rho\,\sigma\,\sqrt\rho = |\psi\rangle\langle\psi|\sigma|\psi\rangle\langle\psi| = \langle\psi|\sigma|\psi\rangle\,|\psi\rangle\langle\psi|. The square root of that rank-1 operator is \sqrt{\langle\psi|\sigma|\psi\rangle}\,|\psi\rangle\langle\psi|, whose trace is \sqrt{\langle\psi|\sigma|\psi\rangle}. Squaring: F(|\psi\rangle\langle\psi|, \sigma) = \langle\psi|\sigma|\psi\rangle. Why this matters: when one state is pure, fidelity reduces to the expectation value of the pure state's projector in the mixed state — a single matrix element, not the full double-square-root machinery. Most experimental fidelities are of this "compare to a target pure state" flavour, and this simpler formula is what actually gets computed.
Third, when both \rho and \sigma are pure, the formula further reduces to F(|\psi\rangle\langle\psi|, |\phi\rangle\langle\phi|) = |\langle\psi|\phi\rangle|^2 — the pure-state Born overlap from the previous subsection. The general formula is a strict generalisation.
Uhlmann's theorem — the operational meaning
Where does the awkward \sqrt{\sqrt\rho\,\sigma\,\sqrt\rho} come from? The clean answer is Uhlmann's theorem, which recharacterises fidelity in terms of pure states on a larger space.
Uhlmann's theorem
Let \rho, \sigma be density operators on \mathcal H. Let \mathcal H' be an ancilla space with \dim\mathcal H' \geq \max(\text{rank}\,\rho, \text{rank}\,\sigma). Then
where the maximum is over all purifications of \rho and \sigma in \mathcal H \otimes \mathcal H'.
Read that carefully. Every density operator \rho on \mathcal H admits purifications — pure states |\psi_\rho\rangle \in \mathcal H \otimes \mathcal H' whose reduction on \mathcal H is \rho (see Purification). These purifications are not unique; ancilla unitaries generate all of them. Uhlmann's theorem says: pick the pair of purifications that overlap the most. The squared magnitude of that maximum overlap is exactly F(\rho, \sigma).
Why this is beautiful: pure-state overlaps are trivial to compute (\langle\psi_\rho|\psi_\sigma\rangle is just a complex number). The operator-level formula with its nested square roots is an indirect computation of this maximum — the same number, reached via matrix algebra rather than ancilla optimisation. Uhlmann's theorem says the two routes always agree.
And the operational reading: fidelity is "the best match the two mixed states can achieve when you are allowed to choose how they sit inside a larger pure-state world." Symmetry F(\rho,\sigma) = F(\sigma,\rho) is immediate from the formula — |\langle\psi_\rho|\psi_\sigma\rangle|^2 = |\langle\psi_\sigma|\psi_\rho\rangle|^2. Monotonicity under channels (next section) has a one-line purification-based proof.
Properties of fidelity
- Range. 0 \leq F(\rho, \sigma) \leq 1.
- Equality. F(\rho,\sigma) = 1 \iff \rho = \sigma.
- Orthogonality. F(\rho,\sigma) = 0 iff the supports of \rho and \sigma are orthogonal subspaces (no non-zero vector is in both).
- Symmetry. F(\rho,\sigma) = F(\sigma,\rho).
- Unitary invariance. F(U\rho U^\dagger, U\sigma U^\dagger) = F(\rho,\sigma) for any unitary U.
- Monotonicity under channels. F(\mathcal E(\rho), \mathcal E(\sigma)) \geq F(\rho,\sigma) for every CPTP map \mathcal E. Why the direction: channels can only make states more similar, never more distinguishable — information is lost, not created. A noisy channel acting on two different states will move them toward each other.
- Joint concavity. For probabilities p_i summing to 1, F\bigl(\sum_i p_i\rho_i, \sum_i p_i\sigma_i\bigr) \geq \sum_i p_i F(\rho_i, \sigma_i).
- Multiplicativity under tensor products. F(\rho_1\otimes\rho_2, \sigma_1\otimes\sigma_2) = F(\rho_1,\sigma_1)\,F(\rho_2,\sigma_2).
Trace distance
Trace distance starts from a different idea: the L^1-norm on matrices, adapted to the quantum setting.
The definition and the one-norm
Trace distance
The trace distance between two density operators \rho, \sigma on the same Hilbert space is
where |A| = \sqrt{A^\dagger A} and the one-norm is \|A\|_1 = \text{tr}|A|. It satisfies 0 \leq D(\rho, \sigma) \leq 1, with D = 0 iff \rho = \sigma and D = 1 iff \rho, \sigma have orthogonal supports.
The factor of \tfrac{1}{2} normalises the range to [0, 1] for states (without it, two orthogonal pure states would give \|\rho-\sigma\|_1 = 2, not 1).
When \rho - \sigma is Hermitian (it always is here — the difference of two Hermitian operators is Hermitian), |\rho - \sigma| = \sqrt{(\rho-\sigma)^2} has the same eigenvectors as \rho - \sigma but with eigenvalues replaced by their absolute values. So if \rho - \sigma has eigenvalues \{\lambda_i\}, then \text{tr}|\rho-\sigma| = \sum_i |\lambda_i|, and
Why the sum of absolute eigenvalues: the one-norm of a Hermitian operator is the sum of the absolute values of its eigenvalues, because the matrix |H| has those as its (non-negative) eigenvalues. This is the quantum analogue of |x_1| + |x_2| + \cdots, the classical L^1 norm.
Notice the nice special case. The eigenvalues of \rho - \sigma sum to \text{tr}(\rho-\sigma) = 1 - 1 = 0, so the positive eigenvalues and the negative eigenvalues have equal magnitude. If you split \rho - \sigma = P - Q into its positive and negative parts (both PSD), then \text{tr}(P) = \text{tr}(Q), and
This is the Jordan decomposition of the difference, and it is what connects trace distance to measurement probabilities.
Pure-state trace distance
For two pure states, a direct calculation gives
The derivation. Let \rho = |\psi\rangle\langle\psi|, \sigma = |\phi\rangle\langle\phi|. The matrix \rho - \sigma lives in the two-dimensional span of \{|\psi\rangle, |\phi\rangle\}; outside this span, \rho - \sigma is zero. Inside it, you can compute eigenvalues by going to an orthonormal basis of the span and writing the 2\times 2 matrix explicitly. The result is eigenvalues \pm\sqrt{1 - |\langle\psi|\phi\rangle|^2} with zero elsewhere; summing absolute values gives 2\sqrt{1 - |\langle\psi|\phi\rangle|^2}, and dividing by 2 yields the formula.
So for pure states, F = |\langle\psi|\phi\rangle|^2 and D = \sqrt{1 - F}. The two metrics are exactly related: knowing one determines the other. The story is different for mixed states — see the Fuchs-van de Graaf inequalities below.
Operational meaning — the distinguishing task
Trace distance has the sharpest operational meaning of any quantum-state distance. Here is the game.
A referee picks a fair coin and, depending on the outcome, hands you one copy of either \rho or \sigma. You know what \rho and \sigma are; you just don't know which you received. You are allowed any measurement — projective, POVM, anything — and you must guess which state you hold. Your success probability is
If \rho = \sigma (D = 0), you guess at chance: p_{\text{correct}} = 1/2. If \rho \neq \sigma have orthogonal supports (D = 1), there is a measurement that tells them apart perfectly: p_{\text{correct}} = 1. Every intermediate D gives a linearly-interpolated optimum.
Sketch of why. Decompose \rho - \sigma = P - Q with P, Q \geq 0 supported on orthogonal subspaces. The optimal measurement is the two-outcome projective measurement \{\Pi_+, \Pi_-\} where \Pi_+ projects onto the support of P and \Pi_- = I - \Pi_+. Probability of guessing \rho when it was \rho plus probability of guessing \sigma when it was \sigma: \tfrac{1}{2}\text{tr}(\Pi_+\rho) + \tfrac{1}{2}\text{tr}(\Pi_-\sigma) = \tfrac{1}{2}(1 + \text{tr}(\Pi_+(\rho-\sigma))) = \tfrac{1}{2}(1 + \text{tr}(P)) = \tfrac{1}{2} + \tfrac{1}{2}D(\rho, \sigma). Why this measurement is optimal: the Helstrom bound says no other measurement does better, and you can see the intuition — you are separating the eigenspaces where \rho dominates (\Pi_+) from those where \sigma dominates (\Pi_-), which is exactly the information the trace distance captures.
This is the Helstrom bound. Every security proof, every state-discrimination protocol, every distinguishing argument in quantum information theory eventually collides with this formula. Trace distance is the natural language for "how much can the adversary (or the error) tell?"
Properties of trace distance
- Range. 0 \leq D(\rho,\sigma) \leq 1.
- Metric axioms. D(\rho,\sigma) = 0 \iff \rho = \sigma; D(\rho,\sigma) = D(\sigma,\rho); D(\rho,\tau) \leq D(\rho,\sigma) + D(\sigma,\tau) (triangle inequality).
- Unitary invariance. D(U\rho U^\dagger, U\sigma U^\dagger) = D(\rho,\sigma).
- Monotonicity under channels. D(\mathcal E(\rho), \mathcal E(\sigma)) \leq D(\rho,\sigma) for every CPTP map \mathcal E. Why the direction: channels only lose information, so whatever distinguishing power you had before a channel, you can't have more after it. Monotonicity goes the opposite way for trace distance (decreases) than for fidelity (increases), because the two metrics have opposite orientations — high F means similar, high D means different.
- Joint convexity. D\bigl(\sum_i p_i \rho_i, \sum_i p_i \sigma_i\bigr) \leq \sum_i p_i D(\rho_i, \sigma_i).
- Supremum over POVMs. D(\rho,\sigma) = \sup_{\{E_m\}} \tfrac{1}{2}\sum_m |\text{tr}(E_m\rho) - \text{tr}(E_m\sigma)|, where the supremum is over all POVMs. This characterisation is the distinguishing-task result in formal clothing.
Fuchs-van de Graaf — how fidelity and trace distance relate
Two metrics, one underlying "difference" between states. How do they constrain each other?
Fuchs-van de Graaf inequalities
For any two density operators \rho, \sigma,
For pure states the right inequality is tight: D = \sqrt{1 - F}. For mixed states the bounds are not tight in general.
The lower bound D \geq 1 - \sqrt F says: high fidelity forces low trace distance. If F \geq 1 - \epsilon, then \sqrt F \geq \sqrt{1-\epsilon} \geq 1 - \epsilon/2 (for small \epsilon), so D \leq 1 - \sqrt F \leq \epsilon/2 ... wait, that is the wrong direction; the lower bound on D gives D \geq 1 - \sqrt F, which for F near 1 is close to 0 but still provides a lower constraint. The more useful statement, in practice, is the upper bound.
The upper bound D \leq \sqrt{1 - F} says: high fidelity forces the trace distance to be small. This is the direction experimentalists care about. If you measure a fidelity of F = 0.99 against a target state, then D \leq \sqrt{0.01} = 0.1 — and the adversary or the error can distinguish your prepared state from the target with at most 0.1 advantage over guessing. Fidelity is the cheaper number to measure (a single expectation value of a projector, for target pure states), and Fuchs-van de Graaf turns it into a trace-distance bound automatically.
The two inequalities are sharp in the pure-state limit: F = |\langle\psi|\phi\rangle|^2 and D = \sqrt{1 - F} make both bounds equalities simultaneously. In the mixed-state interior of state space there is genuine slack, and one metric can be much more informative than the other for specific pairs.
Worked examples
Example 1: $|0\rangle$ vs $|+\rangle$ — computing both metrics on two pure states
Compute F and D between the two pure single-qubit states |0\rangle and |+\rangle = \tfrac{1}{\sqrt 2}(|0\rangle + |1\rangle). Both are pure, so both formulas collapse to their pure-state versions — and the two numbers will be related by D = \sqrt{1-F} exactly.
Step 1. Compute the inner product. Why start here: for pure states, fidelity is the squared modulus of the inner product, so the inner product is the one number that determines both F and D.
Step 2. Compute fidelity.
The fidelity is 1/2 — not great, but not zero. The two states have a substantial overlap.
Step 3. Compute trace distance via the pure-state formula.
Step 4. Verify the distinguishing interpretation. If a referee hands you one copy of either |0\rangle or |+\rangle (each with probability 1/2), the best single-shot probability of guessing correctly is
Over 85\% — much better than chance, because the states are more different than similar, but not 100\%, because they are not orthogonal. Why the optimal measurement is a projection midway: the Helstrom measurement projects onto the eigenvectors of \rho - \sigma, and for these two states the positive eigenvector is proportional to (\cos(\pi/8)|0\rangle + \sin(\pi/8)|1\rangle) — the angle bisector between |0\rangle and |+\rangle on the Bloch sphere, rotated by the \pi/8 that is half the angle between the two Bloch vectors.
Step 5. Sanity-check with the Fuchs-van de Graaf inequalities. The lower bound gives D \geq 1 - \sqrt F = 1 - 1/\sqrt 2 \approx 0.293. The upper bound gives D \leq \sqrt{1-F} = 1/\sqrt 2 \approx 0.707. The true value D = 1/\sqrt 2 saturates the upper bound — as expected for pure states.
Result. F(|0\rangle, |+\rangle) = 1/2 and D(|0\rangle, |+\rangle) = 1/\sqrt 2 \approx 0.707. The two numbers are exactly related by D = \sqrt{1-F} because both states are pure.
What this shows. For two pure qubit states, the full story of "how close" and "how distinguishable" is contained in the single angle between their Bloch vectors. Fidelity and trace distance are just two different trigonometric functions of that angle — both metrics are legitimate, and for pure states they are redundant.
Example 2: $I/2$ vs $|0\rangle\langle 0|$ — maximally mixed versus a pure state
Compute F and D between the maximally mixed qubit \rho = I/2 and the pure state \sigma = |0\rangle\langle 0|. One is at the centre of the Bloch ball, the other at the north pole — as far apart as a mixed state and a pure state can get in this geometry, yet they are not orthogonal.
Step 1. Compute the difference matrix.
Step 2. Compute trace distance. The eigenvalues of \rho - \sigma are \pm 1/2; sum of absolute values is 1; divide by 2:
Step 3. Compute fidelity using the pure-target simplification. Since \sigma = |0\rangle\langle 0| is pure, F(\rho, \sigma) = \langle 0|\rho|0\rangle. And \langle 0|(I/2)|0\rangle = 1/2, so
Step 4. Check the Fuchs-van de Graaf bounds. \sqrt F = 1/\sqrt 2 \approx 0.707. Lower bound: D \geq 1 - 1/\sqrt 2 \approx 0.293. Upper bound: D \leq \sqrt{1 - 1/2} = 1/\sqrt 2 \approx 0.707. The true value D = 1/2 lies strictly inside the bounds (not saturating either) — because \rho is mixed, the bounds aren't tight. Why the bounds aren't tight here: Fuchs-van de Graaf is tight only for pure states. A genuine mixed state can produce (F, D) pairs anywhere in the allowed region — the formalism doesn't pin D exactly from F for non-pure states.
Step 5. Check the distinguishing interpretation. Best single-shot success probability is 1/2 + 1/4 = 3/4. If someone hands you either a pure |0\rangle or a maximally mixed state (each with probability 1/2), you can guess correctly 75\% of the time — by measuring in the computational basis. If you see 0, guess |0\rangle (correct with probability 1 when it was the pure |0\rangle; wrong with probability 1/2 when it was the mixed state, because then the outcome is random). The overall success probability works out to 3/4, matching 1/2 + D/2.
Result. F(I/2, |0\rangle\langle 0|) = 1/2 and D(I/2, |0\rangle\langle 0|) = 1/2. Notice: both metrics give the same number here, but that is an arithmetic coincidence for this specific pair, not a general fact.
What this shows. Fidelity and trace distance measure different things, and for mixed states they do not determine each other. The Fuchs-van de Graaf inequalities give a range, not a formula. When you report an experimental fidelity of 0.99, the implied trace distance is anywhere in [1 - \sqrt{0.99}, \sqrt{0.01}] = [0.005, 0.1] — a 20\times range. Tight bounds require tight metrics, and which tight metric matters depends on what question you're asking.
Applications
The reason these metrics are worth the algebra is that they anchor the most important practical calculations in quantum computing.
- Hardware verification. When a lab announces "we prepared a Bell state with fidelity 0.99", they mean F(\rho_{\text{lab}}, |\Phi^+\rangle\langle\Phi^+|) \geq 0.99. That single number compresses an entire tomographic reconstruction of a 4\times 4 density matrix into one interpretable benchmark. The cross-platform comparison of superconducting and trapped-ion machines reduces to comparing fidelities for the same standard state.
- Tomography validation. Quantum state tomography reconstructs an unknown \rho from many measurements. The output has uncertainty, and the confidence region around the reconstructed \rho is typically described in trace distance (because trace distance bounds distinguishability, which is what a statistician controls).
- Error-correction thresholds. The threshold theorem says that below a certain physical error rate, arbitrarily long quantum computations are possible. The "error rate" here is most naturally the trace distance between the ideal and noisy states of each gate. Published thresholds (around 10^{-3} for the surface code, 10^{-2} in some optimistic models) are trace-distance numbers, not fidelity numbers — because trace distance multiplies cleanly when errors compound.
- Cryptographic security. Quantum key distribution protocols like BB84 prove security against an arbitrary adversary by bounding trace distance between the real protocol's output and an ideal key. If D(\rho_{\text{real}}, \rho_{\text{ideal}}) \leq \epsilon, no adversary (however powerful) can distinguish the real session from the ideal with advantage greater than \epsilon. The Fuchs-van de Graaf upper bound is how experimental fidelity measurements translate into \epsilon-security claims.
- Benchmark protocols. Randomised benchmarking and gate-set tomography report average gate fidelities — a mean of F over random input states — as a single-number summary of gate quality. The associated trace-distance bound follows automatically via Fuchs-van de Graaf.
At TIFR and IIT Madras, experimental quantum computing groups validating NMR, trapped-ion, and superconducting-qubit platforms quote fidelities against target Bell states, GHZ states, and prepared magic states as the standard benchmark. The published fidelities on the most advanced Indian platforms as of 2025 sit around 0.95-0.99 for two-qubit entangled states, implying trace distances in the 0.03-0.15 range via Fuchs-van de Graaf. That is above fault-tolerant threshold for some codes and below it for others — the same number tells both stories, depending on which metric you translate it into.
Common confusions
-
"Fidelity equals |\langle\psi|\phi\rangle|, not |\langle\psi|\phi\rangle|^2." Convention split: some references (especially older ones, and some channel-theory texts) use F' = \text{tr}\sqrt{\sqrt\rho\sigma\sqrt\rho} without squaring, so that pure states give F' = |\langle\psi|\phi\rangle|. This wiki uses the squared convention throughout, following Nielsen and Chuang and most modern literature. Always check which convention a paper is using before quoting numbers between sources.
-
"Fidelity and trace distance measure the same thing." They do not. For mixed states, the same F value is consistent with a range of D values (Fuchs-van de Graaf). For pure states they are rigidly related (D = \sqrt{1-F}), but for general mixed states one metric does not determine the other. When a paper reports only one, it is leaving room for the other to be anywhere in the compatible range.
-
"Both metrics capture the same information about closeness." No. Fidelity has a strong pure-state motivation (Born overlap); trace distance has a strong operational motivation (distinguishing task). An experimental paper usually reports fidelity. A security proof usually works in trace distance. Translating between them is routine but not lossless.
-
"Monotonicity means channels make states more similar." Precisely. F never decreases under channels; D never increases. The direction reflects the underlying physics: information can be lost by a channel, but it cannot be created. States that were different before the channel cannot be more different after — only equally or less.
-
"Trace distance of 0.5 means a 50\% chance of distinguishing." Not quite. D = 0.5 means best single-shot probability of distinguishing is 1/2 + 1/4 = 0.75, not 0.5. The formula is p_{\text{correct}} = (1 + D)/2, not p_{\text{correct}} = D. The baseline p_{\text{correct}} = 1/2 is already 50\% from chance alone; trace distance gives the advantage over chance, not the raw probability.
-
"Fidelity is the inner product of density matrices." Density matrices don't have inner products in the vector-space sense you are used to — the Hilbert-Schmidt inner product \text{tr}(\rho^\dagger\sigma) is a different object with different properties. Fidelity is defined via the nested square-root construction and, by Uhlmann, equals the maximum pure-state overlap on a larger space. Not the Hilbert-Schmidt inner product.
Going deeper
If you are here for the definitions of fidelity and trace distance, the pure-state special cases, the two operational meanings (Uhlmann for F, Helstrom for D), and the Fuchs-van de Graaf inequalities, you have the package. The rest of this section digs into the proof of Uhlmann's theorem, the diamond norm for channels, the Bures metric as an infinitesimal fidelity, and how these metrics plug into tomography and security proofs.
Uhlmann's theorem — proof sketch
The strategy uses the polar decomposition. Fix any purifications |\psi_\rho\rangle, |\psi_\sigma\rangle \in \mathcal H \otimes \mathcal H' of \rho, \sigma (constructed via the spectral-decomposition recipe of the purification chapter). Any other purifications differ by an ancilla unitary: (I \otimes U)|\psi_\rho\rangle and (I \otimes V)|\psi_\sigma\rangle for unitaries U, V on \mathcal H'. The overlap is
Choose fixed purifications |\psi_\rho\rangle = \sum_i \sqrt{p_i}\,|u_i\rangle|e_i\rangle and |\psi_\sigma\rangle = \sum_j \sqrt{q_j}\,|v_j\rangle|e_j\rangle using the same ancilla basis \{|e_i\rangle\} and the spectral decompositions \rho = \sum_i p_i|u_i\rangle\langle u_i|, \sigma = \sum_j q_j|v_j\rangle\langle v_j|. Then the overlap becomes \text{tr}(A W) for a specific operator A = \sqrt\rho\sqrt\sigma (more precisely, A_{ij} = \sqrt{p_i q_j}\,\langle u_i|v_j\rangle in the ancilla basis) and a unitary W = U^\dagger V. Maximising |\text{tr}(AW)| over unitaries W is a classical optimisation: the maximum is \|A\|_1 = \text{tr}|A|, achieved when W is the unitary part of the polar decomposition of A^\dagger. And \text{tr}|A| = \text{tr}\sqrt{A^\dagger A} = \text{tr}\sqrt{\sqrt\sigma\rho\sqrt\sigma}, which (by the symmetry of the square-root trace) equals \text{tr}\sqrt{\sqrt\rho\sigma\sqrt\rho} = \sqrt{F(\rho,\sigma)}. Squaring gives the fidelity. Full details in Nielsen and Chuang §9.2 or Preskill Ch.5.
Beyond fidelity and trace distance — the diamond norm
The two metrics above measure distance between states. There is also a metric for distance between channels, the diamond norm:
where the supremum is over all \rho on \mathcal H \otimes \mathcal H (the system plus a reference of the same dimension). The diamond norm is the operational channel-distinguishability metric — it upper-bounds how well any strategy, including those using entanglement with a reference, can distinguish two channels by running each once. It reduces to trace distance on \mathcal E_i(\rho) - \mathcal E_i(\sigma) in the obvious specialisations, and it is the natural language for channel-level error-correction thresholds.
The Bures metric — an infinitesimal fidelity
The Bures metric is the infinitesimal form of fidelity:
Unlike fidelity (which is 1 for identical states) and trace distance (which is a norm), the Bures metric satisfies the triangle inequality and defines a genuine Riemannian structure on the space of density operators. In this metric, \rho and \sigma are close iff F is close to 1. The Bures distance is used in quantum metrology — the theory of how precisely a parameter can be estimated from quantum measurements — because the inverse of the metric tensor gives the quantum Fisher information, which bounds parameter-estimation precision via the quantum Cramér-Rao inequality.
Quantum tomography — reading a state
Quantum state tomography is the experimental procedure of reconstructing an unknown density operator \rho from repeated measurements. You measure expectation values of a complete set of observables (for a qubit, \langle\sigma_x\rangle, \langle\sigma_y\rangle, \langle\sigma_z\rangle; for n qubits, 4^n - 1 Pauli strings) and solve for \rho. The reconstructed \hat\rho is an estimate — it has statistical error from finite samples, and it may even fail to be a valid density operator (positive semi-definite) if the statistics are noisy. Post-processing with maximum-likelihood estimation returns a valid \hat\rho close to the raw estimate. Fidelity against a target state \rho_{\text{target}} is then the one-number benchmark. In Indian NMR quantum computing (IIT Madras, TIFR Mumbai), tomographic reconstruction of deviation density matrices has been the daily experimental currency for two decades — each new algorithm's output is reported as a fidelity against the ideal output state.
Monotonicity as a structural theorem
The fact that both F and D are monotone under CPTP channels — F goes up, D goes down — is called the data-processing inequality. It is the quantum analogue of the classical fact that processing a random variable cannot increase its distinguishability from another random variable. The proof goes via purification for F (channels correspond to unitaries on a larger space, so fidelity among purifications is preserved), and via the convex structure of trace distance for D. Data processing is the single deepest property both metrics share, and it is why one-shot quantum protocols can be reasoned about using a handful of state distances rather than a blizzard of measurement statistics.
Indian context — NMR quantum computing and fidelity benchmarks
The Indian NMR quantum computing programme at TIFR and IIT Madras (Anil Kumar, T. S. Mahesh, and collaborators) spent the 2000s developing tomographic reconstruction techniques for deviation density matrices on liquid-state NMR qubits. Each published implementation of Deutsch-Jozsa, Grover, or Shor on a small NMR processor is evaluated by computing the fidelity of the experimental output state against the theoretical ideal. Fidelities around 0.95-0.99 on 3–7-qubit NMR experiments established these techniques as the standard benchmark long before superconducting and trapped-ion platforms caught up. The metric's use in the NISQ era on those newer platforms is continuous with this earlier Indian work.
Where this leads next
- Density operator — the mathematical object that F and D take as inputs.
- Purification — the larger-Hilbert-space construction underpinning Uhlmann's theorem.
- Kraus representation — quantum channels, on which both metrics are monotone.
- Standard channels — the bit-flip, phase-flip, depolarizing, and amplitude-damping channels, whose strength is naturally measured in fidelity and trace distance.
- Quantum tomography — reconstructing unknown states from measurements, using fidelity as the standard benchmark.
- CPTP maps — the general framework of quantum channels, on which data-processing inequalities for F and D live.
References
- Wikipedia, Fidelity of quantum states — definition, pure-state limit, Uhlmann's theorem.
- Wikipedia, Trace distance — definition, operational meaning, Helstrom bound.
- Nielsen and Chuang, Quantum Computation and Quantum Information (2010), §9.2 (distance measures for quantum information) — Cambridge University Press.
- John Preskill, Lecture Notes on Quantum Computation, Ch. 3 and Ch. 5 — theory.caltech.edu/~preskill/ph229.
- Armin Uhlmann, The "transition probability" in the state space of a ⋆-algebra (1976), Reports on Mathematical Physics — DOI:10.1016/0034-4877(76)90060-4.
- John Watrous, The Theory of Quantum Information (2018), Ch. 3 (similarity and distance among states and channels) — cs.uwaterloo.ca/~watrous/TQI.