Bell's Theorem and CHSH

Note: Company names, engineers, incidents, numbers, and scaling scenarios in this article are hypothetical — even when they resemble real ones. See the full disclaimer.

In short

In 1964, John Bell proved that no local hidden-variable theory — any classical model in which distant particles carry pre-existing outcomes determined by local variables — can reproduce all the correlations of quantum mechanics. The cleanest test is the CHSH inequality: a specific combination of four correlation expectation values, S = \langle A_0 B_0\rangle + \langle A_0 B_1\rangle + \langle A_1 B_0\rangle - \langle A_1 B_1\rangle. Every local classical theory is forced by algebra to satisfy |S| \leq 2. Quantum mechanics, on a Bell state with the right measurement angles, reaches S = 2\sqrt{2} \approx 2.828 — the Tsirelson bound. Aspect (1982), then Hensen, Giustina, and Shalm (all 2015, loophole-free) measured values clearly above 2, closing the question experimentally. Bell violation does not enable faster-than-light signalling — the no-communication theorem forbids it — but it is the operational core of device-independent quantum cryptography, of the security argument behind satellite-based QKD, and of the sharpest statement we have of what is quantum about quantum mechanics.

In 1935, three physicists in Princeton wrote a paper whose title was a question. Albert Einstein, Boris Podolsky, and Nathan Rosen asked: can quantum-mechanical description of physical reality be considered complete? Their answer was no. Quantum mechanics, they argued, must be an incomplete description of the world — because it seemed to allow two particles, prepared together and separated by any distance, to produce correlated measurement outcomes that no classical story could explain without invoking instantaneous influence across space. Einstein coined his famous dismissal: "spooky action at a distance."

The EPR paper, as it is now known, did not kill quantum mechanics. But it did sharpen the question: if quantum mechanics gives correct predictions, is there some deeper, classical-looking theory underneath? One in which each particle secretly carries "hidden variables" — instructions it received at preparation — that determine outcomes locally, without any action at a distance?

For nearly thirty years, that question sat open as a matter of philosophy. Then, in 1964, an Irish physicist at CERN named John Stewart Bell turned it into a theorem. He showed that any local hidden-variable theory must satisfy a specific inequality on measurement correlations — and that quantum mechanics, in certain experiments, predicts a violation. The question was no longer philosophical. It was an experimental target.

This chapter tells you what Bell proved, derives the cleanest form of the inequality (the CHSH inequality, due to Clauser, Horne, Shimony, and Holt in 1969), shows you why a Bell state reaches 2\sqrt{2}, and surveys the experiments — culminating in the 2015 loophole-free trio — that put the question to rest. It also does an honest hype-check on the word "non-local," which popular accounts persistently misread as "faster than light."

The setup — a two-player game

Picture the experiment as a game, because that is exactly what it is. Two players, Alice and Bob, are separated. They cannot communicate during the game. A referee gives each of them a single-bit input: Alice gets x \in \{0, 1\}, Bob gets y \in \{0, 1\}. Each of them must output a single bit: Alice outputs a \in \{-1, +1\}, Bob outputs b \in \{-1, +1\}. (We use \pm 1 instead of \{0, 1\} because the algebra of expectation values is cleaner.)

The referee scores them by computing a specific correlation quantity after many rounds:

S = \langle A_0 B_0\rangle + \langle A_0 B_1\rangle + \langle A_1 B_0\rangle - \langle A_1 B_1\rangle,

where \langle A_x B_y\rangle is the average product of Alice's output times Bob's output, taken over all rounds where Alice got input x and Bob got input y. The three plus signs and one minus sign are deliberate, and they are the whole reason the game is non-trivial.

The CHSH setup. A source emits pairs of particles. Alice and Bob, far apart, each receive one. Each chooses one of two measurement settings. Each produces an output in $\{-1, +1\}$. The CHSH quantity $S$ combines the four correlation averages with three plus signs and one minus sign.

Two questions hang over the game:

What is the best Alice and Bob can do classically? Meaning: before the game, they agree on any strategy whatsoever — they may share as much classical randomness as they like, any pre-arranged lookup table, any hidden variables that were set at the source. But during the game, they cannot communicate. What is the maximum value of S they can achieve?
What is the best they can do quantumly? Meaning: they may share any entangled quantum state prepared at the source, and their measurement boxes are quantum devices. What is the maximum value of S then?

Bell's theorem is the statement that the answers to these two questions are different, and that the difference is experimentally measurable.

The classical bound — |S| \leq 2

This is where the algebra pays for itself. A local hidden-variable theory is any model in which Alice's output depends only on (i) her input x and (ii) some hidden variable \lambda shared with Bob at preparation, and Bob's output depends only on y and \lambda. The locality is the key clause: Alice's output does not depend on Bob's input, and vice versa.

In such a model, let A_x(\lambda) \in \{-1, +1\} be Alice's output given input x and hidden variable \lambda, and similarly B_y(\lambda) \in \{-1, +1\}. The correlation is the average over the distribution of \lambda:

\langle A_x B_y\rangle = \int A_x(\lambda)\,B_y(\lambda)\,\rho(\lambda)\,d\lambda,

where \rho(\lambda) is the probability distribution over hidden variables.

Now look at the combination S(\lambda) = A_0(\lambda)B_0(\lambda) + A_0(\lambda)B_1(\lambda) + A_1(\lambda)B_0(\lambda) - A_1(\lambda)B_1(\lambda), evaluated for a single value of \lambda. Factor the first two terms and the last two:

S(\lambda) = A_0(\lambda)\bigl[B_0(\lambda) + B_1(\lambda)\bigr] + A_1(\lambda)\bigl[B_0(\lambda) - B_1(\lambda)\bigr].

Why factor this way: B_0, B_1 \in \{-1, +1\}. Their sum B_0 + B_1 is \pm 2 if they agree, 0 if they disagree. Their difference B_0 - B_1 is the opposite: 0 if they agree, \pm 2 if they disagree. So at most one of the two bracketed quantities is non-zero for any given \lambda — the other is automatically zero.

Exactly one of the two brackets is \pm 2 and the other is 0, so |S(\lambda)| \leq 2 for every \lambda. Averaging over \lambda (which can only shrink things, since averaging a bounded quantity gives something bounded):

That is the CHSH inequality: for any local hidden-variable theory, |S| \leq 2. The derivation took four lines. It holds for any shared randomness, any lookup table, any strategy built on any classical resource, no matter how elaborate.

Why classical $|S| \leq 2$. For any hidden variable $\lambda$, either $B_0$ and $B_1$ agree — making the first bracket $\pm 2$ and the second $0$ — or they disagree, swapping the roles. Exactly one bracket survives; the other vanishes.

Example 1 — the best classical strategy saturates $S = 2$

Suppose Alice and Bob agree in advance that Alice will always output +1 regardless of her input, and Bob will output +1 when his input is 0 or 1 — again always +1. This is deterministic: no randomness, no tricks. Compute the correlators:

\langle A_0 B_0\rangle = (+1)(+1) = +1, \quad \langle A_0 B_1\rangle = +1, \quad \langle A_1 B_0\rangle = +1, \quad \langle A_1 B_1\rangle = +1.

Then

S = 1 + 1 + 1 - 1 = 2.

The all-+1 strategy saturates the classical bound. The minus sign in the CHSH combination is the whole reason they cannot do better: it asks the correlations to be aligned on three measurement-setting pairs and anti-aligned on the fourth, which is algebraically incompatible with fixed classical answers.

What this shows. Saturating |S| = 2 is easy for a classical team; exceeding it is impossible. If Alice and Bob ever produce S > 2 in the lab, their behaviour is not explicable by any local classical strategy — no matter how they prepare, no matter what random bits they share, no matter what lookup tables they use. That is the content of Bell's theorem in one sentence.

The quantum bound — |S| \leq 2\sqrt 2

Now let Alice and Bob share the Bell state |\Phi^+\rangle = \tfrac{1}{\sqrt 2}(|00\rangle + |11\rangle). Each of them will measure along a direction on the Bloch sphere: Alice in direction \vec{a}_x (depending on her input x \in \{0, 1\}), Bob in direction \vec{b}_y.

The observable "measure spin along direction \vec{n}" is the Pauli operator \vec{n} \cdot \vec{\sigma} = n_x X + n_y Y + n_z Z, where X, Y, Z are the Pauli matrices. Its eigenvalues are \pm 1, which is why we chose \{-1, +1\} as the output alphabet.

For the Bell state, the correlation of Alice measuring along \vec{a} and Bob along \vec{b} is the textbook result (derived from a direct computation of \langle \Phi^+ |\, (\vec{a}\cdot\vec\sigma) \otimes (\vec{b}\cdot\vec\sigma)\, |\Phi^+\rangle):

\langle A(\vec a)\, B(\vec b)\rangle_{\Phi^+} = \vec{a} \cdot \vec{b} = \cos\theta_{ab},

where \theta_{ab} is the angle between the two measurement axes. The correlation is just the dot product; cosines of angles between the Bloch-sphere directions.

Why the correlation is \vec a \cdot \vec b on |\Phi^+\rangle: the Bell state has a rotational structure — (U \otimes U^*)|\Phi^+\rangle = |\Phi^+\rangle for any single-qubit unitary U. This symmetry forces the expectation value to depend only on the angle between the measurement axes, not on their absolute orientation. A direct calculation in the computational basis nails down the dependence as the cosine. The cleanest derivation runs through Schmidt decomposition; the details can wait for ch.22.

Now pick four directions, one for each setting:

Alice: \vec a_0 = \hat z (measure Z), \vec a_1 = \hat x (measure X).
Bob: \vec b_0 = \tfrac{1}{\sqrt 2}(\hat z + \hat x) (measure (Z+X)/\sqrt 2), \vec b_1 = \tfrac{1}{\sqrt 2}(\hat z - \hat x) (measure (Z-X)/\sqrt 2).

The angles between each Alice-Bob pair work out to:

\vec a_0 \cdot \vec b_0 = \hat z \cdot \tfrac{1}{\sqrt 2}(\hat z + \hat x) = \tfrac{1}{\sqrt 2}.
\vec a_0 \cdot \vec b_1 = \hat z \cdot \tfrac{1}{\sqrt 2}(\hat z - \hat x) = \tfrac{1}{\sqrt 2}.
\vec a_1 \cdot \vec b_0 = \hat x \cdot \tfrac{1}{\sqrt 2}(\hat z + \hat x) = \tfrac{1}{\sqrt 2}.
\vec a_1 \cdot \vec b_1 = \hat x \cdot \tfrac{1}{\sqrt 2}(\hat z - \hat x) = -\tfrac{1}{\sqrt 2}.

Plug into the CHSH sum:

S = \tfrac{1}{\sqrt 2} + \tfrac{1}{\sqrt 2} + \tfrac{1}{\sqrt 2} - \bigl(-\tfrac{1}{\sqrt 2}\bigr) = \tfrac{4}{\sqrt 2} = 2\sqrt 2.

S = 2\sqrt 2 \approx 2.828. Above the classical bound of 2 by a factor of \sqrt 2. The quantum strategy beats every conceivable classical one.

The optimal CHSH configuration. Alice measures along $Z$ or $X$; Bob measures along the two axes $45°$ away from each. Three correlators give $+1/\sqrt 2$, one gives $-1/\sqrt 2$, and the CHSH sum (with its three plus signs and one minus sign) aligns all four contributions constructively to give $2\sqrt 2$.

Example 2 — computing $\langle A_0 B_0\rangle$ for the Bell state

Verify directly that measuring Z on Alice's qubit and (Z+X)/\sqrt 2 on Bob's qubit of |\Phi^+\rangle gives correlation 1/\sqrt 2.

Step 1 — The operator. Alice measures Z; Bob measures (Z+X)/\sqrt 2. The joint observable is

A_0 \otimes B_0 = Z \otimes \frac{Z+X}{\sqrt 2} = \frac{1}{\sqrt 2}\bigl(Z \otimes Z + Z \otimes X\bigr).

(Z \otimes Z)\tfrac{1}{\sqrt 2}(|00\rangle + |11\rangle) = \tfrac{1}{\sqrt 2}\bigl((+1)(+1)|00\rangle + (-1)(-1)|11\rangle\bigr) = |\Phi^+\rangle.

Why Z \otimes Z is the identity on |\Phi^+\rangle: each component has even parity (both |00\rangle and |11\rangle have 0+0 and 1+1, both even). Z \otimes Z multiplies by (-1)^{\text{parity}}, which is +1 on even-parity states. So (Z \otimes Z)|\Phi^+\rangle = |\Phi^+\rangle. Immediately, \langle \Phi^+|Z \otimes Z|\Phi^+\rangle = \langle \Phi^+|\Phi^+\rangle = 1.

(Z \otimes X)\tfrac{1}{\sqrt 2}(|00\rangle + |11\rangle) = \tfrac{1}{\sqrt 2}\bigl((+1)|01\rangle + (-1)|10\rangle\bigr) = \tfrac{1}{\sqrt 2}(|01\rangle - |10\rangle) = |\Psi^-\rangle.

Then \langle \Phi^+ | \Psi^-\rangle = 0 because |\Phi^+\rangle and |\Psi^-\rangle are orthogonal Bell basis states.

Step 3 — Assemble.

\langle A_0 B_0\rangle = \langle \Phi^+|A_0 \otimes B_0|\Phi^+\rangle = \tfrac{1}{\sqrt 2}\bigl(\langle \Phi^+|Z\otimes Z|\Phi^+\rangle + \langle \Phi^+|Z \otimes X|\Phi^+\rangle\bigr) = \tfrac{1}{\sqrt 2}(1 + 0) = \tfrac{1}{\sqrt 2}.

Result. \langle A_0 B_0\rangle = 1/\sqrt 2, matching \vec a_0 \cdot \vec b_0 = \cos 45°. The dot-product formula was a shortcut; Example 2 is the direct computation under the hood.

What this shows. The quantum correlation formula \langle A(\vec a)B(\vec b)\rangle = \vec a \cdot \vec b for the Bell state is not a definition — it is a computable consequence of the state, the Pauli operators, and the tensor-product structure. Every CHSH experiment that reports a value of S is, in effect, repeating this calculation in hardware.

The Tsirelson bound — why not beyond 2\sqrt 2?

A natural next question: could some other quantum state or set of measurements push S higher than 2\sqrt 2? The answer is no: 2\sqrt 2 is the maximum quantum value, regardless of state, regardless of measurements, for the standard CHSH combination. This is known as the Tsirelson bound (Boris Tsirelson, 1980).

The proof is not hard in outline. Treat A_0, A_1, B_0, B_1 as operators with eigenvalues in \{-1, +1\} (so A_x^2 = I, B_y^2 = I) and with [A_x, B_y] = 0 (Alice's and Bob's operators commute because they act on disjoint factors). Then compute the square of the CHSH operator \hat S = A_0(B_0 + B_1) + A_1(B_0 - B_1):

\hat S^2 = 4 I - [A_0, A_1][B_0, B_1].

The second term's magnitude is bounded by 4 since both commutators are bounded by 2 in operator norm. So \hat S^2 \leq 8\,I, giving \|\hat S\| \leq 2\sqrt 2. Clean identity, clean bound. The algebraic constants of quantum mechanics — specifically the i in [X, Z] = -2iY — put the ceiling at exactly 2\sqrt 2.

Three bounds on the CHSH quantity. Classical local hidden-variable theories live below $2$. Quantum mechanics reaches up to $2\sqrt 2$ — no higher. The algebraic maximum $4$ is reachable by hypothetical theories ("PR-boxes") that are non-signalling but more powerful than quantum mechanics; no such theory is realised in nature.

So CHSH has three bounds, not two:

|S| \leq 2 — local hidden-variable theories.
|S| \leq 2\sqrt 2 — quantum mechanics (Tsirelson bound).
|S| \leq 4 — algebraic maximum, allowed by non-signalling alone but not realised in quantum mechanics.

The gap between 2\sqrt 2 and 4 is a curious thing. It is possible to write down a fictitious theory — the PR-box, after Popescu and Rohrlich (1994) — that saturates |S| = 4, remains non-signalling (so doesn't violate relativity), but is not quantum mechanics. Why does nature stop at 2\sqrt 2 rather than 4? This is an open question in quantum foundations. It is a hint that quantum mechanics is more constrained than pure non-signalling alone would require, and the reasons are not fully understood.

The experiments — Aspect 1982 and the 2015 loophole-free trio

The CHSH inequality was proposed in 1969 as a cleaner experimental target than Bell's original 1964 inequality. The first experiment to report a clear violation was Alain Aspect's 1982 work at Orsay, France. Using polarisation-entangled photon pairs from a calcium cascade, Aspect and collaborators measured S \approx 2.70 \pm 0.02 — more than thirty standard deviations above the classical bound.

But the Aspect experiment, and every subsequent experiment for decades, had loopholes. A loophole is a specific way a dedicated sceptic might still defend a local hidden-variable interpretation, by pointing to an imperfection in the experiment. Three loopholes mattered:

The detection loophole. If only a small fraction of particle pairs are detected (photon detectors were inefficient for a long time), a clever hidden-variable theory could "choose" which pairs to present, biasing the statistics.
The locality (or communication) loophole. If Alice and Bob's measurement choices and readouts are too slow, there could in principle be a sub-light-speed signal carrying information between them during the experiment.
The freedom-of-choice loophole. If Alice's and Bob's settings are not truly random but determined in advance by some process correlated with the source, a hidden-variable theory could still fit the data.

Closing all three loopholes simultaneously took three decades of experimental progress: better photon detectors, space-like separation of measurement events, and verifiably random choice of settings. In 2015, three experiments achieved it:

Hensen et al., Delft (August 2015) — entangled electron spins in diamond nitrogen-vacancy centres, 1.3 km apart, reported S = 2.42 \pm 0.20 [1].
Giustina et al., Vienna (December 2015) — entangled photon pairs with 75\% detection efficiency, reported violation of a Bell inequality by more than eleven standard deviations.
Shalm et al., NIST (December 2015) — entangled photons, over 180 m, reported S = 2.35, with over seven-standard-deviation significance after closing all three loopholes.

Three independent experiments, three independent platforms (electron spins, photons, photons), three independent labs, all reporting loophole-free violations of the CHSH inequality within the same year. The question was settled. Local hidden-variable theories are wrong. Quantum mechanics makes the right predictions, and those predictions exceed anything classical physics can accommodate.

Measured CHSH values in selected experiments, with the classical bound at $2$ and the quantum Tsirelson bound at $2\sqrt 2$. All four measurements are above the classical bound by many standard deviations. Hensen, Giustina, and Shalm (all 2015) closed all three major loopholes simultaneously.

What Bell violation means — and what it does not mean

The result is clean and the temptation to over-read it is enormous. Take the temperature down and state carefully what has been proved.

What is established: no local hidden-variable theory can reproduce quantum mechanical predictions for CHSH-type experiments. The world cannot be described by a model in which (a) each particle carries pre-determined outcomes as local properties, and (b) Alice's measurement outcome depends only on her local property and her measurement choice. One of (a) or (b) — or both — must fail.

What is not established: that anything travels faster than light. Bell violation is perfectly compatible with special relativity. Careful: in the literature, the word "non-local" is used in a specific technical sense — violating the Bell inequality — not in the colloquial sense of "one event affects another at spacelike separation." Those are different statements. The no-communication theorem (seen in ch.18) forbids using a Bell state to send information; Bell violation is a statement about correlations that need no causal influence to arise.

What Bell violation also does not prove: that "reality" doesn't exist, that "the moon isn't there when nobody looks," or any of the other sweeping metaphysical claims that hover around quantum foundations. What Bell proved is narrower and sharper: any hidden-variable model attached to quantum mechanics must either reject locality (in the technical sense) or reject counterfactual definiteness (the claim that measurements have definite outcomes even when not performed). Most practising physicists interpret this as: quantum mechanics is not a local classical theory with hidden extras. The deeper philosophical fallout depends on which interpretation you pick — and interpretations of quantum mechanics are a choose-your-own-adventure genre, not an empirical distinction.

Hype check. Bell's theorem does not prove faster-than-light signalling is possible. It does not say the two particles are "communicating instantaneously." It does not say Alice's measurement "changes" Bob's particle. What it says is: the joint probability distribution over Alice's and Bob's measurement outcomes cannot be reproduced by any classical model that respects locality. The correlation is real; the causation is not. Every Bell test uses post-hoc classical communication to compare outcomes; no test has ever shown, or could show, faster-than-light information transfer. Pop-science articles that describe entanglement as "instant communication between particles" are wrong. The technical content of Bell's theorem is precisely what this chapter just said — no more, no less.

Why Bell violation matters — device independence

Beyond the philosophical stakes, Bell violation has a deeply practical consequence. It powers a class of quantum cryptographic protocols called device-independent: protocols whose security depends only on the observed CHSH value, not on any trust in the quantum devices Alice and Bob use.

In ordinary quantum key distribution (QKD), Alice and Bob assume their devices correctly prepare and measure the quantum states they claim to. If a manufacturer slipped in a flaw, security is lost. Device-independent QKD, first proposed by Ekert (1991) and formalised by Mayers and Yao (1998), sidesteps this. If Alice and Bob repeatedly measure CHSH on pairs from a shared source and consistently get S close to 2\sqrt 2, they can certify — without trusting the manufacturer at all — that their source is producing near-Bell-state pairs and their measurements are near-optimal. Any eavesdropper's attempt to tamper would reduce their observed S.

The first demonstrations of device-independent QKD appeared in 2022 (Nadlinger et al., using trapped ions; Zhang et al., using photons), and the field has grown rapidly since. The underlying guarantee — that a quantum state's entanglement can be certified by a number, the CHSH violation, without any further modelling — is sometimes called the self-testing property of Bell states. It is the strongest form of quantum security argument known.

ISRO's 2022 quantum key distribution demonstration from Bengaluru to Hyderabad, part of the National Quantum Mission, uses BB84 rather than device-independent QKD; but the longer-term roadmap, articulated in the mission's programme documents, anticipates moving toward Bell-inequality-based security once the photon-detection efficiencies are high enough. The Raman Research Institute in Bangalore — home to some of India's leading quantum-optics experimentalists — has published CHSH-based entanglement-certification results since the 2010s, and is one of the few Indian labs capable of closing Bell-test loopholes on photonic setups.

Common confusions

"Bell's theorem is one specific experiment." No. Bell's theorem is a mathematical theorem about the structure of probability distributions compatible with local hidden variables. It applies to infinitely many experimental configurations. CHSH is the simplest and cleanest realisation; the original 1964 Bell inequality is a different form; there are also multi-party generalisations (Mermin, Svetlichny) and inequalities for higher-dimensional systems. The theorem is the underlying mathematical impossibility; any specific inequality is an experimentally convenient consequence.
"Non-local means faster than light." As the hype-check above said: in the literature, "non-local" is a term of art — a local hidden-variable theory violates a Bell inequality, and such violation is labelled "non-local." It does not mean FTL signalling. The no-communication theorem forbids signalling. The two usages have been a source of confusion for sixty years and remain so in popular articles.
"If there are loopholes, then experiments haven't confirmed quantum mechanics." The loopholes are specific, known imperfections that could in principle allow a local hidden-variable explanation. They have been closed. The 2015 loophole-free experiments — and subsequent, even more stringent tests — have confirmed the quantum prediction by many standard deviations after all three canonical loopholes were addressed simultaneously. The question of whether local hidden variables can explain nature is experimentally closed.
"The 2\sqrt 2 bound is magical." Not magical — it is the algebraic consequence of A_x^2 = I, B_y^2 = I, and [A_x, B_y] = 0. Tsirelson's proof is a one-line operator inequality, and the \sqrt 2 factor comes from the specific way Pauli operators anticommute. The real surprise is not that the quantum bound is 2\sqrt 2 but that the algebraic maximum 4 is not reached — the non-signalling constraint alone would allow S up to 4 (via hypothetical PR-boxes), and understanding why nature stops at 2\sqrt 2 is an open foundational question.
"Bell proved EPR were wrong." Partly. EPR's specific claim was that quantum mechanics is incomplete and must be completable by local hidden variables. Bell proved that no such completion is possible: local hidden variables plus quantum predictions are mathematically incompatible. But EPR also posed a reasonable methodological question — when is a theory "complete"? — that has not been definitively answered. Bell's theorem settled the mathematical possibility, not the philosophical question about what physics should aspire to.
"Quantum entanglement lets you send messages instantly." Hype. Repeat the hype-check: no operational use of Bell-pair measurements alone can transmit information. Every quantum-communication protocol that achieves a useful task — teleportation, dense coding, QKD — uses Bell-state correlations plus classical communication at sub-light speed. Nothing useful happens "instantly." The entanglement is a resource that enables clever protocols; it does not provide a faster-than-light telegraph. Every Bell test ever done has been consistent with this, and the theoretical reasons are airtight.

Going deeper

The CHSH game, the classical bound, the quantum bound, and the Tsirelson ceiling are the core. What follows is the machinery — a full derivation of the Tsirelson bound via the SDP (semidefinite programme) hierarchy, a pointer into device-independent quantum cryptography and its modern practice, the structure of non-signalling theories beyond quantum, the PR-box, and some historical notes on John Bell himself.

The Tsirelson bound via SDP hierarchy

Tsirelson's \sqrt 2 was proved in 1980 via a direct operator-algebra argument. A more systematic treatment, the Navascués-Pironio-Acín (NPA) hierarchy (2007), turns the problem "what are the quantum values achievable in a Bell scenario?" into a sequence of semidefinite programmes. The k-th level of the hierarchy is a relaxation whose optimal value upper-bounds the true quantum value, and the sequence converges to the exact quantum value as k \to \infty.

For the CHSH scenario, the first level of the NPA hierarchy already saturates at 2\sqrt 2 — the quantum maximum is exactly the level-1 SDP bound. For more complicated scenarios, higher levels are needed. The NPA hierarchy is the standard computational tool in Bell-inequality research today; it is how a working physicist asks "what is the quantum value of this new Bell expression" and gets a tight number.

Finite-level NPA bounds can be computed in standard SDP solvers (Mosek, SCS, CVXOPT). For the interested student: Qiskit does not currently include NPA tooling out of the box; the ncpol2sdpa Python package is the common entry point.

Device-independent QKD and randomness expansion

Device-independent quantum cryptography is the practical pay-off of Bell violation. Two paradigmatic tasks:

Device-independent quantum key distribution (DIQKD). Alice and Bob share pairs from a source. They run CHSH tests on a subset and use the rest to extract a secret key. The security proof (Pironio et al., 2009; Vazirani and Vidick, 2014) shows that if the observed CHSH violation is above a threshold, the extracted key is secret against any eavesdropper — even one with arbitrary computational power, and even if the devices were built by the eavesdropper. Nadlinger et al. (2022) reported the first full experimental demonstration using trapped-ion pairs.
Device-independent randomness expansion / amplification. If you have a small amount of initial randomness and a source violating a Bell inequality, you can produce a longer stream of certified random bits. This is the cryptographic analogue of "randomness extractors" but based on physical device behaviour rather than computational hardness assumptions.

Both are active research areas with growing practical relevance as hardware improves. The common thread is: Bell violation certifies quantum behaviour, and quantum behaviour certifies randomness and security.

Non-signalling theories and the PR-box

The non-signalling principle says Alice's marginal statistics cannot depend on Bob's input — the natural causal constraint on any physical theory compatible with relativity. Any theory satisfying non-signalling obeys |S| \leq 4 (the trivial algebraic maximum). Quantum mechanics satisfies non-signalling and obeys the stronger bound |S| \leq 2\sqrt 2. So something stronger than non-signalling is constraining quantum correlations.

Popescu and Rohrlich (1994) asked: is there a physically reasonable theory saturating |S| = 4, stronger than quantum but still non-signalling? They constructed the PR-box: a hypothetical device producing output pairs (a, b) satisfying a \oplus b = x \cdot y (XOR equals product of inputs) with uniform marginal statistics. A PR-box is non-signalling but achieves S = 4. No such device is realised by any known physical theory, but the PR-box is a useful thought-experiment for understanding what constrains quantum correlations beyond non-signalling alone.

Candidate constraints investigated: information causality (Pawłowski et al., 2009), macroscopic locality, and others. None has yet been shown to single out quantum mechanics exactly — so the question "what principle picks 2\sqrt 2 over 4?" remains genuinely open, and it is one of the cleanest open problems in quantum foundations.

Multipartite Bell inequalities and GHZ proofs

Bell's 1964 argument and CHSH concern two-party scenarios. For three or more parties, stronger inequalities exist. The Mermin inequality (1990) for three-party correlations has a classical bound of 2 and a quantum maximum of 4 — a factor-2 gap rather than \sqrt 2. The GHZ argument (Greenberger-Horne-Zeilinger, 1989) goes further: on the GHZ state |\mathrm{GHZ}\rangle = \tfrac{1}{\sqrt 2}(|000\rangle + |111\rangle), certain measurements give outcomes that are perfectly deterministic under quantum mechanics and perfectly impossible under local hidden variables — a single run of a clean GHZ experiment rules out local hidden variables, without any inequality or statistical averaging.

The GHZ argument is pedagogically famous because it gives the cleanest possible demonstration that Bell-type reasoning does not require statistical distributions — it can be done as a simple counterfactual contradiction. It is the three-qubit analogue of CHSH and it is often the example a lecturer reaches for after showing CHSH.

John Bell — a portrait

Bell, born in Belfast in 1928, worked at CERN on particle physics for most of his career. His 1964 paper on the hidden-variable problem was a side project, published in a low-profile journal (Physics Physique Fizika) that folded a few years later. For decades it was a minority interest; the dominant Copenhagen-interpretation view held that questioning quantum mechanics' foundations was unprofessional. Bell, characteristically, did not care. He pursued the foundational questions with a combination of philosophical clarity and technical rigour that is now textbook.

Bell died in 1990 of a cerebral haemorrhage at 62 — before the loophole-free experiments, before the Nobel Committee awarded Aspect, Clauser, and Zeilinger the 2022 Physics Prize for experimental tests of Bell inequalities. His collected essays, Speakable and Unspeakable in Quantum Mechanics, are among the clearest writings on the conceptual foundations of quantum mechanics ever produced, and remain essential reading for anyone who wants to internalise what Bell actually proved.

Where this leads next

The no-cloning theorem — the structural cousin of Bell's theorem: quantum mechanics' linearity forbids duplication of unknown states.
Quantum teleportation — the protocol that uses Bell pairs (and classical communication) to transfer quantum information.
BB84 quantum key distribution — the original quantum cryptography protocol; E91 is its entanglement-based cousin and uses Bell inequality violation as the security argument.
Device-independent QKD — modern cryptography based on CHSH violation rather than device trust.
The Schmidt decomposition — the canonical form for bipartite pure states; the technical foundation for quantitative entanglement.

References

B. Hensen et al., Loophole-free Bell inequality violation using electron spins separated by 1.3 kilometres (Nature, 2015) — arXiv:1508.05949.
Wikipedia, Bell's theorem — history, statement, and context.
Wikipedia, CHSH inequality — the specific inequality used in most modern Bell tests, including its derivation and bounds.
Wikipedia, Loophole-free Bell test — a survey of the 2015 experiments and the closed loopholes.
John Preskill, Lecture Notes on Quantum Computation, Ch. 4 (Bell inequalities and CHSH) — theory.caltech.edu/~preskill/ph229.
Nielsen and Chuang, Quantum Computation and Quantum Information (Cambridge, 2010), §2.6 on the EPR paradox and Bell's inequality — Cambridge University Press.