Entanglement, Defined

Note: Company names, engineers, incidents, numbers, and scaling scenarios in this article are hypothetical — even when they resemble real ones. See the full disclaimer.

In short

A joint state |\psi\rangle_{AB} of two quantum systems is entangled if it cannot be written as a tensor product |\psi_A\rangle \otimes |\psi_B\rangle of single-system states. Otherwise it is a product state. The four Bell states, like |\Phi^+\rangle = \tfrac{1}{\sqrt{2}}(|00\rangle + |11\rangle), are the canonical entangled two-qubit states — provably non-factorable by a two-line contradiction. Three equivalent tests detect entanglement: non-factorability, Schmidt rank \geq 2, and the reduced density matrix being mixed (purity <1). Entanglement is the resource that powers teleportation, dense coding, Bell-inequality violations, and the exponential state-space that distinguishes quantum from classical computing. It does not enable faster-than-light communication — that is the single most persistent myth, and the no-communication theorem kills it dead.

Two qubits sit on the same chip. You prepare them in some joint state. Run up to the whiteboard, write down the four amplitudes, read them off. Now answer one question: is this joint state "just two separate qubits" — one state on A, one on B, no correlation between them — or is it something richer, something the tensor-product language cannot factor?

The answer to that question is the difference between quantum computing and classical computing.

If the joint state factors, then qubit A has its own life and qubit B has its own life, and the joint description is bookkeeping: "A is in |\psi_A\rangle, B is in |\psi_B\rangle, end of story." You could ship qubit A to Delhi and qubit B to Chennai, do experiments in both cities, and every prediction would depend only on the local state you shipped. Classical information works this way — and so does uncorrelated quantum information.

If the joint state does not factor, something strange is happening. There is no "state of A" alone. There is no "state of B" alone. The only complete description of the world is the joint state, and measurements on A are correlated with measurements on B in a way no classical statistics can match. The joint state is entangled, and entanglement is the thing that Einstein called "spooky," that Schrödinger called "the characteristic trait of quantum mechanics," and that every working quantum algorithm either creates, manipulates, or consumes.

This chapter gives you the precise definition of entanglement, shows you how to test for it on any two-qubit state, proves rigorously that the Bell state |\Phi^+\rangle is entangled, and sketches why entanglement is the resource that makes quantum different. It also does an honest hype-check: entanglement does not allow faster-than-light communication, and the no-communication theorem is the clean argument for why. Get the definition right and the rest follows.

The formal definition

Two qubits, each with their own 2-dimensional Hilbert space, combine into a 4-dimensional joint Hilbert space \mathcal{H}_A \otimes \mathcal{H}_B with basis \{|00\rangle, |01\rangle, |10\rangle, |11\rangle\}. A general joint state is some unit vector in that space:

|\psi\rangle_{AB} = c_{00}|00\rangle + c_{01}|01\rangle + c_{10}|10\rangle + c_{11}|11\rangle,

with |c_{00}|^2 + |c_{01}|^2 + |c_{10}|^2 + |c_{11}|^2 = 1.

Product state vs entangled state

A joint state |\psi\rangle_{AB} \in \mathcal{H}_A \otimes \mathcal{H}_B is a product state if there exist single-qubit states |\psi_A\rangle \in \mathcal{H}_A and |\psi_B\rangle \in \mathcal{H}_B such that

|\psi\rangle_{AB} = |\psi_A\rangle \otimes |\psi_B\rangle.

Otherwise, |\psi\rangle_{AB} is entangled.

Reading the definition. "Product state" means "can be factored." Entangled means "cannot be factored, not for any choice of |\psi_A\rangle and |\psi_B\rangle, no matter how you try." The definition is algebraic: entanglement is the failure of a specific factorisation to exist. This is not a vague property like "surprising" or "non-local" or "spooky" — it is a concrete mathematical statement about the form of a vector.

There is nothing inherently "mysterious" in the definition. What is mysterious is that such non-factorable states exist, are common, and have experimentally verifiable consequences that no classical probability distribution can mimic. But the definition itself is boring algebra.

Left: a product state. Each qubit has its own Bloch-sphere state; the joint description is the pair. Right: an entangled state. There are no single-qubit states to draw on the two spheres; the only complete description is the joint cloud.

The four Bell states

Of all entangled two-qubit states, four are standard:

|\Phi^+\rangle = \tfrac{1}{\sqrt{2}}\bigl(|00\rangle + |11\rangle\bigr),\qquad |\Phi^-\rangle = \tfrac{1}{\sqrt{2}}\bigl(|00\rangle - |11\rangle\bigr),

|\Psi^+\rangle = \tfrac{1}{\sqrt{2}}\bigl(|01\rangle + |10\rangle\bigr),\qquad |\Psi^-\rangle = \tfrac{1}{\sqrt{2}}\bigl(|01\rangle - |10\rangle\bigr).

These are the Bell states, named after John Bell (whose 1964 theorem is what made them important). They form an orthonormal basis of the 4-dimensional two-qubit Hilbert space — the Bell basis — every bit as natural as the computational basis \{|00\rangle, |01\rangle, |10\rangle, |11\rangle\}, just different. Every two-qubit state can be written as a complex linear combination of these four, and every Bell state is maximally entangled in a precise sense you will see by computing a partial trace below.

The four Bell states in Dirac and column-vector notation. Each is an orthonormal basis vector in the 4-dimensional two-qubit space, and each is maximally entangled.

A single CNOT and one Hadamard, applied to |00\rangle, prepare |\Phi^+\rangle: first H on qubit A gives \tfrac{1}{\sqrt{2}}(|00\rangle + |10\rangle), then CNOT (with A as control, B as target) flips B whenever A reads 1, giving \tfrac{1}{\sqrt{2}}(|00\rangle + |11\rangle). That two-gate circuit is the workhorse Bell-pair preparer on every real device. The other three Bell states come from the same circuit preceded by X or Z on one of the qubits. This is how Bell pairs are manufactured on Compustar Quantum, Querion's Willow, and every other public gate-model platform — the 1964 theoretical object, prepared on demand with two gates.

Proving that |\Phi^+\rangle is entangled

This is the two-line contradiction every student of quantum information meets, usually in the first week. Do it carefully, because it is the template for every later entanglement proof.

Example 1 — $|\Phi^+\rangle$ has no product factorisation

Show that |\Phi^+\rangle = \tfrac{1}{\sqrt{2}}(|00\rangle + |11\rangle) cannot be written as |\psi_A\rangle \otimes |\psi_B\rangle for any single-qubit states |\psi_A\rangle, |\psi_B\rangle.

Step 1 — Assume it can be factored. Suppose, for contradiction, that |\Phi^+\rangle = (\alpha|0\rangle + \beta|1\rangle) \otimes (\gamma|0\rangle + \delta|1\rangle) for some complex numbers \alpha, \beta, \gamma, \delta satisfying the normalisation conditions.

Step 2 — Expand the tensor product. Using the bilinearity of \otimes (the same expansion you saw in the tensor-products chapter):

Why four terms with these coefficients: distributing an outer parenthesis over an inner parenthesis under \otimes works exactly like multiplying two binomials (a+b)(c+d) = ac+ad+bc+bd. The coefficient on |ij\rangle is "amplitude of |i\rangle on A, times amplitude of |j\rangle on B." That product structure is the fingerprint of a product state.

Step 3 — Match coefficients with |\Phi^+\rangle. The target state \tfrac{1}{\sqrt{2}}(|00\rangle + |11\rangle) has coefficients \tfrac{1}{\sqrt 2} on |00\rangle and |11\rangle, and 0 on |01\rangle and |10\rangle. Matching:

\alpha\gamma = \tfrac{1}{\sqrt{2}}, \quad \alpha\delta = 0, \quad \beta\gamma = 0, \quad \beta\delta = \tfrac{1}{\sqrt{2}}.

Step 4 — Derive the contradiction. From \alpha\gamma = \tfrac{1}{\sqrt{2}} \neq 0, both \alpha \neq 0 and \gamma \neq 0. Plug \alpha \neq 0 into \alpha\delta = 0: this forces \delta = 0. Plug \gamma \neq 0 into \beta\gamma = 0: this forces \beta = 0. But then \beta\delta = 0 \cdot 0 = 0, which contradicts the requirement \beta\delta = \tfrac{1}{\sqrt{2}} \neq 0. Why the contradiction is the whole proof: the four coefficient equations are an over-determined system when the state is entangled — you can satisfy any three of them by choice of \alpha, \beta, \gamma, \delta, but the fourth blocks the solution. The blocking is what "non-factorability" means algebraically.

Step 5 — Conclude. No choice of (\alpha, \beta, \gamma, \delta) satisfies all four equations simultaneously. Therefore no factorisation |\Phi^+\rangle = |\psi_A\rangle \otimes |\psi_B\rangle exists. By the definition above, |\Phi^+\rangle is entangled.

Result. |\Phi^+\rangle is entangled — provably, with a two-line contradiction.

What this shows. The definition of entanglement has real algebraic teeth. It is not a matter of "some states happen to be surprising" — it is that the tensor-product map from pairs of single-qubit states into the 4-dimensional joint space misses a huge region, and a random state in the joint space is almost surely in the missed region. Entanglement is the default, not the exception. The same contradiction-by-coefficients argument works for any two-qubit state you write down: try to factor, see if it fails, and if it does, the state is entangled.

The four-equation contradiction that proves $|\Phi^+\rangle$ is entangled. Any three can be satisfied; the fourth blocks every solution.

Three equivalent tests for entanglement

Non-factorability is one way to detect entanglement, but it is rarely the easiest. Two other tests are algebraically cleaner and, for larger systems, often more practical.

Test 1: Non-factorability (the definition)

As above: try to write |\psi\rangle_{AB} = |\psi_A\rangle \otimes |\psi_B\rangle. If no choice works, the state is entangled. Clean for 2-qubit systems; painful for larger ones.

Test 2: Schmidt rank

Every bipartite pure state admits a Schmidt decomposition (you will meet the full theorem in a later chapter): there exist orthonormal bases \{|u_i\rangle_A\} and \{|v_i\rangle_B\} and non-negative real numbers \lambda_i such that

|\psi\rangle_{AB} = \sum_i \lambda_i\,|u_i\rangle_A \otimes |v_i\rangle_B, \qquad \sum_i \lambda_i^2 = 1.

The number of non-zero \lambda_i is the Schmidt rank of the state.

Schmidt rank = 1: one term in the sum, i.e., |\psi\rangle_{AB} = |u_1\rangle \otimes |v_1\rangle — a product state.
Schmidt rank \geq 2: cannot be collapsed to one term — entangled.

For |\Phi^+\rangle the Schmidt decomposition is already staring at you: |\Phi^+\rangle = \tfrac{1}{\sqrt 2}|0\rangle|0\rangle + \tfrac{1}{\sqrt 2}|1\rangle|1\rangle, so the Schmidt rank is 2. Entangled.

Test 3: Partial-trace purity

The most operationally meaningful test. Compute the reduced density matrix of qubit A by tracing out qubit B: \rho_A = \text{tr}_B(|\psi\rangle\langle\psi|_{AB}). Then:

\rho_A is pure (i.e., \rho_A^2 = \rho_A, or equivalently \text{tr}(\rho_A^2) = 1): |\psi\rangle_{AB} is a product state.
\rho_A is mixed (i.e., \text{tr}(\rho_A^2) < 1): |\psi\rangle_{AB} is entangled.

The quantity \text{tr}(\rho_A^2) is called the purity of \rho_A. Purity 1 means "pure state"; purity < 1 means "mixed." This test is beautiful because the computation of \rho_A is mechanical (the partial trace), and the purity is a single number that you can compare against 1.

It also has a clean operational interpretation: \rho_A describes what Alice sees when she ignores Bob. If \rho_A is mixed, Alice's qubit "alone" is in a state of genuine uncertainty — not because Alice lacks information, but because the only complete description of the world is the joint state, and there is no pure single-qubit state to which Alice's half reduces. This is the operational meaning of entanglement, and the partial-trace article (ch.9) works it out in detail.

Example 2 — product vs entangled via partial trace

Compare the reduced density matrices of two joint states: the product state |0\rangle \otimes |+\rangle and the entangled state |\Phi^+\rangle.

Step 1 — The product state |0\rangle|+\rangle. Expand:

|0\rangle \otimes |+\rangle = |0\rangle \otimes \tfrac{1}{\sqrt 2}(|0\rangle + |1\rangle) = \tfrac{1}{\sqrt 2}(|00\rangle + |01\rangle).

Form the joint density matrix and trace out qubit B. The partial trace acts on outer-product blocks by \text{tr}_B(|a\rangle\langle b|_A \otimes |c\rangle\langle d|_B) = \langle d|c\rangle \cdot |a\rangle\langle b|_A. Applying this to |0\rangle|+\rangle\langle 0|\langle +|:

\rho_A = \text{tr}_B\bigl(|0\rangle\langle 0|_A \otimes |+\rangle\langle +|_B\bigr) = \langle +|+\rangle \cdot |0\rangle\langle 0|_A = 1 \cdot |0\rangle\langle 0| = |0\rangle\langle 0|.

Why the B-factor drops to 1: \langle +|+\rangle = 1 because |+\rangle is a unit vector. The B-side carries no information about A when the state factors — the partial trace sees that directly and deletes B's contribution to a scalar.

So \rho_A = |0\rangle\langle 0| = \binom{1\;0}{0\;0}. Purity \text{tr}(\rho_A^2) = \text{tr}(|0\rangle\langle 0|\cdot|0\rangle\langle 0|) = \text{tr}(|0\rangle\langle 0|) = 1. Pure. The joint state is a product state, as expected.

Step 2 — The entangled state |\Phi^+\rangle. Expand the joint density matrix:

|\Phi^+\rangle\langle \Phi^+| = \tfrac{1}{2}\bigl(|00\rangle + |11\rangle\bigr)\bigl(\langle 00| + \langle 11|\bigr) = \tfrac{1}{2}\bigl(|00\rangle\langle 00| + |00\rangle\langle 11| + |11\rangle\langle 00| + |11\rangle\langle 11|\bigr).

Apply the partial trace over B to each term. On |00\rangle\langle 00| = |0\rangle\langle 0|_A \otimes |0\rangle\langle 0|_B: \text{tr}_B = \langle 0|0\rangle \cdot |0\rangle\langle 0|_A = |0\rangle\langle 0|. On |11\rangle\langle 11| = |1\rangle\langle 1|_A \otimes |1\rangle\langle 1|_B: \text{tr}_B = |1\rangle\langle 1|. On the cross term |00\rangle\langle 11| = |0\rangle\langle 1|_A \otimes |0\rangle\langle 1|_B: \text{tr}_B = \langle 1|0\rangle \cdot |0\rangle\langle 1|_A = 0 (orthogonality of |0\rangle and |1\rangle). Similarly the other cross term vanishes.

\rho_A = \tfrac{1}{2}|0\rangle\langle 0| + \tfrac{1}{2}|1\rangle\langle 1| = \tfrac{1}{2}I = \tfrac{1}{2}\begin{pmatrix}1 & 0\\ 0 & 1\end{pmatrix}.

Step 3 — Compare purities. Purity of the maximally mixed state:

\text{tr}\bigl((I/2)^2\bigr) = \text{tr}\bigl(I/4\bigr) = \tfrac{1}{2}.

Half. Mixed, not pure. The joint state is entangled.

Result. Product state \Rightarrow \rho_A pure (purity 1); entangled state \Rightarrow \rho_A mixed (purity < 1). For the Bell state in particular, \rho_A = I/2 — maximally mixed, the "classical coin" state — with purity 1/2, the smallest it can be on a qubit. A Bell state is maximally entangled: tracing out either half leaves the other in the most-mixed possible single-qubit state. Why "maximally entangled" means "maximally mixed reduction": the reduced density matrix captures everything local. If the reduction is as mixed as possible, all the information lives in the joint state only — none of it is accessible by looking at one side alone. That is the operational meaning of maximal entanglement.

What this shows. Entanglement is the failure of "local information" to suffice. For a product state, local information (the reduced density matrix) is as complete as it can be — pure. For a maximally entangled state, local information is as empty as it can be — maximally mixed. The slogan "the whole contains more than the sum of its parts" has this exact mathematical content.

Tracing out one qubit of a product state leaves the other in a pure state (purity 1). Tracing out one half of a Bell state leaves the other maximally mixed (purity 1/2). The purity of the reduction is a direct measure of entanglement.

A preview: entanglement entropy

The purity \text{tr}(\rho_A^2) is one measure of how entangled a state is. A more refined measure is the entanglement entropy — the von Neumann entropy of \rho_A:

S(\rho_A) = -\text{tr}(\rho_A \log \rho_A).

For a product state, \rho_A is pure and S(\rho_A) = 0. For a Bell state, \rho_A = I/2 and S(\rho_A) = \log 2 = 1 bit. In between, 0 < S(\rho_A) \leq \log 2 measures how entangled the state is, in units of "bits of entanglement." This is the quantity that everyone plots, every paper cites, every algorithm consumes. The full story is in later chapters on entanglement entropy and the Schmidt decomposition.

Classical correlation versus entanglement — a worked comparison

Entanglement is often described as "correlated outcomes," but classical random variables can be correlated too. The distinction between classical correlation and genuine entanglement is worth making precise, because it is where most intuition-building breaks down.

Consider two scenarios side by side.

Scenario A: classical correlated coins. A friend prepares two boxes. Each box contains either a 2-rupee coin showing heads or a 2-rupee coin showing tails. Your friend tosses a single fair coin behind a curtain, then — depending on the outcome — places a "heads coin" in both boxes (if the hidden toss came up heads) or a "tails coin" in both boxes (if it came up tails). The boxes are sealed and shipped, one to Delhi, one to Chennai. When Alice and Bob open their boxes, they always agree: both heads or both tails, with 50/50 probability on each.

Scenario B: a Bell pair. Alice and Bob share |\Phi^+\rangle = \tfrac{1}{\sqrt 2}(|00\rangle + |11\rangle). They each measure their qubit in the computational basis. They always agree: both see 0 or both see 1, with 50/50 probability on each.

In both scenarios, the local marginal statistics are identical — each party sees a random fair bit. And in both scenarios, when they compare, they always agree. So how is the Bell pair different from the prepared-coin pair?

The difference appears when they measure in a different basis.

In Scenario A, suppose Alice and Bob each also learn to ask a "colour" question about their coin (irrelevant to heads/tails, say: "is the coin shinier or duller than average?"). The prepared coins have some classical shininess, independently assigned at the mint. Alice's coin's shininess and Bob's coin's shininess have no correlation — the friend only matched them on heads/tails.

In Scenario B, if Alice and Bob instead each apply a Hadamard to their qubit before measuring (the qubit analogue of "changing basis to ask a different question"), the joint state transforms according to (H \otimes H)|\Phi^+\rangle. Work it out: H|0\rangle = |+\rangle, H|1\rangle = |-\rangle, so

(H \otimes H)|\Phi^+\rangle = \tfrac{1}{\sqrt 2}\bigl(|++\rangle + |--\rangle\bigr).

Why this still has perfect correlations: after the Hadamards, the state is a Bell-like state in the \{|+\rangle, |-\rangle\} basis — the + and - outcomes still always agree. The Hadamard-rotated version of |\Phi^+\rangle retains the maximal correlations, just expressed in a new basis.

Measuring now gives perfectly correlated outcomes in the X basis: both + or both -, always. The Bell pair is correlated in every measurement basis simultaneously, not just the one the preparer chose.

No classical correlated-coin scheme can mimic this. Classical correlations, once set at preparation, fix the joint distribution on a specific set of properties; any new question you invent to ask the coins later produces uncorrelated answers. Quantum entanglement's correlations are defined in every basis you could eventually choose to measure in, with perfect agreement across many of them. That robustness under basis changes is what separates entanglement from classical correlation — and it is the feature that Bell's theorem (1964) turns into an inequality that classical theories cannot satisfy and quantum theories can.

Classical correlated coins agree in the basis they were prepared in (Z) but are uncorrelated in any other basis. A Bell pair agrees in every basis both parties can choose — the correlation is preserved under Hadamard and under every other local unitary.

Entanglement as a resource — what it enables

Entanglement is not a curiosity. It is the resource that powers most of the non-trivial phenomena in quantum information. Each of these deserves its own chapter (some already have one, others are coming), but it is worth naming them here so you know what the rest of the curriculum is building toward.

Quantum teleportation (Bennett et al., 1993). Using one pre-shared Bell pair plus two classical bits, Alice can send an unknown qubit state to Bob without ever moving the physical qubit. Zero Bell pairs, you cannot do this; one Bell pair, you can. Entanglement is literally the fuel of teleportation.
Superdense coding (Bennett and Wiesner, 1992). The dual: one pre-shared Bell pair plus one qubit Alice sends to Bob conveys two classical bits. Entanglement doubles the classical information rate of a quantum channel.
Bell inequality violation. Measurements on the two halves of a Bell pair, in suitably chosen directions, produce correlations that no classical theory — no matter how contrived, no matter how many hidden variables — can reproduce. This is Bell's 1964 theorem, experimentally verified in Aspect's 1982 photon experiment and definitively in the 2015 loophole-free experiments (Hensen et al., Shalm et al., Giustina et al.). Entanglement is the reason quantum predictions differ empirically from classical ones.
Quantum key distribution (BB84 and E91). Entanglement-based QKD protocols (E91; Ekert, 1991) use the monogamy of entanglement — that two parties cannot both be maximally entangled with a third — to detect eavesdroppers. Any attempt to intercept information destroys entanglement, which the legitimate parties can measure.
Quantum computing speedups. Most quantum algorithms that achieve a provable advantage over classical computing create extensive entanglement as part of their execution. Whether entanglement is the source of the speedup, or merely a necessary feature alongside interference, is an active research question — but it is certainly present in every known advantage.

A common pedagogical picture, due to Aaronson and used by Preskill and others: imagine a string of Christmas lights in a dark room. A classical correlation is two bulbs that are dim together or bright together — both in the same state, same colour, same flicker. An entangled correlation is two bulbs wired so that they synchronise in every basis you choose to observe them in — colour agreement, flicker pattern, even basis rotations you hadn't specified in advance. Classical correlation is fragile; entanglement persists under basis changes. That persistence is the resource.

Entanglement as a central resource feeding four paradigmatic applications. Like energy, entanglement is consumable: each Bell pair enables one instance of each protocol and is typically destroyed in the process.

The non-signalling property — a hard hype-check

You have likely heard the phrase "spooky action at a distance." Einstein coined it in the 1935 EPR paper as a criticism, and the pop-science versions have run with it for ninety years: entanglement, they claim, "lets two particles communicate faster than light."

It does not. This is the single most persistent and damaging misconception in quantum information, and the correction is clean enough that you should be able to give it yourself after reading this section.

The precise statement is the no-communication theorem (Ghirardi, Rimini, Weber, 1980; extended and refined by many since). It says: if Alice and Bob share any entangled state, Alice cannot use any local operation on her qubit — measurement, gate, anything — to transmit information to Bob. Bob's measurement statistics, on his side alone, do not depend on anything Alice does.

Here is why, informally. Take Alice and Bob sharing |\Phi^+\rangle. Alice's reduced density matrix is \rho_A = I/2. If Alice applies some local unitary U_A or even measures her qubit, the joint state changes, but Bob's reduced density matrix — Bob's local description — stays at I/2. The reduced density matrix is unaffected by any local operation on the other side. Since Bob's measurement statistics depend only on his reduced density matrix, those statistics are identical whether Alice acted or not. Bob cannot tell the difference.

To see this concretely: suppose Alice chooses to measure her qubit in the Z basis. With probability 1/2 she gets 0 and the joint state collapses to |00\rangle; with probability 1/2 she gets 1 and it collapses to |11\rangle. From Bob's perspective, before Alice tells him her outcome, his qubit is in a statistical mixture: half the time |0\rangle, half the time |1\rangle. This mixture is described by \rho_B = \tfrac{1}{2}|0\rangle\langle 0| + \tfrac{1}{2}|1\rangle\langle 1| = I/2 — the same as before Alice measured. Every measurement Bob does on his qubit has the same outcome statistics as if Alice had done nothing at all. The correlations only appear once Alice sends Bob her outcome via a classical channel, at which point Bob updates his knowledge from "random coin" to "matches Alice." That classical channel is limited by the speed of light, and so the whole protocol is non-signalling.

Why the reduced state doesn't change under Alice's actions: any local operation Alice does on her qubit can be written as U_A \otimes I_B (for a unitary) or as a measurement acting only on the A-factor. Tracing out Alice's side afterwards always gives back I/2 on Bob's side, because the partial trace over A absorbs any operation that only touches A — it sums over Alice's outcomes with the probability weightings, and the sum reproduces the identity. This is a direct calculation and is the algebraic heart of the no-communication theorem.

Entanglement produces correlations between measurement outcomes, which only become visible when Alice and Bob compare their results afterwards — a comparison that requires them to communicate classically at sub-light speed. Without that classical channel, each party sees only random 50/50 outcomes on their local qubit. The correlation is real and quantum, but it is not a telegraph.

Hype check. Entanglement does not enable faster-than-light communication. A measurement on one half of an entangled pair does not "instantly affect" the other party's measurement in any operationally meaningful way. Both outcomes are random; the correlation shows up only in post-hoc comparison, which requires a classical channel. The no-communication theorem (1980) is the rigorous statement; relativistic quantum information theory, BB84-style protocols, and every real-world quantum-cryptography application depend on it.

Common confusions

"Entanglement means the two qubits are in contact / connected somehow." No. Entanglement is a property of a state, not of a physical link. An entangled pair can be separated by 1200 km (this has been done — Micius satellite experiments, 2017) and remain entangled. The "connection" is not a physical wire or a field; it is a correlation structure in the joint state description. If the state was prepared entangled, measurements on the two halves exhibit the correlations no matter where the qubits go. The physics is non-local only in the specific sense that Bell's theorem requires — not in the sense of "invisible string."
"Entanglement is just strong correlation." Only partly. Classical random variables can be arbitrarily correlated — in fact, perfectly correlated: a pair of classical coins set to always match gives correlation 1. What entangled states can do and classical correlated states cannot is produce correlations that violate a Bell inequality — correlations that persist and predict agreements in measurement bases you could pick after the state was created. Classical correlation is fixed once the variables are set; quantum entanglement is flexible across bases. That flexibility is the distinguishing feature.
"Entangled states are rare or exotic." They are the opposite of rare. The set of product states is a 3-complex-dimensional surface embedded inside the 4-complex-dimensional two-qubit Hilbert space. Pick a joint state uniformly at random — you land off the product-state surface with probability 1. Almost every two-qubit state is entangled. The special ones are the product states.
"Einstein proved entanglement was wrong." Einstein, Podolsky, and Rosen (1935) argued entanglement was a signature of quantum mechanics being incomplete, not wrong. They proposed that some deeper "hidden-variable" theory must underlie quantum mechanics, with the entanglement correlations secretly produced by local variables established at the source. Bell's 1964 theorem later showed, mathematically, that no local hidden-variable theory can reproduce all quantum predictions. The subsequent experiments (Aspect 1982, Hensen 2015, and dozens of others) have verified the quantum predictions against Bell's inequality repeatedly, loophole-free by 2015. Einstein was a great physicist who was wrong about this — and being wrong about something is how science moves forward.
"You can tell if a state is entangled just by looking at the amplitudes." Not always obvious from the components alone. The state \tfrac{1}{2}(|00\rangle + |01\rangle + |10\rangle + |11\rangle) looks maximally symmetric, but it factors: it equals |+\rangle \otimes |+\rangle, a pure product state. Meanwhile \tfrac{1}{\sqrt 2}(|00\rangle + |11\rangle) does not factor. You need one of the three algebraic tests (non-factorability, Schmidt rank, partial-trace purity) to be sure.
"Monogamy / no-cloning / no-communication are separate results." They are interconnected. The no-cloning theorem, the no-communication theorem, and monogamy of entanglement all follow from the linearity of quantum mechanics plus the tensor-product structure of composite systems. They are three faces of the same underlying algebra. Losing any one of them would break the others. Quantum mechanics is structurally coherent about "you cannot signal faster than light" in a way that took decades to fully appreciate.

Going deeper

The algebraic definition of entanglement, the proof-by-contradiction for Bell states, and the partial-trace test are the take-home essentials. What follows is the sharper theory: the Schmidt decomposition as a canonical form, quantitative measures of entanglement (concurrence, entanglement of formation, negativity), multipartite entanglement (where the two-qubit story splinters into multiple inequivalent classes), the connection to Bose's symmetrised tensor products for identical particles, and a pointer to Bell's theorem as the experimental face of the definition here.

Schmidt decomposition — a canonical form

The Schmidt decomposition, mentioned above as Test 2, is stronger than just "it exists." For any bipartite pure state |\psi\rangle_{AB} there are orthonormal bases \{|u_i\rangle_A\} and \{|v_i\rangle_B\} and unique real numbers \lambda_1 \geq \lambda_2 \geq \cdots \geq 0 (the Schmidt coefficients) such that

|\psi\rangle_{AB} = \sum_i \lambda_i\,|u_i\rangle_A \otimes |v_i\rangle_B.

The coefficients are the non-negative square roots of the eigenvalues of \rho_A (equivalently of \rho_B — they share the same non-zero spectrum, a lovely non-obvious theorem). The Schmidt rank (number of non-zero \lambda_i) measures how entangled the state is on a discrete scale; the distribution of the \lambda_i measures it on a continuous scale.

This is the diagonalisation that underlies every quantitative entanglement measure. Its computation is just singular-value decomposition (SVD) on the 2 \times 2 matrix of amplitudes c_{ij} of |\psi\rangle_{AB} — straightforward linear algebra, routinely done in Qiskit and similar tools.

Entanglement measures: concurrence, entanglement of formation

For two-qubit pure states, the concurrence C(|\psi\rangle) is defined as C = |\langle \psi^* | \sigma_y \otimes \sigma_y | \psi\rangle|, where |\psi^*\rangle is the complex conjugate of the amplitudes in the computational basis. For a Bell state, C = 1 (maximal). For a product state, C = 0. Concurrence has the useful property of extending cleanly to mixed states (Wootters, 1998), where it becomes the building block for the entanglement of formation — the minimum number of Bell pairs needed, on average, to prepare a given state. These are the workhorse quantitative entanglement measures for qubits.

For larger systems (more qubits, or qudits of higher dimension), no single measure is universally best. Negativity, relative entropy of entanglement, squashed entanglement, logarithmic negativity — each captures a different aspect, and the landscape of entanglement measures is itself an active research area.

Multipartite entanglement — GHZ vs W

Three qubits have their own entanglement classes. The GHZ state

|\text{GHZ}\rangle = \tfrac{1}{\sqrt 2}\bigl(|000\rangle + |111\rangle\bigr)

and the W state

|W\rangle = \tfrac{1}{\sqrt 3}\bigl(|001\rangle + |010\rangle + |100\rangle\bigr)

are both genuinely entangled across all three qubits, but they are inequivalent — no local operation can convert one into the other even probabilistically. This was proven by Dür, Vidal, and Cirac (2000). The two-qubit story has one maximally-entangled class (the Bell states); the three-qubit story already has at least two inequivalent classes (GHZ and W). For more qubits, the zoo grows — and classifying multipartite entanglement is an open problem in the general case.

The GHZ and W states behave differently under loss of a single qubit. Measure any one qubit of a GHZ state in the Z basis: the other two are projected onto |00\rangle or |11\rangle — a product state, no residual entanglement. Measure any one qubit of a W state: the other two are left in a state that is still entangled with measurable Bell-inequality violations. GHZ is brittle under qubit loss; W is robust. This is the first concrete clue that "how much" and "how structured" entanglement a state carries are different questions. There is no single scalar that captures both. The quantum-information community has built a whole zoo of measures — concurrence, entanglement of formation, negativity, relative entropy of entanglement, squashed entanglement, tangles of various orders — because no single one dominates the others across all tasks.

Monogamy of entanglement

A striking structural property: entanglement cannot be shared promiscuously. If qubit A is maximally entangled with qubit B (say, in a Bell state), then qubit A cannot simultaneously be maximally entangled with any third qubit C. This is called monogamy of entanglement, and it was proven quantitatively by Coffman, Kundu, and Wootters (2000) in the CKW inequality for three qubits.

Monogamy is not a metaphor; it is the algebraic reason the E91 quantum key distribution protocol is secure. If an eavesdropper Eve tried to entangle herself with Alice's qubit in a Bell pair Alice shares with Bob, Eve's entanglement would have to come at the cost of the Alice-Bob entanglement, which Alice and Bob can measure directly. Eve's presence degrades the Alice-Bob correlations in a detectable way — there is no quantum-mechanical way to "listen in" without weakening the primary entanglement. This is structurally different from classical cryptography, where eavesdropping can in principle be undetectable.

Monogamy is also why the universe is thermodynamically stable in its entanglement structure. In a many-body quantum system, any given qubit is entangled with its neighbours by bounded amounts; the total entanglement is shared out. When a system is thermally equilibrated, each subsystem has an entanglement entropy that scales with its boundary area (the "area law" — a deep result in quantum many-body theory) rather than its volume, precisely because of how monogamy constrains the distribution.

Bose, identical particles, and symmetrised tensor products

Entanglement as you have just met it applies to distinguishable subsystems — qubit A on one part of a chip, qubit B on another. When the two subsystems are identical quantum particles (two electrons in a helium atom, two photons in a beam), the joint state must lie in a symmetrised (for bosons) or antisymmetrised (for fermions) subspace of the full tensor product. This extra structure is not entanglement in the definition-above sense, but a symmetry constraint — yet the two ideas interact in subtle ways, and for identical-particle systems the "entanglement" question has to be refined to "entanglement beyond that forced by statistics."

The statistical classification itself is due to Satyendra Nath Bose, whose 1924 paper "Planck's Law and the Light Quantum Hypothesis" — initially rejected but then translated to German and championed by Einstein — introduced what we now call Bose-Einstein statistics. Bose's paper is the load-bearing reference for symmetrised tensor products of identical bosons; bosons are literally named after him. His work sits beside C.V. Raman's 1928 inelastic-scattering experiments and Meghnad Saha's 1920 ionisation equation as the foundational Indian contributions to the quantum theory that underlies everything in this chapter. In a curriculum about entanglement and composite quantum systems, Bose is primary source, not decoration.

Bell's theorem — the experimental face of the definition

Everything in this chapter has been algebra. The actual experimental content of entanglement — why it is not just a theorist's definition but a measurable fact — comes through Bell's theorem.

Bell (1964) considered a specific setup: Alice and Bob share a pair, each chooses one of two measurement directions, and they compare the average correlations of their outcomes across all four combinations of choices. He defined a quantity (the CHSH expression, in the cleanest form) that every local hidden-variable theory must bound by 2 — but quantum mechanics predicts it can reach 2\sqrt{2} \approx 2.828 on a Bell state. Aspect's 1982 photon experiment measured a value around 2.7, decisively above 2. The 2015 loophole-free experiments (Hensen et al. at Delft, Shalm et al. at NIST, Giustina et al. in Vienna) closed every known loophole simultaneously, confirming the violation.

The upshot: entanglement is not a definition looking for empirical confirmation. It has been confirmed, and confirmed, and confirmed again, and the quantum correlations exceed what any local classical theory can produce. The definition this chapter starts with is the mathematical shadow of a measurable physical phenomenon that is now textbook-solid.

Bell's theorem gets its own chapter. For now, know that the algebra of "cannot be factored" and the experiments of "cannot be reproduced by hidden variables" are two sides of the same coin. You have met the first side here.

Where this leads next

Bell states — a full tour of all four Bell states, their preparation circuits, their measurements, and the Bell-state basis.
Bell's theorem and CHSH — the inequality, its derivation, and the experiments that ruled out local hidden variables.
The no-cloning theorem — why you cannot duplicate an unknown quantum state, and why this is structurally tied to entanglement and no-signalling.
Schmidt decomposition — the canonical diagonal form for bipartite states, and the technical heart of entanglement theory.
Quantum teleportation — the first quantum protocol that uses entanglement as fuel.
The partial trace — the operation behind Test 3; the bridge between joint and reduced descriptions.

References

Nielsen and Chuang, Quantum Computation and Quantum Information (2010), §2.4 on entanglement and §2.5 on the Schmidt decomposition — Cambridge University Press.
John Preskill, Lecture Notes on Quantum Computation, Ch. 4 (entanglement, Schmidt decomposition, Bell inequality) — theory.caltech.edu/~preskill/ph229.
Wikipedia, Quantum entanglement — the definition, history, and experimental status [3].
Wikipedia, Bell state — the four Bell states, their circuits, and their role as the canonical examples of maximal entanglement.
B. Hensen et al., Loophole-free Bell inequality violation using electron spins separated by 1.3 kilometres (2015) — arXiv:1508.05949. The experimental end of the EPR debate.
Wikipedia, Satyendra Nath Bose — the 1924 paper behind symmetrised tensor products and Bose-Einstein statistics [6].