Classical Repetition Codes

Note: Company names, engineers, incidents, numbers, and scaling scenarios in this article are hypothetical — even when they resemble real ones. See the full disclaimer.

In short

The classical repetition code encodes one bit by sending it n times; the receiver decodes by majority vote. For n=3, a single bit-flip is correctable, and the failure probability drops from p (unprotected) to 3p^2 - 2p^3 — for p = 0.01, from 10^{-2} down to about 3 \times 10^{-4}. As n grows, the failure probability goes to zero for any p < 1/2. This is the simplest classical error-correcting code — elegant, trivial, practical. Claude Shannon's 1948 channel-capacity theorem says you can protect information at any rate up to 1 - H(p) bits per channel use (about 0.919 bits when p = 0.01); the repetition code achieves only 1/n, so more efficient codes — Hamming, Reed-Solomon, LDPC, turbo, polar — exist and are used everywhere from your phone's 4G modem to deep-space probes. But the repetition code does not survive the jump to quantum: you cannot copy an unknown qubit (no-cloning), measurement for majority vote would collapse the superposition, and quantum errors are a continuous family, not just flips. The next chapter builds the quantum analogue — the 3-qubit bit-flip code — which solves all three problems by replacing copies with entanglement and classical reads with parity measurements.

You are sending the Hindi word "namaste" to your friend over a noisy channel. Somewhere in the wires, one in a hundred bits flips. You could send the raw bytes and hope. Or you could send each byte three times — "nnnaaammmmaaassstttteee" — and let the receiver vote: if two of the three copies of each character agree, output that character. A single flip in any triple gets outvoted by its two siblings. The message arrives intact, at the cost of using three times as much bandwidth.

This is the 3-bit repetition code. It is the simplest error-correcting code in information theory, the one that gets taught on the first day of every coding-theory course, and the one that fails the moment you try to quantum-ise it. Understanding why it works classically — and why every step of it breaks under quantum mechanics — is the fastest route into quantum error correction.

In why QEC is hard you saw three walls: no-cloning, continuous errors, measurement collapse. This chapter is the classical warm-up. You will see exactly what the classical strategy looks like, compute its failure probability from scratch, meet Shannon's capacity theorem (which says you can do much better than repetition), and end with three clean reasons repetition fails quantumly. The 3-qubit bit-flip code (ch. 116) is how Peter Shor got around those failures — but to appreciate his cleverness, you need to see the classical code in full first.

The 3-bit repetition code

Here is the whole code in three sentences. Encode: to send bit b, transmit three copies b, b, b. Transmit: the channel flips each bit independently with probability p (this is the "binary symmetric channel", or BSC, the simplest noisy-channel model). Decode: the receiver reads three bits r_1, r_2, r_3 and outputs the majority.

The 3-bit repetition code. The sender encodes one bit as three identical copies, each copy passes through the noisy channel independently with per-bit flip probability $p$, and the receiver applies majority vote to decode. A single flip is corrected; two or three flips cause a decoding error.

That is the entire code. No fancy algebra, no clever polynomial. Three copies, one vote. The rest of this article is about asking the right questions of it: how often does it fail, how does the failure rate change with n, and how far is this from the best you could possibly do?

Failure probability from first principles

Suppose you sent b = 0, and the channel independently flips each bit with probability p. The received triple (r_1, r_2, r_3) is then one of eight possibilities, each with a probability determined by how many flips occurred.

Let k be the number of flips — the number of received bits that disagree with what was sent. k follows a binomial distribution: P(k \text{ flips}) = \binom{3}{k} p^k (1-p)^{3-k}. Why binomial: each of the three bits flips independently with probability p; \binom{3}{k} counts the ways to choose which k of the three bits flip, and each such pattern has probability p^k (1-p)^{3-k}. This is the same formula as "how many heads in 3 flips of a biased coin."

Spelled out:

P(k=0) = (1-p)^3 — no bit flipped.
P(k=1) = 3p(1-p)^2 — exactly one bit flipped.
P(k=2) = 3p^2(1-p) — exactly two bits flipped.
P(k=3) = p^3 — all three flipped.

The majority-vote decoder outputs the correct bit whenever fewer than half the bits flipped — that is, k \in \{0, 1\}. It outputs the wrong bit when k \in \{2, 3\}. So

P(\text{decode wrong}) \;=\; P(k=2) + P(k=3) \;=\; 3p^2(1-p) + p^3.

Expand and simplify:

P(\text{fail}) \;=\; 3p^2 - 3p^3 + p^3 \;=\; 3p^2 - 2p^3.

For small p, 3p^2 dominates. The code has turned a per-bit failure probability of p into a per-logical-bit failure probability of approximately 3p^2. That is a quadratic suppression — a feature any good error-correcting code must deliver.

Plug in numbers. For p = 0.01:

P(\text{fail}) \;=\; 3(0.0001) - 2(0.000001) \;\approx\; 3 \times 10^{-4}.

You used three bits of channel capacity to turn a 1\% error rate into a 0.03\% error rate. The rate of the code is 1/3 (one logical bit per three physical bits), so the "bandwidth cost" is a factor of 3.

For p = 0.1 — a much noisier channel — P(\text{fail}) \approx 3(0.01)(0.9) + 0.001 = 0.028. Still much better than 0.1, but less dramatic, because p is now larger.

For p = 0.5 — a completely random channel — P(\text{fail}) = 3(0.25)(0.5) + 0.125 = 0.5. The code does nothing. If the channel is a coin flip, no amount of repetition helps, because the three received bits are independently random.

This last case hints at the threshold at p = 1/2. Below threshold, repetition reduces error. At and above threshold, repetition is useless.

Larger n — quadratic becomes better

Repetition with n = 3 gives O(p^2) failure. What about n = 5 or n = 7?

For a length-n repetition code with n odd, the majority decoder gets it wrong when more than half the bits flip, i.e., k \geq \lceil n/2 \rceil. The failure probability is

P(\text{fail}; n) \;=\; \sum_{k=\lceil n/2 \rceil}^{n} \binom{n}{k} p^k (1-p)^{n-k}.

This is the binomial tail probability. For small p, the leading term is the one with the smallest k in the sum, namely k = \lceil n/2 \rceil = (n+1)/2. So

P(\text{fail}; n) \;\approx\; \binom{n}{(n+1)/2} p^{(n+1)/2} (1-p)^{(n-1)/2} \;\sim\; C_n\, p^{(n+1)/2}

for small p. Why this approximation: the binomial tail is a sum of positive terms in decreasing powers of p; the first term is O(p^{(n+1)/2}) and every subsequent term is at least p times smaller, so the first term dominates when p \ll 1.

Rough failure rates for p = 0.01:

n = 3: P(\text{fail}) \approx 3 \times 10^{-4}.
n = 5: P(\text{fail}) \approx 10^{-5}.
n = 7: P(\text{fail}) \approx 3.5 \times 10^{-7}.
n = 9: P(\text{fail}) \approx 10^{-8}.

Each additional pair of bits drops the failure probability by roughly p^2 \times \text{constant} — two more orders of magnitude for p = 0.01.

Decoded failure probability of the $n$-bit repetition code as a function of the per-bit flip probability $p$, plotted on a log scale. For $p < 0.5$, larger $n$ gives dramatically lower failure — and as $n \to \infty$ the failure probability goes to zero. At exactly $p = 0.5$ all curves meet at $0.5$: the channel is random, no code can help. Above $0.5$ (not shown), repetition actually makes things worse.

As n \to \infty with p fixed and p < 1/2, the law of large numbers kicks in: the average number of flips is np, which is below n/2, so the majority-vote decoder is correct with probability approaching 1. For any p < 1/2, you can drive the error rate to zero by making n large enough.

There is a catch. The rate of the code is 1/n — to send one bit, you use n channel uses. As n \to \infty, the rate goes to zero. You are spending infinite bandwidth to get zero error. That sounds like a Faustian bargain, and the right question is: can you get arbitrarily low error rate at a nonzero rate?

Claude Shannon answered that question in 1948.

Shannon's channel capacity — how good can you get?

The binary symmetric channel with flip probability p — the BSC(p) — has a number associated with it called its capacity. Shannon's noisy-channel coding theorem (1948) says:

For any rate R below the capacity, there exists a code of rate R whose error probability can be made arbitrarily small by choosing long enough block length.
For any rate R above the capacity, no code can achieve arbitrarily small error probability. The code is forced into errors by the channel's information-theoretic limit.

For the BSC(p), the capacity is

C(p) \;=\; 1 - H(p) \quad \text{bits per channel use},

where H(p) = -p \log_2 p - (1-p) \log_2 (1-p) is the binary entropy. Why 1 - H(p): each channel use transmits one noisy bit; H(p) is the entropy of the noise (the information the channel throws away at random); 1 - H(p) is what is left over — the reliable information that can be extracted. At p = 0, H(0) = 0 and C = 1 (perfect channel). At p = 1/2, H(1/2) = 1 and C = 0 (the channel carries no information). The derivation is in any information-theory textbook; Shannon's original paper sketched the proof.

Numerical values:

p = 0.01: H \approx 0.081, so C \approx 0.919 bits per channel use.
p = 0.1: H \approx 0.469, so C \approx 0.531.
p = 0.25: H \approx 0.811, so C \approx 0.189.
p = 0.5: H = 1, so C = 0.

For p = 0.01, Shannon says you can send up to about 0.919 reliable bits per channel use. The 3-bit repetition code achieves rate 1/3 \approx 0.333 — a factor of roughly 2.8 worse than optimal. Bigger repetition codes achieve rate 1/n, which goes to zero. Repetition is not even close to optimal.

The Shannon capacity curve $C(p) = 1 - H(p)$ (accent) bounds what any code can achieve. Horizontal dashed lines show the rate of the 3-, 5-, and 7-bit repetition codes — all far below capacity. For $p = 0.01$, capacity is $0.919$ and the 3-bit repetition code achieves $0.333$. Better codes — Hamming, LDPC, polar — approach the capacity curve without the exponential rate loss of repetition.

This is why real systems do not use repetition. Your 4G phone uses LDPC and polar codes to get close to Shannon capacity for a realistic noise model. Wi-Fi uses LDPC. Deep-space probes (Voyager, New Horizons) use convolutional codes and Reed-Solomon. CDs use Reed-Solomon. Each of these encodes more information per channel use than repetition, at the same target error rate. Repetition is the textbook starting point, not the production technology.

For a flavour: the Hamming [7, 4] code encodes 4 bits in 7 — rate 4/7 \approx 0.571, much better than 1/3 — and still corrects any single-bit error. The cost: the encoding and decoding are algebraic rather than majority-vote, but the computation is still trivial by modern standards. Your wallet's SIM card runs Reed-Solomon decoders thousands of times per second without breaking a sweat.

Why none of this works quantumly

Classical error correction is an elegant, mature theory. Every step of the repetition code is so concrete that a 15-year-old can compute it with pen and paper. Why does none of it survive the jump to qubits?

Obstacle 1: no-cloning. To "encode |\psi\rangle as three copies |\psi\rangle|\psi\rangle|\psi\rangle", you would need a machine that takes an unknown qubit and outputs three identical copies of it. The no-cloning theorem (Wootters and Zurek, 1982) proves no such machine can exist — the only operations allowed by quantum mechanics are unitary (linear), and a cloner would be nonlinear. You cannot make three copies of \alpha|0\rangle + \beta|1\rangle without already knowing \alpha, \beta.

Obstacle 2: measurement collapses the state. The decoder's second step is to read the three received bits and majority-vote. Reading a classical bit is passive and free; reading a qubit is a measurement, which collapses the superposition. If the encoded state were |\psi\rangle|\psi\rangle|\psi\rangle (which, per obstacle 1, you could not prepare anyway), measuring the first qubit in the computational basis would collapse it to |0\rangle or |1\rangle — destroying the amplitudes you were trying to protect. The cure erases the patient.

Obstacle 3: quantum errors are continuous. Classical errors are discrete — a bit flips or it does not. Quantum errors are any unitary close to identity, which forms a three-dimensional real continuum on a single qubit (rotations on the Bloch sphere by any small angle about any axis). And that is before you count non-unitary noise like amplitude damping and depolarising. Even if you had a copy-and-majority-vote scheme, there would be infinitely many "small errors" to design against, and any finite syndrome would miss most of them.

Each obstacle on its own would sink the strategy. The three together rule out repetition-style quantum error correction completely.

And yet quantum error correction exists. Peter Shor (1995) and Andrew Steane (1996) independently found codes that sidestep all three obstacles — by replacing copying with entanglement, by making the error set discrete through measurement (the discretisation theorem — see why QEC is hard), and by measuring parities rather than individual bits. The simplest of these, the 3-qubit bit-flip code, is ch. 116. You will recognise the skeleton from this chapter — encode, error, decode — but every step is reinvented for quantum mechanics.

Worked examples

Example 1: failure probability of 3-bit repetition at $p = 0.1$

Compute the decoded failure probability of the 3-bit repetition code for a binary symmetric channel with per-bit flip probability p = 0.1. Compare to the uncoded failure probability.

Step 1. Write the formula.

P(\text{fail}; n=3) \;=\; 3p^2(1-p) + p^3.

Step 2. Substitute p = 0.1.

P(\text{fail}) \;=\; 3(0.1)^2(0.9) + (0.1)^3 \;=\; 3(0.01)(0.9) + 0.001 \;=\; 0.027 + 0.001 \;=\; 0.028.

Step 3. Compare. Without the code, the per-bit failure probability is p = 0.1 = 10\%. With the code, the per-logical-bit failure probability is 0.028 = 2.8\%. The code has reduced the error by a factor of roughly 3.6 — more modest than the factor 30 you get at p = 0.01, because the approximation 3p^2 degrades as p grows.

Step 4. Check the rate cost. You sent 3 physical bits per logical bit, so the rate is 1/3. The Shannon capacity at p = 0.1 is C = 1 - H(0.1) \approx 1 - 0.469 = 0.531. The code is using 0.333 bits per channel use to protect one logical bit — about 63\% of what Shannon says is achievable. A Hamming or LDPC code at the same rate would give dramatically lower error, and at higher rate would match the error rate with less overhead.

Failure probability with and without the 3-bit repetition code, at per-bit flip probability $p = 0.1$. The uncoded rate is $10\%$; the coded rate is $2.8\%$ — a factor-$3.6$ improvement at the cost of $3\times$ bandwidth. The improvement is less dramatic than at $p = 0.01$, because the approximation $P(\text{fail}) \approx 3p^2$ gets worse as $p$ grows.

What this shows. The repetition code works in the quadratic-suppression regime p \ll 1; as p grows, the suppression factor shrinks. At p = 0.1 you still get a useful improvement; at p = 0.4 the improvement is marginal; at p = 0.5 it disappears. Any real channel must be far below the p = 0.5 threshold for repetition to help — the same qualitative story that will reappear for quantum codes in later chapters.

Example 2: Hamming [7,4] vs repetition

Compare the Hamming [7, 4] code to the 3-bit repetition code on a BSC with p = 0.01. Both correct a single bit-flip error. Which is more efficient?

Step 1. State the two codes. The 3-bit repetition code encodes 1 logical bit as 3 physical bits, correcting any single-bit error. The Hamming [7, 4] code encodes 4 logical bits as 7 physical bits, correcting any single-bit error in the block.

Step 2. Compute the rates.

Repetition: rate R = 1/3 \approx 0.333.
Hamming [7,4]: rate R = 4/7 \approx 0.571.

The Hamming code is roughly 1.7\times more efficient: for the same number of physical bits sent, Hamming carries 1.7\times as many logical bits.

Step 3. Compute the failure probabilities. The repetition code fails when \geq 2 of 3 bits flip:

P_{\text{rep}}(\text{fail}) \;=\; 3p^2(1-p) + p^3 \;\approx\; 3p^2 \;=\; 3 \times 10^{-4}.

The Hamming [7,4] code corrects any single error in the block of 7 — so it fails when \geq 2 of 7 bits flip. That probability is

P_{\text{ham}}(\text{fail}) \;=\; \sum_{k=2}^{7} \binom{7}{k} p^k (1-p)^{7-k} \;\approx\; \binom{7}{2} p^2 \;=\; 21 p^2 \;=\; 2.1 \times 10^{-3}.

(This is the per-block failure; since a block carries 4 logical bits, the per-logical-bit failure is about 21 p^2 / 4 \approx 5 p^2 — still several times larger than repetition's 3p^2.)

Step 4. Read off the trade-off. Repetition has lower failure probability, but also lower rate. Hamming has higher rate but higher failure probability. Which is better depends on what you need: if bandwidth is expensive, Hamming wins; if latency and reliability dominate and bandwidth is cheap, repetition is simpler to implement.

Step 5. The more important point: neither is close to Shannon capacity. C(0.01) = 0.919. Both codes leave a large capacity gap. Modern LDPC and polar codes close this gap — LDPC with rate 0.9 can achieve failure probability \lesssim 10^{-5} at p = 0.01, outperforming both schemes at roughly the rate of Hamming.

Comparing the 3-bit repetition code, the Hamming $[7,4]$ code, and the Shannon capacity limit at per-bit flip probability $p = 0.01$. Both repetition and Hamming are textbook illustrations; production systems use codes (LDPC, polar, turbo) that approach Shannon's limit. Repetition's virtue is pedagogical clarity, not efficiency.

What this shows. Even in the classical world, repetition is far from optimal. It is the simplest code to understand, which is why it is the first one you meet and the first one we quantum-ise. Real classical systems use something like LDPC at rates close to capacity. The quantum analogues you will meet next — the bit-flip code, Shor's 9-qubit code, the surface code — likewise start simple and get progressively cleverer. But the first step, in both worlds, is repetition.

Why this matters. Every argument in the next three chapters uses the vocabulary of this one. "Encoding" means spreading a logical bit across n physical bits (or qubits). "Error" is a per-component fault. "Syndrome" is the classical read-out that tells you what error happened. "Decoding" is the rule that maps syndrome to correction. The classical repetition code puts every one of these in the simplest possible form. When you see the quantum bit-flip code replace copies with entanglement and direct reads with parity measurements, you will see clearly which pieces changed and which stayed the same.

Common confusions

"Classical repetition is the best code." No. Repetition has rate 1/n, and Shannon's capacity theorem says you can achieve any rate below 1 - H(p) \approx 1 for small p. Hamming codes, Reed-Solomon, LDPC, polar codes all achieve rates far closer to 1 at similar error rates. Repetition is the textbook introduction, not the production technology.
"n \to \infty repetition gives zero error on any channel." Only if p < 1/2. At p = 1/2 the channel is completely random and no code helps. Above p = 1/2, you can always relabel the output (0 \leftrightarrow 1) to make p' = 1 - p < 1/2, so the p > 1/2 regime is a labelling artefact. But at exactly p = 1/2, repetition (and every other code) fails.
"Classical repetition can be used on qubits too, just in the computational basis." Only if the qubit is already known to be in a computational-basis state. For a qubit in superposition \alpha|0\rangle + \beta|1\rangle, you cannot copy it (no-cloning), and you cannot measure it without collapsing the amplitudes. The literal classical strategy fails at both steps. The quantum bit-flip code is not "repetition on qubits"; it is a different, quantum-native code that uses entanglement and parity measurement.
"Majority vote is decoding." Majority vote is the decoder only for the repetition code with a symmetric channel. For other codes (Hamming, Reed-Solomon, LDPC), decoding involves solving linear equations over \mathbb F_2, Viterbi-style dynamic programming, or iterative belief propagation. The general decoder is whatever map from received word to transmitted word maximises the posterior probability — a quantity the code designer computes, not the receiver guesses.
"The BSC is the right model for real channels." Usually an approximation. Real channels have burst errors (errors cluster in time), non-binary alphabets (symbols, not bits), memory (this error depends on the previous one), and non-symmetric noise (P(0 \to 1) \neq P(1 \to 0)). Shannon's framework extends to all of these, but the BSC is the pedagogical starting point. Real codes are designed against realistic channel models measured from hardware.

Going deeper

If you came for the 3-bit repetition code, the 3p^2 - 2p^3 failure formula, the Shannon capacity curve 1 - H(p), and the three quantum obstacles, you have the core of this chapter. The rest covers Shannon's noisy coding theorem in a little more depth, the Hamming distance and the design of block codes, LDPC and polar codes as modern production systems, and the CSS construction that carries classical codes into the quantum domain.

Shannon's noisy coding theorem — statement and sketch

Shannon (1948) proved two bounds, together called the noisy channel coding theorem. For a memoryless channel with capacity C bits per channel use:

Achievability. For any rate R < C and any \epsilon > 0, there exists a block code of rate R and some large enough length n whose decoded error probability is below \epsilon.
Converse. For any rate R > C, no code of any length can achieve arbitrarily small decoded error. The error probability is lower-bounded by a quantity depending on R - C.

The proof of achievability is a random coding argument: consider all codes of rate R and length n, show that the average error probability over random codes is small, and conclude that at least one good code exists. This proves existence but does not construct the code — finding good explicit codes has been a seventy-year industry.

The converse uses Fano's inequality, which bounds how much information a noisy observation can convey about a discrete variable. The proof is short, elegant, and widely covered in any information-theory textbook (Cover and Thomas is the standard).

Hamming distance and block codes

A block code of length n and dimension k (written [n, k]) maps k-bit messages into n-bit codewords. The Hamming distance between two codewords is the number of bit positions where they differ. A code of minimum Hamming distance d can detect up to d - 1 errors and correct up to \lfloor (d-1)/2 \rfloor errors.

The 3-bit repetition code is [3, 1, 3] — length 3, dimension 1, minimum distance 3. The Hamming [7, 4, 3] code has length 7, dimension 4, minimum distance 3. Both correct \lfloor 2/2 \rfloor = 1 error. The Hamming code uses its block length more efficiently because the codewords are algebraically constructed (as the kernel of a specific 3 \times 7 parity-check matrix).

The Singleton bound says k \leq n - d + 1 for any code. Codes meeting this bound with equality are called MDS (maximum-distance separable); Reed-Solomon codes are the most famous examples.

LDPC, polar, and the path to capacity

Modern codes — the ones in your LTE modem and Wi-Fi chip — approach Shannon capacity within a fraction of a decibel. Two families dominate:

Low-density parity-check (LDPC) codes, invented by Gallager (1963) and rediscovered in the 1990s, are block codes defined by a sparse parity-check matrix. Iterative belief-propagation decoding approaches capacity at long block lengths; 5G NR uses LDPC for its data channels.

Polar codes, invented by Arikan (2009), achieve capacity exactly in the limit of long blocks using a recursive "channel polarisation" transform. 5G NR uses polar codes for control channels. They are newer, simpler to analyse, and have a cleaner theoretical story than LDPC.

Turbo codes, invented by Berrou, Glavieux, and Thitimajshima (1993), interleave two convolutional codes and decode iteratively. They were the first codes to come within 0.5 dB of capacity and dominated 3G/4G until LDPC and polar overtook them.

CSS construction — classical codes become quantum codes

The Calderbank–Shor–Steane (CSS) construction, introduced in 1996 by the eponymous trio, takes a pair of compatible classical linear codes and produces a quantum code that corrects both X and Z errors. The rough idea: one classical code corrects bit-flip errors in the computational basis; its dual, a different classical code, corrects phase-flip errors in the Hadamard-rotated basis. Glue them together with a specific compatibility condition (the CSS condition) and you get a genuine quantum code.

The Steane [[7, 1, 3]] code — seven physical qubits encoding one logical qubit, minimum distance 3 — is built from the Hamming [7, 4] code via CSS. It corrects any single-qubit error. The surface code, the leading fault-tolerant candidate today, is also a CSS construction — built from two classical "toric" codes.

This is the deep connection: every good quantum error-correcting code has classical codes inside it. Understanding classical codes is not optional for the quantum engineer — it is foundational. The chapters that follow build the simplest quantum codes directly; the CSS framework comes back in ch. 120 and beyond.

Indian context — information theory in India

Information theory has a respectable Indian tradition, if a smaller footprint than in the US and Europe. Raj Chandra Bose (1901–1987), at the Indian Statistical Institute (ISI), co-invented the Bose-Chaudhuri-Hocquenghem (BCH) codes in 1960 — a foundational family of algebraic error-correcting codes still widely taught. BCH codes are a generalisation of Hamming and a special case of Reed-Solomon. The ISI in Kolkata remains one of India's leading centres for information theory and coding.

Among contemporary institutions, IIT Bombay, IIT Kanpur, IIT Madras, and IISc Bangalore run active information-theory and coding-theory groups; research areas include LDPC code design, polar-code constructions, network coding, and the cross-over into quantum codes via the CSS construction. The Information Technology Research Academy (ITRA), funded in part by the ShaktiCorp Trust, has sponsored information-theory fellowships and workshops at Indian universities since 2012.

For classroom exposure, the NPTEL video course Information Theory and Coding (run from IIT Bombay) is freely available online — a fine place to deepen the material in this chapter.

Where this leads next

Why QEC is hard — the three quantum obstacles that make repetition fail.
No-cloning theorem — the theorem forbidding quantum copies.
3-qubit bit-flip code — the quantum analogue of classical repetition, using entanglement and parity.
3-qubit phase-flip code — the Hadamard-rotated version, handling Z errors.
Shor 9-qubit code — the first code to correct any single-qubit error.
Stabilizer formalism intro — the unifying framework for most quantum codes, and how it generalises parity-check matrices.

References

Claude E. Shannon, A Mathematical Theory of Communication (1948), Bell System Technical Journal — Wikipedia entry with links.
Wikipedia, Repetition code — overview, failure analysis, and variants.
Wikipedia, Hamming code — the [7, 4] code and its generalisations.
Wikipedia, Noisy-channel coding theorem — Shannon's theorem with references to Cover and Thomas.
John Preskill, Lecture Notes on Quantum Computation, Ch. 7 (quantum error correction, with classical repetition as prelude) — theory.caltech.edu/~preskill/ph229.
Wikipedia, Low-density parity-check code — modern codes that approach Shannon capacity.