In short
The addition theorem says P(A \cup B) = P(A) + P(B) - P(A \cap B). The subtraction corrects for the overlap being counted twice. For three events, a second correction layer is needed. For n events, the pattern generalises into the inclusion-exclusion principle — add singles, subtract pairs, add triples, subtract quadruples, and so on.
A standard deck has 52 cards. You draw one at random. What is the probability that the card is a heart or a king?
Your first instinct might be: there are 13 hearts and 4 kings, so P = 17/52. But that is wrong. The king of hearts is both a heart and a king, and you just counted it twice — once among the 13 hearts and once among the 4 kings.
The correct count: 13 + 4 - 1 = 16 favourable cards (subtracting the one card that was double-counted), so P = 16/52 = 4/13.
That subtraction — removing the overlap to fix the double-count — is the entire content of the addition theorem. The formula works for every probability problem where you ask "what is the chance that this or that happens?" and the two events might overlap.
The formula and its proof
Here is the theorem, stated precisely.
Addition Theorem (two events)
For any two events A and B,
The proof works by splitting the union into disjoint pieces, then reassembling.
Proof. The union A \cup B can be split into three mutually exclusive parts:
These three sets are pairwise disjoint — no outcome belongs to more than one of them. By Axiom 3 (countable additivity):
Now, A itself is the disjoint union A = (A \cap B') \cup (A \cap B), so:
Similarly, B = (A' \cap B) \cup (A \cap B), so:
Substitute (2) and (3) into (1):
That's the proof — clean and complete. The subtraction -P(A \cap B) at the end is doing exactly one job: it removes the double-count of the overlap.
The special case: mutually exclusive events
When A and B are mutually exclusive (A \cap B = \varnothing), the overlap is empty, so P(A \cap B) = 0. The formula simplifies to:
This is just Axiom 3 for two events. The addition theorem generalises the axiom to events that are not mutually exclusive.
Three events: a second layer of correction
What if you have three events — A, B, and C — and you want P(A \cup B \cup C)?
The same double-counting problem occurs, but now it has more layers.
Addition Theorem (three events)
For any three events A, B, and C,
Proof. Write A \cup B \cup C = (A \cup B) \cup C and apply the two-event formula with the events D = A \cup B and C:
Now expand each piece.
For P(D) = P(A \cup B), use the two-event formula:
For P(D \cap C) = P((A \cup B) \cap C), distribute the intersection over the union:
Apply the two-event formula again:
Substitute (4) and (5) back:
Notice the pattern: add the singles, subtract the pairs, add back the triple. Each layer corrects for the overcounting introduced by the previous layer.
Why the triple overlap gets added
Here is a careful count of how many times the central region A \cap B \cap C appears in each step:
| Step | Contribution | Count of A \cap B \cap C |
|---|---|---|
| +P(A) | included in A | +1 |
| +P(B) | included in B | +1 |
| +P(C) | included in C | +1 |
| -P(A \cap B) | included in A \cap B | -1 |
| -P(A \cap C) | included in A \cap C | -1 |
| -P(B \cap C) | included in B \cap C | -1 |
| Running total | 0 |
After adding singles and subtracting pairs, the centre has been counted 3 - 3 = 0 times. It has been subtracted away entirely. The final +P(A \cap B \cap C) puts it back exactly once.
The general principle: inclusion-exclusion
The pattern continues. For n events A_1, A_2, \ldots, A_n:
Inclusion-Exclusion Principle
The signs alternate: add singles, subtract pairs, add triples, subtract quadruples, and so on. At each level, you are correcting for the overcounting (or undercounting) introduced by the previous level.
For n = 2: one positive term, one negative term. For n = 3: three positive, three negative, one positive. For n = 4: four positive, six negative, four positive, one negative — the coefficients follow the binomial pattern \binom{n}{1}, \binom{n}{2}, \binom{n}{3}, \ldots
The proof is by induction on n. The base case n = 2 is the two-event addition theorem, already proved. The inductive step uses the same "group and apply the two-event formula" strategy used in the three-event proof — writing A_1 \cup \cdots \cup A_n = (A_1 \cup \cdots \cup A_{n-1}) \cup A_n, expanding, and distributing intersections over unions.
Seeing it with numbers
Example 1: Hearts or kings from a deck
A standard 52-card deck. One card is drawn. Let A = "the card is a heart" and B = "the card is a king." Find P(A \cup B).
Step 1. Find P(A). There are 13 hearts in a 52-card deck:
Why: one of the four suits is hearts, so exactly a quarter of the deck.
Step 2. Find P(B). There are 4 kings:
Why: one king per suit, four suits.
Step 3. Find P(A \cap B). The overlap is the king of hearts — one card:
Why: there is exactly one card that is both a heart and a king.
Step 4. Apply the addition theorem:
Why: without the subtraction, you would get 17/52, which overcounts by one — the king of hearts.
Result: P(\text{heart or king}) = \dfrac{4}{13} \approx 0.308.
The diagram makes the double-counting visible: the single card sitting in the overlap gets pulled into both circles. The subtraction in the formula is just the diagram talking.
Example 2: Three overlapping events with a student survey
In a class of 60 students: 35 play cricket (A), 25 play football (B), 20 play basketball (C). Also, 12 play both cricket and football, 8 play both cricket and basketball, 10 play both football and basketball, and 5 play all three. Find the probability that a randomly chosen student plays at least one sport.
Step 1. Write down the given information as probabilities (each divided by 60):
Why: each count divided by 60 (the total) gives the probability under equally likely selection.
Step 2. Write the pairwise and triple overlaps:
Why: the problem gives these directly. These are the correction terms in the three-event formula.
Step 3. Apply the three-event addition theorem:
Why: add singles (80), subtract pairs (30), add triple (5). Each layer corrects the over- or under-counting of the previous layer.
Step 4. Interpret. Out of 60 students, 55 play at least one sport. Only 5 play none.
Result: P(\text{at least one sport}) = \dfrac{11}{12} \approx 0.917.
Notice that you did not need to compute the individual region counts (the numbers inside each zone of the Venn diagram) to get the final answer. The inclusion-exclusion formula gave you 55 directly. The region counts are useful for more detailed questions ("how many play only cricket?"), but for "at least one sport" the formula does all the work.
Common confusions
-
"P(A \cup B) = P(A) + P(B) always." Only when A \cap B = \varnothing. If the events can both happen at once, you need the full formula with the subtraction. This is the single most common probability error in exams.
-
"Inclusion-exclusion is only for probability." The principle works for counting (number of elements in a union of sets), for probability, and even for sums. The algebraic structure is identical: the signs alternate and the terms involve intersections of increasing size.
-
"For three events, you subtract the triple overlap." You add it. After adding singles and subtracting pairs, the triple overlap has been counted 3 - 3 = 0 times — removed entirely. The +P(A \cap B \cap C) puts it back once.
-
"If P(A \cup B) = 1, then A and B are exhaustive." True — exhaustive means A \cup B = S, which forces P(A \cup B) = P(S) = 1. But be careful with the converse: P(A \cup B) = 1 does not mean A and B are mutually exclusive. They could overlap and still cover all of S.
Going deeper
If you came here to learn the addition theorem for two and three events, you have it — you can stop here. The rest is for readers who want to see inclusion-exclusion proved in general and applied to a classic counting problem.
The Bonferroni inequalities
Inclusion-exclusion gives the exact value of P(A_1 \cup \cdots \cup A_n). But the formula has 2^n - 1 terms, which grows fast. The Bonferroni inequalities say that if you stop the alternating sum at any stage, you get a bound:
- Stop after adding singles: you get an upper bound.
- Stop after subtracting pairs: you get a lower bound.
- Stop after adding triples: you get an upper bound again.
In general, stopping after an odd-indexed layer gives an upper bound, and stopping after an even-indexed layer gives a lower bound. These bounds are useful when computing all 2^n - 1 terms is impractical but you still want an estimate.
A classic application: the matching problem (derangements)
How many permutations of \{1, 2, \ldots, n\} have no fixed point — that is, \sigma(i) \neq i for every i? These are called derangements, and inclusion-exclusion gives the answer elegantly.
Let A_i be the event that position i is a fixed point. Then the number of permutations with at least one fixed point is |A_1 \cup \cdots \cup A_n|, computed by inclusion-exclusion:
Each term \binom{n}{k}(n-k)! counts the permutations that fix a specific set of k positions (the remaining n - k elements can go anywhere). The number of derangements is:
As n grows, D_n/n! \to 1/e \approx 0.3679. About 36.8\% of all permutations are derangements — a fact that surprises most people and is a direct consequence of inclusion-exclusion.
Where this leads next
The addition theorem is the first major result built on the axioms. The next articles extend the framework in new directions.
- Conditional Probability — what happens when you restrict attention to a smaller sample space because you learned that some event has occurred.
- Independent Events — when the addition theorem simplifies because knowing A tells you nothing about B.
- Inclusion-Exclusion Principle — the full combinatorial version, with applications to counting problems.
- Bayes' Theorem — how to reverse a conditional probability, building on both the addition and multiplication theorems.
- Random Variables - Discrete — attaching numbers to outcomes, the bridge from probability to statistics.