There is a survey question every textbook uses: 60 students like cricket, 40 like football, 20 like both — how many like at least one sport? A student's first instinct is 60 + 40 = 100. That answer is wrong, even though almost every instinct in arithmetic supports it. The correct answer is 80, and understanding why settles the most common misconception in the whole sets chapter.

The rule is |A \cup B| = |A| + |B| - |A \cap B|, not just |A| + |B|. The subtraction is there to fix a double-count that the addition introduces. Drop it and you overstate the union by the size of the overlap, every time.

Where the double-count comes from

When you write |A|, you count every element in A once — including the elements that also live in B. When you then add |B|, you count every element in B once — including those same shared elements. The overlap A \cap B gets counted in both totals.

So the sum |A| + |B| is not counting |A \cup B|. It is counting |A \cup B| plus an extra copy of |A \cap B|. Subtracting |A \cap B| cancels the extra copy.

|A \cup B| = |A| + |B| - |A \cap B|

This is the inclusion-exclusion principle for two sets, and the minus sign is the whole story.

Why the overlap gets counted twice: any element in the overlap is, by definition, a member of both A and B. The total |A| has a "+1" for each such element, and |B| has another "+1." Without correction, every overlap element contributes 2 to |A| + |B|, even though it should contribute 1 to |A \cup B|.

The cricket-football example, done correctly

|A| = 60 (cricket), |B| = 40 (football), |A \cap B| = 20 (both).

|A \cup B| = 60 + 40 - 20 = 80.

The 20 students who like both sports were counted once among the 60 cricket fans and again among the 40 football fans. That is why the raw sum 100 overshoots by exactly 20 — the size of the double-counted overlap.

Region-by-region sanity check:

When the wrong formula happens to work

The equation |A \cup B| = |A| + |B| is not wrong everywhere. It holds exactly when A \cap B = \varnothing — that is, when the two sets are disjoint. In that case the overlap is empty, |A \cap B| = 0, and the correction term vanishes.

A \cap B = \varnothing \implies |A \cup B| = |A| + |B|.

Two sets with no common element are called disjoint. For disjoint sets, the misconception is the correct formula. For every other pair, it overstates the union.

The trap is that students sometimes do not realise their two sets overlap. The overlap might be implicit in the problem — students who play cricket also play football, numbers that are both even and divisible by 3, days that are both weekends and public holidays. Whenever the sets are defined by separate filters that can simultaneously hold, there is usually an overlap to subtract.

A Venn picture of the double count

Venn diagram showing why the overlap is double-countedA Venn diagram of two overlapping circles A and B inside a rectangle. The left crescent is labelled counted once by A. The right crescent is labelled counted once by B. The central overlap is labelled counted twice then subtracted once. The notes explain how the inclusion-exclusion formula corrects the double count. U A B counted once (in |A|) counted twice (in |A| and |B|) counted once (in |B|) subtract |A ∩ B| to fix the double count
The central lens is counted *twice* when you compute $|A| + |B|$ — once by $|A|$ and once by $|B|$. Every element of the overlap contributes $2$ to the sum but should contribute $1$ to the union. Subtracting $|A \cap B|$ removes the extra copy exactly, restoring the correct count.

The crescents are each counted once; the lens is counted twice. That is the entire diagnosis. Subtract one copy of the lens and you are left with the true union.

A smaller numerical walk-through

Let A = \{1, 2, 3, 4, 5\} and B = \{4, 5, 6, 7, 8\}.

The gap between the naïve sum and the true union is always equal to the overlap count. It is not a coincidence; it is the formula at work.

Why: each element in the overlap contributes an "extra" +1 to the sum |A|+|B|. That extra contribution accumulates to exactly |A \cap B|, which is the gap between the raw sum and the true union size.

A JEE-style survey problem

In a class of $100$ students, $70$ like maths, $60$ like physics, and $30$ like both. Using the wrong formula first, then the right one.

Wrong first attempt. |M \cup P| = 70 + 60 = 130. But there are only 100 students — the answer cannot exceed the class size. Something is off.

Where the error shows up. The 30 students who like both maths and physics are counted once in the 70 and once in the 60. Total double-counted: 30. The naïve sum 130 is exactly 30 too high.

Right formula. Apply inclusion-exclusion.

|M \cup P| = 70 + 60 - 30 = 100

Why: add the two single counts, then subtract the overlap once to undo the double-count. 70 + 60 = 130 includes the overlap twice; minus 30 corrects it to 100.

Consequence. Every student in the class likes at least one of maths or physics — no one likes neither. The number liking neither is 100 - 100 = 0.

Result. |M \cup P| = 100. The naïve sum was 130; subtracting the overlap of 30 brings it down to the correct figure.

Three symptoms that you are applying the wrong formula

Watch for any of these in your working:

  1. Your union size exceeds the universe. If you compute |A \cup B| > |U|, you have almost certainly forgotten to subtract the overlap. The union is a subset of U and cannot contain more elements than U does.
  2. The "neither" count comes out negative. |(A \cup B)'| = |U| - |A \cup B|. If the naïve union size is too big, the "neither" number drops below 0 — an impossibility flagging the double-count.
  3. The four Venn regions don't add to |U|. After filling in "only A," "only B," "both," and "neither," they should sum to the universe size. If they overshoot, you are adding the overlap to too many regions.

Any of these symptoms is the same bug wearing a different shirt.

The generalisation

For three sets, the formula becomes

|A \cup B \cup C| = |A| + |B| + |C| - |A \cap B| - |A \cap C| - |B \cap C| + |A \cap B \cap C|.

The pattern alternates: add the singles, subtract the pairs, add the triple. The reason is exactly the same — the pairwise overlaps got counted twice when summing singles, the triple overlap got counted three times when summing singles and then subtracted three times when correcting pairs, so it needs to be added back once. Inclusion-exclusion is the principled way to keep track of these compensations.

For n sets, the formula has 2^n - 1 terms and alternates in sign. The two-set version is just the simplest case.

The one-line mnemonic

Say it out loud once, and the formula lodges:

Add the singles, subtract the double-count.

That is |A \cup B| = |A| + |B| - |A \cap B| in eight words. You will not forget the minus sign again.

Related: Set Operations · Inclusion-Exclusion Principle · Survey-Problem Venn · Three-Set Venn Diagram