In short
Two events A and B are independent if knowing that one has happened does not change the probability of the other. Equivalently, they satisfy the product rule P(A \cap B) = P(A) \cdot P(B). Independence is not the same as mutually exclusive — two mutually exclusive events with positive probability are in fact the most dependent possible: knowing one happened tells you the other definitely did not.
Toss a fair coin, then toss it again. Does the result of the first toss affect the result of the second? Physically speaking: no. The coin has no memory, the second toss is entirely determined by the flick of your thumb and the air it passes through, and what happened the previous time is invisible to the physics. If the first toss was heads, the probability of the second being heads is still \dfrac{1}{2}. If the first toss was tails, same answer: still \dfrac{1}{2}. The first result tells you absolutely nothing about the second.
Now draw a card from a shuffled deck and, without replacing it, draw a second card. Does the first draw affect the second? Yes, obviously. If the first card was the ace of spades, the second draw is now from a deck of 51 cards that no longer contains the ace of spades — the probability of the second card being that same ace is zero, not \dfrac{1}{52}. The first draw changed what you know about the second draw.
These two scenarios illustrate the core distinction of this article. In the coin toss case, the two events are independent — one tells you nothing about the other. In the card case, they are dependent — one changes the probability of the other. Independence is the mathematical formalisation of "knowing this doesn't help me predict that," and it is one of the most powerful simplifying assumptions in probability. Every coin toss, every lottery draw, every newly observed error in a machine learning dataset is assumed to be independent of the others — and the moment you accept independence, the algebra of probabilities collapses from an intimidating web of conditionals into clean multiplication.
Building up from conditional probability
You already know conditional probability — the probability of A given that B has occurred, written P(A \mid B). The most natural way to define independence is to ask: "for which pairs (A, B) does the conditional probability equal the unconditional probability?"
That is: for which pairs does
When this holds, knowing B has occurred leaves P(A) unchanged. Your probability estimate for A is exactly the same whether or not you have heard anything about B. This is the precise sense in which B tells you nothing about A.
Using the definition P(A \mid B) = \dfrac{P(A \cap B)}{P(B)}, the condition P(A \mid B) = P(A) rearranges to
So "knowing B doesn't affect A" is equivalent to the algebraic condition P(A \cap B) = P(A) \cdot P(B). This second form is called the product rule, and it is the formal definition of independence — because it is symmetric in A and B, and because it handles the edge case P(B) = 0 (where the conditional form is undefined) without complication.
The definition
Independent events
Two events A and B in a sample space S are independent if and only if
When both events have positive probability, this is equivalent to P(A \mid B) = P(A) and also to P(B \mid A) = P(B) — "neither event affects the probability of the other."
Three things to notice.
First, the definition is symmetric. If A is independent of B, then B is independent of A — there is no "causal" direction here, just an algebraic condition on the joint and marginal probabilities.
Second, independence is a property of the probability function, not of the events themselves. Two events can be independent under one probability model and dependent under another. "Tossing heads" and "rain tomorrow" are independent if the coin is in Mumbai and the weather has nothing to do with it, but if the coin is somehow rigged to the barometer, they are not. Whether the product rule holds depends entirely on the numbers you feed in.
Third, the definition is the conclusion, not the method. In problems, you are rarely given the joint probability and asked to check independence. Instead, you are told (or can reasonably assume) that two events are independent, and you use the product rule to compute P(A \cap B). The direction of use is usually: "given that these two events are physically unrelated, compute the probability that both happen."
A first example, to anchor the formula
Roll a fair die twice. Let A = "the first roll is a 5" and B = "the second roll is an even number." Are these independent? What is P(A \cap B)?
The sample space is the set of 36 ordered pairs (i, j) with i, j \in \{1, 2, 3, 4, 5, 6\}, all equally likely. Count:
- P(A): the first die is 5 — six outcomes (5, 1), (5, 2), \ldots, (5, 6) out of 36, so P(A) = \dfrac{6}{36} = \dfrac{1}{6}.
- P(B): the second die is even — 18 outcomes out of 36, so P(B) = \dfrac{18}{36} = \dfrac{1}{2}.
- P(A \cap B): first die is 5 and second die is even — three outcomes (5, 2), (5, 4), (5, 6), so P(A \cap B) = \dfrac{3}{36} = \dfrac{1}{12}.
Check the product rule: P(A) \cdot P(B) = \dfrac{1}{6} \cdot \dfrac{1}{2} = \dfrac{1}{12}. This equals P(A \cap B), so the two events are independent. The product rule didn't just check out — it was built in to the symmetry of rolling two dice separately, because each die is rolled in physical isolation from the other.
Staring at the picture, you can see what independence looks like: event A is a row, event B is a union of columns, and their intersection is precisely the rectangle where the row meets the columns. The fraction of the whole grid that the intersection covers is the product of the fractions covered by the row and by the columns separately. Independence is rectangular structure.
Multiple independent events
What about three or more events? The natural guess is that independence should mean "all of them hold simultaneously with probability equal to the product of the individual probabilities." And that is part of it, but the full definition is slightly stronger:
Mutual independence
The events A_1, A_2, \ldots, A_n are mutually independent (or simply "independent") if, for every subset \{A_{i_1}, A_{i_2}, \ldots, A_{i_k}\} of these events,
The product rule must hold for every sub-collection, not just the full collection.
For three events A, B, C, this means all of the following must hold:
- P(A \cap B) = P(A) P(B)
- P(A \cap C) = P(A) P(C)
- P(B \cap C) = P(B) P(C)
- P(A \cap B \cap C) = P(A) P(B) P(C)
Any three of these together do not imply the fourth. This is sometimes surprising: you can construct examples where every pair of events is independent (pairwise independence) but the triple is not mutually independent. Such examples are contrived — in every practical problem, the physical situation makes mutual independence obvious — but the formal definition has to rule them out.
For independent tosses of a coin, independent rolls of a die, independent draws with replacement from a deck, mutual independence is automatic. So in most problems you can just use:
This is the workhorse formula. It turns long sequences of independent trials into one multiplication.
Example 1: a string of coin tosses
Example 1: five heads in a row
A fair coin is tossed five times. Find the probability that every single toss lands heads.
Step 1. Name the events. Let H_i = "the i-th toss is heads," for i = 1, 2, 3, 4, 5. Each has probability P(H_i) = \dfrac{1}{2}.
Why: write down one event per toss so you can treat the tosses one at a time.
Step 2. Recognise the independence. The five tosses are physically separate — the outcome of any one toss cannot affect any other. So the events H_1, H_2, H_3, H_4, H_5 are mutually independent.
Why: independence is a modelling assumption. You are assuming that the coin has no memory, that the tosses do not interact. This is the standard model for independent Bernoulli trials.
Step 3. Apply the product rule.
Each factor is \dfrac{1}{2}, so
Why: five independent events with equal probability turn a complicated joint into one power of the single-event probability. This is the whole reason independence is so useful — it collapses products of many different numbers into one simple expression.
Step 4. Sanity check by counting. There are 2^5 = 32 equally likely outcomes of five tosses. Exactly one of them is HHHHH, so by classical probability P(\text{all heads}) = \dfrac{1}{32}. Matches.
Result: The probability of five heads in a row is \dfrac{1}{32} \approx 0.031 — about 3\%.
Example 2: independent components of a machine
Example 2: a two-component system
A machine has two components, A and B, that operate independently. Component A works with probability 0.9, and component B works with probability 0.8. The machine works if both components work. Find:
(a) the probability that the machine works, (b) the probability that the machine fails, and (c) the probability that at least one component works.
Step 1. Name the events. Let A = "component A works," and B = "component B works." You are told P(A) = 0.9, P(B) = 0.8, and that A and B are independent.
Step 2. Probability the machine works = P(A \cap B). By the product rule (independence):
Why: the machine works only if both components work — that is an intersection. Independence lets you multiply.
Step 3. Probability the machine fails = 1 - P(A \cap B) = 1 - 0.72 = 0.28.
Why: "machine fails" is the complement of "machine works," and the complement rule always applies: probabilities of an event and its complement sum to one.
Step 4. Probability at least one component works = P(A \cup B). Use the addition theorem:
Alternatively, compute via the complement: "at least one works" is the opposite of "both fail," and "both fail" has probability P(A^c) \cdot P(B^c) = 0.1 \cdot 0.2 = 0.02 (because the failure events are also independent — if A and B are independent, so are A^c and B^c). So P(A \cup B) = 1 - 0.02 = 0.98. Same answer, different route.
Result: (a) 0.72, (b) 0.28, (c) 0.98.
Independent versus mutually exclusive — the single most common confusion
Students regularly confuse these two. They are opposite kinds of relationships, and keeping them straight is essential.
- Mutually exclusive means A \cap B = \emptyset, so P(A \cap B) = 0. The two events cannot happen at the same time.
- Independent means P(A \cap B) = P(A) \cdot P(B). The two events do not influence each other's probabilities.
If two events are mutually exclusive and both have positive probability, they are automatically dependent. Why? Because P(A \cap B) = 0, but P(A) \cdot P(B) > 0, so the product rule fails. Worse, if you learn that A has occurred, you know that B cannot have occurred — that is a huge amount of information, the exact opposite of independence.
An example that pins this down: rolling a die, let A = "result is 6" and B = "result is 1." Then A \cap B = \emptyset (the die cannot show both 6 and 1 on the same roll), so the events are mutually exclusive. But they are not independent: P(A) \cdot P(B) = \dfrac{1}{6} \cdot \dfrac{1}{6} = \dfrac{1}{36}, while P(A \cap B) = 0. These are different numbers, so the product rule fails. Alternatively: if you learn that the die showed 6, then you know the event "die showed 1" definitely did not occur. The conditional probability dropped from \dfrac{1}{6} to 0. That is a big change, and a hallmark of dependence.
So mutually exclusive means information-revealing: learning one tells you a lot about the other. Independent means information-irrelevant: learning one tells you nothing about the other. They are near-opposites.
The only case where mutually exclusive events are independent is the trivial case where at least one of them has probability zero. This is a degenerate case and basically never arises in practice.
Common confusions
A few more things students reliably get wrong about independence.
-
"If A and B are independent, then A and B^c are not." Wrong. If A is independent of B, then A is automatically independent of B^c, of A^c and B, and of A^c and B^c. The proof is a line: P(A \cap B^c) = P(A) - P(A \cap B) = P(A) - P(A)P(B) = P(A)(1 - P(B)) = P(A) \cdot P(B^c).
-
"Independence is transitive." That is, "if A and B are independent and B and C are independent, then A and C are independent." This is false in general. Independence does not chain the way you might expect. Building a chain of independent events requires mutual independence, not just pairwise.
-
"Independence is a property you can see by looking." It is not. Two events can look obviously unrelated and be dependent, or they can look related and be (coincidentally) independent. The only way to verify independence is to check the product rule on the numbers.
-
"Because the coin has no memory, the next toss is 'due' to be heads after a streak of tails." This is the gambler's fallacy, and it is exactly the opposite of what independence says. The coin has no memory in the mathematical sense — the next toss is \dfrac{1}{2} heads regardless of the past. There is no "due" — the past does not nudge the future toward a correction. A string of tails does not make heads more likely next time, because the coin has no way of tracking the string.
Going deeper
If you only need to use the product rule for independent trials, you have it. The rest of this section is about how independence interacts with conditional probability in less obvious ways.
Conditional independence
Two events can be dependent in the original sample space but become independent once you condition on a third event. This is called conditional independence and it is at the heart of every modern probabilistic model — Bayesian networks, Markov chains, hidden Markov models, and so on.
The formal definition: events A and B are conditionally independent given C if
A concrete example: take the two-dice experiment. Let A = "first die is 6," B = "second die is 6," and C = "sum of dice is 12." Unconditionally, A and B are independent. But conditionally on C, they are not: learning that the sum is 12 forces both dice to be 6, so knowing the first is 6 tells you the second is also 6 with certainty. Conditional dependence is real, and it cuts against your intuition about independence.
The reverse also happens. Two events can be dependent unconditionally but conditionally independent. "A student knows calculus" and "a student knows probability" are correlated (smart students know both), but once you condition on IQ, the two might become independent — IQ explains the correlation, and after controlling for it, no residual dependence remains. This is how causal reasoning in statistics works: you look for a third variable that breaks the dependence.
When to trust the independence assumption
In real problems, independence is almost always an assumption, not a derived fact. You are asked to compute the probability of all five coin tosses being heads, and you are told (or silently assumed) that the tosses are independent. But is this really true? With a real physical coin, tossed by a real human, the answer is "almost, but not quite." There is some microscopic correlation — the way you grip the coin, the temperature of your thumb, the air currents. For any practical purpose it doesn't matter, and the independence assumption is a useful idealisation.
The art of applied probability is knowing when the independence assumption is safe and when it is dangerous. Insurance pricing assumes the claims on different policyholders are independent — until a flood hits an entire city at once. Financial models assume daily stock returns are independent — until a crash reveals massive correlation. Medical trials assume patients respond independently to treatment — until the drug has a side effect that affects everyone similarly. In each case, the model works beautifully until the independence assumption breaks, and then it fails catastrophically. The lesson is that "independent" is a modelling choice, and you are on the hook for checking whether it is reasonable in the real situation.
Where this leads next
You now have the formal definition of independence, the product rule that follows, and an understanding of how independence differs from mutual exclusion. The next articles use this.
- Bayes' Theorem — the companion to conditional probability, which uses independence and product rules on sequences of evidence.
- Conditional Probability — where independence came from (the condition P(A \mid B) = P(A)).
- Addition Theorem — the formula for P(A \cup B), which pairs with the product rule to handle any finite event calculation.
- Binomial Distribution — the probability distribution for the number of successes in n independent trials, built directly on top of the product rule for independent events.