Classical Probability

In short

When a random experiment has n equally likely outcomes and an event A contains m of them, the classical probability of A is P(A) = \dfrac{m}{n} = \dfrac{n(A)}{n(S)}. Odds in favour of A are m : (n - m), and odds against are (n - m) : m. The formula is deceptively simple — the real work is always in counting n(A) and n(S) correctly.

You are playing a game. Someone rolls a fair six-sided die and you win if the result is even. What fraction of the time do you win, over many games? The die has six possible outcomes, three of them are even (the numbers 2, 4, 6), and every outcome is equally likely. So you win three times out of six — one half of the time. That fraction, \dfrac{3}{6} = \dfrac{1}{2}, is the probability of winning.

Roll a single die and ask for the probability of getting at least a 5. Two favourable outcomes (5 and 6) out of six equally likely outcomes, so the answer is \dfrac{2}{6} = \dfrac{1}{3}. Draw one card from a shuffled deck and ask for the probability of drawing a heart. Thirteen hearts out of fifty-two cards, so \dfrac{13}{52} = \dfrac{1}{4}.

Every one of those answers came from the same formula. Count the outcomes that make your event true, count all the outcomes, divide. That is classical probability. It is the version of probability that gamblers in the 1600s worked out before anyone had written down the axioms, and it is still the cleanest way to get your hands on probability problems — as long as the outcomes really are equally likely.

Equally likely outcomes — the secret ingredient

Everything in this article rests on one condition: the outcomes of your sample space must be equally likely. That phrase means that there is no reason to expect any outcome to occur more often than any other. A fair coin: H and T are equally likely. A fair die: the six faces are equally likely. A well-shuffled deck: each of the 52 cards is equally likely to be the top card.

What breaks it? Take a loaded die, drilled to favour the side opposite 6 so that 6 comes up half the time. The six outcomes are no longer equally likely, and the formula P(\text{six}) = \dfrac{1}{6} is wrong. The classical formula doesn't apply. You need a different framework, which we will meet in Axiomatic Approach.

More subtly: equally likely is a property of the sample space, not of the events. Rolling two dice and recording the sum gives outcomes \{2, 3, 4, \ldots, 12\}, eleven possibilities. Are these outcomes equally likely? No. There is exactly one way to get a 2 (roll a pair of ones) but six ways to get a 7. If you forget this and compute P(\text{sum is }7) = \dfrac{1}{11}, you will be badly wrong. The correct answer uses the sample space of ordered pairs (i, j), where all 36 outcomes are equally likely:

P(\text{sum is }7) = \frac{\#\text{pairs with sum }7}{36} = \frac{6}{36} = \frac{1}{6}.

Six is not eleven. Choosing a sample space that is actually equally likely is the first real skill of classical probability. The rule of thumb: pick the sample space at the finest level of detail, the one where each outcome is genuinely uniform. Then build events on top of that.

The formula

Classical definition of probability

Consider a random experiment whose sample space S has n equally likely outcomes. Let A be an event containing m of those outcomes. The probability of A is

P(A) \;=\; \frac{\text{number of favourable outcomes}}{\text{total number of outcomes}} \;=\; \frac{n(A)}{n(S)} \;=\; \frac{m}{n}.

This is called the classical or a priori definition of probability, and it assumes the outcomes of S are finite in number and equally likely.

Some immediate consequences of the formula:

P(A) is a ratio of counts, so it is always a real number.
Since 0 \le m \le n, you get 0 \le P(A) \le 1. A probability is never negative, and never greater than one.
If A = \emptyset (the impossible event), then m = 0 and P(A) = 0.
If A = S (the certain event), then m = n and P(A) = 1.
If A^c is the complement of A, then A has m outcomes and A^c has n - m. So P(A) + P(A^c) = \dfrac{m}{n} + \dfrac{n-m}{n} = 1, giving the very useful rule P(A^c) = 1 - P(A).

The last rule — the complement rule — is the single most useful shortcut in probability. Sometimes counting A is a nightmare but counting A^c is easy. In that case, compute P(A^c) and subtract from 1.

Every classical probability is a number in $[0, 1]$. Zero means impossible, one means certain, and everything in between is a ratio of how many outcomes make your event true to how many outcomes exist in total.

Example 1: the three-coin toss

Example 1: at least two heads in three coin tosses

Three fair coins are tossed simultaneously. Find the probability that at least two of them land heads.

Step 1. List the sample space. Each coin has two outcomes, so three coins give 2^3 = 8 equally likely outcomes:

S = \{HHH, HHT, HTH, HTT, THH, THT, TTH, TTT\}

n(S) = 8.

Why: "equally likely" is only true here if you list outcomes at the level of individual coins. If you tried to list outcomes as "zero heads, one head, two heads, three heads", those four outcomes would not be equally likely and the formula would give the wrong answer.

Step 2. Let A = "at least two heads". List the outcomes in A:

A = \{HHH, HHT, HTH, THH\}

n(A) = 4.

Why: "at least two" means two or three, so you keep any string with two or three Hs. There is one string with three Hs and three strings with exactly two.

Step 3. Apply the formula.

P(A) = \frac{n(A)}{n(S)} = \frac{4}{8} = \frac{1}{2}.

Why: direct substitution. The probability is just the count of favourable strings over the count of all strings.

Step 4. Sanity check via the complement. The complement A^c is "fewer than two heads" = \{HTT, THT, TTH, TTT\}, also four outcomes. So P(A^c) = \dfrac{4}{8} = \dfrac{1}{2}, and P(A) + P(A^c) = 1. The two numbers sum to one, as they must.

Result: The probability of getting at least two heads in three tosses is \dfrac{1}{2}.

The eight equally likely outcomes of three coin tosses. The four red ones form event $A$ (*at least two heads*). $P(A) = 4/8 = 1/2$.

Example 2: drawing from a deck

Example 2: probability of drawing a king or a diamond

A single card is drawn at random from a well-shuffled standard deck of 52 cards. Find the probability that the card is either a king or a diamond.

Step 1. Identify n(S). The deck has 52 cards, each equally likely to be the one drawn.

n(S) = 52.

Why: "well-shuffled" is the phrase that enforces equal likelihood. If the deck were stacked, the formula wouldn't apply.

Step 2. Let A = "card is a king" and B = "card is a diamond". Count each.

n(A) = 4 (four kings, one per suit). n(B) = 13 (thirteen diamonds, one of each rank).

Why: each of these counts is read directly off the structure of the deck. Make sure you have not double-counted the king of diamonds yet — that comes next.

Step 3. Count A \cup B using inclusion-exclusion. The event "king or diamond" is A \cup B, and

n(A \cup B) = n(A) + n(B) - n(A \cap B).

The intersection A \cap B is the single card king of diamonds, so n(A \cap B) = 1. Therefore

n(A \cup B) = 4 + 13 - 1 = 16.

Why: if you naively added 4 + 13 = 17, you would count the king of diamonds once in A and again in B. Subtracting n(A \cap B) fixes the double count.

Step 4. Apply the formula.

P(A \cup B) = \frac{n(A \cup B)}{n(S)} = \frac{16}{52} = \frac{4}{13}.

Result: The probability that the drawn card is a king or a diamond is \dfrac{4}{13} \approx 0.308.

The deck splits into four regions: $3$ non-diamond kings, the $1$ overlap card $K\diamondsuit$, $12$ non-king diamonds, and $36$ cards that are neither. The union has $16$ cards, giving $P(A \cup B) = 16/52 = 4/13$.

Odds in favour and odds against

The word odds is everywhere in everyday language — "the odds are three to one," "I'd give him two-to-one odds" — and it is a slightly different way of expressing the same ratio that probability does.

If an event A has m favourable outcomes out of n equally likely total outcomes, then n - m outcomes are unfavourable. The odds in favour of A is the ratio

\text{odds in favour of }A \;=\; m : (n - m) \;=\; \frac{\text{favourable}}{\text{unfavourable}}.

The odds against A is the reverse:

\text{odds against }A \;=\; (n - m) : m \;=\; \frac{\text{unfavourable}}{\text{favourable}}.

Odds in favour of rolling a six on a single die: 1 : 5 (one favourable outcome, five unfavourable). Odds against: 5 : 1. Spoken: "five to one against."

The conversions between odds and probabilities are direct. Given odds in favour a : b:

P(A) = \frac{a}{a + b}, \qquad P(A^c) = \frac{b}{a + b}.

And given a probability P(A) = p, the odds in favour are p : (1 - p), which after clearing fractions becomes p : (1 - p) — or, more usefully, multiply both sides by a common denominator. For p = \dfrac{2}{5}, odds in favour are \dfrac{2}{5} : \dfrac{3}{5} = 2 : 3; odds against are 3 : 2.

Worth one line on why odds exist at all: for expressing betting ratios, odds are the right language because they express the ratio of stake to payout. If the odds against an event are 5 : 1, a fair bet pays out five units of profit for every one unit staked (plus your stake back). Gamblers knew this language for centuries before the word probability was standard.

A small example of odds in action

A bag contains 4 red balls, 3 blue balls, and 2 green balls. One ball is drawn at random. Find the probability that the drawn ball is red, and state the odds in favour and odds against drawing a red ball.

Total outcomes: n(S) = 4 + 3 + 2 = 9. Favourable outcomes: n(\text{red}) = 4.

Probability of red: P(R) = \dfrac{4}{9}.

Unfavourable outcomes: 9 - 4 = 5. Odds in favour of red: 4 : 5. Odds against red: 5 : 4.

Stop and verify: \dfrac{4}{4 + 5} = \dfrac{4}{9}, matching the probability. The two descriptions agree, as they must.

Common confusions

A few things students reliably get wrong about classical probability the first time they meet it.

"I can pick any sample space I want." You can — but the classical formula only works if the outcomes of that sample space are equally likely. Picking the wrong sample space is the single most common source of wrong answers in probability. When in doubt, list outcomes at the level of individual coins, individual cards, individual balls. The finer the level, the more likely the outcomes are to be equally likely.
"Odds and probability are the same thing." They are related but not equal. Probability p corresponds to odds p : (1 - p), not p : 1. A probability of \dfrac{1}{2} is odds of 1 : 1 ("fifty-fifty," "even odds"), not \dfrac{1}{2} : 1.
"Probability cannot be zero for something that might happen." In classical probability with finite sample spaces, if an event has zero favourable outcomes, it cannot happen. But in continuous probability, events of zero probability can still occur — picking a random real number in [0, 1] and landing exactly on 0.5 has probability zero, but is not impossible. That subtlety does not bite in the classical finite case.
"If I double the number of outcomes, I double the probability." No. You divide by the new (larger) total. Doubling the sample space while keeping the favourable count fixed halves the probability. This mistake shows up when people confuse count of outcomes with probability.

Going deeper

If you just need the formula for finite problems, you have it. This section is about the limitations of classical probability and why a more general framework is needed.

When classical probability fails

Classical probability works when:

The sample space is finite.
All outcomes in the sample space are equally likely.

Both conditions fail constantly in the real world. Rolling a thumbtack and asking if it lands point-up has two outcomes, but they are not equally likely — point-up and point-down occur at different rates, and the rates depend on the tack. The classical formula would say P(\text{up}) = 1/2, which is just wrong. The frequentist (or empirical) definition handles this: perform the experiment many times, count the fraction of trials on which point-up occurred, and call that the probability — the fraction stabilises to a limit as the number of trials grows.

And for sample spaces that are infinite or continuous — throwing a dart at a square board and measuring where it lands — the classical formula is not even defined, because there is no such thing as "the number of outcomes in S." Continuous probability introduces probability density and integrates it over regions.

The axiomatic approach, which you will meet in the next article, unifies all three (classical, empirical, and continuous) under one framework — a set of three rules (axioms) that any valid probability function must satisfy. Classical probability turns out to be a special case: the axioms, plus the assumption of equally likely outcomes, give back the P(A) = n(A)/n(S) formula.

The classical formula as counting in disguise

When the sample space is equally likely, every probability calculation is secretly a counting problem. "What is P(A)?" becomes "how many outcomes are in A, and how many are in S?" Counting large sets reliably is not a skill that comes for free. This is why counting tools — permutations, combinations, the multiplication principle, inclusion-exclusion — are the workhorses of classical probability. Any probability problem that feels hard is almost always a counting problem in disguise, and the fix is usually to find a cleaner way to count.

A standard trick: if direct counting of A is painful, count A^c instead and use P(A) = 1 - P(A^c). "Find the probability that at least one of four dice shows a six" is awkward directly (you would have to juggle overlaps between "first die is six," "second die is six," and so on). But "none of the four dice shows a six" is easy: \left(\dfrac{5}{6}\right)^4, which uses the independence of the rolls and multiplies. So

P(\text{at least one six}) = 1 - \left(\frac{5}{6}\right)^4 = 1 - \frac{625}{1296} = \frac{671}{1296} \approx 0.518.

The complement rule turned a nightmare into a one-liner. Reach for it whenever "at least one" appears in a question.

Where this leads next

You now have the simplest formula in probability and a sense of when it applies. The next articles extend it.

Axiomatic Approach — the general definition that contains classical probability as a special case.
Addition Theorem — the formal derivation of P(A \cup B) = P(A) + P(B) - P(A \cap B), which powered Example 2 above.
Conditional Probability — how the probability of an event changes when you know that another event has occurred.
Independent Events — the rule P(A \cap B) = P(A) \cdot P(B) and what it really means.
Permutations and Combinations — the counting tools that every hard classical probability problem secretly relies on.