Probability - Introduction

In short

A random experiment is any action with an uncertain outcome — tossing a coin, rolling a die, drawing a card. The sample space S is the set of every possible outcome. An event is any subset of the sample space. You can combine events using union, intersection, and complement — exactly the operations of set theory, applied to uncertainty.

Toss a coin. Before it lands, you do not know whether you will see heads or tails. While it is in the air, there is nothing you can do to change the answer — physics already determined it when your thumb left the coin — but you still cannot predict it. That gap between what is determined and what you can predict is where probability lives.

Now toss a coin a million times. You cannot predict any single toss, but you can predict, with enormous confidence, that close to half of them will come up heads. One flip: total mystery. A million flips: total regularity. The same gap, seen at two scales.

Probability is the branch of mathematics that takes that regularity seriously. It gives you a language to describe events whose individual outcomes are unpredictable but whose long-run behaviour is not. It was invented in the 17th century to solve gambling problems — "how should we split the pot if a game of chance is interrupted halfway through?" — and it has since turned into the engine of statistics, information theory, quantum mechanics, cryptography, insurance, weather forecasting, and every machine learning model you have ever heard of. All of it rests on three little ideas: a random experiment, its sample space, and events inside that sample space.

A random experiment

Probability starts with an action. Not a formula, not a number — an action with an outcome you cannot predict in advance. A random experiment is any such action. Some examples to pin this down:

You toss a coin. The outcome is "heads" or "tails."
You roll a standard six-sided die. The outcome is one of 1, 2, 3, 4, 5, 6.
You draw one card from a shuffled deck of 52. The outcome is any one of the 52 cards.
You roll two dice and record the pair of numbers. The outcome is something like (3, 5).
You measure tomorrow's rainfall in Mumbai in millimetres. The outcome is any non-negative real number.

What makes these random experiments, as opposed to just experiments, is that all of the following hold:

The experiment can, in principle, be repeated any number of times under the same conditions.
The set of possible outcomes is known in advance.
You cannot predict, before the experiment is performed, which particular outcome will occur.

A cooking experiment where you sauté onions until they are brown is not a random experiment in this sense — you know exactly what will happen if you leave them on heat. Dropping a stone from a cliff is not a random experiment — the stone reliably falls. But tossing a stone and seeing whether it lands face-up or face-down is. Random experiments are the raw material of probability, and the first thing you do with one is list all the outcomes it can produce.

The sample space

The list of every possible outcome of a random experiment — every possible answer to the question "what happened?" — is called the sample space, written S (some books use \Omega). It is a set, and every individual outcome is an element of it.

For the coin toss, S = \{H, T\} — two elements.

For the single die, S = \{1, 2, 3, 4, 5, 6\} — six elements.

For a single draw from a deck, S is the set of all 52 cards — a set with 52 elements.

For the roll of two dice where order matters, S has 36 elements: (1,1), (1,2), \ldots, (6,6). Notice that (3, 5) and (5, 3) count as different outcomes if you distinguish the two dice, because the first die showed something different in each case.

For tomorrow's rainfall, S is the set [0, \infty) — every non-negative real number. In this case S is infinite, and in fact uncountable. Probability can handle this, but the tools are heavier, and for most of this introduction you should think of S as a finite set.

Every dot is one outcome. The sample space $S$ of rolling two distinguishable dice has 36 elements — every ordered pair $(i, j)$ with $i, j \in \{1, 2, 3, 4, 5, 6\}$.

A sample space should be exhaustive (every possible outcome must be in it) and mutually exclusive (no two outcomes can occur at the same trial). Pick those two conditions carefully. "Getting more than 3 on a die" is not an outcome — it is a collection of outcomes. "Getting 4" is an outcome. The distinction between outcomes and collections of outcomes is exactly the distinction between elements of S and subsets of S — and subsets of S are what the next section is about.

Events

An event is any subset of the sample space. That is the whole definition, but it packs a lot in.

Rolling a die, let A be the event "the result is even." Then A = \{2, 4, 6\}, a subset of S = \{1, 2, 3, 4, 5, 6\}. If you roll a 4, the actual outcome is the element 4; and because 4 \in A, you say the event A has occurred. If you roll a 3, then 3 \notin A, and the event has not occurred.

Another event: B = "the result is at least 5" = \{5, 6\}. And another: C = "the result is 7" = \{\}. The last one is empty, because it is impossible. The empty set is a perfectly legitimate event — the impossible event — and so is the full set S, the certain event, which occurs no matter what.

The sample space $S$ contains six outcomes. The red oval is the event $A = \{2, 4, 6\}$ (the result is even). The lighter oval is $B = \{5, 6\}$ (the result is at least five). The overlap $A \cap B = \{6\}$ is the outcome where both events happen.

Now the vocabulary. An event consisting of exactly one outcome is called a simple event or elementary event. An event consisting of more than one outcome is a compound event. On the die, \{4\} is a simple event — "the result is 4." And \{2, 4, 6\} is a compound event — "the result is even" — built from three simple events.

Types of events

A handful of vocabulary that you will see everywhere in the rest of probability:

Sure (certain) event: the event S itself. "Something happens."
Impossible event: the empty set \emptyset. "Nothing on the list happens."
Simple (elementary) event: a singleton subset of S — a single outcome wrapped in braces.
Compound event: a subset of S with more than one element.
Complementary event: the complement A^c = S \setminus A — the event that A does not happen.
Mutually exclusive events: two events A and B are mutually exclusive (or disjoint) if they cannot both happen — A \cap B = \emptyset. On a die, "even" and "odd" are mutually exclusive; "even" and "at least 5" are not, because the outcome 6 is in both.
Exhaustive events: a collection of events A_1, A_2, \ldots, A_n is exhaustive if their union is the whole sample space — A_1 \cup A_2 \cup \cdots \cup A_n = S. At least one of them must occur.
Mutually exclusive and exhaustive: both conditions at once. This is what partitions do: a partition of S is a collection of events where exactly one of them occurs on every trial. On a die, "even" and "odd" are mutually exclusive and exhaustive — a partition of the sample space.

The word equally likely also appears everywhere, though it is not a property of a single event but of a collection: the outcomes of S are equally likely if there is no reason to expect any one of them to occur more often than any other. A fair die produces six equally likely outcomes. A biased die does not. This matters because the simplest formula in probability — P(A) = n(A)/n(S), coming up in the next article — assumes the outcomes of S are equally likely.

Algebra of events

Because events are subsets of S, you can combine them using set operations. Every set operation has a meaning in probability — and "set operation" translates exactly into "and/or/not" in English.

Event operations

Let A, B \subseteq S be events.

Union: A \cup B is the event "A happens or B happens (or both)." The set of outcomes in A, in B, or in both.
Intersection: A \cap B is the event "A happens and B happens." The set of outcomes in both.
Complement: A^c (also written A' or \overline{A}) is the event "A does not happen." The outcomes in S that are not in A.
Difference: A \setminus B (also A - B) is the event "A happens but B does not." Equivalent to A \cap B^c.

Every single one of these is a subset of S, so every single one of these is itself an event. The operations take events in and give events back — that is what makes them an algebra.

The rules of set theory apply without modification. The most useful ones are:

Commutative: A \cup B = B \cup A and A \cap B = B \cap A.
Associative: (A \cup B) \cup C = A \cup (B \cup C) and similarly for intersection.
Distributive: A \cap (B \cup C) = (A \cap B) \cup (A \cap C) and A \cup (B \cap C) = (A \cup B) \cap (A \cup C).
De Morgan's laws: (A \cup B)^c = A^c \cap B^c and (A \cap B)^c = A^c \cup B^c.

De Morgan's laws deserve a closer look. In English: "the event 'neither A nor B' is the same as 'not A and not B.'" And "the event 'not (both A and B)' is the same as 'not A or not B.'" You will use them constantly when translating English-language probability questions into set operations.

Two concrete worked examples

Example 1: sample space of tossing a coin three times

An experiment consists of tossing a fair coin three times and recording the sequence of outcomes.

Step 1. List the sample space. Each toss is H or T, so a complete outcome is a string of three letters.

S = \{HHH, HHT, HTH, HTT, THH, THT, TTH, TTT\}

Why: there are 2 \times 2 \times 2 = 8 outcomes because each of three independent tosses has two results. Write them all out so you can pick events off visually.

Step 2. Define the event A = "exactly two heads" as a subset of S.

A = \{HHT, HTH, THH\}

Why: you scan through S and keep the strings with exactly two Hs. There are three ways to place the one T among three positions.

Step 3. Define the event B = "first toss is heads" as a subset of S.

B = \{HHH, HHT, HTH, HTT\}

Why: you keep every outcome that starts with H. The remaining two tosses are unconstrained, giving four such strings.

Step 4. Compute A \cap B, A \cup B, and A^c.

A \cap B = \{HHT, HTH\}

A \cup B = \{HHH, HHT, HTH, HTT, THH\}

A^c = \{HHH, HTT, THT, TTH, TTT\}

Why: intersection keeps the strings in both. Union keeps the strings in either. Complement keeps the strings in S that are not in A.

Result: The event "exactly two heads and first toss is heads" contains the outcomes HHT and HTH — two out of the eight total outcomes.

The sample space of three coin tosses contains 8 outcomes. Event $A$ (exactly two heads) contains $\{HHT, HTH, THH\}$; event $B$ (first toss is heads) contains the four outcomes starting with $H$. Their intersection — the overlap region — contains exactly $HHT$ and $HTH$.

Example 2: drawing a card

A single card is drawn from a well-shuffled standard deck of 52. Let A = "the card is a heart" and B = "the card is a face card (J, Q, or K)."

Step 1. Identify the sample space and the two events as subsets.

S has 52 elements. The event A contains all 13 hearts: A = \{A\heartsuit, 2\heartsuit, \ldots, K\heartsuit\}. The event B contains the 12 face cards: \{J, Q, K\} in each of the four suits.

Why: sample space first, events as subsets second. Writing the events explicitly keeps you from counting wrong.

Step 2. Compute A \cap B.

A \cap B = "heart and face card" = \{J\heartsuit, Q\heartsuit, K\heartsuit\}.

Why: the intersection keeps only the cards that are in both sets — a card has to be a heart and a face card.

Step 3. Compute A \cup B using inclusion-exclusion on the sizes.

|A| = 13, |B| = 12, |A \cap B| = 3. So

|A \cup B| = |A| + |B| - |A \cap B| = 13 + 12 - 3 = 22.

Why: if you naively added 13 + 12 = 25, you would count the three cards in the overlap twice. Subtract them once to get the right total.

Step 4. Compute A^c (the event "not a heart").

A^c contains the 39 non-heart cards. In particular |A^c| = 52 - 13 = 39.

Result: The four quantities the problem asks about are |A| = 13, |B| = 12, |A \cap B| = 3, |A \cup B| = 22. Events and their sizes are what you will plug into the probability formulas in the next article.

The deck splits into four zones: $10$ non-face hearts, the $3$ face hearts in the overlap, $9$ face cards in other suits, and $30$ cards that are neither hearts nor face cards. The four zones sum to $52$, and $|A \cup B| = 10 + 3 + 9 = 22$ — matching the inclusion-exclusion calculation.

Common confusions

A few things students reliably get wrong about sample spaces and events.

"The sample space is whatever I find convenient." Not quite. The sample space has to be exhaustive and the outcomes have to be mutually exclusive. If you decide to describe rolling two dice by their sum (2, 3, 4, \ldots, 12), you get 11 outcomes — but those outcomes are not equally likely (there is only one way to get a 2 and six ways to get a 7), which will break the next article's formulas.
"Events and outcomes are the same thing." An outcome is a single element of S. An event is a subset of S — possibly with just one element (a simple event), possibly with many. When you roll a die, 4 is an outcome and \{4\} is an event, even though they feel identical.
"Mutually exclusive and independent mean the same thing." They are completely different, and you will meet independence later in this chapter. For now, hold on to this: mutually exclusive means A \cap B = \emptyset, meaning the two events can never both happen at once. Independent means the occurrence of one does not affect the probability of the other. Two mutually exclusive events with positive probability are the opposite of independent — knowing one happened tells you the other definitely did not.
"The empty set is not an event." It is. The impossible event \emptyset is a subset of every set, including the sample space. It has probability zero, but it is a legitimate event.

Going deeper

If you only need probability at the level of basic coin-and-dice problems, you have the full setup now and can move on to Classical Probability. The rest of this section is about how probability handles infinite sample spaces, and why the set-of-all-subsets approach needs refinement in those cases.

Countable and uncountable sample spaces

The sample spaces in this article are all finite — at most a few dozen outcomes. Probability can handle more:

Countably infinite sample spaces show up whenever you count the number of trials until something happens. "Toss a coin until you see heads; X is the number of tosses needed." Here S = \{1, 2, 3, \ldots\}. Still manageable: any subset of S is a valid event.
Uncountable sample spaces show up with continuous measurements. "Pick a random point on a dartboard." Here S is the interior of a disk — a two-dimensional continuum. Now a subtlety appears: not every subset of S can be assigned a sensible probability. The subsets that can form a smaller collection called a \sigma-algebra, and this is the starting point of measure-theoretic probability, which you will meet in advanced courses.

For finite and countable sample spaces, you can ignore this distinction entirely: every subset of S is an event, full stop. The distinction only bites in the continuous case, and even there, you can treat every reasonable subset (any interval, any circle, any region with a sensible area) as an event.

Why events form an algebra

The union, intersection, and complement operations on events are closed: taking them gives you back an event. That closure is what makes "events under \cup, \cap, {}^c" an algebra in the formal sense. Combined with the fact that \emptyset and S are always events, this gives you the structure called a Boolean algebra — the same structure logic runs on, which is why every English phrase with and, or, not can be translated directly into operations on events.

The name "probability theory" is a bit misleading, because the theory itself is built on top of set theory: pure set operations, with a probability function assigning each set a number in [0, 1]. That function is what the next articles introduce — classical probability first, then its axiomatic formulation.

Where this leads next

You now have the vocabulary of probability: random experiments, sample spaces, events, and how to combine events using set operations. The next articles put numbers on events.

Classical Probability — the simplest formula, P(A) = n(A)/n(S), applicable whenever the outcomes of S are equally likely.
Axiomatic Approach — the general definition of a probability function and the three axioms every such function must satisfy.
Addition Theorem — how to compute P(A \cup B) from P(A), P(B), and P(A \cap B).
Conditional Probability — how the probability of one event changes when you learn that another has occurred.
Independent Events — the precise notion of two events not affecting each other's probability.