Binomial Distribution

In short

A binomial distribution models the number of successes in n independent trials, each with the same probability p of success. The probability of getting exactly k successes is \binom{n}{k} p^k (1-p)^{n-k}. Its mean is np and its variance is np(1-p).

A factory produces LED bulbs. Quality control tests show that 90% of bulbs work perfectly and 10% are defective. You pick 5 bulbs at random from a large production batch. How many defective bulbs should you expect to find?

You could enumerate every possibility. Zero defective, one defective, two defective, all the way up to five defective. For each case, you could compute the probability. Then you could find the expected value.

But this particular setup — repeating the same yes/no experiment multiple times and counting the "yes" outcomes — appears everywhere. Coin flips, medical trials, election polls, quality inspections, free-throw shooting. It appears so often that it has its own name and its own formula, and once you see where the formula comes from, you'll never need to enumerate cases by hand again.

One trial: the Bernoulli experiment

Start with the smallest possible version of the problem. You pick one bulb. It is either defective (probability p = 0.1) or not defective (probability 1 - p = 0.9). That's it — two outcomes, one trial.

This single yes/no experiment is called a Bernoulli trial. You label one outcome "success" (which doesn't have to be a good thing — here, "success" means "defective," because that's what you're counting) and the other "failure." The random variable X takes value 1 for success and 0 for failure, with

P(X = 1) = p, \qquad P(X = 0) = 1 - p

The expected value is E(X) = 0 \cdot (1 - p) + 1 \cdot p = p. The variance is E(X^2) - [E(X)]^2 = p - p^2 = p(1 - p).

A single Bernoulli trial is simple. The interesting question is what happens when you repeat it.

From one trial to n trials

Go back to the 5 bulbs. Each bulb is an independent Bernoulli trial with p = 0.1. You want the probability of getting exactly k defective bulbs out of 5.

Take a specific case first: exactly 2 defective bulbs. One way this could happen is if the first two bulbs are defective and the last three are good:

\text{D, D, G, G, G}

The probability of this specific sequence is 0.1 \times 0.1 \times 0.9 \times 0.9 \times 0.9 = (0.1)^2 (0.9)^3.

But this is not the only arrangement that gives exactly 2 defectives. The defective bulbs could be in positions 1 and 3, or positions 2 and 5, or any other pair of positions. Each such arrangement has exactly the same probability (0.1)^2 (0.9)^3 — because the trials are independent, the probability depends only on how many are defective, not on which ones.

How many such arrangements are there? You are choosing 2 positions (for the defective bulbs) out of 5. That is \binom{5}{2} = 10.

So the total probability of exactly 2 defectives is

P(X = 2) = \binom{5}{2} (0.1)^2 (0.9)^3 = 10 \times 0.01 \times 0.729 = 0.0729

The logic generalises immediately.

The binomial probability formula

Binomial Distribution

Let X be the number of successes in n independent Bernoulli trials, each with success probability p. Then X follows a binomial distribution B(n, p), and

P(X = k) = \binom{n}{k} p^k (1-p)^{n-k}, \qquad k = 0, 1, 2, \ldots, n

Derivation. Each trial is independent. A specific sequence with k successes and n - k failures has probability p^k (1 - p)^{n-k}, because you multiply the probabilities of each trial and get p for each success and (1-p) for each failure. The number of distinct sequences with exactly k successes is \binom{n}{k} — you choose which k of the n positions are successes. Since these sequences are mutually exclusive (they represent different outcomes), you add their probabilities, giving \binom{n}{k} p^k (1-p)^{n-k}. \square

Three assumptions must hold for the binomial model to apply:

Fixed number of trials n.
Each trial has exactly two outcomes (success or failure).
Trials are independent, and the probability p is the same for every trial.

If any of these fail — if p changes from trial to trial, or the outcomes are dependent, or the number of trials is not fixed — then you do not have a binomial distribution.

The binomial distribution $B(5, 0.1)$: the probability of getting $k$ defective bulbs out of 5 when each has a 10% defect rate. Most of the probability is concentrated at 0 and 1. The mean $\mu = np = 0.5$ says that on average, you find half a defective bulb per batch of 5.

Notice how the probabilities verify: 0.590 + 0.328 + 0.073 + 0.008 + 0.0004 + 0.00001 = 1. This must happen, by the binomial theorem: \sum_{k=0}^{n} \binom{n}{k} p^k (1-p)^{n-k} = (p + (1-p))^n = 1^n = 1. The binomial probability formula and the binomial theorem from algebra are the same identity, just viewed through different lenses.

Mean of the binomial distribution

You could compute the mean directly from the definition: E(X) = \sum_{k=0}^{n} k \binom{n}{k} p^k (1-p)^{n-k}. This sum is not obviously easy to evaluate. There is a cleaner path.

Proof using linearity. Think of X as a sum. Each trial i produces a Bernoulli random variable X_i that is 1 if trial i is a success and 0 otherwise. Then

X = X_1 + X_2 + \cdots + X_n

By linearity of expectation (which holds whether or not the variables are independent):

E(X) = E(X_1) + E(X_2) + \cdots + E(X_n) = p + p + \cdots + p = np \quad \square

That's it. No complicated sums, no binomial coefficient manipulations. The entire proof rests on writing X as a sum of indicator variables and using the fact that expectation distributes over addition.

For the bulb example, E(X) = 5 \times 0.1 = 0.5. On average, half a defective bulb per batch.

Variance of the binomial distribution

Proof using independence. The X_i are independent Bernoulli variables, each with variance p(1 - p). For independent variables, variances add:

\text{Var}(X) = \text{Var}(X_1) + \text{Var}(X_2) + \cdots + \text{Var}(X_n) = np(1-p) \quad \square

The standard deviation is \sigma = \sqrt{np(1-p)}.

For the bulb example, \text{Var}(X) = 5 \times 0.1 \times 0.9 = 0.45 and \sigma = \sqrt{0.45} \approx 0.671.

Alternative proof via the direct sum. For completeness, you can also compute E(X^2) directly. Since X = \sum X_i:

X^2 = \left(\sum_{i=1}^{n} X_i\right)^2 = \sum_{i=1}^{n} X_i^2 + 2\sum_{i < j} X_i X_j

Since X_i is 0 or 1, X_i^2 = X_i, so E(X_i^2) = p. Since X_i and X_j are independent for i \neq j, E(X_i X_j) = E(X_i) E(X_j) = p^2. There are \binom{n}{2} pairs, so:

E(X^2) = np + 2\binom{n}{2}p^2 = np + n(n-1)p^2

\text{Var}(X) = E(X^2) - [E(X)]^2 = np + n(n-1)p^2 - n^2p^2 = np + n^2p^2 - np^2 - n^2p^2 = np - np^2 = np(1-p) \quad \square

Both proofs arrive at the same answer. The first is elegant; the second shows you the algebra that's hiding behind the elegance.

Two binomial distributions with $n = 10$. The faint bars are $B(10, 0.3)$ with mean 3, skewed right. The coloured bars are $B(10, 0.7)$ with mean 7, skewed left. Both have the same variance $np(1-p) = 10 \times 0.3 \times 0.7 = 2.1$ — they are mirror images of each other, reflected around $k = 5$.

This symmetry is not a coincidence. If "success" has probability p, then "failure" has probability 1 - p, and the number of failures in n trials follows B(n, 1-p). The two distributions are reflections.

Two worked examples

Example 1: Free throws in basketball

A basketball player makes 80% of her free throws. In a game, she takes 6 free throws. Find the probability that she makes exactly 4, and find the expected number of successful throws.

Step 1. Identify the parameters. Each free throw is a Bernoulli trial with p = 0.8. There are n = 6 trials. You want P(X = 4).

Why: the trials are independent (each throw doesn't affect the next), and p is the same for each throw. The binomial model applies.

Step 2. Apply the formula.

P(X = 4) = \binom{6}{4} (0.8)^4 (0.2)^2

\binom{6}{4} = \frac{6!}{4! \cdot 2!} = \frac{6 \times 5}{2 \times 1} = 15

Why: there are 15 ways to choose which 4 of the 6 throws are successful.

Step 3. Compute the powers.

(0.8)^4 = 0.4096, \qquad (0.2)^2 = 0.04

Step 4. Multiply.

P(X = 4) = 15 \times 0.4096 \times 0.04 = 15 \times 0.016384 = 0.24576

Why: about a 24.6% chance of making exactly 4 out of 6.

Step 5. Expected successes: E(X) = np = 6 \times 0.8 = 4.8.

Result: P(X = 4) \approx 0.246. Expected makes: 4.8 out of 6.

The distribution $B(6, 0.8)$. The highlighted bar at $k = 4$ shows about a 24.6% chance. The most probable outcome is $k = 5$ (about 39.3%), and the mean sits at 4.8 — just to the left of the peak. With a high success rate, the distribution piles up at the right end.

The graph matches the intuition: an 80% shooter is most likely to make 5 out of 6, with making all 6 the second most likely outcome. Exactly 4 is the third most likely — a reasonable outcome, but not the most probable one.

Example 2: Quality control on a production line

A phone factory has a defect rate of 5%. An inspector tests a batch of 20 phones. What is the probability that exactly 2 are defective? What is the standard deviation of the number of defectives?

Step 1. Parameters: n = 20, p = 0.05, want P(X = 2).

Why: each phone is an independent trial with two outcomes (defective or not), same probability each time. Binomial model applies.

Step 2. Compute \binom{20}{2}.

\binom{20}{2} = \frac{20 \times 19}{2} = 190

Step 3. Compute the probability.

P(X = 2) = 190 \times (0.05)^2 \times (0.95)^{18}

(0.05)^2 = 0.0025. For (0.95)^{18}: \ln(0.95) = -0.05129, so 18 \times (-0.05129) = -0.9233, and e^{-0.9233} \approx 0.3972.

P(X = 2) = 190 \times 0.0025 \times 0.3972 = 190 \times 0.000993 \approx 0.1887

Why: about an 18.9% chance of finding exactly 2 defectives in a batch of 20.

Step 4. Standard deviation: \sigma = \sqrt{np(1-p)} = \sqrt{20 \times 0.05 \times 0.95} = \sqrt{0.95} \approx 0.975.

Result: P(X = 2) \approx 0.189. Standard deviation \sigma \approx 0.975.

The distribution $B(20, 0.05)$. With a low defect rate, most batches have 0 or 1 defective phone. The highlighted bar at $k = 2$ shows about 18.9% probability. The mean is $np = 1$, and the standard deviation is about 0.975 — almost all the probability lies within 2 units of the mean.

A standard deviation below 1 tells you that the defect count barely fluctuates. Most batches have 0 or 1 defective phone, with 2 being the upper edge of the typical range. The graph confirms: everything beyond k = 3 is nearly invisible.

Common confusions

"The binomial formula works for sampling without replacement." Only approximately, and only when the population is much larger than the sample. If you pick 5 bulbs from a box of 20 without replacing them, the probability changes after each draw. The exact model is the hypergeometric distribution, not the binomial. But if you pick 5 from a box of 10,000, the change in probability after each draw is negligible, and the binomial is an excellent approximation.
"\binom{n}{k} counts the probability." No — \binom{n}{k} counts the number of arrangements, not the probability. Each arrangement has its own probability p^k(1-p)^{n-k}, and you multiply the count by this probability to get the final answer. The two factors play very different roles.
"If p = 0.5, the distribution is always symmetric." Correct, but the converse is false — a binomial distribution with p \neq 0.5 is always skewed. The skewness increases as p moves further from 0.5.
"More trials means more variance." True in absolute terms (\text{Var} = np(1-p) grows with n), but the variance per trial is p(1-p)/n, which shrinks. The relative spread decreases as you do more trials — this is why polling becomes more reliable with larger samples.

Going deeper

If you understand the formula, the mean, and the variance, you have the complete toolkit for solving binomial problems. The rest is for readers who want to see how the binomial connects to other distributions and to the algebra they already know.

Connection to the binomial theorem

The name "binomial distribution" is not a coincidence. The binomial theorem says

(a + b)^n = \sum_{k=0}^{n} \binom{n}{k} a^k b^{n-k}

Set a = p and b = 1 - p. The right side becomes \sum_{k=0}^{n} \binom{n}{k} p^k (1-p)^{n-k}, which is the sum of all binomial probabilities. The left side becomes (p + 1 - p)^n = 1. So the binomial theorem is the algebraic statement that probabilities sum to 1 — the two ideas are literally the same identity.

Mode of the binomial

The most probable value of k — the mode — is the integer part of (n+1)p. More precisely, P(X = k) increases as k goes up from 0, peaks at or near k = (n+1)p - 1, and then decreases. You can show this by examining the ratio P(X = k+1) / P(X = k):

\frac{P(X = k+1)}{P(X = k)} = \frac{(n-k) \, p}{(k+1)(1-p)}

This ratio exceeds 1 (meaning probabilities are still increasing) when k < (n+1)p - 1, and drops below 1 after that.

The normal approximation

When n is large and p is not too close to 0 or 1, the binomial distribution looks increasingly like a bell curve. Specifically, B(n, p) is well approximated by a normal distribution with the same mean and variance:

B(n, p) \approx N(np, \, np(1-p))

A common rule of thumb is that the approximation is good when np \geq 5 and n(1-p) \geq 5. This is the bridge between discrete and continuous probability, and it is the reason the normal distribution is so central to statistics.

Where this leads next

Other Discrete Distributions — the geometric distribution (how many trials until the first success) and the Poisson distribution (rare events in a large population), with derivations.
Normal Distribution — the bell curve that the binomial approaches for large n, and the most important distribution in statistics.
Binomial Theorem for Positive Integer — the algebraic identity that powers the binomial distribution formula.
Combinations — Basics — a closer look at \binom{n}{k} and why it counts what it counts.
Conditional Probability — Advanced — when the probability of success changes based on what you know. Extends the fixed-p world of the binomial.