In short
A binomial distribution models the number of successes in n independent trials, each with the same probability p of success. The probability of getting exactly k successes is \binom{n}{k} p^k (1-p)^{n-k}. Its mean is np and its variance is np(1-p).
A factory produces LED bulbs. Quality control tests show that 90% of bulbs work perfectly and 10% are defective. You pick 5 bulbs at random from a large production batch. How many defective bulbs should you expect to find?
You could enumerate every possibility. Zero defective, one defective, two defective, all the way up to five defective. For each case, you could compute the probability. Then you could find the expected value.
But this particular setup — repeating the same yes/no experiment multiple times and counting the "yes" outcomes — appears everywhere. Coin flips, medical trials, election polls, quality inspections, free-throw shooting. It appears so often that it has its own name and its own formula, and once you see where the formula comes from, you'll never need to enumerate cases by hand again.
One trial: the Bernoulli experiment
Start with the smallest possible version of the problem. You pick one bulb. It is either defective (probability p = 0.1) or not defective (probability 1 - p = 0.9). That's it — two outcomes, one trial.
This single yes/no experiment is called a Bernoulli trial. You label one outcome "success" (which doesn't have to be a good thing — here, "success" means "defective," because that's what you're counting) and the other "failure." The random variable X takes value 1 for success and 0 for failure, with
The expected value is E(X) = 0 \cdot (1 - p) + 1 \cdot p = p. The variance is E(X^2) - [E(X)]^2 = p - p^2 = p(1 - p).
A single Bernoulli trial is simple. The interesting question is what happens when you repeat it.
From one trial to n trials
Go back to the 5 bulbs. Each bulb is an independent Bernoulli trial with p = 0.1. You want the probability of getting exactly k defective bulbs out of 5.
Take a specific case first: exactly 2 defective bulbs. One way this could happen is if the first two bulbs are defective and the last three are good:
The probability of this specific sequence is 0.1 \times 0.1 \times 0.9 \times 0.9 \times 0.9 = (0.1)^2 (0.9)^3.
But this is not the only arrangement that gives exactly 2 defectives. The defective bulbs could be in positions 1 and 3, or positions 2 and 5, or any other pair of positions. Each such arrangement has exactly the same probability (0.1)^2 (0.9)^3 — because the trials are independent, the probability depends only on how many are defective, not on which ones.
How many such arrangements are there? You are choosing 2 positions (for the defective bulbs) out of 5. That is \binom{5}{2} = 10.
So the total probability of exactly 2 defectives is
The logic generalises immediately.
The binomial probability formula
Binomial Distribution
Let X be the number of successes in n independent Bernoulli trials, each with success probability p. Then X follows a binomial distribution B(n, p), and
Derivation. Each trial is independent. A specific sequence with k successes and n - k failures has probability p^k (1 - p)^{n-k}, because you multiply the probabilities of each trial and get p for each success and (1-p) for each failure. The number of distinct sequences with exactly k successes is \binom{n}{k} — you choose which k of the n positions are successes. Since these sequences are mutually exclusive (they represent different outcomes), you add their probabilities, giving \binom{n}{k} p^k (1-p)^{n-k}. \square
Three assumptions must hold for the binomial model to apply:
- Fixed number of trials n.
- Each trial has exactly two outcomes (success or failure).
- Trials are independent, and the probability p is the same for every trial.
If any of these fail — if p changes from trial to trial, or the outcomes are dependent, or the number of trials is not fixed — then you do not have a binomial distribution.
Notice how the probabilities verify: 0.590 + 0.328 + 0.073 + 0.008 + 0.0004 + 0.00001 = 1. This must happen, by the binomial theorem: \sum_{k=0}^{n} \binom{n}{k} p^k (1-p)^{n-k} = (p + (1-p))^n = 1^n = 1. The binomial probability formula and the binomial theorem from algebra are the same identity, just viewed through different lenses.
Mean of the binomial distribution
You could compute the mean directly from the definition: E(X) = \sum_{k=0}^{n} k \binom{n}{k} p^k (1-p)^{n-k}. This sum is not obviously easy to evaluate. There is a cleaner path.
Proof using linearity. Think of X as a sum. Each trial i produces a Bernoulli random variable X_i that is 1 if trial i is a success and 0 otherwise. Then
By linearity of expectation (which holds whether or not the variables are independent):
That's it. No complicated sums, no binomial coefficient manipulations. The entire proof rests on writing X as a sum of indicator variables and using the fact that expectation distributes over addition.
For the bulb example, E(X) = 5 \times 0.1 = 0.5. On average, half a defective bulb per batch.
Variance of the binomial distribution
Proof using independence. The X_i are independent Bernoulli variables, each with variance p(1 - p). For independent variables, variances add:
The standard deviation is \sigma = \sqrt{np(1-p)}.
For the bulb example, \text{Var}(X) = 5 \times 0.1 \times 0.9 = 0.45 and \sigma = \sqrt{0.45} \approx 0.671.
Alternative proof via the direct sum. For completeness, you can also compute E(X^2) directly. Since X = \sum X_i:
Since X_i is 0 or 1, X_i^2 = X_i, so E(X_i^2) = p. Since X_i and X_j are independent for i \neq j, E(X_i X_j) = E(X_i) E(X_j) = p^2. There are \binom{n}{2} pairs, so:
Both proofs arrive at the same answer. The first is elegant; the second shows you the algebra that's hiding behind the elegance.
This symmetry is not a coincidence. If "success" has probability p, then "failure" has probability 1 - p, and the number of failures in n trials follows B(n, 1-p). The two distributions are reflections.
Two worked examples
Example 1: Free throws in basketball
A basketball player makes 80% of her free throws. In a game, she takes 6 free throws. Find the probability that she makes exactly 4, and find the expected number of successful throws.
Step 1. Identify the parameters. Each free throw is a Bernoulli trial with p = 0.8. There are n = 6 trials. You want P(X = 4).
Why: the trials are independent (each throw doesn't affect the next), and p is the same for each throw. The binomial model applies.
Step 2. Apply the formula.
Why: there are 15 ways to choose which 4 of the 6 throws are successful.
Step 3. Compute the powers.
Step 4. Multiply.
Why: about a 24.6% chance of making exactly 4 out of 6.
Step 5. Expected successes: E(X) = np = 6 \times 0.8 = 4.8.
Result: P(X = 4) \approx 0.246. Expected makes: 4.8 out of 6.
The graph matches the intuition: an 80% shooter is most likely to make 5 out of 6, with making all 6 the second most likely outcome. Exactly 4 is the third most likely — a reasonable outcome, but not the most probable one.
Example 2: Quality control on a production line
A phone factory has a defect rate of 5%. An inspector tests a batch of 20 phones. What is the probability that exactly 2 are defective? What is the standard deviation of the number of defectives?
Step 1. Parameters: n = 20, p = 0.05, want P(X = 2).
Why: each phone is an independent trial with two outcomes (defective or not), same probability each time. Binomial model applies.
Step 2. Compute \binom{20}{2}.
Step 3. Compute the probability.
(0.05)^2 = 0.0025. For (0.95)^{18}: \ln(0.95) = -0.05129, so 18 \times (-0.05129) = -0.9233, and e^{-0.9233} \approx 0.3972.
Why: about an 18.9% chance of finding exactly 2 defectives in a batch of 20.
Step 4. Standard deviation: \sigma = \sqrt{np(1-p)} = \sqrt{20 \times 0.05 \times 0.95} = \sqrt{0.95} \approx 0.975.
Result: P(X = 2) \approx 0.189. Standard deviation \sigma \approx 0.975.
A standard deviation below 1 tells you that the defect count barely fluctuates. Most batches have 0 or 1 defective phone, with 2 being the upper edge of the typical range. The graph confirms: everything beyond k = 3 is nearly invisible.
Common confusions
-
"The binomial formula works for sampling without replacement." Only approximately, and only when the population is much larger than the sample. If you pick 5 bulbs from a box of 20 without replacing them, the probability changes after each draw. The exact model is the hypergeometric distribution, not the binomial. But if you pick 5 from a box of 10,000, the change in probability after each draw is negligible, and the binomial is an excellent approximation.
-
"\binom{n}{k} counts the probability." No — \binom{n}{k} counts the number of arrangements, not the probability. Each arrangement has its own probability p^k(1-p)^{n-k}, and you multiply the count by this probability to get the final answer. The two factors play very different roles.
-
"If p = 0.5, the distribution is always symmetric." Correct, but the converse is false — a binomial distribution with p \neq 0.5 is always skewed. The skewness increases as p moves further from 0.5.
-
"More trials means more variance." True in absolute terms (\text{Var} = np(1-p) grows with n), but the variance per trial is p(1-p)/n, which shrinks. The relative spread decreases as you do more trials — this is why polling becomes more reliable with larger samples.
Going deeper
If you understand the formula, the mean, and the variance, you have the complete toolkit for solving binomial problems. The rest is for readers who want to see how the binomial connects to other distributions and to the algebra they already know.
Connection to the binomial theorem
The name "binomial distribution" is not a coincidence. The binomial theorem says
Set a = p and b = 1 - p. The right side becomes \sum_{k=0}^{n} \binom{n}{k} p^k (1-p)^{n-k}, which is the sum of all binomial probabilities. The left side becomes (p + 1 - p)^n = 1. So the binomial theorem is the algebraic statement that probabilities sum to 1 — the two ideas are literally the same identity.
Mode of the binomial
The most probable value of k — the mode — is the integer part of (n+1)p. More precisely, P(X = k) increases as k goes up from 0, peaks at or near k = (n+1)p - 1, and then decreases. You can show this by examining the ratio P(X = k+1) / P(X = k):
This ratio exceeds 1 (meaning probabilities are still increasing) when k < (n+1)p - 1, and drops below 1 after that.
The normal approximation
When n is large and p is not too close to 0 or 1, the binomial distribution looks increasingly like a bell curve. Specifically, B(n, p) is well approximated by a normal distribution with the same mean and variance:
A common rule of thumb is that the approximation is good when np \geq 5 and n(1-p) \geq 5. This is the bridge between discrete and continuous probability, and it is the reason the normal distribution is so central to statistics.
Where this leads next
- Other Discrete Distributions — the geometric distribution (how many trials until the first success) and the Poisson distribution (rare events in a large population), with derivations.
- Normal Distribution — the bell curve that the binomial approaches for large n, and the most important distribution in statistics.
- Binomial Theorem for Positive Integer — the algebraic identity that powers the binomial distribution formula.
- Combinations — Basics — a closer look at \binom{n}{k} and why it counts what it counts.
- Conditional Probability — Advanced — when the probability of success changes based on what you know. Extends the fixed-p world of the binomial.