In short

A population is the entire group you want to study. A sample is a smaller subset you actually measure. If you choose the sample randomly, the sample average is a reliable stand-in for the population average — and the larger the sample, the closer it gets. The pattern of how sample averages spread out is called the sampling distribution.

India has roughly 25 crore households. Suppose the government wants to know the average monthly electricity bill across the entire country. Going door to door and collecting 25 crore electricity bills is not feasible — it would take years and cost a fortune. So instead, a surveyor picks 10,000 households, records their bills, and computes the average of those 10,000 numbers. That average comes out to, say, ₹1,840.

Here is the question: can you trust that number? Those 10,000 households are not the country. They are a tiny sliver of the country — 0.004% of all households. Why should their average tell you anything about the average for everyone?

This is the central problem of sampling. You want to know something about a huge group. You can only measure a small piece. Under what conditions does the small piece faithfully represent the whole?

The answer is surprisingly precise, and it rests on one idea: randomness. If you choose the 10,000 households at random — genuinely at random, not "convenient" households, not "nearby" households — then the mathematics guarantees that their average will be close to the true average. Not exactly equal, but close, and you can even quantify how close.

Population and sample

These two words are the foundation of everything in statistics, so they need careful definitions.

Population and sample

The population is the complete set of individuals or objects you want to study. The sample is a subset of the population that you actually observe and measure.

A parameter is a fixed number describing the population (like the true average electricity bill of all 25 crore households). A statistic is a number computed from the sample (like the average electricity bill of your 10,000 sampled households).

The population is not always "people." If you are testing whether a batch of 50,000 light bulbs meets a quality standard, the population is all 50,000 bulbs. The sample might be 200 bulbs pulled from the production line and tested. If you are studying the heights of teak trees in a forest reserve, the population is every teak tree in that reserve.

The key distinction: a parameter is a fact about the world. It has a single, fixed, true value — you just do not know it. A statistic is something you compute from data you actually collected. It changes every time you draw a new sample. The entire point of sampling is to use the statistic (which you can compute) to estimate the parameter (which you cannot directly observe).

Here is a concrete example. Suppose a school has 1,200 students, and the average height of all 1,200 is exactly 162.3 cm. That is the parameter — you just do not know it yet. You pick 50 students at random and measure their heights. Their average comes out to 163.1 cm. That is the statistic. Pick a different 50 students and you might get 161.8 cm, or 162.7 cm. Each sample gives a slightly different statistic, but they all hover around the true parameter.

Population and sample relationship A large rectangle represents the population. Inside it, a smaller shaded region represents the sample — a subset drawn from the population. Arrows indicate that the sample statistic estimates the population parameter. Population (N = 1,200 students) Parameter: true mean height = ??? Sample (n = 50) Statistic: x̄ = 163.1 cm The sample statistic estimates the population parameter
The population is everything you want to study. The sample is the piece you actually measure. The goal: use the sample statistic to estimate the unknown population parameter.

Why the sample has to be random

Suppose you want to estimate the average monthly income of families in a city. You stand outside a shopping mall on a Saturday afternoon and survey 200 people. What is wrong with this?

Everything. The people at the mall are not representative of the city. They are disproportionately wealthier (they are shopping), disproportionately urban (they are at a mall), and disproportionately free on Saturday (ruling out many workers). Your sample is biased — it systematically leans in one direction. The average income you compute from this sample will almost certainly be higher than the city's true average, and no amount of increasing the sample size will fix the problem. You could survey 10,000 mall-goers and the bias would still be there.

The cure for bias is random sampling — a procedure where every individual in the population has a known, nonzero chance of being selected.

Simple random sampling

A simple random sample (SRS) of size n from a population of size N is a sample chosen so that every possible subset of n individuals is equally likely to be selected. Equivalently, every individual has the same probability n/N of appearing in the sample.

The word "random" here is doing real work. It does not mean "haphazard" or "whatever feels right." It means you have a specific mechanism — a lottery, a random number generator, a table of random digits — that gives every member of the population an equal shot.

Why does randomness help? Because it removes your biases from the selection. You might unconsciously prefer taller students, richer families, trees near the path. Randomness does not have preferences. Over many possible random samples, the biases cancel out: sometimes you oversample tall students, sometimes short ones, and on average, you get the right answer.

There are other sampling methods beyond simple random sampling — stratified sampling divides the population into subgroups (strata) and samples from each, systematic sampling picks every k-th individual from a list, cluster sampling selects entire groups at once. Each has its uses. But simple random sampling is the baseline: the method against which all others are compared, and the one whose mathematics is cleanest.

What happens when you sample repeatedly

Here is an experiment you can run in your head. Take the school with 1,200 students whose true average height is 162.3 cm. Draw a random sample of 50 students and compute their average. Write it down. Put those students back, draw another random sample of 50, compute the average. Do this 1,000 times. You now have 1,000 sample averages.

What does the collection of those 1,000 averages look like?

It forms a pattern — a distribution — centred on the true population mean. Most of the sample averages cluster near 162.3 cm. A few land at 160 or 164. Almost none land below 158 or above 167. The shape is a bell curve, symmetric and concentrated.

Sampling distribution of the sample mean A histogram showing the distribution of 1000 sample means, each from a sample of size 50. The histogram forms a bell-shaped curve centred at the population mean of 162.3 cm. Most sample means fall between 161 and 164. sample mean (cm) frequency μ = 162.3 159 161 162 163 165
If you draw 1,000 random samples of size 50 from the same population and compute each sample's average, the averages form a bell-shaped distribution centred on the true population mean. This distribution of sample averages is the sampling distribution.

This distribution of sample averages is called the sampling distribution of the sample mean. It is not a distribution of individual heights — it is a distribution of averages, each computed from a different random sample.

Three facts about the sampling distribution make the entire field of statistics possible.

Fact 1: The centre. The mean of the sampling distribution equals the population mean \mu. If you averaged all 1,000 sample means, you would get (very close to) 162.3 cm. In symbols:

E(\bar{x}) = \mu

where \bar{x} is the sample mean. This says that random sampling is unbiased — on average, the sample mean hits the right target.

Fact 2: The spread. The standard deviation of the sampling distribution — called the standard error — is

\text{SE} = \frac{\sigma}{\sqrt{n}}

where \sigma is the population standard deviation and n is the sample size. The \sqrt{n} in the denominator is the key: as the sample size grows, the spread of sample means shrinks. With n = 50, the standard error is \sigma/\sqrt{50}. With n = 200, it is \sigma/\sqrt{200} — half as wide. Larger samples give more precise estimates.

Fact 3: The shape. Even if the population distribution is skewed or irregular, the sampling distribution of the sample mean is approximately normal (bell-shaped) when n is large enough. This is the Central Limit Theorem — one of the most remarkable results in mathematics. It says that averaging washes out individual quirks, and the average of many random things is always approximately normal.

How large is "large enough"? For most populations, n \geq 30 is a reasonable threshold. If the population is strongly skewed, you might need n \geq 50 or more. If the population is already symmetric, even n = 10 can be enough.

The standard error formula, derived

The formula \text{SE} = \sigma / \sqrt{n} deserves a derivation, not just a statement.

Suppose you draw a sample of n values x_1, x_2, \ldots, x_n from a population with mean \mu and standard deviation \sigma. Assume the values are drawn independently (each draw does not affect the next). The sample mean is

\bar{x} = \frac{x_1 + x_2 + \cdots + x_n}{n}

Now compute the variance of \bar{x}. Each x_i has variance \sigma^2. Since the draws are independent, the variance of a sum is the sum of the variances:

\text{Var}(x_1 + x_2 + \cdots + x_n) = \sigma^2 + \sigma^2 + \cdots + \sigma^2 = n\sigma^2

The sample mean divides this sum by the constant n. When you divide a random variable by a constant c, the variance gets divided by c^2:

\text{Var}(\bar{x}) = \text{Var}\!\left(\frac{x_1 + \cdots + x_n}{n}\right) = \frac{n\sigma^2}{n^2} = \frac{\sigma^2}{n}

Take the square root to get the standard deviation:

\text{SD}(\bar{x}) = \frac{\sigma}{\sqrt{n}}

That is the standard error. The \sqrt{n} does not come from a rule of thumb — it comes from the algebra of variances. Every time you quadruple the sample size, the standard error halves.

Worked examples

Example 1: Estimating mean marks from a sample

A coaching centre has 2,000 students. The true mean score on a mock test is \mu = 68 marks with a standard deviation of \sigma = 12 marks. You randomly sample 36 students and compute the sample mean. What is the standard error, and within what range would you expect most sample means to fall?

Step 1. Identify the known quantities.

\mu = 68, \quad \sigma = 12, \quad n = 36

Why: \mu and \sigma describe the population. n is the sample size you chose.

Step 2. Compute the standard error.

\text{SE} = \frac{\sigma}{\sqrt{n}} = \frac{12}{\sqrt{36}} = \frac{12}{6} = 2

Why: the standard error measures how much sample means typically deviate from the population mean.

Step 3. Find the range where most sample means fall. By the empirical rule for normal distributions, about 95% of sample means lie within \pm 2 standard errors of \mu.

68 - 2(2) = 64 \quad \text{to} \quad 68 + 2(2) = 72

Why: the sampling distribution is approximately normal (Central Limit Theorem, n = 36 \geq 30), so the 95% rule applies.

Step 4. Interpret: if you sampled 36 students many times, about 95% of those sample means would land between 64 and 72.

Result: The standard error is 2 marks. About 95% of sample means fall in the interval [64, 72].

Sampling distribution for n equals 36 with mean 68 A bell curve centred at 68 marks, with the region between 64 and 72 shaded, representing the 95% interval. The standard error is 2 marks. μ = 68 64 72 95% of sample means
The sampling distribution of $\bar{x}$ when $n = 36$. The bell curve is centred at $\mu = 68$, with standard error $\text{SE} = 2$. The shaded region from 64 to 72 captures about 95% of all possible sample means.

Notice what the picture says: even though individual students' marks vary widely (standard deviation 12), the average of 36 students barely moves — its standard deviation is only 2. Averaging compresses randomness.

Example 2: How sample size controls precision

A factory produces resistors whose resistance has a population mean of \mu = 100\,\Omega and a standard deviation of \sigma = 5\,\Omega. The quality inspector wants the standard error of the sample mean to be at most 0.5\,\Omega. How many resistors must be sampled?

Step 1. Write the standard error formula and set it equal to the target.

\text{SE} = \frac{\sigma}{\sqrt{n}} \leq 0.5

Why: the inspector's requirement translates directly into a bound on the standard error.

Step 2. Substitute \sigma = 5 and solve for n.

\frac{5}{\sqrt{n}} \leq 0.5
\sqrt{n} \geq \frac{5}{0.5} = 10
n \geq 100

Why: algebraically, multiplying both sides by \sqrt{n} and dividing by 0.5 isolates \sqrt{n}. Squaring both sides gives n.

Step 3. Verify: with n = 100, \text{SE} = 5/\sqrt{100} = 5/10 = 0.5\,\Omega. Exactly meets the requirement.

Step 4. Compare with a smaller sample: with n = 25, \text{SE} = 5/\sqrt{25} = 1.0\,\Omega — twice the target. With n = 400, \text{SE} = 5/\sqrt{400} = 0.25\,\Omega — half the target but four times the sample size.

Result: The inspector must sample at least 100 resistors.

Standard error decreasing as sample size increases A curve showing standard error plotted against sample size. The curve drops steeply at first and then flattens out. Points are marked at n equals 25 (SE equals 1.0), n equals 100 (SE equals 0.5), and n equals 400 (SE equals 0.25). n (sample size) SE n = 25, SE = 1.0 n = 100, SE = 0.5 n = 400, SE = 0.25 0.5
The standard error $\text{SE} = 5/\sqrt{n}$ plotted against $n$. The curve drops steeply at first — going from $n = 1$ to $n = 25$ cuts the error by 80% — but then flattens. To halve the error from 0.5 to 0.25, you need to quadruple the sample from 100 to 400. Precision comes cheap at first, then gets expensive.

The picture reveals a law of diminishing returns. The first 100 resistors buy you a lot of precision. The next 300 buy you only a little more. This is the \sqrt{n} at work: precision improves as the square root of the sample size, not linearly.

Common confusions

Going deeper

If you came here to understand what a sample is, why it needs to be random, and how sample means behave — you have it. You can stop here. The rest of this section is for readers who want the mathematical statement of the Central Limit Theorem and a closer look at what "approximately normal" really means.

The Central Limit Theorem, stated precisely

The Central Limit Theorem is one of those results where the statement itself is the surprise.

Let x_1, x_2, \ldots, x_n be independent draws from any distribution with mean \mu and finite variance \sigma^2. Define the standardised sample mean:

Z_n = \frac{\bar{x} - \mu}{\sigma / \sqrt{n}}

Then as n \to \infty:

P(Z_n \leq z) \to \Phi(z)

where \Phi(z) is the cumulative distribution function of the standard normal distribution.

Read what this says. You start with any distribution — it could be skewed, bimodal, discrete, continuous, anything — as long as it has a finite mean and variance. You take the average of n independent draws. The distribution of that average, properly standardised, converges to the standard normal. The original distribution does not matter. Only the mean and variance survive in the limit.

This is why the bell curve appears everywhere in science. Whenever a measurement is the average (or sum) of many small, independent contributions — measurement errors, molecular velocities, exam scores summed over many questions — the Central Limit Theorem predicts a normal distribution. The bell curve is not an assumption; it is a consequence of averaging.

Finite population correction

The standard error formula \text{SE} = \sigma / \sqrt{n} assumes sampling with replacement (or, equivalently, that the population is infinite relative to the sample). When the sample is a significant fraction of the population, you can apply a finite population correction:

\text{SE}_{\text{corrected}} = \frac{\sigma}{\sqrt{n}} \cdot \sqrt{\frac{N - n}{N - 1}}

The factor \sqrt{(N - n)/(N - 1)} is less than 1, so the corrected standard error is smaller than \sigma/\sqrt{n}. This makes intuitive sense: if your sample covers 90% of the population, you know a lot more than the basic formula suggests. In practice, if n/N < 0.05 (the sample is less than 5% of the population), the correction is negligible and you can ignore it.

Stratified sampling and why it can beat SRS

Simple random sampling is the baseline, but it is not always the best. Suppose you want to estimate the average income of a state that has both large cities and small villages. In a simple random sample, you might — by chance — oversample cities or oversample villages. Stratified sampling avoids this by dividing the population into strata (urban vs. rural, or by district), sampling from each stratum separately, and then combining the results. The combined estimate is guaranteed to represent each stratum in the right proportion, and its standard error is often smaller than that of a simple random sample of the same total size.

The mathematics of stratified sampling uses the same variance-of-sums machinery you saw in the standard error derivation, applied stratum by stratum. It is a natural extension, not a different theory.

Where this leads next

You now know what a sample is, what makes a sample reliable, and how sample means distribute themselves. The next set of ideas uses the sampling distribution as a foundation: