In short

Range is the gap between the largest and smallest data values — quick but easily distorted by one outlier. Mean deviation is the average absolute distance from the mean, a more stable measure. Variance is the average squared distance from the mean, with a clean algebraic identity \sigma^2 = \overline{x^2} - \bar{x}^2 that makes it computable. Standard deviation is the square root of the variance and lives on the same scale as the original data, which is why it is the dispersion measure you see most often in practice.

Two students, Anaya and Rohan, each take five mathematics quizzes. Anaya scores 78, 79, 80, 81, 82. Rohan scores 60, 70, 80, 90, 100. Both students have the same mean score: 80. So by the mean alone, they are tied.

But they are not the same student in any meaningful sense. Anaya is remarkably consistent — her quizzes hug her average closely and never stray far. Rohan is all over the place — he sometimes crushes the quiz and sometimes flops. If you were choosing someone to rely on for a steady performance, you would pick Anaya. If you were looking for the student capable of an exceptional score on a critical day, Rohan might be your answer.

The mean on its own cannot distinguish these two students. What you need is a second number — a single number summarising how spread out the data is around the mean. That number is called a measure of dispersion, and there are several of them. This article walks through the four classical ones, explains when to use each, and derives the formula for variance — the one that eventually becomes the workhorse of all of statistics.

Range: the simplest thing that works

The crudest way to describe how spread out a data set is: take the largest value, subtract the smallest, and report the gap.

Range

For a data set x_1, x_2, \ldots, x_n, the range is

R = x_{\max} - x_{\min}.

For Anaya, the range is 82 - 78 = 4. For Rohan, the range is 100 - 60 = 40. The range gives you the first glimpse of how different the two students are: Rohan's scores span ten times as much territory as Anaya's.

The range is easy to compute and easy to explain, but it has one big flaw: it only looks at two points — the extremes — and ignores everything in between. A data set with one wildly unusual value and a hundred normal ones will have a huge range, even though "most" of the data is tightly clustered. The range is sensitive to outliers in a way that badly misrepresents the typical spread.

That is the recurring problem with range. What you really want is a measure that takes every data point into account.

The range is just the distance between extremesA number line showing Rohan's five scores from 60 to 100 as dots, with an arrow spanning from the smallest to the largest and labelled "range equals 40".60708090100range = 40
The range only sees the two extreme values. The three middle scores — $70$, $80$, $90$ — contribute nothing to the range, even though they make up most of the data. This is why the range is a weak measure: it ignores everything except the outermost points.

Mean deviation: averaging the distances

Here is the next idea. Compute each data point's distance from the mean. Add up all those distances. Divide by the number of points. The result is the average distance from the mean — the mean deviation.

Mean deviation

For a data set x_1, x_2, \ldots, x_n with mean \bar{x}, the mean deviation about the mean is

\text{MD} \;=\; \frac{1}{n} \sum_{i=1}^{n} |x_i - \bar{x}|.

The absolute value is crucial. Without it, the deviations would cancel each other out exactly, giving zero every time. This is not a hand-wave — it is a theorem.

Claim: For any data set, \sum_{i=1}^{n} (x_i - \bar{x}) = 0.

Proof. Expanding,

\sum_{i=1}^{n} (x_i - \bar{x}) = \sum_{i=1}^{n} x_i - \sum_{i=1}^{n} \bar{x} = n\bar{x} - n\bar{x} = 0,

because \sum x_i = n\bar{x} by the definition of the mean, and \bar{x} is constant so \sum \bar{x} = n\bar{x}. Done.

So the signed deviations always sum to zero. To get a useful "average distance," you must either take absolute values or squares. Taking absolute values gives the mean deviation; squaring gives the variance (coming up next).

For Anaya: the deviations from the mean 80 are -2, -1, 0, 1, 2. Their absolute values are 2, 1, 0, 1, 2. The mean deviation is \tfrac{2+1+0+1+2}{5} = \tfrac{6}{5} = 1.2.

For Rohan: the deviations are -20, -10, 0, 10, 20. Their absolute values are 20, 10, 0, 10, 20. The mean deviation is \tfrac{60}{5} = 12.

Rohan's mean deviation is ten times Anaya's — a much sharper separation than you would guess from the range alone, and one that reflects every quiz rather than just the extremes.

Deviations from the mean for Anaya and RohanTwo number lines showing the five quiz scores for Anaya and Rohan, with vertical ticks at each score and a central mean at 80. Anaya's ticks are tightly clustered around 80 from 78 to 82, and Rohan's ticks are spread out from 60 to 100.Anayamean = 807882Rohanmean = 8060100
Both students have the same mean ($80$) but very different spreads. Anaya's scores cluster tightly between $78$ and $82$; Rohan's scores range from $60$ to $100$. The mean deviation quantifies this spread: $1.2$ for Anaya and $12$ for Rohan.

Mean deviation is honest and interpretable: "on average, the data points are this far from the mean." So why isn't it the dispersion measure everyone uses?

The answer is that absolute values are mathematically awkward. |x| is not differentiable at x = 0, which means calculus techniques don't apply cleanly to formulas built from |x - \bar{x}|. When you try to derive properties of dispersion algebraically — how it behaves when you combine data sets, or when you take a linear transformation of the data — the absolute value gets in the way. Squaring, by contrast, is smooth and algebraic and plays well with calculus. That is why the next measure replaces absolute values with squares.

Squared deviations for Rohan's five scoresA bar chart where each bar represents the squared deviation of one of Rohan's five scores from the mean. The first and last bars, corresponding to 60 and 100, are much taller at 400 each. The bars for 70 and 90 are shorter at 100 each, and the middle bar for 80 is zero.quiz score(x − μ)²607080901004001000100400
Rohan's squared deviations from the mean of $80$. The outer points contribute $400$ each, while the intermediate points contribute only $100$ each. Squaring strongly emphasises the outliers — a data point twice as far from the mean contributes four times as much to the variance.

Variance: averaging the squared distances

Variance

For a data set x_1, x_2, \ldots, x_n with mean \bar{x}, the variance is

\sigma^2 \;=\; \frac{1}{n} \sum_{i=1}^{n} (x_i - \bar{x})^2.

The variance is the average squared distance from the mean. It solves the algebraic awkwardness of the mean deviation by replacing |x_i - \bar{x}| with (x_i - \bar{x})^2, which is always non-negative without needing an absolute value.

For Anaya: the squared deviations are 4, 1, 0, 1, 4. Sum is 10. Divide by 5 to get \sigma^2 = 2.

For Rohan: the squared deviations are 400, 100, 0, 100, 400. Sum is 1000. Divide by 5 to get \sigma^2 = 200.

There is one thing that feels strange about the variance: its units are the square of the data's units. If the data is measured in marks, the variance is in marks-squared. That is conceptually weird — marks-squared is not a quantity anyone has intuition about. This is where the fourth measure — the standard deviation — comes in: it is just the square root of the variance, which brings the units back to the original scale.

But before taking the square root, it is worth deriving a cleaner formula for the variance itself.

Deriving the computational formula

The variance formula above is the definition, but it has a practical problem: to use it you first compute \bar{x}, then subtract \bar{x} from each data point, then square each difference, then sum. That's a lot of work, and it also loses precision if done in finite-precision arithmetic. There is a much cleaner form, and deriving it is a good algebra exercise.

Start with the definition:

\sigma^2 \;=\; \frac{1}{n} \sum_{i=1}^{n} (x_i - \bar{x})^2.

Expand the squared binomial:

(x_i - \bar{x})^2 \;=\; x_i^2 - 2 x_i \bar{x} + \bar{x}^2.

Plug back in and split the sum:

\sigma^2 \;=\; \frac{1}{n} \sum_{i=1}^{n} \left( x_i^2 - 2 x_i \bar{x} + \bar{x}^2 \right) \;=\; \frac{1}{n} \sum_{i=1}^{n} x_i^2 \;-\; \frac{2\bar{x}}{n} \sum_{i=1}^{n} x_i \;+\; \frac{\bar{x}^2}{n} \sum_{i=1}^{n} 1.

Now simplify each piece:

Putting it together:

\sigma^2 \;=\; \overline{x^2} - 2\bar{x}^2 + \bar{x}^2 \;=\; \overline{x^2} - \bar{x}^2.

A clean identity.

Computational formula for variance

\sigma^2 \;=\; \overline{x^2} - (\bar{x})^2 \;=\; \frac{1}{n}\sum_{i=1}^{n} x_i^2 \;-\; \left(\frac{1}{n}\sum_{i=1}^{n} x_i\right)^2.

In words: the variance is the mean of the squares minus the square of the mean.

This one-liner is the identity you use in practice. To compute variance, you only need two sums: \sum x_i and \sum x_i^2. You never have to visit \bar{x} twice, never have to subtract it from each data point, and you can compute it in one pass through the data. It is also the algebraic bridge that makes many variance proofs in probability theory easy (you will see this again in the expectation-and-variance article for random variables, where the analogous identity is \text{Var}(X) = E[X^2] - (E[X])^2).

Try it on Anaya's data. \sum x_i = 78 + 79 + 80 + 81 + 82 = 400, so \bar{x} = 80. \sum x_i^2 = 6084 + 6241 + 6400 + 6561 + 6724 = 32010, so \overline{x^2} = \tfrac{32010}{5} = 6402. Then \sigma^2 = 6402 - 6400 = 2. Matches the direct computation.

The variance identity visualisedA schematic showing two rectangles. The larger rectangle is the mean of the squares. A smaller rectangle labelled "square of the mean" is subtracted from it, and the remaining area is labelled "variance".mean ofsquaresx² averagedsquare ofthe mean(mean)²=σ²
The computational identity $\sigma^2 = \overline{x^2} - \bar{x}^2$ in words: subtract the square of the mean from the mean of the squares to get the variance. You never need to compute deviations explicitly.

Standard deviation

Variance has the wrong units. The remedy is the simplest possible one: take a square root.

Standard deviation

The standard deviation of a data set is

\sigma \;=\; \sqrt{\sigma^2} \;=\; \sqrt{\frac{1}{n} \sum_{i=1}^{n} (x_i - \bar{x})^2}.

Standard deviation lives on the same scale as the data itself. If the data is in marks, so is \sigma. If the data is in metres, so is \sigma. This is the property that makes standard deviation the dispersion measure of choice when reporting results — you can say things like "the average exam score was 62, with a standard deviation of 10," and both numbers are directly comparable.

For Anaya: \sigma = \sqrt{2} \approx 1.41. For Rohan: \sigma = \sqrt{200} \approx 14.14.

Rohan's standard deviation is ten times Anaya's. Because standard deviation scales linearly with the spread of the data, these two numbers are directly comparable in a way that \sigma^2 = 2 versus \sigma^2 = 200 is not.

One nice property of standard deviation that does not hold for variance: if you multiply every data point by a constant c, the standard deviation also multiplies by |c|. (The variance multiplies by c^2.) And if you add a constant c to every data point, the standard deviation is unchanged — shifting all data doesn't change how spread out it is. These two facts, together, say that standard deviation is a proper measure of spread, not a measure of location.

Two worked examples

Example 1: weekly temperatures

The maximum daily temperatures (in degrees Celsius) recorded in a city for one week are 28, 30, 32, 29, 31, 27, 33. Compute the range, the mean deviation, the variance, and the standard deviation.

Step 1. Compute the mean.

\bar{x} = \frac{28 + 30 + 32 + 29 + 31 + 27 + 33}{7} = \frac{210}{7} = 30.

Why: every dispersion measure except the range is defined relative to the mean, so you need \bar{x} first.

Step 2. Compute the range.

R = x_{\max} - x_{\min} = 33 - 27 = 6.

Why: the range needs only the two extremes — the fastest measurement, with the least information.

Step 3. Compute the mean deviation. The signed deviations from the mean are -2, 0, 2, -1, 1, -3, 3. Their absolute values are 2, 0, 2, 1, 1, 3, 3.

\text{MD} = \frac{2 + 0 + 2 + 1 + 1 + 3 + 3}{7} = \frac{12}{7} \approx 1.71.

Step 4. Compute the variance using the computational formula. First, \sum x_i^2:

28^2 + 30^2 + 32^2 + 29^2 + 31^2 + 27^2 + 33^2 = 784 + 900 + 1024 + 841 + 961 + 729 + 1089 = 6328.

So \overline{x^2} = \tfrac{6328}{7} \approx 904.00. Then

\sigma^2 = \overline{x^2} - \bar{x}^2 = 904 - 900 = 4.

Why: the mean of the squares minus the square of the mean — exactly the one-liner formula.

Step 5. Take the square root to get the standard deviation.

\sigma = \sqrt{4} = 2 \text{ °C}.

Result: Range = 6 °C, mean deviation \approx 1.71 °C, variance = 4 °C^2, standard deviation = 2 °C.

Temperature data with mean and one-sigma bandA number line showing the seven temperature values from 27 to 33 as dots, with a vertical line at the mean of 30 and a shaded band from 28 to 32 representing one standard deviation on each side of the mean.mean = 302729313233μ − σμ + σ
The seven temperature values plotted on a number line. The shaded band from $\mu - \sigma = 28$ to $\mu + \sigma = 32$ contains most of the data points — exactly five of seven — showing how the standard deviation captures typical spread.

Example 2: comparing two bowlers

Two cricket bowlers each record the following five economy rates (runs given per over) across five matches.

Bowler P: 4, 5, 6, 7, 8. Bowler Q: 2, 3, 6, 9, 10.

Who is more consistent?

Step 1. Compute the means.

Bowler P: \bar{x}_P = \tfrac{4+5+6+7+8}{5} = \tfrac{30}{5} = 6. Bowler Q: \bar{x}_Q = \tfrac{2+3+6+9+10}{5} = \tfrac{30}{5} = 6.

Both have the same mean — the same typical economy rate.

Why: with identical means, consistency is the only thing that can distinguish them, so the dispersion measure is decisive.

Step 2. Compute the variance of P using the one-liner.

\sum x^2 = 16 + 25 + 36 + 49 + 64 = 190. So \overline{x^2}_P = \tfrac{190}{5} = 38, and \sigma_P^2 = 38 - 36 = 2. Then \sigma_P = \sqrt{2} \approx 1.41.

Step 3. Compute the variance of Q.

\sum x^2 = 4 + 9 + 36 + 81 + 100 = 230. So \overline{x^2}_Q = \tfrac{230}{5} = 46, and \sigma_Q^2 = 46 - 36 = 10. Then \sigma_Q = \sqrt{10} \approx 3.16.

Step 4. Compare.

Bowler P has \sigma = 1.41; bowler Q has \sigma = 3.16. Q's standard deviation is about 2.2 times P's. Both bowlers average six runs per over, but P's performance hugs that average much more tightly.

Why: a lower standard deviation at the same mean means the data is clustered closer to the centre, and for performance comparisons this is the operational definition of "more consistent."

Result: Bowler P is more consistent, with \sigma_P \approx 1.41 versus \sigma_Q \approx 3.16.

Comparing the spread of two bowlers' economy ratesTwo number lines showing the five economy rates for bowler P and bowler Q. Bowler P's points are tightly clustered from 4 to 8 around the mean of 6. Bowler Q's points are spread from 2 to 10 around the same mean.Pμ = 648Qμ = 6210
Two bowlers with the same mean economy rate but different spreads. $P$'s data hugs the mean at $\sigma = 1.41$; $Q$'s data spreads out to $\sigma = 3.16$. Standard deviation is what separates them.

Common confusions

Going deeper

If you can compute the four measures and explain when to use each, you have the working toolkit. The rest is for readers who want to see dispersion for grouped data, the coefficient of variation, and the link between dispersion of data and variance of a random variable.

Variance for grouped data

When the data comes in a frequency table — value x_i occurring with frequency f_i, where \sum f_i = n — the variance formula generalises in the obvious way:

\sigma^2 \;=\; \frac{1}{n} \sum_{i} f_i (x_i - \bar{x})^2 \;=\; \frac{1}{n} \sum_i f_i x_i^2 - \bar{x}^2.

The same computational identity holds: mean of the squares minus square of the mean. For a grouped distribution with class midpoints, you use the midpoints as the x_i and the class frequencies as the f_i, and the formulas are identical.

The coefficient of variation

Standard deviation has units. That means you cannot compare the standard deviations of two data sets unless they are in the same units — and even within the same units, you cannot always tell whether a given spread is "large" or "small" without knowing the mean to compare it against. A standard deviation of 2 is enormous if the mean is 3 but negligible if the mean is 300.

The coefficient of variation (CV) addresses this by dividing the standard deviation by the mean:

\text{CV} \;=\; \frac{\sigma}{\bar{x}}.

The CV is dimensionless, often expressed as a percentage. It lets you compare dispersion across data sets with different units or different scales. For the two bowlers, both have \bar{x} = 6, so the CV is just \sigma / 6: 0.235 for P and 0.527 for Q. For the temperature example, the CV is \tfrac{2}{30} \approx 0.067 — the temperature is remarkably stable relative to its mean.

From data dispersion to random-variable variance

The measures in this article are defined on actual data — specific numbers you can list. Probability theory has exact analogues for random variables. For a discrete random variable X with mean \mu = E[X], the variance is

\text{Var}(X) \;=\; E[(X - \mu)^2] \;=\; E[X^2] - \mu^2.

This is the same identity you derived for data, reshaped from a sum \frac{1}{n}\sum to an expectation E[\cdot]. Every fact you proved for data dispersion has a probability counterpart — and conversely, the variance formulas you see in probability problems are just the random-variable versions of the algebra you did in this article.

The argmin characterisation of the mean

There is one more reason the mean and the variance are intimately linked. Ask: what constant c minimises the average squared distance from the data, \frac{1}{n} \sum (x_i - c)^2? Take the derivative with respect to c:

\frac{d}{dc} \frac{1}{n} \sum (x_i - c)^2 = \frac{1}{n} \sum -2(x_i - c) = -2\left(\frac{1}{n}\sum x_i - c\right) = -2(\bar{x} - c).

Setting this to zero gives c = \bar{x}. So the mean is the unique constant that minimises the sum of squared distances — and the minimum value is exactly the variance. That is the deep reason variance is built around the mean and not around some other centre: the pairing is optimal.

Interestingly, if you instead minimise the sum of absolute distances, the optimal c is the median, not the mean. That is the duality between variance-with-mean and mean-deviation-with-median, and it is why the median is sometimes a more natural centre when your dispersion measure uses absolute values.

Where this leads next

Measures of dispersion are foundational to almost every statistical tool that comes later. The next articles take the numbers you can now compute and build interpretation around them.