Measures of Dispersion

In short

Range is the gap between the largest and smallest data values — quick but easily distorted by one outlier. Mean deviation is the average absolute distance from the mean, a more stable measure. Variance is the average squared distance from the mean, with a clean algebraic identity \sigma^2 = \overline{x^2} - \bar{x}^2 that makes it computable. Standard deviation is the square root of the variance and lives on the same scale as the original data, which is why it is the dispersion measure you see most often in practice.

Two students, Anaya and Rohan, each take five mathematics quizzes. Anaya scores 78, 79, 80, 81, 82. Rohan scores 60, 70, 80, 90, 100. Both students have the same mean score: 80. So by the mean alone, they are tied.

But they are not the same student in any meaningful sense. Anaya is remarkably consistent — her quizzes hug her average closely and never stray far. Rohan is all over the place — he sometimes crushes the quiz and sometimes flops. If you were choosing someone to rely on for a steady performance, you would pick Anaya. If you were looking for the student capable of an exceptional score on a critical day, Rohan might be your answer.

The mean on its own cannot distinguish these two students. What you need is a second number — a single number summarising how spread out the data is around the mean. That number is called a measure of dispersion, and there are several of them. This article walks through the four classical ones, explains when to use each, and derives the formula for variance — the one that eventually becomes the workhorse of all of statistics.

Range: the simplest thing that works

The crudest way to describe how spread out a data set is: take the largest value, subtract the smallest, and report the gap.

Range

For a data set x_1, x_2, \ldots, x_n, the range is

R = x_{\max} - x_{\min}.

For Anaya, the range is 82 - 78 = 4. For Rohan, the range is 100 - 60 = 40. The range gives you the first glimpse of how different the two students are: Rohan's scores span ten times as much territory as Anaya's.

The range is easy to compute and easy to explain, but it has one big flaw: it only looks at two points — the extremes — and ignores everything in between. A data set with one wildly unusual value and a hundred normal ones will have a huge range, even though "most" of the data is tightly clustered. The range is sensitive to outliers in a way that badly misrepresents the typical spread.

That is the recurring problem with range. What you really want is a measure that takes every data point into account.

The range only sees the two extreme values. The three middle scores — $70$, $80$, $90$ — contribute nothing to the range, even though they make up most of the data. This is why the range is a weak measure: it ignores everything except the outermost points.

Mean deviation: averaging the distances

Here is the next idea. Compute each data point's distance from the mean. Add up all those distances. Divide by the number of points. The result is the average distance from the mean — the mean deviation.

Mean deviation

For a data set x_1, x_2, \ldots, x_n with mean \bar{x}, the mean deviation about the mean is

\text{MD} \;=\; \frac{1}{n} \sum_{i=1}^{n} |x_i - \bar{x}|.

The absolute value is crucial. Without it, the deviations would cancel each other out exactly, giving zero every time. This is not a hand-wave — it is a theorem.

Claim: For any data set, \sum_{i=1}^{n} (x_i - \bar{x}) = 0.

Proof. Expanding,

\sum_{i=1}^{n} (x_i - \bar{x}) = \sum_{i=1}^{n} x_i - \sum_{i=1}^{n} \bar{x} = n\bar{x} - n\bar{x} = 0,

because \sum x_i = n\bar{x} by the definition of the mean, and \bar{x} is constant so \sum \bar{x} = n\bar{x}. Done.

So the signed deviations always sum to zero. To get a useful "average distance," you must either take absolute values or squares. Taking absolute values gives the mean deviation; squaring gives the variance (coming up next).

For Anaya: the deviations from the mean 80 are -2, -1, 0, 1, 2. Their absolute values are 2, 1, 0, 1, 2. The mean deviation is \tfrac{2+1+0+1+2}{5} = \tfrac{6}{5} = 1.2.

For Rohan: the deviations are -20, -10, 0, 10, 20. Their absolute values are 20, 10, 0, 10, 20. The mean deviation is \tfrac{60}{5} = 12.

Rohan's mean deviation is ten times Anaya's — a much sharper separation than you would guess from the range alone, and one that reflects every quiz rather than just the extremes.

Both students have the same mean ($80$) but very different spreads. Anaya's scores cluster tightly between $78$ and $82$; Rohan's scores range from $60$ to $100$. The mean deviation quantifies this spread: $1.2$ for Anaya and $12$ for Rohan.

Mean deviation is honest and interpretable: "on average, the data points are this far from the mean." So why isn't it the dispersion measure everyone uses?

The answer is that absolute values are mathematically awkward. |x| is not differentiable at x = 0, which means calculus techniques don't apply cleanly to formulas built from |x - \bar{x}|. When you try to derive properties of dispersion algebraically — how it behaves when you combine data sets, or when you take a linear transformation of the data — the absolute value gets in the way. Squaring, by contrast, is smooth and algebraic and plays well with calculus. That is why the next measure replaces absolute values with squares.

Rohan's squared deviations from the mean of $80$. The outer points contribute $400$ each, while the intermediate points contribute only $100$ each. Squaring strongly emphasises the outliers — a data point twice as far from the mean contributes four times as much to the variance.

Variance: averaging the squared distances

Variance

For a data set x_1, x_2, \ldots, x_n with mean \bar{x}, the variance is

\sigma^2 \;=\; \frac{1}{n} \sum_{i=1}^{n} (x_i - \bar{x})^2.

The variance is the average squared distance from the mean. It solves the algebraic awkwardness of the mean deviation by replacing |x_i - \bar{x}| with (x_i - \bar{x})^2, which is always non-negative without needing an absolute value.

For Anaya: the squared deviations are 4, 1, 0, 1, 4. Sum is 10. Divide by 5 to get \sigma^2 = 2.

For Rohan: the squared deviations are 400, 100, 0, 100, 400. Sum is 1000. Divide by 5 to get \sigma^2 = 200.

There is one thing that feels strange about the variance: its units are the square of the data's units. If the data is measured in marks, the variance is in marks-squared. That is conceptually weird — marks-squared is not a quantity anyone has intuition about. This is where the fourth measure — the standard deviation — comes in: it is just the square root of the variance, which brings the units back to the original scale.

But before taking the square root, it is worth deriving a cleaner formula for the variance itself.

Deriving the computational formula

The variance formula above is the definition, but it has a practical problem: to use it you first compute \bar{x}, then subtract \bar{x} from each data point, then square each difference, then sum. That's a lot of work, and it also loses precision if done in finite-precision arithmetic. There is a much cleaner form, and deriving it is a good algebra exercise.

Start with the definition:

\sigma^2 \;=\; \frac{1}{n} \sum_{i=1}^{n} (x_i - \bar{x})^2.

Expand the squared binomial:

(x_i - \bar{x})^2 \;=\; x_i^2 - 2 x_i \bar{x} + \bar{x}^2.

Plug back in and split the sum:

\sigma^2 \;=\; \frac{1}{n} \sum_{i=1}^{n} \left( x_i^2 - 2 x_i \bar{x} + \bar{x}^2 \right) \;=\; \frac{1}{n} \sum_{i=1}^{n} x_i^2 \;-\; \frac{2\bar{x}}{n} \sum_{i=1}^{n} x_i \;+\; \frac{\bar{x}^2}{n} \sum_{i=1}^{n} 1.

Now simplify each piece:

\dfrac{1}{n} \sum x_i^2 is the mean of the squares — call it \overline{x^2}.
\sum x_i = n\bar{x} by the definition of the mean, so \dfrac{2\bar{x}}{n} \sum x_i = \dfrac{2\bar{x}}{n} \cdot n\bar{x} = 2\bar{x}^2.
\sum 1 = n, so \dfrac{\bar{x}^2}{n} \sum 1 = \bar{x}^2.

Putting it together:

\sigma^2 \;=\; \overline{x^2} - 2\bar{x}^2 + \bar{x}^2 \;=\; \overline{x^2} - \bar{x}^2.

A clean identity.

Computational formula for variance

\sigma^2 \;=\; \overline{x^2} - (\bar{x})^2 \;=\; \frac{1}{n}\sum_{i=1}^{n} x_i^2 \;-\; \left(\frac{1}{n}\sum_{i=1}^{n} x_i\right)^2.

In words: the variance is the mean of the squares minus the square of the mean.

This one-liner is the identity you use in practice. To compute variance, you only need two sums: \sum x_i and \sum x_i^2. You never have to visit \bar{x} twice, never have to subtract it from each data point, and you can compute it in one pass through the data. It is also the algebraic bridge that makes many variance proofs in probability theory easy (you will see this again in the expectation-and-variance article for random variables, where the analogous identity is \text{Var}(X) = E[X^2] - (E[X])^2).

Try it on Anaya's data. \sum x_i = 78 + 79 + 80 + 81 + 82 = 400, so \bar{x} = 80. \sum x_i^2 = 6084 + 6241 + 6400 + 6561 + 6724 = 32010, so \overline{x^2} = \tfrac{32010}{5} = 6402. Then \sigma^2 = 6402 - 6400 = 2. Matches the direct computation.

The computational identity $\sigma^2 = \overline{x^2} - \bar{x}^2$ in words: subtract the square of the mean from the mean of the squares to get the variance. You never need to compute deviations explicitly.

Standard deviation

Variance has the wrong units. The remedy is the simplest possible one: take a square root.

Standard deviation

The standard deviation of a data set is

\sigma \;=\; \sqrt{\sigma^2} \;=\; \sqrt{\frac{1}{n} \sum_{i=1}^{n} (x_i - \bar{x})^2}.

Standard deviation lives on the same scale as the data itself. If the data is in marks, so is \sigma. If the data is in metres, so is \sigma. This is the property that makes standard deviation the dispersion measure of choice when reporting results — you can say things like "the average exam score was 62, with a standard deviation of 10," and both numbers are directly comparable.

For Anaya: \sigma = \sqrt{2} \approx 1.41. For Rohan: \sigma = \sqrt{200} \approx 14.14.

Rohan's standard deviation is ten times Anaya's. Because standard deviation scales linearly with the spread of the data, these two numbers are directly comparable in a way that \sigma^2 = 2 versus \sigma^2 = 200 is not.

One nice property of standard deviation that does not hold for variance: if you multiply every data point by a constant c, the standard deviation also multiplies by |c|. (The variance multiplies by c^2.) And if you add a constant c to every data point, the standard deviation is unchanged — shifting all data doesn't change how spread out it is. These two facts, together, say that standard deviation is a proper measure of spread, not a measure of location.

Two worked examples

Example 1: weekly temperatures

The maximum daily temperatures (in degrees Celsius) recorded in a city for one week are 28, 30, 32, 29, 31, 27, 33. Compute the range, the mean deviation, the variance, and the standard deviation.

Step 1. Compute the mean.

\bar{x} = \frac{28 + 30 + 32 + 29 + 31 + 27 + 33}{7} = \frac{210}{7} = 30.

Why: every dispersion measure except the range is defined relative to the mean, so you need \bar{x} first.

Step 2. Compute the range.

R = x_{\max} - x_{\min} = 33 - 27 = 6.

Why: the range needs only the two extremes — the fastest measurement, with the least information.

Step 3. Compute the mean deviation. The signed deviations from the mean are -2, 0, 2, -1, 1, -3, 3. Their absolute values are 2, 0, 2, 1, 1, 3, 3.

\text{MD} = \frac{2 + 0 + 2 + 1 + 1 + 3 + 3}{7} = \frac{12}{7} \approx 1.71.

Step 4. Compute the variance using the computational formula. First, \sum x_i^2:

28^2 + 30^2 + 32^2 + 29^2 + 31^2 + 27^2 + 33^2 = 784 + 900 + 1024 + 841 + 961 + 729 + 1089 = 6328.

So \overline{x^2} = \tfrac{6328}{7} \approx 904.00. Then

\sigma^2 = \overline{x^2} - \bar{x}^2 = 904 - 900 = 4.

Why: the mean of the squares minus the square of the mean — exactly the one-liner formula.

Step 5. Take the square root to get the standard deviation.

\sigma = \sqrt{4} = 2 \text{ °C}.

Result: Range = 6 °C, mean deviation \approx 1.71 °C, variance = 4 °C^2, standard deviation = 2 °C.

The seven temperature values plotted on a number line. The shaded band from $\mu - \sigma = 28$ to $\mu + \sigma = 32$ contains most of the data points — exactly five of seven — showing how the standard deviation captures typical spread.

Example 2: comparing two bowlers

Two cricket bowlers each record the following five economy rates (runs given per over) across five matches.

Bowler P: 4, 5, 6, 7, 8. Bowler Q: 2, 3, 6, 9, 10.

Who is more consistent?

Step 1. Compute the means.

Bowler P: \bar{x}_P = \tfrac{4+5+6+7+8}{5} = \tfrac{30}{5} = 6. Bowler Q: \bar{x}_Q = \tfrac{2+3+6+9+10}{5} = \tfrac{30}{5} = 6.

Both have the same mean — the same typical economy rate.

Why: with identical means, consistency is the only thing that can distinguish them, so the dispersion measure is decisive.

Step 2. Compute the variance of P using the one-liner.

\sum x^2 = 16 + 25 + 36 + 49 + 64 = 190. So \overline{x^2}_P = \tfrac{190}{5} = 38, and \sigma_P^2 = 38 - 36 = 2. Then \sigma_P = \sqrt{2} \approx 1.41.

Step 3. Compute the variance of Q.

\sum x^2 = 4 + 9 + 36 + 81 + 100 = 230. So \overline{x^2}_Q = \tfrac{230}{5} = 46, and \sigma_Q^2 = 46 - 36 = 10. Then \sigma_Q = \sqrt{10} \approx 3.16.

Step 4. Compare.

Bowler P has \sigma = 1.41; bowler Q has \sigma = 3.16. Q's standard deviation is about 2.2 times P's. Both bowlers average six runs per over, but P's performance hugs that average much more tightly.

Why: a lower standard deviation at the same mean means the data is clustered closer to the centre, and for performance comparisons this is the operational definition of "more consistent."

Result: Bowler P is more consistent, with \sigma_P \approx 1.41 versus \sigma_Q \approx 3.16.

Two bowlers with the same mean economy rate but different spreads. $P$'s data hugs the mean at $\sigma = 1.41$; $Q$'s data spreads out to $\sigma = 3.16$. Standard deviation is what separates them.

Common confusions

"Variance and standard deviation are the same thing." They are related but not interchangeable. The variance is in squared units; the standard deviation is in original units. You usually report the standard deviation because it is interpretable; you usually work with the variance in proofs because it obeys cleaner algebra.
"Mean deviation and standard deviation are always equal." They are not equal in general. For most data sets the mean deviation is smaller than the standard deviation, because squaring amplifies large deviations more than small ones. For the normal distribution, the ratio is \sqrt{2/\pi} \approx 0.798: mean deviation is about 80\% of standard deviation.
"If the mean deviation of the unsigned differences is zero, the data is constant." Correct. If every |x_i - \bar{x}| = 0, then every x_i = \bar{x}. But you cannot make the same claim about the signed differences — those always sum to zero regardless.
"Dividing by n versus n - 1." Some textbooks and calculators use n - 1 in the denominator of the sample variance. This is the "unbiased sample variance," used when the data is a sample from a larger population and you are trying to estimate the population variance. When the data is the whole population, you divide by n. For class 11 and 12 purposes, dividing by n is the convention used here; the n - 1 version is a refinement you will meet in statistics courses.
"Standard deviation is meaningful only for normally distributed data." The standard deviation is well-defined for any data set, normal or not. Its interpretation as "distance containing about 68\% of the data" is special to the normal distribution, but the number itself is a legitimate measure of spread for any distribution.
"Range is useless because it only looks at two points." It is crude, but it is not useless. When the data is small and the extremes are reliable, range gives a quick summary. For large or outlier-prone data, the range's sensitivity to extremes is its weakness.

Going deeper

If you can compute the four measures and explain when to use each, you have the working toolkit. The rest is for readers who want to see dispersion for grouped data, the coefficient of variation, and the link between dispersion of data and variance of a random variable.

Variance for grouped data

When the data comes in a frequency table — value x_i occurring with frequency f_i, where \sum f_i = n — the variance formula generalises in the obvious way:

\sigma^2 \;=\; \frac{1}{n} \sum_{i} f_i (x_i - \bar{x})^2 \;=\; \frac{1}{n} \sum_i f_i x_i^2 - \bar{x}^2.

The same computational identity holds: mean of the squares minus square of the mean. For a grouped distribution with class midpoints, you use the midpoints as the x_i and the class frequencies as the f_i, and the formulas are identical.

The coefficient of variation

Standard deviation has units. That means you cannot compare the standard deviations of two data sets unless they are in the same units — and even within the same units, you cannot always tell whether a given spread is "large" or "small" without knowing the mean to compare it against. A standard deviation of 2 is enormous if the mean is 3 but negligible if the mean is 300.

The coefficient of variation (CV) addresses this by dividing the standard deviation by the mean:

\text{CV} \;=\; \frac{\sigma}{\bar{x}}.

The CV is dimensionless, often expressed as a percentage. It lets you compare dispersion across data sets with different units or different scales. For the two bowlers, both have \bar{x} = 6, so the CV is just \sigma / 6: 0.235 for P and 0.527 for Q. For the temperature example, the CV is \tfrac{2}{30} \approx 0.067 — the temperature is remarkably stable relative to its mean.

From data dispersion to random-variable variance

The measures in this article are defined on actual data — specific numbers you can list. Probability theory has exact analogues for random variables. For a discrete random variable X with mean \mu = E[X], the variance is

\text{Var}(X) \;=\; E[(X - \mu)^2] \;=\; E[X^2] - \mu^2.

This is the same identity you derived for data, reshaped from a sum \frac{1}{n}\sum to an expectation E[\cdot]. Every fact you proved for data dispersion has a probability counterpart — and conversely, the variance formulas you see in probability problems are just the random-variable versions of the algebra you did in this article.

The argmin characterisation of the mean

There is one more reason the mean and the variance are intimately linked. Ask: what constant c minimises the average squared distance from the data, \frac{1}{n} \sum (x_i - c)^2? Take the derivative with respect to c:

\frac{d}{dc} \frac{1}{n} \sum (x_i - c)^2 = \frac{1}{n} \sum -2(x_i - c) = -2\left(\frac{1}{n}\sum x_i - c\right) = -2(\bar{x} - c).

Setting this to zero gives c = \bar{x}. So the mean is the unique constant that minimises the sum of squared distances — and the minimum value is exactly the variance. That is the deep reason variance is built around the mean and not around some other centre: the pairing is optimal.

Interestingly, if you instead minimise the sum of absolute distances, the optimal c is the median, not the mean. That is the duality between variance-with-mean and mean-deviation-with-median, and it is why the median is sometimes a more natural centre when your dispersion measure uses absolute values.

Where this leads next

Measures of dispersion are foundational to almost every statistical tool that comes later. The next articles take the numbers you can now compute and build interpretation around them.

Measures of Central Tendency — mean, median, and mode, which the dispersion measures are built around.
Expectation and Variance - Discrete — the random-variable version of mean and variance, with the same identity \sigma^2 = E[X^2] - (E[X])^2.
Normal Distribution — the continuous distribution where standard deviation tells you exactly how much of the data lies within one, two, or three \sigma of the mean.
Quartiles and Percentiles — another family of spread measures based on position in the sorted data, complementary to the ones in this article.