Measures of Central Tendency

In short

A measure of central tendency is a single number that represents the "centre" of a dataset. The three standard measures are the mean (the arithmetic average), the median (the middle value when the data is sorted), and the mode (the most frequently occurring value). Each answers a slightly different question, and the right choice depends on the shape of the data.

Suppose a company has 5 employees with the following monthly salaries (in thousands of rupees):

20,\; 22,\; 25,\; 23,\; 210

What is the "typical" salary at this company?

If you average all five numbers, you get (20 + 22 + 25 + 23 + 210)/5 = 300/5 = 60 thousand. The mean salary is ₹60,000.

But look at the actual salaries. Four of the five employees earn between ₹20,000 and ₹25,000. Only one person — perhaps the owner — earns ₹2,10,000. Nobody earns anything close to ₹60,000. The "average" is a number that describes none of the employees.

If instead you sorted the data and picked the middle value — 23 — you would get a much more representative picture of what a typical employee earns.

This is the central tension in statistics: there is no single "best" way to summarize the centre of a dataset. The mean, the median, and the mode each tell you something different. Knowing when to use each one is not a mathematical skill — it is a judgment call, and it matters more than the computation itself.

The arithmetic mean

The most familiar measure. Add up all the values and divide by how many there are.

Arithmetic mean

For n observations x_1, x_2, \ldots, x_n, the arithmetic mean is

\bar{x} = \frac{x_1 + x_2 + \cdots + x_n}{n} = \frac{1}{n}\sum_{i=1}^{n} x_i

The symbol \bar{x} is read "x-bar."

What the mean does. The mean is the balance point of the data. If you placed the data values along a number line and put a physical weight at each value, the mean is where you would place the fulcrum so the line balances perfectly. Every value contributes to the mean, pulled in proportion to how far it is from the centre.

This is both a strength and a weakness. It is a strength because the mean uses all the information in the data — no observation is ignored. It is a weakness because extreme values (outliers) pull the mean toward them disproportionately. The salary example above is a textbook case: one outlier (₹2,10,000) yanks the mean far from where most of the data sits.

Properties of the mean.

The sum of deviations from the mean is always zero: \sum (x_i - \bar{x}) = 0. This is not a coincidence — it follows directly from the definition. Expand \sum (x_i - \bar{x}) = \sum x_i - n\bar{x} = n\bar{x} - n\bar{x} = 0.
If every observation is increased by a constant k, the mean increases by k: \overline{x + k} = \bar{x} + k.
If every observation is multiplied by a constant k, the mean is multiplied by k: \overline{kx} = k\bar{x}.

These properties make the mean algebraically convenient — it behaves well under the standard arithmetic operations.

Mean from a frequency table

When data comes in a frequency table — value x_i appears f_i times — the formula adjusts:

\bar{x} = \frac{\sum f_i x_i}{\sum f_i} = \frac{f_1 x_1 + f_2 x_2 + \cdots + f_k x_k}{f_1 + f_2 + \cdots + f_k}

This is the same idea: each value is weighted by how many times it appears.

Mean from grouped data

When data is grouped into class intervals, you do not know the individual values — only that some number of observations fell in each interval. The standard approximation: replace each interval with its class mark (midpoint), then compute the weighted mean using the class marks as values.

If the class intervals are [l_i, u_i) with class marks m_i = (l_i + u_i)/2 and frequencies f_i:

\bar{x} \approx \frac{\sum f_i m_i}{\sum f_i}

This is an approximation because within each interval, the actual values might not be centred at the midpoint. But for most practical datasets, the approximation is excellent.

The weighted mean

Sometimes different observations carry different importance. A student's final grade might weight the exam at 60% and assignments at 40%. If the exam score is 72 and the assignment score is 88:

\text{weighted mean} = \frac{w_1 x_1 + w_2 x_2}{w_1 + w_2} = \frac{0.60 \times 72 + 0.40 \times 88}{0.60 + 0.40} = \frac{43.2 + 35.2}{1} = 78.4

Weighted mean

For observations x_1, x_2, \ldots, x_n with corresponding weights w_1, w_2, \ldots, w_n:

\bar{x}_w = \frac{\sum w_i x_i}{\sum w_i}

The arithmetic mean is the special case where all weights are equal.

The median

Sort the data. Pick the middle value. That is the median.

Median

For n observations arranged in ascending order:

If n is odd, the median is the value at position (n+1)/2.
If n is even, the median is the average of the values at positions n/2 and n/2 + 1.

For the salary data 20, 22, 23, 25, 210 (already sorted), n = 5 (odd), so the median is at position (5+1)/2 = 3. The third value is 23. The median salary is ₹23,000 — a much better summary of the "typical" salary than the mean of ₹60,000.

What the median does. The median splits the sorted data exactly in half: at least 50% of the observations are at or below it, and at least 50% are at or above it. The median is insensitive to outliers. You could change the ₹2,10,000 salary to ₹2,00,00,000 and the median would still be 23 — because the median only cares about which value sits in the middle position, not how far the extreme values are.

Median from grouped data. When data is grouped, you cannot pick the middle value directly. You use the median class — the class interval where the cumulative frequency first reaches or exceeds n/2 — and then interpolate within it:

\text{Median} = l + \left(\frac{n/2 - F}{f}\right) \times h

where l is the lower boundary of the median class, F is the cumulative frequency of the class before the median class, f is the frequency of the median class, and h is the class width. This formula assumes the observations are uniformly distributed within the median class.

The mode

The mode is the value that appears most often.

Mode

The mode of a dataset is the observation with the highest frequency. A dataset can have one mode (unimodal), two modes (bimodal), or more. If all values appear equally often, the dataset has no mode.

For the data 2, 3, 3, 4, 5, 5, 5, 6, 7, the mode is 5 — it appears three times, more than any other value.

The mode has a natural interpretation that the mean and median lack: it is the value you are most likely to encounter. A shoe store deciding how many pairs of each size to stock cares about the mode — the most popular size — not the mean size (which might be 8.3, a size that doesn't exist).

Mode from grouped data. For grouped data, the modal class is the class with the highest frequency. If you want a single number instead of an interval, the standard formula is:

\text{Mode} = l + \left(\frac{f_1 - f_0}{2f_1 - f_0 - f_2}\right) \times h

where l is the lower boundary of the modal class, f_1 is the frequency of the modal class, f_0 is the frequency of the class before it, f_2 is the frequency of the class after it, and h is the class width.

Choosing the right measure

This is the part that textbooks often skip, and it matters more than the formulas.

Use the mean when the data is roughly symmetric and has no extreme outliers. In this case, the mean, median, and mode are all close to each other, and the mean has the best mathematical properties (it uses all the data, it is unique, and it participates in further formulas like variance).

Use the median when the data is skewed or has outliers. Income data is the classic case: a small number of very high incomes pulls the mean up, making it unrepresentative. The median income is almost always a better summary of "what a typical person earns" than the mean income.

Use the mode when the data is categorical (favourite colour, most common shirt size) or when you want to know the most popular value. The mode is the only measure of central tendency that works for qualitative data — you cannot compute the mean of "red, blue, blue, green."

The empirical relationship. For moderately skewed distributions, there is a rough relationship:

\text{Mode} \approx 3 \times \text{Median} - 2 \times \text{Mean}

This is an approximation, not an exact formula, but it gives you a quick sanity check: if you know two of the three measures, you can estimate the third.

Worked examples

Example 1: Mean, median, and mode of ungrouped data

The number of books read by 12 students during a summer break is:

3,\; 5,\; 2,\; 7,\; 5,\; 4,\; 8,\; 5,\; 6,\; 3,\; 9,\; 1

Find the mean, median, and mode.

Step 1. Compute the mean.

\bar{x} = \frac{3 + 5 + 2 + 7 + 5 + 4 + 8 + 5 + 6 + 3 + 9 + 1}{12} = \frac{58}{12} \approx 4.83

Why: add all values and divide by the count. No value is weighted more than another.

Step 2. Find the median. Sort the data first:

1,\; 2,\; 3,\; 3,\; 4,\; 5,\; 5,\; 5,\; 6,\; 7,\; 8,\; 9

There are 12 values (even), so the median is the average of the 6th and 7th values: (5 + 5)/2 = 5.

Why: with an even number of observations, the "middle" falls between two values. Averaging them is the standard convention.

Step 3. Find the mode. The value 5 appears 3 times; all other values appear once or twice. The mode is 5.

Why: the mode is simply the most frequent value. Here it is unambiguous — 5 occurs more often than any other number.

Step 4. Compare the three measures.

Measure	Value
Mean	4.83
Median	5
Mode	5

Result: Mean = 4.83, Median = 5, Mode = 5. The three measures are close, which tells you the data is roughly symmetric — no extreme outliers are dragging the mean away from the middle.

A dot plot of the books-read data with the mean (4.83) and median/mode (5) marked. The three measures cluster tightly near the centre of the distribution, confirming that the data is roughly symmetric. Each dot above the number line represents one student.

The closeness of the three measures is the signature of a symmetric distribution. When you see the mean and median diverge — as in the salary example — that is a signal of skewness, and the median becomes the more reliable summary.

Example 2: Mean from grouped data (using class marks)

The daily commute times (in minutes) for 40 office workers are grouped below.

Commute time (min)	Frequency (f_i)
10–20	4
20–30	8
30–40	14
40–50	10
50–60	4

Find the mean commute time.

Step 1. Compute the class mark m_i for each interval.

Interval	Class mark m_i	Frequency f_i
10–20	15	4
20–30	25	8
30–40	35	14
40–50	45	10
50–60	55	4

Why: the class mark is the midpoint of the interval — it is the best single-number representative of all values in that interval.

Step 2. Compute f_i \times m_i for each row.

m_i	f_i	f_i \times m_i
15	4	60
25	8	200
35	14	490
45	10	450
55	4	220
Total	40	1420

Why: each product f_i \times m_i represents the total contribution of that class to the overall sum. Fourteen workers commuting 35 minutes each contribute 14 \times 35 = 490 total minutes.

Step 3. Divide to get the mean.

\bar{x} = \frac{\sum f_i m_i}{\sum f_i} = \frac{1420}{40} = 35.5 \text{ minutes}

Why: this is just the weighted-mean formula, with frequencies as weights.

Result: The mean commute time is 35.5 minutes.

The histogram of commute times with the mean (35.5 min) marked by a dashed red line. The mean falls inside the tallest bar (30–40 min), which makes sense — the mean is being pulled toward the densest part of the data. The slight asymmetry (the right tail is a bit heavier) means the mean is pulled slightly right of the 35-minute class mark.

The mean of 35.5 minutes sits in the modal class (30–40 minutes), exactly where you would expect it for a roughly symmetric distribution. The slight rightward pull comes from the 40–50 and 50–60 groups being slightly heavier (combined frequency 14) than the 10–20 and 20–30 groups (combined frequency 12).

Common confusions

"The mean is always the best measure." It is not. The mean is the best measure when the data is symmetric and free of outliers. When data is skewed (as income data almost always is), the median is a better summary. When data is categorical, only the mode makes sense.
"The median is always one of the data values." Only when n is odd. When n is even, the median is the average of two middle values, which might not be a value that appears in the data. For the data 2, 4, 6, 8, the median is (4 + 6)/2 = 5, and 5 does not appear in the dataset.
"A dataset always has exactly one mode." Not true. The data 2, 3, 3, 5, 5, 7 is bimodal — it has two modes (3 and 5). The data 1, 2, 3, 4, 5 has no mode at all, because every value appears exactly once.
"Changing one extreme value doesn't affect the mean much." It can affect it enormously if the change is large. Replace 9 with 900 in the books data and the mean jumps from 4.83 to 79.25 — a sixteenfold increase — while the median stays at 5.
"Mean from grouped data is exact." It is an approximation. By using the class mark as a stand-in for all values in the interval, you are assuming the data is evenly spread within each class. The approximation is usually good, but it is not exact.

Going deeper

If you came here to understand the three measures of central tendency and when to use each one, you have it — you can stop here. The rest is for readers who want the mathematical foundations and the connections to more advanced ideas.

The mean minimises the sum of squared deviations

There is a deep reason the mean is so important. Among all possible summary values, the mean is the unique number that makes the sum of squared deviations as small as possible:

\sum_{i=1}^{n} (x_i - \bar{x})^2 \;\leq\; \sum_{i=1}^{n} (x_i - a)^2 \quad \text{for any real number } a

To see this, expand the right side using a = \bar{x} + (a - \bar{x}):

\sum (x_i - a)^2 = \sum (x_i - \bar{x})^2 + n(a - \bar{x})^2

The extra term n(a - \bar{x})^2 is always non-negative and equals zero only when a = \bar{x}. So moving away from the mean in any direction increases the total squared deviation. This property is why the mean is the natural starting point for measures of dispersion — variance is literally the sum of squared deviations from the mean.

The median minimises the sum of absolute deviations

Similarly, the median is the unique value that minimises the sum of absolute deviations \sum |x_i - a|. Squared deviations penalise large errors heavily; absolute deviations treat all errors equally. The mean and median are each "optimal" — but for different definitions of what "close to the data" means.

Relationship between mean, median, and mode in skewed distributions

For a unimodal distribution with moderate skew:

If the distribution is right-skewed (long tail to the right), then mode < median < mean.
If the distribution is left-skewed (long tail to the left), then mean < median < mode.
If the distribution is symmetric, all three coincide.

This ordering is not a theorem (it can fail for unusual distributions) but it holds remarkably often in practice. Income distributions are right-skewed: the mean income is pulled right by high earners, the mode is the most common income (usually lower), and the median sits between them.

The geometric and harmonic means

The arithmetic mean is not the only kind of mean. The geometric mean of positive numbers x_1, x_2, \ldots, x_n is

G = (x_1 \cdot x_2 \cdots x_n)^{1/n}

It is the right tool when quantities multiply together — for instance, compound growth rates. If an investment grows by 10% one year and 20% the next, the average annual growth rate is not (10 + 20)/2 = 15\% but rather \sqrt{1.10 \times 1.20} - 1 \approx 14.9\%.

The harmonic mean is

H = \frac{n}{\sum (1/x_i)}

It is the right tool when you are averaging rates. If you drive 60 km at 40 km/h and then 60 km at 60 km/h, the average speed for the whole journey is not 50 km/h but the harmonic mean 2/(1/40 + 1/60) = 48 km/h.

For any set of positive, non-identical numbers: H \leq G \leq A (the AM-GM-HM inequality).

Where this leads next

Once you can summarise the centre of a dataset, the next question is: how spread out is the data around that centre? Two datasets can have the same mean but very different shapes.

Measures of Dispersion — range, variance, and standard deviation: quantifying how far the data spreads from its centre.
Quartiles and Percentiles — dividing the data into quarters and hundredths for a finer summary than just the median.
Correlation — when you have two variables and want to know whether they are related.
Data Organization — the prerequisite: how to turn raw data into frequency tables and graphs before computing any summary.