Continuous Random Variables

In short

A continuous random variable takes values in an interval of the real line, not a countable list. Probability is no longer concentrated at points — it is spread continuously, measured by a probability density function f_X. The probability of landing inside any interval is the area under f_X over that interval. The cumulative distribution function F_X(x) = P(X \leq x) is the running integral of f_X, and f_X = F_X'. The mean is \int x f_X(x)\,dx and the variance is \int (x - \mu)^2 f_X(x)\,dx.

You are waiting for a train that leaves at exactly 9{:}00. You will arrive at the platform at some random time between 8{:}50 and 9{:}00 — you have no idea when. Every moment in that ten-minute window is equally likely. Now the question: what is the probability that you arrive at exactly 8{:}54?

Not somewhere around 8{:}54. Exactly. To the nanosecond. To the attosecond.

The answer is 0. Not "a small number." Literally zero.

This sounds like it breaks probability. If every exact instant has probability 0, then adding them all up gives 0 + 0 + 0 + \cdots = 0 — but something has to happen, so the total should be 1. Where did the probability go?

The resolution is that when the possible values fill an entire continuous range, probability cannot live at points. It has to live on intervals. The probability of arriving between 8{:}54 and 8{:}55 is perfectly well-defined — it is \tfrac{1}{10}, because that minute is \tfrac{1}{10} of the ten-minute window. The probability of arriving at the exact instant 8{:}54 is the limit of "probability of arriving between 8{:}54 and 8{:}54 + \epsilon" as \epsilon \to 0, and that limit is 0.

This is the first and strangest fact about continuous random variables: single points have zero probability, but intervals do not. You have to stop thinking in terms of bar charts and start thinking in terms of areas.

From mass to density

Go back to the discrete picture for a moment. A discrete random variable has a probability mass function p_X(k) — a number at each allowed value of k. The probability of any event is a sum of p_X values:

P(a \leq X \leq b) = \sum_{a \leq k \leq b} p_X(k).

For a continuous random variable, sums become integrals. There is no longer a mass at each point; there is a density — a function f_X(x) that tells you how thick the probability is per unit length near each point. The probability of an interval is

P(a \leq X \leq b) = \int_{a}^{b} f_X(x)\, dx.

Read that carefully. The integrand f_X(x) is not a probability. It has units of probability per unit of x. To turn density into probability you have to multiply by length — or, if the density varies along the interval, integrate.

The density function is constrained by two conditions that match the mass function's conditions exactly:

f_X(x) \geq 0 for all x. (Density can't be negative, just as mass can't.)
\displaystyle\int_{-\infty}^{\infty} f_X(x)\, dx = 1. (The total probability must be 1.)

One thing the density function is allowed to do that the mass function is not: f_X(x) can be larger than 1. That feels wrong — probabilities should be between 0 and 1! But remember, f_X(x) is not a probability. It is a density. If you have a very narrow interval where all the probability is concentrated, the density on that interval has to be very high, and that is fine.

The shaded region under the density curve $f_X$ between $x = a$ and $x = b$ is exactly the probability $P(a \leq X \leq b)$. The total area under the curve is $1$.

Now the train-arrival story has a clean answer. The density of your arrival time is a rectangle: f_X(t) = \tfrac{1}{10} for t between 8{:}50 and 9{:}00, and 0 outside that window. (This is the uniform distribution on the ten-minute interval.) The probability of arriving between 8{:}54 and 8{:}55 is the area of the strip: \tfrac{1}{10} \cdot 1 = \tfrac{1}{10}. The probability of arriving at the exact instant 8{:}54 is the area of a strip of zero width: \tfrac{1}{10} \cdot 0 = 0.

The CDF of a discrete random variable is a staircase — flat between allowed values with a jump at each one. The CDF of a continuous random variable is a smooth curve with no jumps. Probability lives at points in the discrete case and on intervals in the continuous case.

The formal definition

Definition

A random variable X is continuous if there exists a function f_X : \mathbb{R} \to [0, \infty) — the probability density function — such that for every a \leq b,

P(a \leq X \leq b) \;=\; \int_{a}^{b} f_X(x)\, dx,

with \displaystyle\int_{-\infty}^{\infty} f_X(x)\, dx = 1.

The cumulative distribution function of X is

F_X(x) \;=\; P(X \leq x) \;=\; \int_{-\infty}^{x} f_X(t)\, dt,

which is a continuous, non-decreasing function satisfying \lim_{x \to -\infty} F_X(x) = 0 and \lim_{x \to \infty} F_X(x) = 1.

Reading the definition. The density f_X is the tool; probability is always extracted from it by integrating. The CDF F_X(x) is the running integral of f_X from the far left up to x. Unlike the staircase CDF of a discrete random variable, the CDF of a continuous variable is a genuinely smooth (or at least continuous) curve that rises from 0 to 1 as x moves from -\infty to \infty.

One consequence of the definition you should burn into memory. Because an integral over a single point is 0,

P(X = a) \;=\; \int_{a}^{a} f_X(x)\, dx \;=\; 0

for every a. So for a continuous random variable, "strictly less than" and "less than or equal to" give the same probability: P(X < a) = P(X \leq a). This is the opposite of the discrete case, where the two can differ by p_X(a).

Density and CDF: two sides of one coin

The density and the CDF are not independent objects — they are linked by the fundamental theorem of calculus.

The CDF is the integral of the density:

F_X(x) = \int_{-\infty}^{x} f_X(t)\, dt.

And conversely, the density is the derivative of the CDF:

f_X(x) = \frac{d}{dx} F_X(x).

If someone hands you the CDF, you differentiate to get the density. If someone hands you the density, you integrate to get the CDF. In most problems, one of them is natural to write down and the other follows from the calculus.

Probabilities can then be extracted from either:

P(a \leq X \leq b) = F_X(b) - F_X(a) = \int_{a}^{b} f_X(x)\, dx.

The first form is often easier to use in practice because a single CDF value is just a number lookup — no integration required at the point of use.

The density $f_X$ on the left and its CDF $F_X$ on the right. The shaded area under the density from $-\infty$ up to $x$ equals the height of the CDF at $x$ — that is the fundamental theorem of calculus saying the CDF is the running integral of the density.

Mean and variance, with integrals

For discrete random variables, the expected value is E[X] = \sum_k k \cdot p_X(k). For continuous random variables, the sum becomes an integral and the mass function becomes the density:

E[X] \;=\; \int_{-\infty}^{\infty} x \cdot f_X(x)\, dx.

Reading this: each possible value of x is weighted by the density at x — how "thick" the probability is there — and the weights are added up over the whole real line. The result is the average value of X across the distribution, sometimes called the expectation or the mean, written \mu or \mu_X or E[X].

The variance is the average squared distance from the mean:

\text{Var}(X) \;=\; E[(X - \mu)^2] \;=\; \int_{-\infty}^{\infty} (x - \mu)^2 \, f_X(x)\, dx.

And the standard deviation \sigma_X = \sqrt{\text{Var}(X)} is the root of that, in the same units as X itself. Standard deviation is the number you want when you are comparing spread across distributions, because it lives on the same scale as the variable.

There is an identity that saves a lot of work: \text{Var}(X) = E[X^2] - (E[X])^2. It lets you compute the variance by finding E[X^2] = \int x^2 f_X(x)\, dx and subtracting the square of E[X], which is usually less painful than expanding (x - \mu)^2 inside the integral.

Two worked examples

Example 1: the uniform distribution on [0, 4]

Let X be uniformly distributed on the interval [0, 4]. This means the density is constant on the interval and zero outside:

f_X(x) = \begin{cases} c & 0 \leq x \leq 4 \\ 0 & \text{otherwise} \end{cases}

for some constant c. Find c, the CDF, the mean, and the variance.

Step 1. Find c so the total probability is 1.

\int_{0}^{4} c \, dx = 4c = 1 \;\Longrightarrow\; c = \frac{1}{4}.

Why: the density has to integrate to 1 — that is the normalization condition.

Step 2. Compute the CDF.

For x < 0: F_X(x) = 0. For 0 \leq x \leq 4:

F_X(x) = \int_{0}^{x} \frac{1}{4}\, dt = \frac{x}{4}.

For x > 4: F_X(x) = 1.

So F_X is 0 to the left of the interval, rises linearly from 0 to 1 across it, and stays at 1 afterward.

Why: the CDF is just the running integral of the density — for a constant density, the running integral is a straight line.

Step 3. Compute the mean.

E[X] = \int_{0}^{4} x \cdot \frac{1}{4}\, dx = \frac{1}{4} \cdot \frac{x^2}{2} \Big|_{0}^{4} = \frac{1}{4} \cdot 8 = 2.

The mean is the midpoint of the interval, as you would expect from symmetry.

Why: for a symmetric distribution, the mean must coincide with the centre of symmetry.

Step 4. Compute the variance via E[X^2] - \mu^2.

E[X^2] = \int_{0}^{4} x^2 \cdot \frac{1}{4}\, dx = \frac{1}{4} \cdot \frac{x^3}{3} \Big|_{0}^{4} = \frac{64}{12} = \frac{16}{3}.

\text{Var}(X) = E[X^2] - \mu^2 = \frac{16}{3} - 4 = \frac{16 - 12}{3} = \frac{4}{3}.

The standard deviation is \sigma_X = \tfrac{2}{\sqrt{3}} \approx 1.155.

Result: f_X(x) = \tfrac{1}{4} on [0, 4], F_X(x) = \tfrac{x}{4} on [0, 4], \mu = 2, \sigma^2 = \tfrac{4}{3}.

Left: the uniform density on $[0, 4]$ is a flat rectangle of height $\tfrac{1}{4}$. Right: the CDF rises linearly from $0$ at $x = 0$ to $1$ at $x = 4$. The area of the rectangle on the left matches the height gain on the right — that is the fundamental theorem of calculus at work.

Example 2: a triangular density on [0, 2]

Let X have density

f_X(x) = \begin{cases} c x & 0 \leq x \leq 2 \\ 0 & \text{otherwise} \end{cases}

Find c, then compute P(1 \leq X \leq 2) and E[X].

Step 1. Normalize the density.

\int_{0}^{2} c x \, dx = c \cdot \frac{x^2}{2} \Big|_{0}^{2} = c \cdot 2 = 1 \;\Longrightarrow\; c = \frac{1}{2}.

So f_X(x) = \tfrac{x}{2} on [0, 2].

Why: without normalization, the function is just a shape — the integral-to-one condition picks out the specific density that represents a probability distribution.

Step 2. Sketch the density. It is a straight line from (0, 0) to (2, 1). The triangle under the line has base 2 and height 1, so its area is \tfrac{1}{2} \cdot 2 \cdot 1 = 1. Consistent.

Step 3. Compute P(1 \leq X \leq 2).

P(1 \leq X \leq 2) = \int_{1}^{2} \frac{x}{2}\, dx = \frac{1}{2} \cdot \frac{x^2}{2} \Big|_{1}^{2} = \frac{1}{4}(4 - 1) = \frac{3}{4}.

Why: the density is tilted to the right, so most of the probability lives on the right half of the interval — three times as much on the right half as on the left.

Step 4. Compute the mean.

E[X] = \int_{0}^{2} x \cdot \frac{x}{2}\, dx = \int_{0}^{2} \frac{x^2}{2}\, dx = \frac{1}{2} \cdot \frac{x^3}{3} \Big|_{0}^{2} = \frac{1}{2} \cdot \frac{8}{3} = \frac{4}{3}.

The mean is \tfrac{4}{3} \approx 1.333, which sits in the right half of the interval — exactly where the density is largest. The mean is pulled toward the mass.

Result: P(1 \leq X \leq 2) = \tfrac{3}{4}, E[X] = \tfrac{4}{3}.

The density $f_X(x) = x/2$ is a rising line. The shaded trapezoid from $x = 1$ to $x = 2$ has area $\tfrac{3}{4}$ — three-quarters of the total probability lives in the right half of the interval.

The second example shows why the shape of the density matters so much: a flat density puts equal probability in every unit of length, while a tilted density concentrates probability wherever the density is highest.

Left: the triangular density $f_X(x) = x/2$ rises linearly from $0$ to $1$ across the interval $[0, 2]$. Right: the corresponding CDF $F_X(x) = x^2/4$ is a concave-up curve rising quadratically from $0$ to $1$. The CDF accelerates where the density is large — which is why the CDF bends more steeply on the right side of the interval.

Common confusions

"f_X(a) is the probability that X equals a." Wrong — P(X = a) = 0 for every continuous variable. f_X(a) is a density, and you have to multiply by a small length dx to get an approximate probability: P(X \in [a, a + dx]) \approx f_X(a)\, dx.
"Densities must be less than 1." No. Densities can be any non-negative number, including values much larger than 1. Only probabilities are capped at 1, and probabilities are integrals of densities, not densities themselves.
"You cannot tell whether a density is a density by looking at it." You can — check the two conditions. Is it non-negative everywhere? Does it integrate to 1? If yes to both, it is a valid density.
"The CDF of a continuous variable is always smooth." It is always continuous, but it can have corners — places where the density jumps discontinuously. The uniform distribution's CDF has corners at the endpoints of the interval. Continuity is guaranteed; smoothness is not.
"P(a \leq X \leq b) and P(a < X < b) are different because of the endpoints." For continuous random variables, they are equal. The endpoints contribute zero probability, so you can freely swap \leq for < without changing anything. This is not true for discrete variables.
"To find the probability of an event, read off the height of the density at that point." You read off the density if the event is a point, and you get a density value, not a probability. To get a probability, you always integrate — either analytically, or geometrically as area, or using the CDF.

Going deeper

If you understand what a density is, how it relates to the CDF, and how to compute means and variances with integrals, you have the working toolkit. The rest is for readers who want to see the formal measure-theoretic picture, the law of the unconscious statistician, and a subtlety about mixed distributions.

Why P(X = a) = 0 isn't a paradox

The statement "every single point has probability zero, but the interval has positive probability" sounds like it violates additivity: if you take a union of many events, the probability of the union should be the sum of the probabilities. If you chop the interval [0, 1] into all its individual points and add up the zeros, you should get 0, not 1.

The catch is that additivity only works for countable unions. The interval [0, 1] contains uncountably many points — so many that you cannot list them as x_1, x_2, x_3, \ldots even with infinite patience. The axiom of probability (countable additivity) does not extend to uncountable unions, and it is exactly this gap that lets continuous distributions exist without contradiction.

This is the point where probability theory starts to need measure theory. You do not need that machinery to compute anything in this article — the integral calculus you already know is enough — but it is where the rigorous foundations sit.

The law of the unconscious statistician

You already know E[X] = \int x f_X(x)\, dx. What about E[X^2], or E[\sin X], or more generally E[g(X)] for some function g? You might think you have to first find the density of the new random variable Y = g(X) and then take \int y f_Y(y)\, dy — but there is a shortcut.

E[g(X)] = \int_{-\infty}^{\infty} g(x)\, f_X(x)\, dx.

You can compute the expectation of g(X) by integrating g(x) against the original density f_X, without ever finding the density of g(X) itself. This trick is traditionally called the law of the unconscious statistician (because you use it without thinking about why it works), and it makes variance calculations much cleaner: E[X^2] = \int x^2 f_X(x)\, dx, done.

Mixed distributions

Not every random variable is purely discrete or purely continuous. Some have mass at a few points and a continuous density on an interval. A classic example is an insurance claim X that is 0 with probability 0.9 (no claim filed) and follows some continuous density when X > 0. The CDF of such a variable has a jump at x = 0 (size 0.9) and is smooth elsewhere.

For mixed distributions, neither a pure density nor a pure mass function will capture the whole picture. You can still write the CDF, though — and in many practical problems, writing the CDF directly and differentiating where it is smooth is the cleanest approach. The fully general framework is measure theory, which treats discrete, continuous, and mixed distributions uniformly.

Transforming a continuous random variable

Suppose X has density f_X and you want to know the distribution of Y = g(X) for some smooth, strictly increasing function g. The CDF route is the simplest:

F_Y(y) \;=\; P(Y \leq y) \;=\; P(g(X) \leq y) \;=\; P(X \leq g^{-1}(y)) \;=\; F_X(g^{-1}(y)).

Differentiating both sides with respect to y and using the chain rule,

f_Y(y) \;=\; f_X(g^{-1}(y)) \cdot \frac{d}{dy} g^{-1}(y) \;=\; \frac{f_X(g^{-1}(y))}{g'(g^{-1}(y))}.

This is called the change of variable formula for densities. The factor in the denominator accounts for how much g stretches or compresses the x-axis near each point. If g stretches the axis by a factor of 2, the density at the corresponding y-value has to be halved, because probability is conserved but now spread over twice the length. This formula is the continuous analogue of the way a discrete mass function transforms under a relabelling of outcomes, and it is how many of the common distributions you will meet — log-normal, chi-square, exponential — are derived from simpler ones.

Why densities can exceed one

Here is a quick sanity check that densities do not have to be bounded by 1. Take the uniform distribution on [0, 0.1]. The density has to be constant over an interval of length 0.1 and has to integrate to 1, so the density value is \frac{1}{0.1} = 10. The density is 10 everywhere on the interval — ten times what a probability is allowed to be — and yet the total probability is exactly 1. The apparent paradox dissolves once you remember that density times length gives probability, not density alone.

Where this leads next

Continuous random variables are the setting for the most important distribution in all of statistics — the normal distribution — and for most of the continuous distributions you will meet in physics, engineering, and data science.

Normal Distribution — the bell curve. The single most important continuous distribution, and the one the central limit theorem delivers as a universal shape for averages.
Expectation and Variance - Discrete — the discrete counterpart of this article's mean and variance formulas, with sums in place of integrals.
Random Variables - Discrete — the discrete case, to compare and contrast with the continuous picture you just built.
Definite Integration - Techniques — the integration methods you will use to compute probabilities, means, and variances for specific densities.