In short

The tangent line to y = f(x) at x = a gives the best linear estimate of f near a: f(a + \Delta x) \approx f(a) + f'(a)\,\Delta x. This is called linear approximation. The notation dy = f'(x)\,dx packages this into a single symbol called the differential, and it is the tool behind all error-estimation problems.

What is \sqrt{4.02}?

You know \sqrt{4} = 2. And 4.02 is very close to 4. So \sqrt{4.02} should be very close to 2 — but how close? Can you get a good estimate without a calculator?

Here is the idea. The function f(x) = \sqrt{x} has a tangent line at x = 4. Near x = 4, the curve and its tangent line are nearly identical — the tangent hugs the curve. So instead of evaluating the curve at x = 4.02 (which requires a calculator), evaluate the tangent line at x = 4.02 (which requires only arithmetic).

The tangent line at x = 4:

f'(x) = \frac{1}{2\sqrt{x}}, \quad f'(4) = \frac{1}{4}
\text{Tangent: } y = f(4) + f'(4)(x - 4) = 2 + \frac{1}{4}(x - 4)

At x = 4.02:

y = 2 + \frac{1}{4}(0.02) = 2 + 0.005 = 2.005

A calculator gives \sqrt{4.02} = 2.00499\ldots The tangent-line estimate is off by less than 0.00001 — five decimal places of accuracy from one line of arithmetic.

That is the power of linear approximation. The derivative, which you already know how to compute, hands you a recipe for estimating function values near known points. And the recipe comes with a built-in way to estimate how far off your estimate might be.

The linear approximation formula

The tangent line to y = f(x) at x = a is

L(x) = f(a) + f'(a)(x - a)

For x near a, f(x) \approx L(x). Writing \Delta x = x - a (the small step away from a):

Linear approximation

f(a + \Delta x) \approx f(a) + f'(a)\,\Delta x

This approximation is good when \Delta x is small. It says: the function value at a nearby point is approximately the known value plus the derivative times the step size.

Read the formula piece by piece:

The whole thing says: value at the new point \approx value at the known point + rate \times step. This is the same logic as "distance \approx speed \times time" when speed is approximately constant — the derivative is the speed, and linear approximation is the assumption that the speed does not change much over a small step.

Why it works: the tangent hugs the curve

The tangent line at a point is the unique straight line that matches both the function's value and its slope at that point. No other line does both. Near the point, the function barely curves — it is nearly straight — so the tangent line is nearly indistinguishable from the function.

The curve $y = \sqrt{x}$ (black) and its tangent line at $(4, 2)$ (red dashed). Near $x = 4$, the two are nearly indistinguishable. At $x = 4.02$, the tangent gives $2.005$, while the true value is $2.00499\ldots$ — the tangent-line estimate is excellent.

Farther from the anchor, the approximation degrades. At x = 5 (one full unit away), the tangent gives 2 + \frac{1}{4}(1) = 2.25, while the true value is \sqrt{5} = 2.236\ldots — an error of about 0.014, which is still reasonable but noticeably worse. At x = 9, the tangent gives 2 + \frac{1}{4}(5) = 3.25, while \sqrt{9} = 3 — an error of 0.25, which is too large to be useful. The tangent line is a local tool: it works near the anchor and fails far away.

Standard approximations

Several commonly used approximations follow immediately from the linear approximation formula. Each one takes a specific function, anchors it at a convenient point, and writes the tangent-line estimate.

For |x| small:

Function Approximation Anchor point
(1 + x)^n \approx 1 + nx x = 0
\sin x \approx x x = 0
\cos x \approx 1 - \frac{x^2}{2} x = 0
\tan x \approx x x = 0
e^x \approx 1 + x x = 0
\ln(1 + x) \approx x x = 0

The first one — (1 + x)^n \approx 1 + nx — is the most versatile. It works for any exponent n (integer, fraction, negative). For example:

Each one replaces a messy computation with a multiplication. The \cos x entry uses the second-order approximation (the first-order approximation \cos x \approx 1 is not very useful since it gives no dependence on x).

Where does (1 + x)^n \approx 1 + nx come from? Apply the linear approximation formula to f(t) = t^n at t = 1. The derivative is f'(t) = nt^{n-1}, so f'(1) = n. The formula gives f(1 + x) \approx f(1) + f'(1) \cdot x = 1 + nx. That's all — one line of the general formula, applied to a specific function.

How good are these? For \sqrt{1.02}, the exact value is 1.00995\ldots and the approximation gives 1.01 — off by about 0.00005. For (1.03)^{10}, the exact value is 1.3439\ldots and the approximation gives 1.3 — off by about 0.04. The second approximation is worse because 0.03 \times 10 = 0.3 is not that small; the formula works best when the product nx is small compared to 1.

Combining approximations. You can chain these. For example, e^{\sin(0.01)} \approx e^{0.01} \approx 1 + 0.01 = 1.01, where the first step uses \sin x \approx x and the second uses e^x \approx 1 + x. The exact value is 1.01005\ldots — two approximations stacked, and the result is still accurate to four decimal places.

Differentials

There is a notation that makes linear approximation more compact and more useful for error problems. Define:

Differentials

If y = f(x), then the differential dy is defined as

dy = f'(x)\,dx

where dx is a small increment in x (the same as \Delta x).

The differential dy is the change in y predicted by the tangent line — not the actual change \Delta y = f(x + dx) - f(x). The actual change \Delta y follows the curve; the differential dy follows the tangent. The approximation \Delta y \approx dy is exactly the linear approximation written in different notation.

The difference between delta y and dyA curve with a tangent line at a point. A horizontal step dx is taken. The actual change in the function, delta y, follows the curve upward. The differential dy follows the tangent line upward. The difference between delta y and dy is the approximation error, shown as a small gap at the top. P dy Δy dx = Δx tangent curve
The actual change $\Delta y$ follows the curve; the differential $dy$ follows the tangent line. For small $dx$, the two are nearly equal. The gap between them is the approximation error.

Why the notation is useful. In error problems, you know dx (the measurement uncertainty) and want dy (how much the computed answer is affected). The formula dy = f'(x)\,dx gives you that directly. You do not need to recompute f(x + dx) - f(x) — the derivative does the work.

Error estimation

Suppose you measure the radius of a circle as r = 5.0 cm, with a possible error of \pm 0.1 cm. What is the resulting error in the computed area?

The area is A = \pi r^2. The differential gives the error:

dA = \frac{dA}{dr}\,dr = 2\pi r \,dr

With r = 5.0 and dr = \pm 0.1:

dA = 2\pi(5.0)(0.1) = \pi \approx 3.14 \text{ cm}^2

The true area is \pi(25) = 78.54 cm^2. An error of \pm 0.1 cm in the radius produces an error of about \pm 3.14 cm^2 in the area — about a \pm 4% error.

This is the standard method. The differential dy = f'(x)\,dx propagates a known error dx in the input to an estimated error dy in the output. The derivative f'(x) acts as an error multiplier: it tells you how sensitive the output is to changes in the input.

Another example: error in period of a pendulum. The period of a simple pendulum is T = 2\pi\sqrt{L/g}, where L is the length. If L is measured with 1\% error, what is the error in T?

Write T = 2\pi g^{-1/2} \cdot L^{1/2}. Since T \propto L^{1/2}, the relative error in T is \frac{1}{2} times the relative error in L: a 1\% error in length gives a 0.5\% error in period. The square root reduces the relative error — which is why pendulum clocks were historically good timekeepers even with imprecise length measurements.

Percentage error

The absolute error in a quantity y is |dy|. The relative error is \frac{|dy|}{|y|}, and the percentage error is \frac{|dy|}{|y|} \times 100\%.

These have clean forms when expressed using differentials. Starting from dy = f'(x)\,dx:

\frac{dy}{y} = \frac{f'(x)}{f(x)}\,dx

This is the relative error in y expressed in terms of the relative error in x — but only when f has a particularly nice form.

Power rule for errors. If y = x^n, then \frac{dy}{y} = n \cdot \frac{dx}{x}. The relative error in y is n times the relative error in x.

This is why volume measurements are more sensitive to radius errors than area measurements — the exponent is larger.

Error multiplier for different powersThree boxes showing how a 2 percent error in x propagates through different powers. For y equals x to the one half, the error is 1 percent. For y equals x squared, the error is 4 percent. For y equals x cubed, the error is 6 percent. y = √x n = ½ → 1% y = x² n = 2 → 4% y = x³ n = 3 → 6% 2% error in x produces:
The power rule for errors. A $2\%$ error in $x$ produces an $n \times 2\%$ error in $x^n$. Square roots halve the error; squares double it; cubes triple it.

Product rule for errors. If y = u \cdot v, where u and v are independent measurements, then

\frac{dy}{y} = \frac{du}{u} + \frac{dv}{v}

Relative errors add when quantities are multiplied. (More precisely, in the worst case they add; on average they partially cancel.)

Quotient rule for errors. If y = u/v, then \frac{dy}{y} = \frac{du}{u} + \frac{dv}{v} — the same formula. Relative errors add for both multiplication and division.

Worked examples

Example 1: Approximate (26)^(1/3)

Compute \sqrt[3]{26} using linear approximation.

Step 1. Choose the anchor: the nearest perfect cube is 27 = 3^3. So a = 27, \Delta x = -1.

Why: you need a nearby point where the cube root is known exactly. 27 is the closest such point to 26.

Step 2. Set up the function and its derivative. f(x) = x^{1/3}, so f'(x) = \frac{1}{3}x^{-2/3}.

Step 3. Evaluate at the anchor.

f(27) = 3, \quad f'(27) = \frac{1}{3}(27)^{-2/3} = \frac{1}{3} \cdot \frac{1}{9} = \frac{1}{27}

Why: (27)^{2/3} = (27^{1/3})^2 = 3^2 = 9.

Step 4. Apply the formula.

f(26) \approx f(27) + f'(27) \cdot (-1) = 3 - \frac{1}{27} = 3 - 0.03704 = 2.9630

Why: \Delta x = 26 - 27 = -1, and the negative sign means the cube root decreases slightly as x drops from 27 to 26.

Result: \sqrt[3]{26} \approx 2.9630. A calculator gives 2.96250\ldots — the error is about 0.0005, less than 0.02\%.

The curve $y = x^{1/3}$ (black) and its tangent line at $(27, 3)$ (red dashed). At $x = 26$, the tangent gives $2.963$, matching the true value to three decimal places.

The tangent line slightly overestimates the cube root at x = 26. This happens because f(x) = x^{1/3} is concave — it bends downward — so its tangent line always lies above the curve. Knowing the direction of the error (overestimate vs. underestimate) is useful for bounding the true value.

Example 2: Error in the volume of a sphere

The radius of a sphere is measured as r = 5.0 cm with a maximum error of \pm 0.05 cm. Find the percentage error in the computed volume.

Step 1. The volume is V = \frac{4}{3}\pi r^3. Differentiate:

dV = 4\pi r^2\,dr

Why: the differential gives the approximate change in volume caused by a small change dr in the radius.

Step 2. Compute the relative error.

\frac{dV}{V} = \frac{4\pi r^2\,dr}{\frac{4}{3}\pi r^3} = \frac{3\,dr}{r}

Why: the 4\pi cancels, and the power of r reduces by one. The result is 3 \cdot \frac{dr}{r} — the relative error in V is three times the relative error in r. This matches the power rule: V \propto r^3, so n = 3.

Step 3. Plug in.

\frac{dV}{V} = 3 \cdot \frac{0.05}{5.0} = 3 \times 0.01 = 0.03

Why: the relative error in r is 0.05/5.0 = 1\%. Tripled, this gives a 3\% relative error in V.

Step 4. Convert to percentage error and compute the absolute error.

\text{Percentage error in } V = 3\%
\text{Absolute error: } dV = 4\pi(25)(0.05) = 5\pi \approx 15.71 \text{ cm}^3

The volume itself is V = \frac{4}{3}\pi(125) = \frac{500\pi}{3} \approx 523.6 cm^3, so \frac{15.71}{523.6} \approx 0.03, confirming the 3\%.

Result: A 1\% error in the radius causes a 3\% error in the volume.

Error in volume from error in radiusA bar chart showing the relative error in volume (3 percent) is three times the relative error in radius (1 percent). A sphere diagram shows the thin shell whose volume is approximately dV. 1% error in r 3% error in V × 3 V = (4/3)πr³ ⟹ relative error multiplied by 3
The power rule for errors in action. Since $V \propto r^3$, the relative error in $V$ is three times the relative error in $r$. A $1\%$ measurement error in the radius becomes a $3\%$ uncertainty in the volume.

This is why precision matters more for higher-dimensional quantities. Measuring a length with 1\% accuracy gives 1\% accuracy for lengths, 2\% for areas, and 3\% for volumes. Each additional dimension multiplies the error.

The role of concavity

Linear approximation always introduces an error. The direction of the error — whether the tangent line overestimates or underestimates the function — depends on the concavity of the function.

A concave-up function: the tangent line at $(1, 1)$ lies below the curve. Linear approximation underestimates $x^2$ for every $x \neq 1$.

Knowing this lets you bound the true value. If you use linear approximation on \sqrt{x} (concave down) and get 2.005, you know the true value is at most 2.005 — the real answer is slightly below. This one-sided guarantee is surprisingly useful.

The error term

How large is the error? The exact error in linear approximation is

f(a + \Delta x) - L(a + \Delta x) = \frac{f''(c)}{2}(\Delta x)^2

for some c between a and a + \Delta x. This is a consequence of the mean value theorem applied twice — it is the beginning of Taylor's theorem, which you will meet later.

The key observation: the error is proportional to (\Delta x)^2. If you halve the step size, the error drops by a factor of four. This is why linear approximation is so accurate for small steps and so inaccurate for large ones.

For \sqrt{4.02}: \Delta x = 0.02, so (\Delta x)^2 = 0.0004. The second derivative of \sqrt{x} at x = 4 is f''(4) = -\frac{1}{4}(4)^{-3/2} = -\frac{1}{32}. The error is approximately \frac{1}{2} \cdot \frac{1}{32} \cdot 0.0004 = 0.00000625 — matching the actual error of about 0.000005.

Common confusions

Going deeper

The core technique — f(a + \Delta x) \approx f(a) + f'(a)\Delta x — is above, and it covers all standard exam problems. What follows is the wider context.

From linear to quadratic approximation

If linear approximation uses the tangent line, what happens if you use a tangent parabola instead? The second-order approximation is

f(a + \Delta x) \approx f(a) + f'(a)\,\Delta x + \frac{f''(a)}{2}(\Delta x)^2

This matches not just the function's value and slope at a, but also its curvature. For \sqrt{4.02}:

f(4) + f'(4)(0.02) + \frac{f''(4)}{2}(0.02)^2 = 2 + 0.005 + \frac{-1/32}{2}(0.0004) = 2.005 - 0.00000625 = 2.00499375

The true value is 2.004993766\ldots — the second-order estimate is accurate to seven decimal places. This is the beginning of Taylor series, where you keep adding terms with higher derivatives for ever-greater accuracy.

Differentials in several variables

If z = f(x, y) depends on two variables, the total differential is

dz = \frac{\partial f}{\partial x}\,dx + \frac{\partial f}{\partial y}\,dy

This is the multivariable version of dy = f'(x)\,dx. Each partial derivative tells you how sensitive z is to one variable while the other is held fixed. The total differential combines both sensitivities. This is how engineers propagate measurement errors through formulas that depend on multiple measured quantities.

The mean value theorem connection

The linear approximation formula says f(a + h) \approx f(a) + f'(a)h. The mean value theorem says f(a + h) = f(a) + f'(c)h for some c between a and a + hexactly, not approximately. The approximation error comes entirely from the difference between f'(a) and f'(c). When f' is nearly constant (i.e., f'' is small), f'(c) \approx f'(a) and the approximation is excellent. The mean value theorem turns the approximation into an identity by shifting the evaluation point of the derivative.

Where this leads next