In short
The tangent line to y = f(x) at x = a gives the best linear estimate of f near a: f(a + \Delta x) \approx f(a) + f'(a)\,\Delta x. This is called linear approximation. The notation dy = f'(x)\,dx packages this into a single symbol called the differential, and it is the tool behind all error-estimation problems.
What is \sqrt{4.02}?
You know \sqrt{4} = 2. And 4.02 is very close to 4. So \sqrt{4.02} should be very close to 2 — but how close? Can you get a good estimate without a calculator?
Here is the idea. The function f(x) = \sqrt{x} has a tangent line at x = 4. Near x = 4, the curve and its tangent line are nearly identical — the tangent hugs the curve. So instead of evaluating the curve at x = 4.02 (which requires a calculator), evaluate the tangent line at x = 4.02 (which requires only arithmetic).
The tangent line at x = 4:
At x = 4.02:
A calculator gives \sqrt{4.02} = 2.00499\ldots The tangent-line estimate is off by less than 0.00001 — five decimal places of accuracy from one line of arithmetic.
That is the power of linear approximation. The derivative, which you already know how to compute, hands you a recipe for estimating function values near known points. And the recipe comes with a built-in way to estimate how far off your estimate might be.
The linear approximation formula
The tangent line to y = f(x) at x = a is
For x near a, f(x) \approx L(x). Writing \Delta x = x - a (the small step away from a):
Linear approximation
This approximation is good when \Delta x is small. It says: the function value at a nearby point is approximately the known value plus the derivative times the step size.
Read the formula piece by piece:
- f(a) — the known value. This is the anchor.
- f'(a) — the rate of change at the anchor. This tells you how fast the function is changing.
- \Delta x — the step. This is how far you have moved from the anchor.
- f'(a)\,\Delta x — the correction. The rate times the step gives the estimated change.
The whole thing says: value at the new point \approx value at the known point + rate \times step. This is the same logic as "distance \approx speed \times time" when speed is approximately constant — the derivative is the speed, and linear approximation is the assumption that the speed does not change much over a small step.
Why it works: the tangent hugs the curve
The tangent line at a point is the unique straight line that matches both the function's value and its slope at that point. No other line does both. Near the point, the function barely curves — it is nearly straight — so the tangent line is nearly indistinguishable from the function.
Farther from the anchor, the approximation degrades. At x = 5 (one full unit away), the tangent gives 2 + \frac{1}{4}(1) = 2.25, while the true value is \sqrt{5} = 2.236\ldots — an error of about 0.014, which is still reasonable but noticeably worse. At x = 9, the tangent gives 2 + \frac{1}{4}(5) = 3.25, while \sqrt{9} = 3 — an error of 0.25, which is too large to be useful. The tangent line is a local tool: it works near the anchor and fails far away.
Standard approximations
Several commonly used approximations follow immediately from the linear approximation formula. Each one takes a specific function, anchors it at a convenient point, and writes the tangent-line estimate.
For |x| small:
| Function | Approximation | Anchor point |
|---|---|---|
| (1 + x)^n | \approx 1 + nx | x = 0 |
| \sin x | \approx x | x = 0 |
| \cos x | \approx 1 - \frac{x^2}{2} | x = 0 |
| \tan x | \approx x | x = 0 |
| e^x | \approx 1 + x | x = 0 |
| \ln(1 + x) | \approx x | x = 0 |
The first one — (1 + x)^n \approx 1 + nx — is the most versatile. It works for any exponent n (integer, fraction, negative). For example:
- \sqrt{1.02} = (1 + 0.02)^{1/2} \approx 1 + \frac{1}{2}(0.02) = 1.01
- (1.03)^{10} \approx 1 + 10(0.03) = 1.3
- \frac{1}{1.05} = (1 + 0.05)^{-1} \approx 1 - 0.05 = 0.95
Each one replaces a messy computation with a multiplication. The \cos x entry uses the second-order approximation (the first-order approximation \cos x \approx 1 is not very useful since it gives no dependence on x).
Where does (1 + x)^n \approx 1 + nx come from? Apply the linear approximation formula to f(t) = t^n at t = 1. The derivative is f'(t) = nt^{n-1}, so f'(1) = n. The formula gives f(1 + x) \approx f(1) + f'(1) \cdot x = 1 + nx. That's all — one line of the general formula, applied to a specific function.
How good are these? For \sqrt{1.02}, the exact value is 1.00995\ldots and the approximation gives 1.01 — off by about 0.00005. For (1.03)^{10}, the exact value is 1.3439\ldots and the approximation gives 1.3 — off by about 0.04. The second approximation is worse because 0.03 \times 10 = 0.3 is not that small; the formula works best when the product nx is small compared to 1.
Combining approximations. You can chain these. For example, e^{\sin(0.01)} \approx e^{0.01} \approx 1 + 0.01 = 1.01, where the first step uses \sin x \approx x and the second uses e^x \approx 1 + x. The exact value is 1.01005\ldots — two approximations stacked, and the result is still accurate to four decimal places.
Differentials
There is a notation that makes linear approximation more compact and more useful for error problems. Define:
Differentials
If y = f(x), then the differential dy is defined as
where dx is a small increment in x (the same as \Delta x).
The differential dy is the change in y predicted by the tangent line — not the actual change \Delta y = f(x + dx) - f(x). The actual change \Delta y follows the curve; the differential dy follows the tangent. The approximation \Delta y \approx dy is exactly the linear approximation written in different notation.
Why the notation is useful. In error problems, you know dx (the measurement uncertainty) and want dy (how much the computed answer is affected). The formula dy = f'(x)\,dx gives you that directly. You do not need to recompute f(x + dx) - f(x) — the derivative does the work.
Error estimation
Suppose you measure the radius of a circle as r = 5.0 cm, with a possible error of \pm 0.1 cm. What is the resulting error in the computed area?
The area is A = \pi r^2. The differential gives the error:
With r = 5.0 and dr = \pm 0.1:
The true area is \pi(25) = 78.54 cm^2. An error of \pm 0.1 cm in the radius produces an error of about \pm 3.14 cm^2 in the area — about a \pm 4% error.
This is the standard method. The differential dy = f'(x)\,dx propagates a known error dx in the input to an estimated error dy in the output. The derivative f'(x) acts as an error multiplier: it tells you how sensitive the output is to changes in the input.
Another example: error in period of a pendulum. The period of a simple pendulum is T = 2\pi\sqrt{L/g}, where L is the length. If L is measured with 1\% error, what is the error in T?
Write T = 2\pi g^{-1/2} \cdot L^{1/2}. Since T \propto L^{1/2}, the relative error in T is \frac{1}{2} times the relative error in L: a 1\% error in length gives a 0.5\% error in period. The square root reduces the relative error — which is why pendulum clocks were historically good timekeepers even with imprecise length measurements.
Percentage error
The absolute error in a quantity y is |dy|. The relative error is \frac{|dy|}{|y|}, and the percentage error is \frac{|dy|}{|y|} \times 100\%.
These have clean forms when expressed using differentials. Starting from dy = f'(x)\,dx:
This is the relative error in y expressed in terms of the relative error in x — but only when f has a particularly nice form.
Power rule for errors. If y = x^n, then \frac{dy}{y} = n \cdot \frac{dx}{x}. The relative error in y is n times the relative error in x.
- If y = x^2 (area from radius), a 2% error in x causes a 4% error in y.
- If y = x^3 (volume from radius), a 2% error in x causes a 6% error in y.
- If y = \sqrt{x} = x^{1/2}, a 2% error in x causes a 1% error in y.
This is why volume measurements are more sensitive to radius errors than area measurements — the exponent is larger.
Product rule for errors. If y = u \cdot v, where u and v are independent measurements, then
Relative errors add when quantities are multiplied. (More precisely, in the worst case they add; on average they partially cancel.)
Quotient rule for errors. If y = u/v, then \frac{dy}{y} = \frac{du}{u} + \frac{dv}{v} — the same formula. Relative errors add for both multiplication and division.
Worked examples
Example 1: Approximate (26)^(1/3)
Compute \sqrt[3]{26} using linear approximation.
Step 1. Choose the anchor: the nearest perfect cube is 27 = 3^3. So a = 27, \Delta x = -1.
Why: you need a nearby point where the cube root is known exactly. 27 is the closest such point to 26.
Step 2. Set up the function and its derivative. f(x) = x^{1/3}, so f'(x) = \frac{1}{3}x^{-2/3}.
Step 3. Evaluate at the anchor.
Why: (27)^{2/3} = (27^{1/3})^2 = 3^2 = 9.
Step 4. Apply the formula.
Why: \Delta x = 26 - 27 = -1, and the negative sign means the cube root decreases slightly as x drops from 27 to 26.
Result: \sqrt[3]{26} \approx 2.9630. A calculator gives 2.96250\ldots — the error is about 0.0005, less than 0.02\%.
The tangent line slightly overestimates the cube root at x = 26. This happens because f(x) = x^{1/3} is concave — it bends downward — so its tangent line always lies above the curve. Knowing the direction of the error (overestimate vs. underestimate) is useful for bounding the true value.
Example 2: Error in the volume of a sphere
The radius of a sphere is measured as r = 5.0 cm with a maximum error of \pm 0.05 cm. Find the percentage error in the computed volume.
Step 1. The volume is V = \frac{4}{3}\pi r^3. Differentiate:
Why: the differential gives the approximate change in volume caused by a small change dr in the radius.
Step 2. Compute the relative error.
Why: the 4\pi cancels, and the power of r reduces by one. The result is 3 \cdot \frac{dr}{r} — the relative error in V is three times the relative error in r. This matches the power rule: V \propto r^3, so n = 3.
Step 3. Plug in.
Why: the relative error in r is 0.05/5.0 = 1\%. Tripled, this gives a 3\% relative error in V.
Step 4. Convert to percentage error and compute the absolute error.
The volume itself is V = \frac{4}{3}\pi(125) = \frac{500\pi}{3} \approx 523.6 cm^3, so \frac{15.71}{523.6} \approx 0.03, confirming the 3\%.
Result: A 1\% error in the radius causes a 3\% error in the volume.
This is why precision matters more for higher-dimensional quantities. Measuring a length with 1\% accuracy gives 1\% accuracy for lengths, 2\% for areas, and 3\% for volumes. Each additional dimension multiplies the error.
The role of concavity
Linear approximation always introduces an error. The direction of the error — whether the tangent line overestimates or underestimates the function — depends on the concavity of the function.
- If f''(a) > 0 (concave up, like x^2), the tangent line lies below the curve. The linear approximation is an underestimate.
- If f''(a) < 0 (concave down, like \sqrt{x} or \ln x), the tangent line lies above the curve. The linear approximation is an overestimate.
Knowing this lets you bound the true value. If you use linear approximation on \sqrt{x} (concave down) and get 2.005, you know the true value is at most 2.005 — the real answer is slightly below. This one-sided guarantee is surprisingly useful.
The error term
How large is the error? The exact error in linear approximation is
for some c between a and a + \Delta x. This is a consequence of the mean value theorem applied twice — it is the beginning of Taylor's theorem, which you will meet later.
The key observation: the error is proportional to (\Delta x)^2. If you halve the step size, the error drops by a factor of four. This is why linear approximation is so accurate for small steps and so inaccurate for large ones.
For \sqrt{4.02}: \Delta x = 0.02, so (\Delta x)^2 = 0.0004. The second derivative of \sqrt{x} at x = 4 is f''(4) = -\frac{1}{4}(4)^{-3/2} = -\frac{1}{32}. The error is approximately \frac{1}{2} \cdot \frac{1}{32} \cdot 0.0004 = 0.00000625 — matching the actual error of about 0.000005.
Common confusions
-
"dy and \Delta y are the same thing." They are not. \Delta y = f(x + dx) - f(x) is the actual change in the function. dy = f'(x)\,dx is the approximate change predicted by the tangent line. They are nearly equal for small dx, but they are conceptually different objects.
-
"Linear approximation works best when the function is steep." It works best when the function is nearly straight — i.e., when the second derivative is small. A steep function with constant slope (a straight line) is approximated perfectly. A gently curved function with large f'' is approximated poorly. Steepness is f'; straightness is f''.
-
"The percentage error in x^n is n\%." It is n times the percentage error in x. If x has 2\% error and y = x^3, the error in y is 3 \times 2\% = 6\%, not 3\%.
-
"For y = u + v, the relative errors add." For sums, the absolute errors add: dy = du + dv. The relative errors do not combine as simply. The relative-error-addition rule applies to products and quotients, not sums.
-
"Differentials are infinitely small." In the modern treatment used in Indian textbooks, dx is simply a small finite increment. The notation dy = f'(x)\,dx is a shorthand for the linear approximation formula. The "infinitely small" interpretation belongs to a different (non-standard) framework.
Going deeper
The core technique — f(a + \Delta x) \approx f(a) + f'(a)\Delta x — is above, and it covers all standard exam problems. What follows is the wider context.
From linear to quadratic approximation
If linear approximation uses the tangent line, what happens if you use a tangent parabola instead? The second-order approximation is
This matches not just the function's value and slope at a, but also its curvature. For \sqrt{4.02}:
The true value is 2.004993766\ldots — the second-order estimate is accurate to seven decimal places. This is the beginning of Taylor series, where you keep adding terms with higher derivatives for ever-greater accuracy.
Differentials in several variables
If z = f(x, y) depends on two variables, the total differential is
This is the multivariable version of dy = f'(x)\,dx. Each partial derivative tells you how sensitive z is to one variable while the other is held fixed. The total differential combines both sensitivities. This is how engineers propagate measurement errors through formulas that depend on multiple measured quantities.
The mean value theorem connection
The linear approximation formula says f(a + h) \approx f(a) + f'(a)h. The mean value theorem says f(a + h) = f(a) + f'(c)h for some c between a and a + h — exactly, not approximately. The approximation error comes entirely from the difference between f'(a) and f'(c). When f' is nearly constant (i.e., f'' is small), f'(c) \approx f'(a) and the approximation is excellent. The mean value theorem turns the approximation into an identity by shifting the evaluation point of the derivative.
Where this leads next
- Rate of Change — the derivative as a physical rate, applied to problems where quantities change together.
- Maxima and Minima — where the derivative is zero and the linear approximation gives a horizontal tangent.
- Mean Value Theorems — the rigorous backbone that guarantees the tangent line cannot be too far from the curve.
- Taylor Series — the full generalisation: approximate a function not with a tangent line but with a polynomial of any degree.
- Monotonicity — using the sign of f' to determine where f is increasing or decreasing, which is also the sign of the linear correction term.