Chain Rule — padho-wiki

In short

The chain rule says: to differentiate a composite function f(g(x)), multiply the derivative of the outer function (evaluated at the inner function) by the derivative of the inner function. In Leibniz notation: \dfrac{dy}{dx} = \dfrac{dy}{du} \cdot \dfrac{du}{dx}.

Take f(x) = (3x + 1)^5. You want its derivative.

You already know the power rule: the derivative of x^5 is 5x^4. But here the thing being raised to the fifth power is not x — it is 3x + 1. If you blindly write 5(3x + 1)^4 and stop, you get the wrong answer. The correct answer is 5(3x + 1)^4 \cdot 3 = 15(3x + 1)^4. That extra factor of 3 is not a minor detail — it is the entire point.

Where does the 3 come from? It comes from the rate at which the inside of the function is changing. The expression 3x + 1 changes three times as fast as x itself, and that scaling factor has to appear in the derivative. The rule that accounts for this is called the chain rule, and it is the single most-used differentiation rule in all of calculus.

Without the chain rule, you can only differentiate simple building blocks — polynomials, basic trig functions, exponentials. With it, you can differentiate anything built by plugging one function inside another: \sin(x^2), e^{3x}, \sqrt{1 + x^4}, (2x^3 - 7)^{10}, and every composite function you will ever meet.

What is happening inside a composite function

Before stating the rule, make sure you can see the two-layer structure of a composite function.

Take y = (3x + 1)^5 again. There are two functions at work:

The inner function: u = 3x + 1. This takes x and produces a new number u.
The outer function: y = u^5. This takes u and raises it to the fifth power.

The composite function chains them: x \xrightarrow{\text{inner}} u \xrightarrow{\text{outer}} y. First x becomes u, then u becomes y.

Here is a second example to make the structure concrete. Take y = \sqrt{x^2 + 9}. The inner function is u = x^2 + 9 — it produces a number. The outer function is y = \sqrt{u} — it takes a square root. The composition: first compute x^2 + 9, then take the square root of whatever you got.

Being able to identify "what is the outer function?" and "what is the inner function?" is the only skill you need before applying the chain rule. The rule itself is mechanical once you see the layers.

The intuition: rates multiply through stages

Now think about rates of change. If x changes by a tiny amount \Delta x, the inner function amplifies (or shrinks) that change: u changes by approximately \frac{du}{dx} \cdot \Delta x. Then the outer function amplifies that change: y changes by approximately \frac{dy}{du} \cdot \Delta u.

Putting these two stages together:

\Delta y \;\approx\; \frac{dy}{du} \cdot \Delta u \;\approx\; \frac{dy}{du} \cdot \frac{du}{dx} \cdot \Delta x

Divide both sides by \Delta x:

\frac{\Delta y}{\Delta x} \;\approx\; \frac{dy}{du} \cdot \frac{du}{dx}

In the limit as \Delta x \to 0, this becomes exact. That is the chain rule.

An analogy helps. A gear train has two gears. If the first gear turns 3 times for every 1 turn of the crank (the inner function), and the second gear turns 5 times for every 1 turn of the first gear (the outer function), then the second gear turns 3 \times 5 = 15 times for every 1 turn of the crank. Rates multiply through stages. The chain rule says exactly the same thing about functions: the overall rate of change is the product of the rates at each stage.

For the original problem, u = 3x + 1 changes at rate 3 (three units of u per unit of x), and y = u^5 changes at rate 5u^4 (per unit of u). The overall rate is 5u^4 \times 3 = 15(3x+1)^4. Fifteen turns of the final gear per turn of the crank.

The formal statement

Chain Rule

If y = f(u) and u = g(x), and both f and g are differentiable, then the composite function y = f(g(x)) is differentiable and

\frac{dy}{dx} = \frac{dy}{du} \cdot \frac{du}{dx}

Equivalently, in prime notation:

\bigl[f(g(x))\bigr]' = f'(g(x)) \cdot g'(x)

Reading the formula. There are two factors:

f'(g(x)) — the derivative of the outer function, but evaluated at the inner function, not at x. You differentiate the outer shell as if the inner part were a single variable, then plug the inner part back in.
g'(x) — the derivative of the inner function. This is the factor that beginners forget. It accounts for how fast the input to the outer function is itself changing.

The Leibniz notation \frac{dy}{dx} = \frac{dy}{du} \cdot \frac{du}{dx} makes the rule look like fraction cancellation — the du's "cancel." They do not literally cancel (these are not fractions), but the notation is designed to make the rule easy to remember and hard to misapply. That is Leibniz notation at its best: it encodes the correct computation into the notation itself.

The proof

Here is the proof from first principles. You want to show that \frac{d}{dx}f(g(x)) = f'(g(x)) \cdot g'(x).

Start from the definition of the derivative:

\frac{d}{dx}f(g(x)) = \lim_{h \to 0} \frac{f(g(x+h)) - f(g(x))}{h}

The trick is to introduce the change in g. Write \Delta u = g(x+h) - g(x). Then g(x+h) = g(x) + \Delta u, and the expression becomes:

\lim_{h \to 0} \frac{f(g(x) + \Delta u) - f(g(x))}{h}

Multiply and divide by \Delta u (when \Delta u \neq 0):

\lim_{h \to 0} \frac{f(g(x) + \Delta u) - f(g(x))}{\Delta u} \cdot \frac{\Delta u}{h}

Now observe what happens as h \to 0:

Since g is differentiable (and hence continuous), \Delta u = g(x+h) - g(x) \to 0 as h \to 0.
The first factor \frac{f(g(x) + \Delta u) - f(g(x))}{\Delta u} becomes f'(g(x)) as \Delta u \to 0 — this is the definition of the derivative of f at the point g(x).
The second factor \frac{\Delta u}{h} = \frac{g(x+h) - g(x)}{h} becomes g'(x) as h \to 0 — the definition of the derivative of g at x.

So the limit is f'(g(x)) \cdot g'(x), which is exactly the chain rule.

(A small technicality: when \Delta u = 0 for some values of h near 0, the division by \Delta u is not valid. The rigorous fix is given in the going-deeper section below.)

The chain rule as a two-stage pipeline. A change in $x$ flows through $g$ (producing $u$) then through $f$ (producing $y$). The overall rate of change is the product of the rates at each stage — just like gears in a gear train.

A quick-reference table

Before the worked examples, here are several common chain-rule derivatives that appear constantly in problems. Each one is the outer derivative times the inner derivative.

Function	Outer	Inner	Derivative
(ax + b)^n	u^n	ax + b	na(ax+b)^{n-1}
\sqrt{f(x)}	\sqrt{u}	f(x)	\frac{f'(x)}{2\sqrt{f(x)}}
\sin(ax)	\sin u	ax	a\cos(ax)
\cos(ax)	\cos u	ax	-a\sin(ax)
e^{f(x)}	e^u	f(x)	f'(x) \cdot e^{f(x)}
\ln(f(x))	\ln u	f(x)	\frac{f'(x)}{f(x)}

Every row follows the same pattern. Once you internalise the pattern, you will apply the chain rule without even consciously thinking about it.

Worked examples

Time to use the chain rule on real problems.

Example 1: Differentiating a power of a linear function

Find \dfrac{d}{dx}(3x + 1)^5.

Step 1. Identify the outer and inner functions.

\text{Outer: } f(u) = u^5, \qquad \text{Inner: } u = g(x) = 3x + 1

Why: the expression has two layers — "raise to the fifth power" is the outer operation, "compute 3x+1" is the inner one.

Step 2. Differentiate the outer function.

f'(u) = 5u^4

Why: by the power rule, the derivative of u^5 is 5u^4.

Step 3. Evaluate the outer derivative at the inner function.

f'(g(x)) = 5(3x + 1)^4

Why: wherever the formula says u, replace it with g(x) = 3x + 1.

Step 4. Differentiate the inner function.

g'(x) = 3

Why: the derivative of 3x + 1 with respect to x is just 3.

Step 5. Multiply the two pieces together (the chain rule).

\frac{d}{dx}(3x + 1)^5 = 5(3x + 1)^4 \cdot 3 = 15(3x + 1)^4

Result: \dfrac{d}{dx}(3x + 1)^5 = 15(3x + 1)^4.

The curve $y = (3x+1)^5$. At $x = 0$, the function value is $1$ and the derivative is $15(1)^4 = 15$, so the red tangent line rises steeply. The tangent slope comes from multiplying the outer derivative ($5u^4 = 5$ at $u = 1$) by the inner derivative ($3$).

Notice what would have gone wrong without the chain rule. Writing 5(3x+1)^4 alone gives a slope of 5 at x = 0, not 15. The tangent line would be three times too shallow — a real error, not a rounding issue. The picture would look wrong, and that is the clearest sign that a factor is missing.

Example 2: Differentiating a square root with a non-trivial inside

Find \dfrac{d}{dx}\sqrt{x^2 + 9}.

Step 1. Rewrite the square root as a power and identify the layers.

\sqrt{x^2 + 9} = (x^2 + 9)^{1/2}

\text{Outer: } f(u) = u^{1/2}, \qquad \text{Inner: } u = g(x) = x^2 + 9

Why: writing \sqrt{u} as u^{1/2} lets you apply the power rule to the outer function, just as in Example 1.

Step 2. Differentiate the outer function.

f'(u) = \frac{1}{2}u^{-1/2} = \frac{1}{2\sqrt{u}}

Why: the power rule gives \frac{d}{du}u^{1/2} = \frac{1}{2}u^{-1/2}, which is \frac{1}{2\sqrt{u}}.

Step 3. Evaluate at the inner function.

f'(g(x)) = \frac{1}{2\sqrt{x^2 + 9}}

Why: replace u with x^2 + 9.

Step 4. Differentiate the inner function.

g'(x) = 2x

Why: the derivative of x^2 + 9 is 2x (the constant 9 vanishes).

Step 5. Multiply.

\frac{d}{dx}\sqrt{x^2 + 9} = \frac{1}{2\sqrt{x^2 + 9}} \cdot 2x = \frac{x}{\sqrt{x^2 + 9}}

Result: \dfrac{d}{dx}\sqrt{x^2 + 9} = \dfrac{x}{\sqrt{x^2 + 9}}.

The curve $y = \sqrt{x^2 + 9}$. At $x = 0$, the derivative is $0/\sqrt{9} = 0$ — the curve is flat at its lowest point. At $x = 4$, the derivative is $4/\sqrt{25} = 4/5 = 0.8$. As $x \to \infty$, the derivative approaches $1$ from below — the curve increasingly resembles the line $y = x$, but never quite reaches slope $1$.

Check the answer at x = 4: the function value is \sqrt{16+9} = \sqrt{25} = 5, and the derivative is 4/5. The tangent line through (4, 5) with slope 4/5 matches the graph. The derivative also makes physical sense: \sqrt{x^2 + 9} is the distance from the point (x, 0) to (0, 3), and as you move x to the right, that distance grows — but at less than 1 unit per unit of x, because the distance is a hypotenuse, not a leg.

Multiple compositions: chaining the chain rule

What if there are three layers? Take y = \sin^2(3x), which means y = [\sin(3x)]^2.

There are three functions chained together:

Innermost: v = 3x
Middle: u = \sin v
Outermost: y = u^2

Apply the chain rule twice:

\frac{dy}{dx} = \frac{dy}{du} \cdot \frac{du}{dv} \cdot \frac{dv}{dx}

Compute each factor:

\frac{dy}{du} = 2u = 2\sin(3x), \qquad \frac{du}{dv} = \cos v = \cos(3x), \qquad \frac{dv}{dx} = 3

Multiply them together:

\frac{dy}{dx} = 2\sin(3x) \cdot \cos(3x) \cdot 3 = 6\sin(3x)\cos(3x)

Using the double-angle identity 2\sin\theta\cos\theta = \sin 2\theta, this simplifies to 3\sin(6x).

The solid black curve is $y = \sin^2(3x)$ and the dashed red curve is its derivative $y = 3\sin(6x)$. The derivative is zero at every peak and trough of the original — confirming the chain rule computation. The derivative oscillates twice as fast as the original, with triple the amplitude.

Here is another three-layer example. Find \frac{d}{dx}e^{\cos(x^2)}.

Innermost: w = x^2
Middle: v = \cos w
Outermost: y = e^v

The chain rule gives:

\frac{dy}{dx} = e^v \cdot (-\sin w) \cdot 2x = e^{\cos(x^2)} \cdot (-\sin(x^2)) \cdot 2x = -2x\sin(x^2) \cdot e^{\cos(x^2)}

Each layer contributes one factor to the product. You peel the function from the outside in, differentiating each layer and multiplying all the factors together.

The pattern generalises. For a chain of n functions f_1(f_2(\cdots f_n(x)\cdots)), you differentiate each layer and multiply all the derivatives together, each evaluated at the appropriate input. In Leibniz notation:

\frac{dy}{dx} = \frac{dy}{du_1} \cdot \frac{du_1}{du_2} \cdot \frac{du_2}{du_3} \cdots \frac{du_{n-1}}{dx}

The notation makes the pattern visible: each intermediate variable appears once in a numerator and once in the next denominator, forming a chain — which is why the rule has its name.

The chain rule in Leibniz notation

Leibniz notation is particularly natural for the chain rule because it makes the "multiply the rates" idea visible.

If y depends on u and u depends on x, then

\frac{dy}{dx} = \frac{dy}{du} \cdot \frac{du}{dx}

This looks exactly like fraction cancellation, and in practice you can use it that way — as long as you remember what each piece means.

Here is a concrete application. Suppose the radius r of a balloon is increasing at a rate of 2 cm/s, and you want to know how fast the volume V = \frac{4}{3}\pi r^3 is increasing when the radius is 5 cm.

You know \frac{dr}{dt} = 2 cm/s. You want \frac{dV}{dt}. The chain rule says:

\frac{dV}{dt} = \frac{dV}{dr} \cdot \frac{dr}{dt}

Compute \frac{dV}{dr}:

V = \frac{4}{3}\pi r^3 \quad \Longrightarrow \quad \frac{dV}{dr} = 4\pi r^2

So \frac{dV}{dt} = 4\pi r^2 \cdot 2 = 8\pi r^2.

At r = 5: \frac{dV}{dt} = 8\pi(25) = 200\pi \approx 628.3 cm^3/s.

The solid black curve is the volume $V = \frac{4}{3}\pi r^3$ as a function of radius. The dashed red curve is the rate of change of volume with respect to time, $\frac{dV}{dt} = 8\pi r^2$, given that $\frac{dr}{dt} = 2$ cm/s. At $r = 5$, the volume is about $524$ cm$^3$ and it is growing at $200\pi \approx 628$ cm$^3$/s.

The chain rule converted a rate of change in one variable (radius) into a rate of change in another (volume). This is the chain rule as a chain of cause and effect: a change in t causes a change in r, which causes a change in V, and the chain rule tells you how these rates multiply through. This idea — computing how fast one quantity changes when you know how fast a related quantity changes — is the basis of an entire class of problems called related rates, and every one of them is an application of the chain rule.

A preview: implicit differentiation

The chain rule is also the engine behind a technique called implicit differentiation, which you will study in detail in Implicit Differentiation.

Here is the idea. Suppose x and y satisfy the equation x^2 + y^2 = 25 — a circle of radius 5. You want \frac{dy}{dx}, but y is not given as an explicit function of x.

Differentiate both sides with respect to x. The left side has two terms:

\frac{d}{dx}(x^2) = 2x — straightforward.
\frac{d}{dx}(y^2) — here y is a function of x (even though you don't know which one), so y^2 is a composite: the square function applied to y(x). By the chain rule: \frac{d}{dx}(y^2) = 2y \cdot \frac{dy}{dx}.

The right side: \frac{d}{dx}(25) = 0.

So you get 2x + 2y\frac{dy}{dx} = 0. Solve for \frac{dy}{dx}:

2y\frac{dy}{dx} = -2x \quad \Longrightarrow \quad \frac{dy}{dx} = -\frac{x}{y}

At the point (3, 4) on the circle, the slope of the tangent is -\frac{3}{4}. At (0, 5), the slope is 0 — the tangent is horizontal at the top of the circle. At (5, 0), the formula gives -5/0, which is undefined — the tangent is vertical at the rightmost point. All three match the geometry perfectly.

The circle $x^2 + y^2 = 25$ with tangent lines at $(3,4)$ and $(0,5)$. The chain rule, applied to $y^2$ where $y$ is a function of $x$, produces the factor $\frac{dy}{dx}$ that makes the whole computation work. At $(5,0)$ the tangent is vertical — the derivative is undefined because $y = 0$ puts a zero in the denominator.

The key move was treating y^2 as (\text{something})^2 and applying the chain rule: derivative of the outer (2y) times derivative of the inner (\frac{dy}{dx}). Without the chain rule, implicit differentiation does not exist.

Common confusions

"The chain rule just means multiply by the derivative of the inside." Correct — but the word "just" hides the most common error: forgetting to do it at all. The derivative of \sin(5x) is 5\cos(5x), not \cos(5x). The derivative of e^{x^2} is 2xe^{x^2}, not e^{x^2}. Every time there is a non-trivial inner function, the chain rule produces an extra factor. The mistake is so common that teachers sometimes call it the "chain rule tax" — a factor you must always pay.
"\frac{dy}{dx} = \frac{dy}{du} \cdot \frac{du}{dx} is just cancelling fractions." It looks like fraction cancellation, and the notation was designed to suggest it, but \frac{dy}{dx} is not a fraction — it is a single symbol meaning "the derivative of y with respect to x." The chain rule is a theorem that has to be proved; it is not an algebraic identity. In single-variable calculus, treating it as fraction cancellation always gives the right answer, which is a feature of Leibniz's brilliant notation design — not evidence that the derivatives are actually fractions.
"The derivative of \sin^2(x) is \cos^2(x)." No. \sin^2(x) = [\sin(x)]^2, so the outer function is squaring and the inner function is \sin x. The chain rule gives 2\sin(x) \cdot \cos(x) = \sin(2x). The error comes from confusing \sin^2 x with \sin(x^2), which are completely different functions with completely different derivatives.
"I can always expand first and avoid the chain rule." Sometimes — (3x+1)^2 can be expanded to 9x^2 + 6x + 1 and differentiated term by term to get 18x + 6 = 6(3x+1), which matches the chain rule result 2(3x+1) \cdot 3. But try expanding (3x+1)^{50} or differentiating \sqrt{1 + x^4} by expansion. The chain rule handles both effortlessly. For functions like \sin(x^2) or e^{\cos x}, there is no algebraic expansion at all — the chain rule is the only tool available.
"The chain rule only applies when I can see two explicit functions." Not quite. Every time you see a function applied to something other than the bare variable x, the chain rule is at work — even in simple cases like \frac{d}{dx}(5x)^3 = 3(5x)^2 \cdot 5 = 375x^2. The question to ask is: "Is the argument of the outer function just x, or is it something more?" If it is something more, the chain rule applies.

Going deeper

If you came here to learn how to apply the chain rule, you have it — you can stop here. The rest of this section is for readers who want to see the rigorous proof details and a useful generalisation.

The rigorous proof via a linear approximation

The proof given earlier has a gap: when \Delta u = g(x+h) - g(x) = 0 for some nonzero values of h near 0, dividing by \Delta u is not valid. This happens, for example, when g is the zero function or any constant. Here is how to fix the proof so it works in all cases.

Define a function \phi as follows:

\phi(k) = \begin{cases} \dfrac{f(g(x) + k) - f(g(x))}{k} & \text{if } k \neq 0 \\[6pt] f'(g(x)) & \text{if } k = 0 \end{cases}

Since f is differentiable at g(x), the limit of the top case as k \to 0 equals f'(g(x)), so \phi is continuous at 0. This continuity is the key property.

Now for any h \neq 0, set k = \Delta u = g(x+h) - g(x). Then:

f(g(x+h)) - f(g(x)) = f(g(x) + k) - f(g(x)) = \phi(k) \cdot k = \phi(\Delta u) \cdot \Delta u

This identity holds whether \Delta u is zero or not. If \Delta u = 0, both sides are zero — the left side because f(g(x) + 0) - f(g(x)) = 0, and the right side because \phi(\Delta u) \cdot 0 = 0. No division by zero occurs.

Divide by h:

\frac{f(g(x+h)) - f(g(x))}{h} = \phi(\Delta u) \cdot \frac{\Delta u}{h}

As h \to 0: \Delta u \to 0 (because g is continuous at x, which follows from differentiability), so \phi(\Delta u) \to \phi(0) = f'(g(x)). And \frac{\Delta u}{h} = \frac{g(x+h) - g(x)}{h} \to g'(x). The limit of the product is the product of the limits (since both limits exist):

\frac{d}{dx}f(g(x)) = f'(g(x)) \cdot g'(x)

This completes the rigorous proof with no division-by-zero issues. The essential insight is that the function \phi lets you rewrite the difference quotient as a product rather than a quotient, and products behave well under limits.

The generalised chain rule for partial derivatives

When you move to functions of several variables — say z = f(x, y) where both x and y depend on a single parameter t — the chain rule generalises to:

\frac{dz}{dt} = \frac{\partial z}{\partial x} \cdot \frac{dx}{dt} + \frac{\partial z}{\partial y} \cdot \frac{dy}{dt}

Each independent pathway from t to z contributes a term. This is the multivariable chain rule, and it appears throughout physics and engineering. The single-variable chain rule you learned here is the special case where there is only one pathway from x to y.

The multivariable version explains something the single-variable version does not: why the single-variable chain rule involves a product (one pathway, one factor per link in the chain), while the multivariable version involves a sum of products (multiple pathways, each contributing a product). The full picture is a directed graph of dependencies, and the chain rule says: follow every path, multiply along each path, then add.

A note on higher derivatives and the chain rule

The chain rule tells you the first derivative of a composite. What about the second derivative? If y = f(g(x)), then

y' = f'(g(x)) \cdot g'(x)

To find y'', differentiate y' using the product rule (since y' is a product of two functions of x):

y'' = f''(g(x)) \cdot [g'(x)]^2 + f'(g(x)) \cdot g''(x)

The first term uses the chain rule again (to differentiate f'(g(x))), and the second term uses the chain rule on the original outer derivative. The formula for y'' is more complex than y', and the formulas for higher derivatives grow even more involved. This is one reason why Leibniz notation, while wonderful for first derivatives of composites, becomes less convenient for higher-order derivatives.

Where this leads next

The chain rule is a prerequisite for almost everything that follows in differentiation. The most immediate applications:

Derivatives of Trigonometric Functions — the chain rule combines with trig derivatives to handle \sin(3x), \cos(x^2), and every composite trig function.
Implicit Differentiation — finding \frac{dy}{dx} when y is defined implicitly by an equation, powered entirely by the chain rule.
Logarithmic Differentiation — using \ln to simplify products and powers before differentiating, with the chain rule applied to \ln(f(x)).
Parametric Differentiation — computing \frac{dy}{dx} when x and y are both functions of a parameter t, using \frac{dy}{dx} = \frac{dy/dt}{dx/dt}.
Derivatives of Inverse Trigonometric Functions — every inverse trig derivative is derived using the chain rule applied to an identity like \sin(\sin^{-1}x) = x.