In short
A derivative measures how quickly a function is changing at a single point. Geometrically, it is the slope of the tangent line to the function's graph at that point. Symbolically, it is what the average rate of change becomes when you shrink the interval down to nothing.
Drop a ball off a cliff. After exactly 2 seconds, how fast is it moving?
This is not the question "how fast on average over the first 2 seconds." That one is easy. By physics, after 2 seconds the ball has fallen 4.9 \times 2^2 = 19.6 metres, so its average speed over those 2 seconds is 19.6 \div 2 = 9.8 m/s.
But that's not what we asked. We asked how fast is the ball moving at the single instant t = 2. Right now. At this exact moment. Not averaged over anything.
That question is much harder than it sounds. Speed is distance divided by time — but at a single instant, no time has passed and no distance has been covered. Zero divided by zero. There is nothing to divide.
This is the kind of question that broke mathematicians for two thousand years. The answer turned out to be so important that an entire branch of mathematics — calculus — was invented to handle it. The thing you compute to answer this question is called a derivative.
A trick that almost works
Here's an idea. Instead of asking "how fast at the instant t = 2," let's ask "how fast on average over a tiny window around t = 2" — and then make the window smaller and smaller, and see what the average is heading toward.
The position of the falling ball at time t is s(t) = 4.9t^2. Compute the average velocity over windows of shrinking size, all anchored at t = 2:
| Window of time | Distance fallen in window | Window length | Average velocity |
|---|---|---|---|
| 2.0 to 3.0 | 44.10 - 19.60 = 24.50 m | 1 s | 24.50 m/s |
| 2.0 to 2.5 | 30.625 - 19.60 = 11.025 m | 0.5 s | 22.05 m/s |
| 2.0 to 2.1 | 21.609 - 19.60 = 2.009 m | 0.1 s | 20.09 m/s |
| 2.0 to 2.01 | 19.79604 - 19.60 = 0.19604 m | 0.01 s | 19.604 m/s |
| 2.0 to 2.001 | 19.619604 - 19.60 = 0.019604 m | 0.001 s | 19.6004 m/s |
Look at the rightmost column. The numbers are clearly heading somewhere: they are approaching 19.6 m/s. They are not going to reach 19.6 exactly — there is always some small extra term — but they are getting arbitrarily close.
So the answer to "how fast at the instant t = 2" looks like 19.6 m/s. Not because we ever computed it directly. Because every time we shrank the window, the average drifted closer to that one number.
This is the central trick. The instantaneous speed is what the average speed approaches as the time window shrinks to zero.
The same picture, drawn
There is another way to see exactly the same thing, and this one will generalise to any function — not just falling balls.
Plot s(t) = 4.9t^2 on a graph: time on the horizontal axis, position on the vertical axis. The graph is a parabola. Pick the point at t = 2. Now pick a second point a little to the right, say at t = 3. Draw the straight line connecting the two points. This line is called a secant.
What is the slope of this secant line? It is the change in s divided by the change in t:
That is exactly the average velocity over the window — the same 24.5 m/s from the table.
Now slide the second point closer. Take it to t = 2.5, then t = 2.1, then t = 2.01. Each time, redraw the secant. Something specific happens: the secant lines pivot toward a single line — the line that just touches the curve at t = 2 without cutting through it. That special line is called the tangent line.
The slope of the tangent line is what you want. It is what the secant slopes are heading toward as the second point closes in. From the table, you already know that number: 19.6.
You now have two ways to say the same thing.
- Numerically: the instantaneous velocity is the limit of the average velocity over a shrinking window.
- Geometrically: the instantaneous velocity is the slope of the tangent line at that point.
Both are saying: shrink something tiny, see what it is heading toward.
What you have just been computing has a name
The number — the slope of the tangent, the limit of the average rate of change — has a name. It is called the derivative of the function at that point.
When the function is s(t) = 4.9t^2, you have shown that its derivative at t = 2 is 19.6.
If you had done the same computation at t = 3, you would have got a different number. At t = 1, a different number again. Every point on the curve has its own tangent line, with its own slope, and so its own derivative value.
In other words: the derivative of a function at a point is just a number. But because that number changes from point to point, the derivative across all points is itself a function.
The formal definition
You can now write down the precise version of what you have been doing all along.
Definition
The derivative of a function f at a point x is
when this limit exists.
Reading the definition. Look at what each piece is saying:
- f(x+h) is the value of the function a tiny step h to the right of x.
- f(x+h) - f(x) is how much the function changed over that tiny step.
- Dividing by h turns that change into a rate: the change per unit of x. This is the slope of the secant from x to x+h.
- The \lim_{h \to 0} at the front says: shrink that tiny step until it isn't tiny anymore — until you have the rate at the single point itself.
The notation f'(x) — read "f prime of x" — is the most common name for this. You will also see Leibniz's notation \dfrac{df}{dx}, which makes the "rate of change of f per unit of x" meaning more visible, and the equivalent \dfrac{d}{dx}f(x). Some Indian textbooks call this the differential coefficient of f with respect to x. It is the same number, just an older name.
Notice how this captures exactly the picture you built up. Each \dfrac{f(x+h) - f(x)}{h} is the slope of a secant line from x to x+h. The limit of those slopes as h shrinks is the slope of the tangent. And the slope of the tangent is exactly what the table of average velocities was approaching.
Computing one from the definition
Time to compute a derivative yourself. Find the derivative of f(x) = x^2 — the simplest possible non-trivial function — using the definition directly.
Example 1: f(x) = x²
Step 1. Compute f(x+h).
Why: you need the value of the function a step h to the right of x, so you can subtract.
Step 2. Compute f(x+h) - f(x).
Why: the x^2 terms cancel — that is the whole point of subtracting. Notice that everything left over has at least one factor of h.
Step 3. Divide by h.
Why: you can divide because h \neq 0 — you have not taken the limit yet. After the division, the result is a clean expression in x and h.
Step 4. Take the limit as h \to 0.
Why: the first term 2x does not involve h at all, so it stays. The second term h goes to 0.
Result: f'(x) = 2x.
Read what this is saying. The derivative of x^2 is 2x — everywhere. Not at one point. At every single point, all at once, in one formula.
So at x = 3, the slope of the tangent to the parabola is 2 \times 3 = 6. At x = 10, the slope is 20. At x = -4, the slope is -8, meaning the parabola is sloping downward as you walk along it at x = -4 — which matches the picture of the parabola, since the left arm goes down. The function gets steeper as you move outward from the origin, and the formula tells you exactly how steep at any point you ask about.
A second example, for the pattern
Let's do another one, with the same template. Take f(x) = 1/x. This is a different shape — a hyperbola — and you should expect the derivative to look different too.
Example 2: f(x) = 1/x
Why: to subtract two fractions you need a common denominator. The numerator collapses to -h.
Why: the h in the numerator cancels with the h in the denominator. You can divide because h \neq 0.
Why: as h goes to 0, the term (x+h) becomes just x. This is well-defined as long as x \neq 0.
Result: f'(x) = -\dfrac{1}{x^2}.
What does this say? The derivative is negative everywhere — meaning the slope of the tangent line to 1/x always points downward. That matches the picture: 1/x is a decreasing function on both sides of zero. The magnitude \frac{1}{x^2} is large when x is small (the curve dives steeply near the origin) and tiny when x is large (the curve flattens out far from the origin). Both of those match what you would see if you looked at the graph.
Common confusions
A few things students reliably get wrong about derivatives the first time they meet them.
-
"The derivative is the slope of the function." Almost — it is the slope of the tangent line at a single point. A function does not have one slope; it has a different slope at every point. The derivative records all of those slopes as a new function.
-
"dy/dx is a fraction." It looks like one, and Leibniz designed the notation to suggest a fraction, and you can sometimes get away with treating it like one. But formally, it is not. The d's are not numbers. The whole symbol dy/dx is one piece of notation for "the derivative of y with respect to x."
-
"Continuous functions are always differentiable." False. Continuity means the graph has no jumps. Differentiability means the graph has a well-defined tangent line at every point. A graph with a sharp corner — like |x| at the origin — is continuous but has no tangent line at the corner. You will see exactly this case below.
-
"Differentiable means smooth." Depends on what you mean by smooth. Differentiable once means the graph has tangent lines. To get the kind of smoothness where the slope itself changes smoothly, you need the derivative to also be differentiable. Higher-order derivatives are how you measure that.
Going deeper
If you're just here to understand what a derivative is, you have it — you can stop here. The rest of this article is for readers who want the rigorous version, the boundary cases, and the connection to the formal theory of limits.
When the definition fails
The definition says "when this limit exists." Sometimes the limit does not exist.
The classic example is f(x) = |x| at the point x = 0. The function is perfectly continuous there — no jumps, no holes — but it has a sharp corner.
Try to compute the derivative of |x| at x = 0 from the definition. You get
If h is positive — the second point is to the right of zero — this expression is \frac{h}{h} = 1. If h is negative — the second point is to the left of zero — this expression is \frac{-h}{h} = -1. The two one-sided limits are different. There is no single number the expression approaches as h \to 0 from both sides. So the limit does not exist, and |x| has no derivative at x = 0.
Geometrically: there is no single tangent line at the corner. You could draw a line of any slope between -1 and +1 through the corner without cutting the function, so the tangent isn't well-defined. The derivative captures something specific about the function, and at a corner, there is no specific thing to capture.
What the limit really means
The "limit" in the definition is not waving hands. It has its own precise definition — the epsilon-delta definition — that turns "the secant slopes get arbitrarily close to one number" into a statement you can actually prove things from. You will meet it in the article on Limit. Everything in this article rests on that more rigorous foundation.
Worth knowing: Newton and Leibniz worked out derivatives in the 1670s and used them successfully for two centuries before Weierstrass nailed down the formal definition of a limit in the 1850s. The intuition came first; the rigorous justification came later. That order is unusual in mathematics, and it tells you how stubbornly useful the idea of a derivative is.
Higher-order derivatives
Since f'(x) is itself a function, you can take its derivative. That second derivative, written f''(x), measures how f' is changing — how the slope itself changes. Geometrically, it tells you whether the curve is bending up or bending down. That is the subject of Concavity and Points of Inflection.
Where this leads next
You now know what a derivative is. The next set of articles show you how to compute derivatives without going back to the definition every time — because expanding (x+h)^n for large n gets tedious quickly, and there are shortcuts.
- Power Rule — the shortcut for differentiating x^n for any n, so you never have to expand (x+h)^n by hand again.
- Sum, Product, and Quotient Rules — how to differentiate combinations of functions built from simpler pieces.
- Chain Rule — how to differentiate a function inside another function.
- Tangent and Normal — using the derivative to find the equation of the tangent line at any point.
- Maxima and Minima — using the derivative to find the highest and lowest points of a curve. This is the application that makes derivatives indispensable in physics, economics, optimisation, and machine learning.