In short

The derivative of a function at a point is the instantaneous rate of change — the limit of the average rate of change over a shrinking interval. Geometrically, it is the slope of the tangent line to the graph. The definition is f'(x) = \lim_{h \to 0} \frac{f(x+h) - f(x)}{h}, and the notation f'(x), \frac{dy}{dx}, and Df all refer to the same thing.

A train leaves Delhi at 8 AM and arrives in Agra at 10 AM, covering 200 km. Its average speed is 200 / 2 = 100 km/h. But if you looked at the speedometer at 9:15 AM, you would not see 100 — the train speeds up, slows down, stops at signals. The speedometer shows the speed at that instant, not averaged over the whole journey.

Average speed is simple: total distance divided by total time. Instantaneous speed is harder. At a single instant, no time passes and no distance is covered — you have 0/0, which is meaningless.

So how does the speedometer work? It does not compute 0/0. It computes the average speed over a tiny interval — say the last tenth of a second — and that average, over such a short window, is very close to the instantaneous speed. The shorter the window, the closer the average gets.

That idea — shrinking the averaging window until the average settles on a single number — is the entire idea behind the derivative.

Average rate of change

Start with any function f(x), not just distance-versus-time. Pick two points on its graph: the point at x = a and the point at x = b.

The average rate of change of f from a to b is

\frac{f(b) - f(a)}{b - a}

This is the ratio of how much the output changed to how much the input changed. Geometrically, it is the slope of the straight line connecting the two points (a, f(a)) and (b, f(b)) on the graph. That line is called a secant line.

The parabola $y = x^2$ with a secant line drawn from $(1, 1)$ to $(4, 16)$. The slope of this secant is $\frac{16 - 1}{4 - 1} = 5$, which is the average rate of change of $x^2$ from $x = 1$ to $x = 4$.

Take f(x) = x^2. The average rate of change from x = 1 to x = 4 is

\frac{f(4) - f(1)}{4 - 1} = \frac{16 - 1}{3} = 5

This tells you that, on average, the function x^2 increases by 5 units for every 1 unit increase in x, over the interval from 1 to 4. But the function is curved — it increases slowly near x = 1 and quickly near x = 4 — so the rate of change is not the same everywhere. The number 5 is only the average.

Instantaneous rate of change

To get the rate of change at a single point — say at x = 1 — you need to shrink the interval. Instead of measuring from x = 1 to x = 4, measure from x = 1 to x = 1 + h, where h is a small positive number. The average rate of change over this tiny interval is

\frac{f(1 + h) - f(1)}{h} = \frac{(1 + h)^2 - 1}{h} = \frac{1 + 2h + h^2 - 1}{h} = \frac{2h + h^2}{h} = 2 + h

Now make h smaller:

h Average rate of change = 2 + h
1 3
0.5 2.5
0.1 2.1
0.01 2.01
0.001 2.001

The averages are heading toward 2. They never reach 2 for any non-zero h, but they get as close as you like. The number they are approaching — 2 — is the instantaneous rate of change of f(x) = x^2 at x = 1.

Geometrically, each row of the table corresponds to a secant line from (1, 1) to (1+h, (1+h)^2). As h shrinks, the second point slides toward the first, and the secant pivots toward a line that just touches the curve at (1, 1) without cutting through it. That limiting line is the tangent line, and its slope — 2 — is the instantaneous rate of change.

Secant lines (dashed) from $(1, 1)$ to nearby points on the parabola $y = x^2$. As the second point approaches $(1, 1)$, the secant lines rotate toward the tangent line (solid red), whose slope is $2$. The tangent line is the limit of the secants.

From one point to any point

The computation you just did at x = 1 works at any point. At x = 3, the average rate of change of f(x) = x^2 over the interval from 3 to 3 + h is

\frac{(3 + h)^2 - 9}{h} = \frac{9 + 6h + h^2 - 9}{h} = 6 + h

As h \to 0, this approaches 6. So the instantaneous rate of change of x^2 at x = 3 is 6.

At a general point x:

\frac{(x + h)^2 - x^2}{h} = \frac{x^2 + 2xh + h^2 - x^2}{h} = 2x + h

As h \to 0, this approaches 2x. So the instantaneous rate of change of x^2 at any point x is 2x. At x = 1 you get 2; at x = 3 you get 6; at x = 0 you get 0 — the parabola is flat at its vertex.

This is the big leap. You started with a specific function (x^2) and a specific question (how fast is it changing at x = 1?), and you ended up with a new function (2x) that answers the question at every point simultaneously. That new function — the one that gives the instantaneous rate of change everywhere — has a name: the derivative.

The black curve is $f(x) = x^2$ and the dashed red line is its derivative $f'(x) = 2x$. At each marked point, the derivative value ($2x$) tells you the slope of the tangent to the parabola. At $x = 0$, the parabola is flat (slope $0$). At $x = 2$, the slope is $4$. The derivative function is a complete record of the original function's steepness at every point.

The definition of the derivative

You can now write down the precise version. For any function f and any point x, the average rate of change over the interval from x to x + h is

\frac{f(x + h) - f(x)}{h}

This expression is called the difference quotient. It depends on both x and h. Take the limit as h \to 0, and you get the instantaneous rate of change at x — if the limit exists.

Definition of the derivative

The derivative of a function f at a point x is

f'(x) = \lim_{h \to 0} \frac{f(x+h) - f(x)}{h}

when this limit exists.

Read each piece:

The derivative at x is just a number. But since each point x gives its own number, the derivative across all points is itself a function: f' is a new function built from f.

An equivalent form: the "two-point" version

Sometimes it is cleaner to write the definition using two points x and a instead of a point and a step:

f'(a) = \lim_{x \to a} \frac{f(x) - f(a)}{x - a}

This is the same definition — just set x = a + h and h \to 0 becomes x \to a. The fraction \frac{f(x) - f(a)}{x - a} is the slope of the secant from (a, f(a)) to (x, f(x)), and the limit as x \to a is the slope of the tangent at a. Both forms appear in textbooks; use whichever is more convenient.

Notation for the derivative

There are several notations for the same thing, and you will see all of them.

Lagrange's notation: f'(x). Read "f prime of x." Clean and compact. When you have a specific function named f, this is the most common choice. The second derivative is f''(x), the third is f'''(x).

Leibniz's notation: \dfrac{dy}{dx}. Read "dee y dee x." If y = f(x), then \dfrac{dy}{dx} is the derivative of y with respect to x. This notation suggests a fraction — output change over input change — and that suggestion is deliberate. Leibniz designed it to make the chain rule and substitution rules look like fraction algebra. But dy and dx are not separate numbers; \dfrac{dy}{dx} is a single symbol.

You will also see \dfrac{d}{dx}f(x), which means "differentiate f(x) with respect to x" — the \dfrac{d}{dx} acts as an operator applied to f(x).

Newton's dot notation: \dot{y}. Used primarily in physics, where the variable is usually time. \dot{y} means \dfrac{dy}{dt}. You will encounter it in mechanics and differential equations, rarely in pure mathematics.

Operator notation: Df. Treats differentiation as a function that takes in f and produces f'. The symbol D is the differentiation operator. Some Indian textbooks use this.

All four notations refer to the same mathematical object. The choice is a matter of context and convenience. In this article, f'(x) and \dfrac{dy}{dx} will appear most often.

Four equivalent notations for the derivativeA diagram showing the four common notations for the derivative of y = f(x): f prime of x (Lagrange), dy/dx (Leibniz), y-dot (Newton), and Df (operator). All are connected by equal signs to show they mean the same thing. f'(x) Lagrange = dy/dx Leibniz = Newton = Df Operator All four mean the same thing: the derivative of y = f(x).
Four notations, one concept. $f'(x)$ and $dy/dx$ are the most common in Indian textbooks. Newton's $\dot{y}$ appears in physics. The operator $Df$ is occasional in analysis.

Computing derivatives from the definition

The definition is not just a theoretical formula — it is a recipe. Feed in a function, carry out the algebra, take the limit, and out comes the derivative. Time to do this twice.

Example 1: Derivative of $f(x) = x^3$

Step 1. Compute f(x + h).

f(x + h) = (x + h)^3 = x^3 + 3x^2 h + 3xh^2 + h^3

Why: you need f evaluated a step h to the right. Expand using the binomial formula (or just multiply out).

Step 2. Compute f(x + h) - f(x).

f(x + h) - f(x) = (x^3 + 3x^2 h + 3xh^2 + h^3) - x^3 = 3x^2 h + 3xh^2 + h^3

Why: the x^3 terms cancel. Everything that remains has at least one factor of h — this is the key algebraic feature.

Step 3. Divide by h.

\frac{f(x+h) - f(x)}{h} = \frac{3x^2 h + 3xh^2 + h^3}{h} = 3x^2 + 3xh + h^2

Why: h \neq 0 at this stage (you haven't taken the limit yet), so dividing by h is valid. Each term in the numerator gives up one factor of h.

Step 4. Take the limit as h \to 0.

f'(x) = \lim_{h \to 0} (3x^2 + 3xh + h^2) = 3x^2

Why: the terms 3xh and h^2 both go to 0 as h \to 0. Only 3x^2 survives — it does not depend on h.

Result: f'(x) = 3x^2.

The black curve is $f(x) = x^3$ and the dashed red curve is its derivative $f'(x) = 3x^2$. At $x = 1$, the cubic has slope $3(1)^2 = 3$ — the derivative curve reads $3$ there. At $x = 0$, the cubic is flat (horizontal tangent), and the derivative reads $0$. The derivative curve records the slope of the original curve at every point.

Check a few values. At x = 1, the slope of x^3 should be 3(1)^2 = 3. At x = -1, it should be 3(-1)^2 = 3 — the same steepness, but notice the graph at x = -1 is rising (from left to right), not falling. The slope is positive because x^3 is increasing everywhere. At x = 0, the slope is 0 — the curve passes through the origin with a flat tangent, which matches the picture of the cubic's inflection point at the origin.

Example 2: Derivative of $f(x) = \sqrt{x}$

This example uses a different algebraic trick — rationalising — because the square root does not expand like a polynomial.

Step 1. Write the difference quotient.

\frac{f(x+h) - f(x)}{h} = \frac{\sqrt{x+h} - \sqrt{x}}{h}

Why: straight substitution into the definition. The difficulty is that the numerator has a difference of square roots, which does not simplify by expansion.

Step 2. Rationalise the numerator by multiplying top and bottom by \sqrt{x+h} + \sqrt{x}.

\frac{\sqrt{x+h} - \sqrt{x}}{h} \cdot \frac{\sqrt{x+h} + \sqrt{x}}{\sqrt{x+h} + \sqrt{x}} = \frac{(x+h) - x}{h(\sqrt{x+h} + \sqrt{x})} = \frac{h}{h(\sqrt{x+h} + \sqrt{x})}

Why: the numerator becomes (\sqrt{x+h})^2 - (\sqrt{x})^2 = (x+h) - x = h. The difference-of-squares identity (a-b)(a+b) = a^2 - b^2 is doing the work.

Step 3. Cancel h.

\frac{h}{h(\sqrt{x+h} + \sqrt{x})} = \frac{1}{\sqrt{x+h} + \sqrt{x}}

Why: again, h \neq 0 at this stage, so the cancellation is valid.

Step 4. Take the limit as h \to 0.

f'(x) = \lim_{h \to 0} \frac{1}{\sqrt{x+h} + \sqrt{x}} = \frac{1}{\sqrt{x} + \sqrt{x}} = \frac{1}{2\sqrt{x}}

Why: as h \to 0, \sqrt{x + h} \to \sqrt{x}. The denominator becomes 2\sqrt{x}. This is valid for x > 0.

Result: f'(x) = \dfrac{1}{2\sqrt{x}}, for x > 0.

The black curve is $f(x) = \sqrt{x}$ and the dashed red curve is its derivative $f'(x) = \frac{1}{2\sqrt{x}}$. At $x = 1$, the slope is $\frac{1}{2}$. At $x = 4$, the slope is $\frac{1}{4}$ — the curve is flattening out. As $x \to 0^+$, the derivative blows up: the square root curve is nearly vertical near the origin.

Two things to notice. First, the derivative \frac{1}{2\sqrt{x}} is not defined at x = 0. The function \sqrt{x} has a vertical tangent at the origin — the curve goes straight up — and vertical lines have undefined slope. Second, the derivative is always positive for x > 0, which matches the picture: \sqrt{x} is always increasing. And the derivative decreases as x grows — the curve gets flatter and flatter — which you can see from both the formula and the graph.

The pattern emerging

Both examples followed the same four-step recipe:

  1. Compute f(x + h).
  2. Subtract f(x) to get the numerator.
  3. Divide by h and simplify (cancel the h that is making the expression 0/0).
  4. Take h \to 0.

Step 3 is where the algebra happens. There is always a factor of h in the numerator (because f(x + h) - f(x) \to 0 as h \to 0), and dividing by h removes it, leaving something clean enough that the limit exists.

For x^2, the trick was expanding and cancelling. For x^3, the same. For \sqrt{x}, the trick was rationalising. For 1/x, you would use common denominators. Each function type has its own algebraic move, but the structure is always the same.

This process — computing derivatives from the limit definition — is called differentiation from first principles (or "ab initio" in some textbooks). It always works, but it gets tedious for complicated functions. The whole point of the differentiation rules (power rule, product rule, chain rule) is to avoid repeating this process for every new function. Those shortcuts are the subject of the next several articles.

Common confusions

Going deeper

If you came here to understand what the derivative is and how to compute it from the definition, you have it — you can stop here. The rest of this article is for readers who want the rigorous details and a broader perspective.

Why the h-form and the two-point form are the same

Set x = a + h in the two-point form. Then x - a = h and x \to a becomes h \to 0:

\lim_{x \to a} \frac{f(x) - f(a)}{x - a} = \lim_{h \to 0} \frac{f(a + h) - f(a)}{h}

The two forms are related by a change of variable. In practice, the h-form is usually easier for computation (because you expand f(a + h) and simplify), while the two-point form is sometimes cleaner for proofs (because it is symmetric in x and a).

The derivative as a linear approximation

The derivative does more than give a slope — it gives the best linear approximation to f near x = a. The tangent line at a is

y = f(a) + f'(a)(x - a)

For x close to a, f(x) \approx f(a) + f'(a)(x - a). The error in this approximation is roughly proportional to (x - a)^2 — it vanishes faster than the linear term. This is why the tangent line hugs the curve so closely near the point of tangency.

For example, \sqrt{x} near x = 4: the derivative is f'(4) = \frac{1}{2\sqrt{4}} = \frac{1}{4}, so

\sqrt{x} \approx 2 + \frac{1}{4}(x - 4) = 1 + \frac{x}{4}

Try x = 4.1: the approximation gives 2.025, while \sqrt{4.1} = 2.02485\ldots The error is about 0.00015 — remarkably small for such a simple formula.

Derivative of a constant function

If f(x) = c for all x, then

\frac{f(x+h) - f(x)}{h} = \frac{c - c}{h} = 0

for all h \neq 0. The limit is 0. So f'(x) = 0 everywhere: a constant function has zero rate of change, which is geometrically obvious since its graph is a horizontal line.

Derivative of f(x) = x

\frac{(x+h) - x}{h} = \frac{h}{h} = 1

for all h \neq 0. So f'(x) = 1 everywhere. The function f(x) = x is a straight line with slope 1, and its derivative confirms that: the slope of a straight line is the same at every point.

The emerging pattern

Lay out what you have:

Function f(x) Derivative f'(x)
c (constant) 0
x 1
x^2 2x
x^3 3x^2
x^{1/2} \frac{1}{2}x^{-1/2}

Look at the pattern. Each derivative brings the exponent down as a coefficient and reduces the exponent by one. In every case, the derivative of x^n is nx^{n-1}. Check: for n = 3, you get 3x^2. For n = 1/2, you get \frac{1}{2}x^{-1/2} = \frac{1}{2\sqrt{x}}. For n = 1, you get 1 \cdot x^0 = 1. Even for n = 0 (the constant x^0 = 1), you get 0 \cdot x^{-1} = 0.

This pattern holds for every real number n — positive, negative, integer, fraction, even irrational. Proving it in full generality is the content of the power rule, which is the subject of the next article in this sequence.

Historical note on notation

The f'(x) notation (the "prime" mark) was introduced by Joseph-Louis Lagrange in the 18th century. The \frac{dy}{dx} notation was introduced by Leibniz a century earlier. Both notations survived because each has advantages. Lagrange's is compact: f'(3) immediately means "the derivative of f at 3." Leibniz's is structural: \frac{dy}{dx} shows you which variable you are differentiating with respect to, which matters when a function depends on more than one variable.

Indian textbooks tend to use \frac{dy}{dx} most often, especially in applied problems (physics, rate-of-change word problems). In proofs and abstract statements, f'(x) is more common. Being comfortable with both is essential — exam questions can use either without warning.

Where this leads next

You now have the definition and the first-principles technique. The next set of articles gives you the tools to compute derivatives without going back to the definition every time.