Numerical Integration

In short

Most integrals you meet in the real world have no closed form. To compute them, you approximate the area by slicing the interval into n equal pieces and replacing the function on each piece with something easy to integrate: a straight line (the trapezoidal rule) or a parabola (Simpson's rule). Both approximations get better as n grows — the trapezoidal error shrinks like \frac{1}{n^2}, Simpson's like \frac{1}{n^4}. Simpson's is almost always the right choice.

Compute \int_0^1 e^{-x^2}\,dx.

This integral is famous for a reason: it has no elementary antiderivative. You cannot write a finite combination of polynomials, exponentials, logarithms, and trig functions whose derivative is e^{-x^2}. The integrand is smooth, the interval is short, the fundamental theorem applies — and still, there is no closed form to evaluate. You cannot plug the endpoints into any expression you can write down.

But the area under the curve still exists. It is a real number, just under 0.75, and you can compute it to as many decimal places as you want — you just cannot do it using the fundamental theorem. You have to do it numerically, by approximating the area geometrically.

This is where numerical integration lives. It is the machinery for computing definite integrals when antiderivatives are not available, and it is what every scientific calculator, every piece of physics simulation software, and every finance model actually does under the hood when you ask it for an integral. Newton's closed-form methods are beautiful, but they are the minority — most integrals that arise in practice are numerical.

The idea: approximate the curve, not the function

The fundamental theorem says: to compute an area under f, find an antiderivative of f and evaluate at the endpoints. Numerical integration says: forget the antiderivative. The area under f is still an area — and if you can approximate f by something simpler whose area you can compute directly, you get an approximate answer.

There are two obvious candidates for "simpler":

Straight-line approximation. Over each little subinterval, replace the curve with the straight line joining its two endpoints. The area under a straight line is a trapezoid, and trapezoid areas are easy. This is the trapezoidal rule.
Parabolic approximation. Over each pair of subintervals, replace the curve with the parabola passing through three consecutive points. The area under a parabola is slightly less easy — but still a closed formula. This is Simpson's rule.

The trapezoidal rule uses a degree-1 polynomial (a line). Simpson's rule uses a degree-2 polynomial (a parabola). You might guess that degree-3, degree-4, ... rules exist too, and they do — they are called Newton–Cotes formulas — but the returns start diminishing, and Simpson's rule is the sweet spot for most problems.

Here is the trapezoidal picture. Slice [a, b] into n equal subintervals of width h = \frac{b - a}{n}. The endpoints of the subintervals are x_0 = a, x_1 = a + h, x_2 = a + 2h, \ldots, x_n = b. On the i-th subinterval [x_{i-1}, x_i], approximate the area under the curve by the area of the trapezoid with vertices (x_{i-1}, 0), (x_i, 0), (x_i, f(x_i)), (x_{i-1}, f(x_{i-1})). That area is \frac{h}{2}[f(x_{i-1}) + f(x_i)].

Sum over all n subintervals:

\sum_{i=1}^n \frac{h}{2}[f(x_{i-1}) + f(x_i)] = \frac{h}{2}\left[f(x_0) + 2f(x_1) + 2f(x_2) + \cdots + 2f(x_{n-1}) + f(x_n)\right]

Every interior point x_1, x_2, \ldots, x_{n-1} is shared between two trapezoids — once as the right vertex of one and once as the left vertex of the next — so it appears twice. The two boundary points x_0 and x_n appear only once.

The trapezoidal rule

For f continuous on [a, b], partition the interval into n equal subintervals of width h = \frac{b - a}{n}, and let x_i = a + ih. The trapezoidal approximation is

T_n = \frac{h}{2}\Bigl[f(x_0) + 2f(x_1) + 2f(x_2) + \cdots + 2f(x_{n-1}) + f(x_n)\Bigr]

As n \to \infty, T_n \to \int_a^b f(x)\,dx. The error satisfies

\left|\int_a^b f(x)\,dx - T_n\right| \leq \frac{(b - a)^3}{12 n^2}\max_{[a,b]}|f''(x)|

The error bound is the useful part. It tells you two things: first, the approximation gets better as n grows, with the error shrinking like \frac{1}{n^2}. If you double n, the error drops by a factor of 4. Second, the error depends on |f''|. A function whose second derivative is large — meaning the curve bends sharply — is harder to approximate with straight lines, and the bound is bigger. A linear function has f'' = 0, and the trapezoidal rule is exact for it.

The first worked example

Let me run the trapezoidal rule on an integral you can check.

Example 1: $\int_0^1 x^2\,dx$ with $n = 4$

Step 1. Set up. With a = 0, b = 1, n = 4, the step size is h = \frac{1 - 0}{4} = 0.25. The nodes are x_0 = 0, x_1 = 0.25, x_2 = 0.5, x_3 = 0.75, x_4 = 1.

Why: you need the x-values where you will evaluate the function. With n = 4 subintervals, you have 5 nodes.

Step 2. Evaluate f(x) = x^2 at each node.

f(0) = 0, \quad f(0.25) = 0.0625, \quad f(0.5) = 0.25

f(0.75) = 0.5625, \quad f(1) = 1

Why: each node gives you the height of the curve at that point — the top of each trapezoid.

Step 3. Apply the trapezoidal formula.

T_4 = \frac{0.25}{2}\Bigl[0 + 2(0.0625) + 2(0.25) + 2(0.5625) + 1\Bigr]

= 0.125\Bigl[0 + 0.125 + 0.5 + 1.125 + 1\Bigr] = 0.125 \times 2.75 = 0.34375

Why: the weights are 1, 2, 2, 2, 1 — boundary points once, interior points twice. Scale by h/2 at the end.

Step 4. Compare with the exact answer.

\int_0^1 x^2\,dx = \frac{x^3}{3}\bigg|_0^1 = \frac{1}{3} \approx 0.33333

The approximation 0.34375 is a bit too large — the true value is \frac{1}{3}, and T_4 = \frac{11}{32}. The error is \frac{11}{32} - \frac{1}{3} = \frac{33 - 32}{96} = \frac{1}{96} \approx 0.0104.

Step 5. Compare with the error bound. For f(x) = x^2, f''(x) = 2, so \max|f''| = 2. The bound is

\frac{(b-a)^3}{12 n^2}\cdot 2 = \frac{1}{12 \cdot 16}\cdot 2 = \frac{1}{96} \approx 0.0104

The actual error matches the bound exactly (which is unusual — it happens because f'' is a constant). In most problems the actual error is smaller than the bound.

Result: T_4 = 0.34375, which approximates \int_0^1 x^2\,dx = \frac{1}{3} with error \frac{1}{96}.

The parabola $y = x^2$ on $[0, 1]$, with the trapezoidal approximation in soft red. Four chords, drawn between consecutive sample points, form the tops of four trapezoids that over-estimate the area under the curve. Each trapezoid overshoots because the parabola is concave up — it bulges below the chord — and the trapezoidal rule always over-estimates for concave-up functions.

Notice the overshoot in the picture. Every chord lies above the curve between its two endpoints, because the curve is concave up. That is a general fact: for a concave-up function the trapezoidal rule is always an over-estimate, and for a concave-down function it is always an under-estimate. This explains the sign of the error — you can read off whether the approximation is high or low just by looking at the second derivative.

Simpson's rule: parabolas beat lines

The trapezoidal rule uses straight lines. But straight lines are a pretty crude approximation to a general curve. What if you fit a parabola through three consecutive points instead?

A parabola is determined by three points, so you can do it over pairs of subintervals. That is Simpson's rule.

Start by partitioning [a, b] into n subintervals, where n must be even, of width h = \frac{b-a}{n}. Group the subintervals into pairs: [x_0, x_2], [x_2, x_4], \ldots, [x_{n-2}, x_n]. Over each pair, fit a parabola through f(x_0), f(x_1), f(x_2), and integrate that parabola exactly.

A little algebra (done in the going-deeper section) gives the area under the parabola as \frac{h}{3}[f(x_0) + 4f(x_1) + f(x_2)]. Summing over all the pairs and consolidating the shared boundary points gives the main formula:

Simpson's rule

For f continuous on [a, b], partition the interval into n (even) equal subintervals of width h = \frac{b - a}{n}, and let x_i = a + ih. Simpson's approximation is

S_n = \frac{h}{3}\Bigl[f(x_0) + 4f(x_1) + 2f(x_2) + 4f(x_3) + 2f(x_4) + \cdots + 4f(x_{n-1}) + f(x_n)\Bigr]

where interior points alternate weights 4, 2, 4, 2, \ldots, odd-indexed points get 4, even-indexed points get 2, and the boundary points x_0 and x_n get 1.

The error satisfies

\left|\int_a^b f(x)\,dx - S_n\right| \leq \frac{(b - a)^5}{180 n^4}\max_{[a,b]}|f^{(4)}(x)|

The error shrinks like \frac{1}{n^4}, not \frac{1}{n^2} — so doubling n makes the error 16 times smaller, not 4 times smaller. This is why Simpson's rule is almost always preferred over trapezoidal: for the same amount of work, you get much more accuracy.

And there is a second bonus. The error bound depends on |f^{(4)}|, the fourth derivative of f. For polynomials of degree \leq 3, f^{(4)} = 0, so Simpson's rule is exact for cubics. Despite using parabolas, it reproduces cubics perfectly because the cubic-correction terms happen to average out over each pair of subintervals. This is a small mathematical miracle and one of the reasons Simpson's rule is so good.

The second worked example

Run Simpson's rule on the same problem.

Example 2: $\int_0^1 e^{-x^2}\,dx$ using Simpson's rule with $n = 4$

Step 1. Set up. a = 0, b = 1, n = 4, h = 0.25. Nodes: x_0 = 0, x_1 = 0.25, x_2 = 0.5, x_3 = 0.75, x_4 = 1.

Why: same setup as the trapezoidal example. Simpson's needs n to be even, and n = 4 qualifies.

Step 2. Evaluate f(x) = e^{-x^2} at each node.

f(0) = 1, \quad f(0.25) = e^{-0.0625} \approx 0.93941

f(0.5) = e^{-0.25} \approx 0.77880, \quad f(0.75) = e^{-0.5625} \approx 0.56978

f(1) = e^{-1} \approx 0.36788

Why: unlike the previous example, this function has no elementary antiderivative — you cannot check the answer by the fundamental theorem. The numerical evaluation at nodes is all you have.

Step 3. Apply Simpson's formula with weights 1, 4, 2, 4, 1.

S_4 = \frac{0.25}{3}\Bigl[1 + 4(0.93941) + 2(0.77880) + 4(0.56978) + 0.36788\Bigr]

= \frac{0.25}{3}\Bigl[1 + 3.75764 + 1.55760 + 2.27912 + 0.36788\Bigr]

= \frac{0.25}{3} \times 8.96224 \approx 0.74685

Why: the pattern of weights is easy to remember — ends get 1, and interior points alternate 4, 2, 4, 2, \ldots starting and ending on 4. The odd-indexed nodes (the "parabola midpoints") get the 4's.

Step 4. Compare with the "true" value. The best accepted value of this integral is \int_0^1 e^{-x^2}\,dx \approx 0.74682. So Simpson's rule with just n = 4 gives about 5 correct digits — remarkable for so little work.

Result: S_4 \approx 0.74685, approximating \int_0^1 e^{-x^2}\,dx \approx 0.74682 with error about 3 \times 10^{-5}.

The curve $y = e^{-x^2}$ on $[0, 1]$. The five dashed vertical lines mark the nodes where Simpson's rule samples the function. The rule fits a parabola through $x_0, x_1, x_2$ and another parabola through $x_2, x_3, x_4$; each parabola approximates the curve so tightly you cannot see the difference in the picture. The total area under the two parabolas is $0.74685$, which matches the true area to five decimal places.

Compare trapezoidal and Simpson's on the same integral. With n = 4, the trapezoidal rule would give T_4 \approx 0.74298 (off by about 0.004), while Simpson's gives S_4 \approx 0.74685 (off by about 0.00003). Simpson's is about 130 times more accurate for the same number of function evaluations.

How to choose n

Given a target accuracy \varepsilon, how many subintervals do you need?

For the trapezoidal rule. Solve \frac{(b-a)^3}{12 n^2}M_2 \leq \varepsilon for n, where M_2 = \max|f''|:

n \geq \sqrt{\frac{(b-a)^3 M_2}{12\varepsilon}}

For Simpson's rule. Solve \frac{(b-a)^5}{180 n^4}M_4 \leq \varepsilon for n, where M_4 = \max|f^{(4)}|:

n \geq \left(\frac{(b-a)^5 M_4}{180\varepsilon}\right)^{1/4}

Example. For \int_0^1 e^{-x^2}\,dx with a target of \varepsilon = 10^{-6}, you need |f^{(4)}| — which for this integrand is at most about 12 on [0, 1]. Plug in:

n \geq \left(\frac{1 \cdot 12}{180 \cdot 10^{-6}}\right)^{1/4} = \left(\frac{12}{1.8 \times 10^{-4}}\right)^{1/4} \approx (66667)^{1/4} \approx 16.1

So n = 18 (rounded up to an even number) is more than enough. For the trapezoidal rule to match this accuracy, you would need n \approx 375 — more than twenty times more function evaluations. That is the practical difference between the two rules.

Common confusions

"Simpson's rule only works for n even." Correct — the rule groups subintervals into pairs, so you need an even number. If your n is odd, either increase by 1 or handle the last subinterval separately (with a trapezoidal approximation or a cubic correction).
"The error bound is the error." It is an upper bound. Actual errors are typically smaller, sometimes much smaller. For \int_0^1 e^{-x^2}\,dx the actual Simpson's error at n = 4 was about 3 \times 10^{-5}, while the bound predicts around 4 \times 10^{-4} — a tenfold margin.
"More points is always better." Eventually, no. As n gets very large, floating-point rounding errors from summing many tiny contributions start to dominate, and adding more points makes things worse. In practice, this only matters for n in the thousands or more.
"Simpson's rule is exact for parabolas." True, and more — it is also exact for cubics. The cubic errors from the two subintervals in a pair cancel by symmetry, which is a minor miracle.
"I should always use Simpson's rule." Almost always, yes. The one case where trapezoidal is better: when the integrand has discontinuous derivatives or sharp kinks inside the interval. In that case, Simpson's rule's parabolic fit is thrown off by the kink, and the error bound (which assumes a bounded fourth derivative) is too optimistic. Trapezoidal is more forgiving.

Going deeper

If you can apply both rules, compute errors, and pick n for a target accuracy, you have the working skill. The rest of this section derives Simpson's formula from first principles, looks at the adaptive version, and shows where the rules fit into the broader family of Newton–Cotes formulas.

Deriving Simpson's formula

Fit a parabola through three points (x_0, y_0), (x_1, y_1), (x_2, y_2) where x_1 = x_0 + h and x_2 = x_0 + 2h. For convenience, shift coordinates so that x_1 = 0; then the three points become (-h, y_0), (0, y_1), (h, y_2), and the parabola y = Ax^2 + Bx + C has to satisfy

Ah^2 - Bh + C = y_0, \quad C = y_1, \quad Ah^2 + Bh + C = y_2

So C = y_1. Adding the first and third equations: 2Ah^2 + 2C = y_0 + y_2, giving A = \frac{y_0 + y_2 - 2y_1}{2h^2}. The B coefficient will not matter for the area, because \int_{-h}^h Bx\,dx = 0 by symmetry.

The area is

\int_{-h}^h (Ax^2 + Bx + C)\,dx = \frac{2Ah^3}{3} + 2Ch = \frac{2h^3}{3}\cdot\frac{y_0 + y_2 - 2y_1}{2h^2} + 2hy_1

= \frac{h(y_0 + y_2 - 2y_1)}{3} + 2hy_1 = \frac{h(y_0 + y_2 - 2y_1 + 6y_1)}{3} = \frac{h(y_0 + 4y_1 + y_2)}{3}

That is the one-pair Simpson formula. Summing over all pairs, the interior nodes x_2, x_4, \ldots, x_{n-2} are shared between two consecutive pairs (each contributes 1 to the current pair's end and 1 to the next pair's start, totalling 2); the odd interior nodes x_1, x_3, \ldots, x_{n-1} are each "middle of a parabola" and get weight 4. The boundary nodes x_0, x_n appear only once. That gives the weight pattern 1, 4, 2, 4, 2, 4, 2, \ldots, 4, 1.

Why the error is O(h^4), not O(h^3)

You might expect Simpson's rule, which uses a degree-2 polynomial fit, to have error O(h^3) — one power of h better than the trapezoidal rule's O(h^2). But the actual error is O(h^4) — two powers better. Why?

The reason is a symmetry accident. The leading-order error term of Simpson's rule for a cubic polynomial f(x) = x^3 happens to vanish on each subinterval by symmetry. Specifically, on the pair [x_0, x_2], the cubic correction \int_{-h}^h x^3\,dx = 0 because x^3 is an odd function and the interval is symmetric about 0. So Simpson's rule reproduces cubics exactly, and the error starts at the h^4 level.

This is why the error bound involves f^{(4)}, not f^{(3)}. The fourth derivative is the first one whose integral over a symmetric interval does not vanish, so that is where the leading error actually lives.

Adaptive quadrature

In real-world applications, you rarely use a fixed n. Instead you use adaptive methods that refine the grid where the function is wiggly and keep the grid coarse where the function is flat. The simplest adaptive method is: compute S_n and S_{2n}; if they disagree by more than your tolerance, split the interval in half and apply the method recursively to each half.

The advantage is that you spend computational effort where the function needs it and save it where it does not. For a function that is smooth on most of its domain but has a sharp spike at one location, adaptive quadrature can be many orders of magnitude faster than a uniform grid.

Every serious numerical integration library — MATLAB's integral, Python's scipy.integrate.quad, Wolfram Alpha's integrator — is some form of adaptive Newton–Cotes, usually built on Simpson's rule or Gauss–Legendre quadrature.

The Newton–Cotes family

Trapezoidal (n = 1, linear fit) and Simpson's (n = 2, quadratic fit) are the first two members of the Newton–Cotes family of integration rules. The degree-3 rule (cubic fit, sometimes called "Simpson's 3/8 rule") uses four points per panel and has error O(h^4), just like Simpson's — except its constant is slightly worse, so Simpson's is preferred.

Going to higher degrees stops helping quickly. Degree 10 rules exist, but they have a pathology called the Runge phenomenon: high-degree polynomial fits through equally-spaced points can oscillate wildly near the ends, making the approximation worse than a low-degree one. The right way to go higher is to change the node spacing to Gauss–Legendre nodes (chosen as roots of Legendre polynomials), which are clustered near the endpoints. Gauss–Legendre quadrature with n nodes is exact for polynomials up to degree 2n - 1 — the most accurate family of all.

But for pen-and-paper problems, and for a first course, Simpson's rule is the right tool. It is simple to state, easy to apply, and accurate enough for anything you will meet before a specialised numerical-methods course.

Where this leads next

Definite Integration - Introduction — the definition of the definite integral that numerical integration approximates.
Fundamental Theorem of Calculus — the theorem that makes closed-form integration possible, and whose failure motivates numerical methods.
Improper Integrals — numerical methods break down at singularities; you truncate the improper integral first, then apply Simpson's rule to the truncation.
Sum of Series Using Integration — the opposite direction — replacing integrals with sums — which is what a Riemann sum already is.
Taylor Series — the tool behind the error analysis for both rules, and the natural next step for understanding why polynomial approximations work.