Approximation: The Taylor Series

[!NOTE] This module explores the core principles of Approximation: The Taylor Series, deriving solutions from first principles and hardware constraints to build world-class, production-ready expertise.

1. Introduction: Complexity from Simplicity

How does a calculator compute sin(37°)? It doesn’t draw a triangle. It uses Polynomials.

Calculus tells us that any smooth function (like e^x, sin(x), or a Neural Network Loss Function) can be approximated locally by a sum of simpler terms:

A Constant (Start point).
A Line (Slope).
A Parabola (Curve).
…and so on.

This is the basis of Gradient Descent (First-Order Approximation) and Newton’s Method (Second-Order Approximation).

2. The Formula

The Taylor Series of f(x) centered at a is:

f(x) ≈ f(a) + f’(a)(x-a) + f’‘(a)/2! (x-a)² + f’’‘(a)/3! (x-a)³ + …

Why “Factorials”?

Imagine we want to approximate f(x) with a polynomial P(x) = c₀ + c₁x + c₂x² + c₃x³. We want the derivatives of P(x) to match f(x) at x=0.

P’(x) = c₁ + 2c₂x + 3c₃x². At x=0, P’(0) = c₁. So c₁ = f’(0).
P’‘(x) = 2c₂ + 3×2c₃x. At x=0, P’‘(0) = 2c₂. So c₂ = f’‘(0)/2.
P’’‘(x) = 3×2c₃. At x=0, P’’‘(0) = 3×2c₃ = 6c₃. So c₃ = f’’‘(0)/6 = f’’‘(0)/3!.

The factorials (n!) appear because differentiating xⁿ repeatedly brings down n, n-1, n-2… as multipliers. We divide by them to normalize.

3. Example: Approximating sin(x) at a=0

At x=0, sin(0)=0, cos(0)=1.

f(x) = sin(x) → f(0) = 0
f’(x) = cos(x) → f’(0) = 1
f’‘(x) = -sin(x) → f’‘(0) = 0
f’’‘(x) = -cos(x) → f’’‘(0) = -1

Putting it together (Maclaurin Series):

sin(x) ≈ x - x³/3! + x⁵/5! - x⁷/7! …

4. Connection to Optimization

In Machine Learning, we want to find the minimum of a Loss Function L(w).

4.1 First-Order: Gradient Descent

We use the first two terms (Line Approximation):

L(w+h) ≈ L(w) + L’(w)h

We step downhill along this line.

4.2 Second-Order: Newton’s Method

We use the first three terms (Parabola Approximation):

L(w+h) ≈ L(w) + L’(w)h + 0.5 L’‘(w)h²

We jump directly to the bottom of this parabola.

[!TIP] Trade-off: Newton’s Method is faster (fewer steps) but computationally expensive because calculating the second derivative (Hessian Matrix) for millions of parameters is impossible. That’s why we stick to Gradient Descent.

5. Interactive Visualizer: Polynomials & Newton’s Method

Modes:

Taylor Series: See how adding terms (N) hugs the curve f(x) near x=0. Notice the “Explosion” far from the center.
Newton’s Method: Visualize finding a root (f(x)=0) using tangent lines.
- Start at x₀.
- Follow the tangent to the x-axis to find x₁.
- Repeat. This is how solvers find zeros efficiently.

Taylor Series Newton's Method

Order (N): 1

Error

Target: f(x)
Approx: PN(x)

6. Summary

Taylor Series: Converting hard functions to easy polynomials.
Gradient Descent: A Linear approximation (Taylor N=1). We step down the slope.
Newton’s Method: A Quadratic approximation (Taylor N=2). We jump to the minimum of the parabola.
Trust Region: We must restrict our steps to the region where the Taylor approximation is valid.

Calculus Fundamentals Glossary