Approximation: The Taylor Series

[!NOTE] This module explores the core principles of Approximation: The Taylor Series, deriving solutions from first principles and hardware constraints to build world-class, production-ready expertise.

1. Introduction: Complexity from Simplicity

How does a calculator compute sin(37°)? It doesn’t draw a triangle. It uses Polynomials.

Calculus tells us that any smooth function (like ex, sin(x), or a Neural Network Loss Function) can be approximated locally by a sum of simpler terms:

  1. A Constant (Start point).
  2. A Line (Slope).
  3. A Parabola (Curve).
  4. …and so on.

This is the basis of Gradient Descent (First-Order Approximation) and Newton’s Method (Second-Order Approximation).


2. The Formula

The Taylor Series of f(x) centered at a is:

f(x) ≈ f(a) + f’(a)(x-a) + f’‘(a)/2! (x-a)2 + f’’‘(a)/3! (x-a)3 + …

Why “Factorials”?

Imagine we want to approximate f(x) with a polynomial P(x) = c0 + c1x + c2x2 + c3x3. We want the derivatives of P(x) to match f(x) at x=0.

  1. P’(x) = c1 + 2c2x + 3c3x2. At x=0, P’(0) = c1. So c1 = f’(0).
  2. P’‘(x) = 2c2 + 3×2c3x. At x=0, P’‘(0) = 2c2. So c2 = f’‘(0)/2.
  3. P’’‘(x) = 3×2c3. At x=0, P’’‘(0) = 3×2c3 = 6c3. So c3 = f’’‘(0)/6 = f’’‘(0)/3!.

The factorials (n!) appear because differentiating xn repeatedly brings down n, n-1, n-2… as multipliers. We divide by them to normalize.


3. Example: Approximating sin(x) at a=0

At x=0, sin(0)=0, cos(0)=1.

  • f(x) = sin(x) → f(0) = 0
  • f’(x) = cos(x) → f’(0) = 1
  • f’‘(x) = -sin(x) → f’‘(0) = 0
  • f’’‘(x) = -cos(x) → f’’‘(0) = -1

Putting it together (Maclaurin Series):

sin(x) ≈ x - x3/3! + x5/5! - x7/7! …


4. Connection to Optimization

In Machine Learning, we want to find the minimum of a Loss Function L(w).

4.1 First-Order: Gradient Descent

We use the first two terms (Line Approximation):

L(w+h) ≈ L(w) + L’(w)h

We step downhill along this line.

4.2 Second-Order: Newton’s Method

We use the first three terms (Parabola Approximation):

L(w+h) ≈ L(w) + L’(w)h + 0.5 L’‘(w)h2

We jump directly to the bottom of this parabola.

[!TIP] Trade-off: Newton’s Method is faster (fewer steps) but computationally expensive because calculating the second derivative (Hessian Matrix) for millions of parameters is impossible. That’s why we stick to Gradient Descent.


5. Interactive Visualizer: Polynomials & Newton’s Method

Modes:

  1. Taylor Series: See how adding terms (N) hugs the curve f(x) near x=0. Notice the “Explosion” far from the center.
  2. Newton’s Method: Visualize finding a root (f(x)=0) using tangent lines.
    • Start at x0.
    • Follow the tangent to the x-axis to find x1.
    • Repeat. This is how solvers find zeros efficiently.
Target: f(x)
Approx: PN(x)

6. Summary

  • Taylor Series: Converting hard functions to easy polynomials.
  • Gradient Descent: A Linear approximation (Taylor N=1). We step down the slope.
  • Newton’s Method: A Quadratic approximation (Taylor N=2). We jump to the minimum of the parabola.
  • Trust Region: We must restrict our steps to the region where the Taylor approximation is valid.

Calculus Fundamentals Glossary