Jacobian & Hessian Matrices
1. Introduction: The Landscape of Loss
Training a Neural Network is like hiking down a mountain in the dark. You want to reach the lowest point (Global Minimum Loss). To do this, you need to know:
- Which way is down? (Gradient).
- Is the ground curving? (Hessian).
2. The Gradient & Jacobian (First Derivative)
The Gradient (∇f) tells you the direction of steepest ascent. You go the opposite way to minimize loss.
If you have a function with multiple outputs (like a layer in a neural net), the derivatives form a matrix called the Jacobian (J).
- Meaning: How much does Output i change when I wiggle Input j?
- Deep Learning: Used in Backpropagation to pass errors backward.
3. The Hessian (Second Derivative)
The Hessian (H) is a matrix of second derivatives. It describes the curvature of the landscape.
The Eigenvalues of the Hessian tell us the shape of the terrain:
- All Positive: Bowl (Convex). Local Minimum.
- All Negative: Hill (Concave). Local Maximum.
- Mixed Signs: Saddle Point. (Up in one direction, down in another).
Newton’s Method (The Smart Jump)
Gradient Descent takes tiny steps. Newton’s Method uses the curvature (Hessian) to take a massive leap straight to the bottom of the bowl.
Why don’t we always use it? Calculating the Inverse Hessian ($H^{-1}$) for a billion-parameter network is impossibly expensive ($O(N^3)$).
4. Interactive Visualizer: The Landscape Explorer v3.0
Explore different optimization landscapes.
- Gradient Step: Takes a small step downhill. Safe but slow.
- Newton Step: Uses curvature to jump. Fast, but can fail if the Hessian is not positive definite (e.g., Saddle Points).
Task: Try to reach the center (0,0) from a random spot. Compare Gradient vs Newton steps on a Saddle Point. Notice how Newton’s Method might shoot you in the wrong direction if the curvature is negative (Hill/Saddle).
[ 0.0, 0.0 ]
λ1 = 2.0
λ2 = 2.0
5. Summary
- Gradient: Direction of steepest climb. (Use negative gradient to descend).
- Jacobian: Matrix of all first-order derivatives. Measures sensitivity.
- Hessian: Matrix of second-order derivatives. Measures curvature.
- Optimization: We want to find points where Gradient is zero and Hessian is Positive Definite (Bowl).