Automatic Differentiation: The Magic of PyTorch
1. Introduction: Who computes the gradients?
In calculus class, you calculated derivatives by hand. In early AI (80s), people derived gradients on paper and coded them. In Modern AI (PyTorch/TF), you write the Forward Pass, and the framework calculates the Backward Pass (gradients) automatically. This is AutoDiff.
2. The Computational Graph
Every calculation in your code builds a graph. Nodes = Operations (+, -, *, sin). Edges = Data Flow (Tensors).
Example: y = (x + 2) * 3
- Input x.
- Add 2 → a.
- Multiply 3 → y.
3. Forward vs Backward Mode
- Forward Mode: Computes the value (y) and the derivative (dy/dx) simultaneously. Good when inputs < outputs.
- Backward Mode (Backprop): Computes value first, then traverses the graph in reverse to find gradients. Good when inputs > outputs (like in Neural Nets, where inputs=millions, output=1 loss).
4. Interactive Visualizer: Graph Builder
Visualize the computational graph for y = (x + w) * b.
- Forward Pass (Blue): Values flow up.
- Backward Pass (Red): Gradients flow down.
Input: x=2, w=1, b=3.
- a = x+w = 3.
- y = a*b = 9.
- dy/dy = 1.
- dy/da = b = 3.
- dy/dx = dy/da · da/dx = 3 · 1 = 3.
5. Summary
- Computational Graph: Represents math as a tree.
- AutoDiff: Applies Chain Rule automatically on the graph.
- Backward Mode: Efficient for functions with many inputs and few outputs (like Loss functions).