Case Study: Backpropagation from Scratch
1. Introduction: The Algorithm That Runs the World
GPT-4, Stable Diffusion, AlphaGo—they all train using one algorithm: Backpropagation. It is simply the Chain Rule applied to the Computational Graph of a Neural Network.
2. The Network
Consider a tiny network with 1 input, 1 hidden neuron, and 1 output.
- Input x.
- Hidden h = σ(w1x + b1).
- Output y = w2h + b2.
- Loss L = (y - t)2.
We want to find &partial;L/&partial;w1.
3. The Derivation (Chain Rule)
&partial;L/&partial;w1 = &partial;L/&partial;y · &partial;y/&partial;h · &partial;h/&partial;z1 · &partial;z1/&partial;w1
- Loss Gradient: &partial;L/&partial;y = 2(y - t).
- Output Weight: &partial;y/&partial;h = w2.
- Activation: &partial;h/&partial;z1 = σ‘(z1).
- Input Weight: &partial;z1/&partial;w1 = x.
Multiply them all together, and you have the gradient!
4. Interactive Visualizer: Neural Flow
A visualization of data flowing Forward (Blue) and Gradients flowing Backward (Red).
- Green Edges: Positive Weights.
- Red Edges: Negative Weights.
- Thickness: Magnitude.
Target: 1.0 | Output: 0.0 | Loss: 0.0
5. Summary
- Forward: Compute prediction.
- Backward: Compute gradients (Chain Rule).
- Update: Adjust weights.
- Repeat: Until loss is low.