Module Review: Neural Networks
[!NOTE] This module explores the core principles of Module Review: Neural Networks, deriving solutions from first principles and hardware constraints to build world-class, production-ready expertise.
1. Key Takeaways
- Perceptron: The basic unit of a neural network. It performs a linear classification but fails on non-linear problems like XOR.
- Activation Functions: Essential for introducing non-linearity.
- ReLU: Standard for hidden layers.
- Sigmoid: Good for probability output, bad for hidden layers (vanishing gradient).
- Softmax: Used for multi-class classification output.
- Universal Approximation: An MLP with at least one hidden layer can approximate any continuous function.
- Forward Propagation: The flow of data from input to output through layers of neurons.
2. Flashcards
What is the "Vanishing Gradient" problem?
When gradients become extremely small during backpropagation (common with Sigmoid/Tanh), preventing early layers from learning effectively.
Why can't a Perceptron solve XOR?
Because XOR is not linearly separable. A single Perceptron can only draw a straight line decision boundary.
What is the purpose of an Activation Function?
To introduce non-linearity into the network, allowing it to learn complex patterns.
What is the output range of Tanh?
(-1, 1). It is zero-centered, unlike Sigmoid which is (0, 1).
What is "Dead ReLU"?
A state where a ReLU neuron only outputs 0 because its weights have updated such that the input is always negative. It stops learning.
3. Cheat Sheet
| Concept | Formula / Definition | Key Usage |
|---|---|---|
| Perceptron Output | y = 1 if w·x + b > 0, else 0 |
Simple binary classification |
| Sigmoid | σ(x) = 1 / (1 + e⁻ˣ) |
Binary probability output |
| Tanh | tanh(x) |
Zero-centered hidden layers (Legacy) |
| ReLU | max(0, x) |
Default for hidden layers |
| Leaky ReLU | max(0.01x, x) |
Fixes “Dead ReLU” |
| Softmax | e^z / Σ e^z |
Multi-class probability output |
| Update Rule | w ← w + α(y - ŷ)x |
Perceptron learning |
4. Next Steps
Now that you understand the architecture, it’s time to learn how to train these deep networks using Gradient Descent and Backpropagation.