Module 6 Review & Cheat Sheet
1. Key Takeaways
- Information Theory: Information is “Surprisal”. Entropy ($H$) measures the average uncertainty. Huffman Coding compresses data near this limit.
- Graphs: Structures for State Spaces. BFS (Queue) finds shortest paths (Wave). DFS (Stack) finds deep solutions (Maze).
- Signals: Any complex signal is a sum of sine waves (Fourier Transform). This uses Basis Vectors to decompose signals.
- Complex Numbers: Multiplication is Rotation. Quaternions (4D) solve Gimbal Lock in 3D rotations.
- Transformers: Self-Attention is a Dot Product similarity search ($QK^T$). Multi-Head Attention learns multiple perspectives.
- VAEs: Use the Reparameterization Trick to backpropagate through randomness ($z = \mu + \sigma \odot \epsilon$).
2. Cheat Sheet
| Concept | Formula / Definition | Intuition |
|---|---|---|
| Self-Information | $I(x) = -\log_2 P(x)$ | Rare events have high information. |
| Entropy | $H(P) = - \sum P(x) \log_2 P(x)$ | Average surprise. Max at uniform distribution. |
| Cross-Entropy | $H(P, Q) = - \sum P(x) \log Q(x)$ | Loss function. “How surprised is the model by the truth?” |
| KL Divergence | $D_{KL} = H(P, Q) - H(P)$ | Distance between two distributions (Asymmetric). |
| Euler’s Formula | $e^{ix} = \cos x + i \sin x$ | Multiplication = Rotation. |
| DFT | $X_k = \sum x_n e^{-i 2\pi k n / N}$ | Dot product with “Frequency Spinners”. |
| FFT Complexity | $O(N \log N)$ | Much faster than Naive $O(N^2)$. |
| Self-Attention | $\text{softmax}( \frac{QK^T}{\sqrt{d_k}} ) V$ | Database lookup based on relevance. |
| Adjacency Matrix | Space $O(V^2)$, Lookup $O(1)$ | Good for dense graphs. |
| Adjacency List | Space $O(V+E)$, Lookup $O(\text{degree})$ | Good for sparse graphs (Social Networks). |
3. Interactive Flashcards
Question 1
Answer 1
1 / 8