Module 6 Review & Cheat Sheet

1. Key Takeaways

  • Information Theory: Information is “Surprisal”. Entropy ($H$) measures the average uncertainty. Huffman Coding compresses data near this limit.
  • Graphs: Structures for State Spaces. BFS (Queue) finds shortest paths (Wave). DFS (Stack) finds deep solutions (Maze).
  • Signals: Any complex signal is a sum of sine waves (Fourier Transform). This uses Basis Vectors to decompose signals.
  • Complex Numbers: Multiplication is Rotation. Quaternions (4D) solve Gimbal Lock in 3D rotations.
  • Transformers: Self-Attention is a Dot Product similarity search ($QK^T$). Multi-Head Attention learns multiple perspectives.
  • VAEs: Use the Reparameterization Trick to backpropagate through randomness ($z = \mu + \sigma \odot \epsilon$).

2. Cheat Sheet

Concept Formula / Definition Intuition
Self-Information $I(x) = -\log_2 P(x)$ Rare events have high information.
Entropy $H(P) = - \sum P(x) \log_2 P(x)$ Average surprise. Max at uniform distribution.
Cross-Entropy $H(P, Q) = - \sum P(x) \log Q(x)$ Loss function. “How surprised is the model by the truth?”
KL Divergence $D_{KL} = H(P, Q) - H(P)$ Distance between two distributions (Asymmetric).
Euler’s Formula $e^{ix} = \cos x + i \sin x$ Multiplication = Rotation.
DFT $X_k = \sum x_n e^{-i 2\pi k n / N}$ Dot product with “Frequency Spinners”.
FFT Complexity $O(N \log N)$ Much faster than Naive $O(N^2)$.
Self-Attention $\text{softmax}( \frac{QK^T}{\sqrt{d_k}} ) V$ Database lookup based on relevance.
Adjacency Matrix Space $O(V^2)$, Lookup $O(1)$ Good for dense graphs.
Adjacency List Space $O(V+E)$, Lookup $O(\text{degree})$ Good for sparse graphs (Social Networks).

3. Interactive Flashcards

Question 1
Answer 1
1 / 8