Module Review: Sequence Models
1. Key Takeaways
- Sequence Matters: Traditional FNNs cannot handle variable-length sequences or capture temporal dependencies. RNNs are designed for this.
- Memory: RNNs maintain a hidden state (
h<sub>t</sub>) that acts as memory, updated at each time step. - Training Challenges: Training RNNs with BPTT often leads to vanishing or exploding gradients.
- LSTMs & GRUs: These advanced architectures solve the vanishing gradient problem using gating mechanisms (Forget, Input, Output).
- Seq2Seq: The Encoder-Decoder architecture is standard for tasks like translation and summarization.
- Attention: The Attention Mechanism allows the decoder to access the entire input sequence, solving the information bottleneck problem of fixed-length context vectors.
2. Flashcards
Test your understanding of the core concepts.
3. Cheat Sheet
| Concept | Equation / Description |
|---|---|
| RNN Update | h<sub>t</sub> = tanh(W<sub>xh</sub> x<sub>t</sub> + W<sub>hh</sub> h<sub>t-1</sub>) |
| LSTM Forget | f<sub>t</sub> = σ(W<sub>f</sub> · [h<sub>t-1</sub>, x<sub>t</sub>]) |
| LSTM Input | i<sub>t</sub> = σ(W<sub>i</sub> · [h<sub>t-1</sub>, x<sub>t</sub>]) |
| LSTM Cell | C<sub>t</sub> = f<sub>t</sub> * C<sub>t-1</sub> + i<sub>t</sub> * &Ctilde;<sub>t</sub> |
| Attention Score | score = h<sub>decoder</sub><sup>T</sup> · h<sub>encoder</sub> (Dot Product) |
| Context Vector | c<sub>t</sub> = Σ α<sub>ts</sub> h<sub>s</sub> (Weighted sum) |
4. Next Steps
You have mastered the fundamentals of sequence modeling with RNNs. However, RNNs are sequential by nature, making them slow to train on modern hardware (GPUs).
In the next module, we will explore Transformers, an architecture that relies entirely on Attention and discards recurrence, revolutionizing NLP and Deep Learning.