Module 05 Review: Advanced Linear Algebra
1. Cheat Sheet: The Big Picture
We’ve moved from basic matrix operations to the core machinery of Machine Learning.
| Concept | The “One Liner” | Key Equation | Application |
|---|---|---|---|
| Eigenvector | An axis that doesn’t rotate, only stretches. | $A\mathbf{v} = \lambda\mathbf{v}$ | PageRank, Stability Analysis. |
| Eigenvalue | The stretch factor along the eigenvector. | $\det(A - \lambda I) = 0$ | Variance in PCA, Curvature in Hessian. |
| SVD | Factoring any matrix into Rotation-Stretch-Rotation. | $A = U \Sigma V^T$ | Compression, Denoising, Recommenders. |
| PCA | Finding the best axes to project data onto. | Eig of $\Sigma = \frac{1}{n}X^TX$ | Dimensionality Reduction. |
| Tensor | A multi-dimensional grid of numbers. | Rank $N$ | Deep Learning Data Structure. |
| Broadcasting | Stretching smaller tensors to match larger ones. | (4,1) + (4,4) = (4,4) |
Efficient Coding. |
| Jacobian | First derivatives of a vector function. | $J_{ij} = \partial y_i / \partial x_j$ | Sensitivity, Backpropagation. |
| Hessian | Second derivatives (Curvature). | $H_{ij} = \partial^2 f / \partial x_i \partial x_j$ | Optimization Landscape (Bowl vs Saddle). |
| Newton’s Method | Jumping to the minimum using curvature. | $x_{new} = x_{old} - H^{-1} \nabla f$ | Fast Optimization (Second Order). |
2. Interactive Flashcards
Test your recall. Click a card to flip it.
What is the geometric meaning of an Eigenvector?
A vector that does not change direction after a linear transformation is applied (it only scales).
What does a Zero Gradient and Mixed Hessian Eigenvalues imply?
A Saddle Point. It's a minimum in one direction and a maximum in another.
Why is PCA sensitive to Outliers?
Because it maximizes Variance (Squared Error). A distant point has a massive squared error, pulling the axis towards it.
What is Broadcasting?
The implicit rule that stretches a smaller tensor (e.g., a vector) to match the shape of a larger one during operations.
Why do we use ReLU in Neural Networks?
To introduce Non-Linearity. It folds the space, allowing the network to learn complex boundaries.
SVD decomposes a matrix into which 3 components?
U (Left Singular Vectors), Σ (Singular Values/Strength), VT (Right Singular Vectors).
What is the Gradient vector?
A vector pointing in the direction of steepest ascent (greatest increase of the function).
What is the Rank of a Color Image Tensor?
Rank 3 (Height, Width, Channels).
What is the Manifold Hypothesis?
The idea that high-dimensional real-world data lies on a lower-dimensional "surface" (manifold) embedded within that space.
3. What’s Next?
You have mastered the algebra of transformations. Next, we move to Discrete Math & Information Theory, where we learn about Graphs, Entropy, and how to measure information itself.