Review & Cheat Sheet

[!NOTE] This module explores the core principles of Review & Cheat Sheet, deriving solutions from first principles and hardware constraints to build world-class, production-ready expertise.

1. Cheat Sheet: The Big Picture

We’ve moved from basic matrix operations to the core machinery of Machine Learning.

Concept The “One Liner” Key Equation / Code Application
Eigenvector An axis that doesn’t rotate, only stretches. Av = λv PageRank, Stability Analysis.
Eigenvalue The stretch factor along the eigenvector. np.linalg.eig(A) Variance in PCA, Curvature in Hessian.
SVD Factoring any matrix into Rotation-Stretch-Rotation. A = U Σ VT Compression, Denoising, Recommenders.
PCA Finding the best axes to project data onto. Eig of Σ = (1/n)XTX Dimensionality Reduction.
Tensor A multi-dimensional grid of numbers. torch.rand(3, 256, 256) Deep Learning Data Structure.
Broadcasting Stretching smaller tensors to match larger ones. (4,1) + (4,4) = (4,4) Efficient Coding.
Jacobian First derivatives of a vector function. Jij = ∂yi / ∂xj Sensitivity, Backpropagation.
Hessian Second derivatives (Curvature). torch.autograd.functional.hessian Optimization Landscape (Bowl vs Saddle).
Newton’s Method Jumping to the minimum using curvature. xnew = xold - H-1 ∇ f Fast Optimization (Second Order).

2. Interactive Flashcards

Test your recall. Click a card to flip it.

[!TIP] How to use: Tap on a card to reveal the answer. Try to answer before flipping!

What is the geometric meaning of an Eigenvector?
A vector that does not change direction after a linear transformation is applied (it only scales).
What does a Zero Gradient and Mixed Hessian Eigenvalues imply?
A Saddle Point. It's a minimum in one direction and a maximum in another.
Why is PCA sensitive to Outliers?
Because it maximizes Variance (Squared Error). A distant point has a massive squared error, pulling the axis towards it.
What is Broadcasting?
The implicit rule that stretches a smaller tensor (e.g., a vector) to match the shape of a larger one during operations.
Why do we use ReLU in Neural Networks?
To introduce Non-Linearity. It folds the space, allowing the network to learn complex boundaries.
SVD decomposes a matrix into which 3 components?
U (Left Singular Vectors), Σ (Singular Values), VT (Right Singular Vectors).
What is the Gradient vector?
A vector pointing in the direction of steepest ascent (greatest increase of the function).
What is the Rank of a Color Image Tensor?
Rank 3 (Height, Width, Channels).
What is the Manifold Hypothesis?
The idea that high-dimensional real-world data lies on a lower-dimensional "surface" (manifold) embedded within that space.

3. What’s Next?

You have mastered the algebra of transformations. Next, we move to Discrete Math & Information Theory, where we learn about Graphs, Entropy, and how to measure information itself.