Module 05 Review: Advanced Linear Algebra

1. Cheat Sheet: The Big Picture

We’ve moved from basic matrix operations to the core machinery of Machine Learning.

Concept	The “One Liner”	Key Equation	Application
Eigenvector	An axis that doesn’t rotate, only stretches.	$A\mathbf{v} = \lambda\mathbf{v}$	PageRank, Stability Analysis.
Eigenvalue	The stretch factor along the eigenvector.	$\det(A - \lambda I) = 0$	Variance in PCA, Curvature in Hessian.
SVD	Factoring any matrix into Rotation-Stretch-Rotation.	$A = U \Sigma V^T$	Compression, Denoising, Recommenders.
PCA	Finding the best axes to project data onto.	Eig of $\Sigma = \frac{1}{n}X^TX$	Dimensionality Reduction.
Tensor	A multi-dimensional grid of numbers.	Rank $N$	Deep Learning Data Structure.
Broadcasting	Stretching smaller tensors to match larger ones.	`(4,1) + (4,4) = (4,4)`	Efficient Coding.
Jacobian	First derivatives of a vector function.	$J_{ij} = \partial y_i / \partial x_j$	Sensitivity, Backpropagation.
Hessian	Second derivatives (Curvature).	$H_{ij} = \partial^2 f / \partial x_i \partial x_j$	Optimization Landscape (Bowl vs Saddle).
Newton’s Method	Jumping to the minimum using curvature.	$x_{new} = x_{old} - H^{-1} \nabla f$	Fast Optimization (Second Order).

2. Interactive Flashcards

Test your recall. Click a card to flip it.

What is the geometric meaning of an Eigenvector?

A vector that does not change direction after a linear transformation is applied (it only scales).

What does a Zero Gradient and Mixed Hessian Eigenvalues imply?

A Saddle Point. It's a minimum in one direction and a maximum in another.

Why is PCA sensitive to Outliers?

Because it maximizes Variance (Squared Error). A distant point has a massive squared error, pulling the axis towards it.

What is Broadcasting?

The implicit rule that stretches a smaller tensor (e.g., a vector) to match the shape of a larger one during operations.

Why do we use ReLU in Neural Networks?

To introduce Non-Linearity. It folds the space, allowing the network to learn complex boundaries.

SVD decomposes a matrix into which 3 components?

U (Left Singular Vectors), Σ (Singular Values/Strength), V^T (Right Singular Vectors).

What is the Gradient vector?

A vector pointing in the direction of steepest ascent (greatest increase of the function).

What is the Rank of a Color Image Tensor?

Rank 3 (Height, Width, Channels).

What is the Manifold Hypothesis?

The idea that high-dimensional real-world data lies on a lower-dimensional "surface" (manifold) embedded within that space.

3. What’s Next?

You have mastered the algebra of transformations. Next, we move to Discrete Math & Information Theory, where we learn about Graphs, Entropy, and how to measure information itself.

Next Module: Discrete Math & Info Theory