Breaking it Down: Matrix Decompositions (SVD)

1. Introduction: Atoms of Data

Just as prime numbers are the building blocks of integers (e.g., 12 = 2² × 3), Matrix Decomposition finds the building blocks of matrices.

The most powerful decomposition is the Singular Value Decomposition (SVD). It works on any matrix, not just square ones. It is often called the “Swiss Army Knife” of Linear Algebra because it solves so many problems (Compression, Denoising, Prediction).

Any matrix A can be decomposed into three specific matrices:

    A = U Σ VT

U: Left Singular Vectors. (The “Final Rotation”).
Σ (Sigma): Singular Values. These are the stretching factors (σ₁ ≥ σ₂ ≥ … ≥ 0).
V^T: Right Singular Vectors. (The “Initial Rotation”).

2. Geometric Interpretation: Rotate, Stretch, Rotate

SVD implies that every linear transformation can be broken down into three simple steps:

Rotate (V^T): Rotate the input space.
Stretch (Σ): Stretch the space along the axes (by factors σ_i).
Rotate (U): Rotate the space again.

This means that for any matrix, a circle is transformed into an ellipse (hyper-ellipse). The singular values are the lengths of the ellipse’s axes.

3. Interactive Visualizer: The SVD Geometric Transformer

Below, we visualize how SVD transforms a Unit Circle (Blue) into an Ellipse (Red). Control the three components independently, or press Animate to see the sequence.

1. Rotate (V^T): 0°

2. Stretch (Σ): 2.0, 0.5

3. Rotate (U): 0°

Blue → Red

4. The Power of Sums (Data Compression)

We can rewrite SVD as a sum of rank-1 matrices:

A = σ₁u₁v₁^T + σ₂u₂v₂^T + ... + σ_ru_rv_r^T

Think of this as “Layers of Information”.

Layer 1 (σ₁): The main structure (High contrast shapes).
Layer 2 (σ₂): Secondary details.
Layer 100: Tiny details (or noise).

Image Compression: An image is just a matrix. If we take an image and keep only the top 10% of singular values (setting the rest to 0), we can reconstruct the image with 90% less data, and it will still look recognizable! This is the basis of compression algorithms.

5. Application: Recommender Systems (Netflix)

Netflix uses SVD to predict if you will like a movie. Imagine a huge matrix M: Rows are Users, Columns are Movies.

M_ij = Rating (1-5).
Most of the matrix is empty (you haven’t watched every movie).

SVD factorizes this into:

Users Matrix (U): Describes users’ preferences for “Latent Features” (e.g., likes Action, hates Romance).
Movies Matrix (V): Describes movies’ content of those features (e.g., High Action, Low Romance).
Strengths (Σ): How important each feature is.

By multiplying these low-rank matrices back together, we can fill in the blanks (predict ratings) for movies you haven’t seen!

6. Summary

SVD: Factorizes A = UΣV^T.
Geometry: Rotate → Stretch → Rotate.
Singular Values: The lengths of the semi-axes of the hyper-ellipse defined by A.
Compression: Using the top-k singular values allows us to approximate the original matrix with minimal data.

Next: PCA & Dimensionality Reduction →