Breaking it Down: Matrix Decompositions (SVD)

[!NOTE] This module explores the core principles of Breaking it Down: Matrix Decompositions (SVD), deriving solutions from first principles and hardware constraints to build world-class, production-ready expertise.

1. Introduction: Atoms of Data

Just as prime numbers are the building blocks of integers (e.g., 12 = 2² × 3), Matrix Decomposition finds the building blocks of matrices.

The most powerful decomposition is the Singular Value Decomposition (SVD). It works on any matrix, not just square ones. It is often called the “Swiss Army Knife” of Linear Algebra because it solves so many problems (Compression, Denoising, Prediction).

Any matrix A can be decomposed into three specific matrices:

  A = U Σ VT

U: Left Singular Vectors. (The “Final Rotation”).
Σ (Sigma): Singular Values. These are the stretching factors (σ₁ ≥ σ₂ ≥ … ≥ 0).
V^T: Right Singular Vectors. (The “Initial Rotation”).

2. Geometric Interpretation: Rotate, Stretch, Rotate

SVD implies that every linear transformation can be broken down into three simple steps:

Rotate (V^T): Rotate the input space.
Stretch (Σ): Stretch the space along the axes (by factors σ_i).
Rotate (U): Rotate the space again to the final orientation.

This means that for any matrix, a circle is transformed into an ellipse (hyper-ellipse). The singular values are the lengths of the ellipse’s axes.

3. Interactive Visualizer: The SVD Geometric Transformer

Below, we visualize how SVD transforms a Unit Circle (Blue) into an Ellipse (Red).

[!TIP] Try it yourself: Adjust the sliders to see each step individually. Click “Animate Sequence” to watch the full transformation flow.

1. Rotate (V^T): 0°

2. Stretch (Σ): 2.0, 0.5

3. Rotate (U): 0°

Blue → Red

4. The Power of Sums (Data Compression)

We can rewrite SVD as a sum of rank-1 matrices:

A = σ₁u₁v₁^T + σ₂u₂v₂^T + ... + σ_ru_rv_r^T

Think of this as “Layers of Information”.

Layer 1 (σ₁): The main structure (High contrast shapes).
Layer 2 (σ₂): Secondary details.
Layer 100: Tiny details (or noise).

Image Compression: An image is just a matrix. If we take an image and keep only the top 10% of singular values (setting the rest to 0), we can reconstruct the image with 90% less data, and it will still look recognizable! This is the basis of compression algorithms.

import numpy as np

# 1. Create a matrix A (e.g., User-Movie Ratings or Image Pixels)
A = np.array([
    [1, 1, 1, 0, 0],
    [3, 3, 3, 0, 0],
    [4, 4, 4, 0, 0],
    [5, 5, 5, 0, 0],
    [0, 2, 0, 4, 4],
    [0, 0, 0, 5, 5],
    [0, 1, 0, 2, 2]
])

# 2. Perform SVD
U, S, Vt = np.linalg.svd(A)

print("Singular Values (Sigma):")
print(np.round(S, 2))
# Output: [12.48  9.51  1.35  0.   0.  ]

# Notice how quickly they drop off!
# The first two singular values capture almost all the information.

# 3. Reconstruct with Top-2 Singular Values (Low Rank Approximation)
k = 2
S_k = np.zeros((k, k))
np.fill_diagonal(S_k, S[:k]) # Diagonal matrix of top-k singular values

# A_approx = U_k * S_k * Vt_k
A_approx = U[:, :k] @ S_k @ Vt[:k, :]

print("\nOriginal Matrix (Top Left 3x3):")
print(A[:3, :3])

print("\nApproximated Matrix (Top Left 3x3):")
print(np.round(A_approx[:3, :3], 1))
# Output is very close to original!

5. Application: Recommender Systems (Netflix)

Netflix uses SVD to predict if you will like a movie. Imagine a huge matrix M: Rows are Users, Columns are Movies.

M_ij = Rating (1-5).
Most of the matrix is empty (you haven’t watched every movie).

SVD factorizes this into:

Users Matrix (U): Describes users’ preferences for “Latent Features” (e.g., likes Action, hates Romance).
Movies Matrix (V): Describes movies’ content of those features (e.g., High Action, Low Romance).
Strengths (Σ): How important each feature is.

By multiplying these low-rank matrices back together, we can fill in the blanks (predict ratings) for movies you haven’t seen!

6. Summary

SVD: Factorizes A = UΣV^T.
Geometry: Rotate → Stretch → Rotate.
Singular Values: The lengths of the semi-axes of the hyper-ellipse defined by A.
Compression: Using the top-k singular values allows us to approximate the original matrix with minimal data.