Covariance Matrices
When dealing with multiple variables, variance becomes a matrix. The Covariance Matrix Σ (Sigma) captures both the spread of individual variables and their relationships with each other.
It is the fundamental building block of Multivariate Calculus, Machine Learning (PCA, Gaussian Mixtures), and Portfolio Theory.
Analogy: Think of Variance as the “spread” of a single stock’s price. A Covariance Matrix is the “relationship map” of an entire stock portfolio. It tells you not just how volatile each stock is, but how they move together (e.g., if Tech goes down, does Energy go up?).
PEDALS Case Study: Real Estate Market
- Process Requirements: We want to model house prices based on
SquareFootageandDistanceToCenter. - Estimate: Both features vary wildly. They likely have a negative covariance (as distance increases, footage might increase slightly, but price decreases).
- Data Model: We represent these features as a 2D random vector X = [SquareFootage, DistanceToCenter]T.
- Architecture: The Covariance Matrix Σ stores Var(SquareFootage), Var(DistanceToCenter), and Cov(SquareFootage, DistanceToCenter).
- Localized Details: A negative off-diagonal element tells us that houses further out tend to be cheaper, adjusting our multidimensional Gaussian contours to be tilted.
- Scale: In high dimensions (e.g., thousands of features), covariance matrices become too large for memory (O(N2)). We use low-rank approximations or PCA.
1. Definition
For a random vector X = [X1, X2, …, Xn]T, the covariance matrix Σ is an n × n matrix where the element at (i, j) is the covariance between Xi and Xj.
For two variables X and Y:
Key Properties
- Symmetric: Σij = Σji (since Cov(X, Y) = Cov(Y, X)).
- Positive Semi-Definite (PSD): vT Σ v ≥ 0 for any vector v. This ensures variance is never negative along any projection.
- Eigen-Decomposition: The eigenvectors of Σ point in the directions of greatest variance (Principal Components). The eigenvalues represent the magnitude of variance in those directions. Imagine stretching a circular rubber sheet: the eigenvectors are the directions in which you stretch, and the eigenvalues are how far you pull in each direction.
2. Interactive: Covariance Playground
Visualize how changing variances and correlation affects the shape of a Bivariate Gaussian distribution.
- Sliders: Adjust σX, σY, and Correlation ρ.
- Blue Ellipse: Represents the 95% confidence region.
- Red Arrows: The eigenvectors (Principal Axes).
3. Hardware Reality: Memory Layout Matters
Mathematically, a matrix is just a grid of numbers. To a CPU, it is a linear strip of memory.
- Row-Major (C/Java/Python):
[[1, 2], [3, 4]]is stored as1, 2, 3, 4. - Column-Major (Fortran/MATLAB): Stored as
1, 3, 2, 4.
When computing the covariance matrix from raw data X (N samples x D dimensions), we essentially compute XT X. This involves dot products of columns.
If your data is stored Row-Major (N rows of D features), accessing a “column” (feature) requires skipping over D-1 elements for every read. This causes Cache Thrashing (fetching a whole cache line but only using one number). Think of reading a book column-by-column (skipping lines) instead of left-to-right: it’s incredibly slow and inefficient for your brain (the CPU).
Optimization: High-performance libraries (BLAS/LAPACK) often transpose the matrix first or use Block Matrix Multiplication to keep data in the CPU cache (L1/L2), speeding up covariance calculation by 10x-100x.
4. Coding Example
We can calculate the covariance matrix from scratch. The formula for sample covariance is: Cov(X, Y) = Σ (xi - x̄)(yi - ȳ) / (N - 1)
Java
import java.util.Arrays;
public class CovarianceMatrix {
public static void main(String[] args) {
// Data: 3 samples, 2 features (X, Y)
double[][] data = {
{1.0, 2.0},
{2.0, 3.0},
{3.0, 5.0}
};
double[][] cov = calculateCovariance(data);
System.out.println(Arrays.deepToString(cov));
}
public static double[][] calculateCovariance(double[][] data) {
int n = data.length;
int d = data[0].length;
// 1. Calculate Means
double[] means = new double[d];
for (double[] row : data) {
for (int j = 0; j < d; j++) {
means[j] += row[j];
}
}
for (int j = 0; j < d; j++) means[j] /= n;
// 2. Calculate Covariance Matrix (d x d)
double[][] cov = new double[d][d];
for (int j = 0; j < d; j++) {
for (int k = j; k < d; k++) { // Symmetric, so start k from j
double sum = 0;
for (int i = 0; i < n; i++) {
sum += (data[i][j] - means[j]) * (data[i][k] - means[k]);
}
double val = sum / (n - 1); // Sample covariance (unbiased)
cov[j][k] = val;
cov[k][j] = val;
}
}
return cov;
}
}
Go
package main
import (
"fmt"
)
func main() {
// Data: 3 samples, 2 features (X, Y)
data := [][]float64{
{1.0, 2.0},
{2.0, 3.0},
{3.0, 5.0},
}
cov := calculateCovariance(data)
fmt.Println(cov)
}
func calculateCovariance(data [][]float64) [][]float64 {
n := len(data)
d := len(data[0])
// 1. Calculate Means
means := make([]float64, d)
for _, row := range data {
for j, val := range row {
means[j] += val
}
}
for j := range means {
means[j] /= float64(n)
}
// 2. Calculate Covariance Matrix (d x d)
cov := make([][]float64, d)
for i := range cov {
cov[i] = make([]float64, d)
}
for j := 0; j < d; j++ {
for k := j; k < d; k++ {
sum := 0.0
for i := 0; i < n; i++ {
sum += (data[i][j] - means[j]) * (data[i][k] - means[k])
}
val := sum / float64(n-1)
cov[j][k] = val
cov[k][j] = val
}
}
return cov
}
Python
import numpy as np
# 1. Generate correlated data
# X ~ N(0, 1)
x = np.random.normal(0, 1, 1000)
# Y = 0.8*X + Noise
y = 0.8 * x + np.random.normal(0, 0.6, 1000)
# Stack into a (2, 1000) matrix
data = np.vstack([x, y])
# 2. Compute Covariance Matrix
# bias=True normalizes by N instead of N-1
sigma = np.cov(data, bias=True)
print("Covariance Matrix:")
print(sigma)
5. Summary
- Covariance Matrix: Summary of spread and linear relation between multiple variables.
- PSD Property: Ensures volumes (determinants) are non-negative and variance is real.
- Hardware: Matrix layout impacts performance significantly due to cache lines.