Review: Multivariate Probability

This module covered how to handle multiple random variables simultaneously, from their joint behavior to their individual marginals and the structures that connect them (conditional independence).

Key Takeaways

Curse of Dimensionality: A full joint probability table grows exponentially (K^N). We must use factorization (Conditional Independence) to make storage feasible.
Marginalization: Recovering the distribution of a single variable P(X) by summing/integrating out others from the Joint P(X, Y).
Naive Bayes: An algorithm that assumes conditional independence to reduce model complexity from exponential to linear.
Covariance Matrix: A symmetric, positive semi-definite matrix that summarizes the spread and direction of linear relationships between variables.
Hardware Reality: Matrix memory layout (Row-major vs Column-major) critically impacts the performance of covariance calculations due to CPU cache locality.

1. Interactive Flashcards

Test your understanding of the core concepts.

Marginal Probability P(X)

How do you calculate P(X) from the Joint Distribution P(X, Y)?

Sum Rule

Sum (or integrate) the joint probability over all possible values of Y:

P(X) = Σ y P(X, Y)

Conditional Independence

What does it mean for X and Y to be conditionally independent given Z?

(X ⊥ Y | Z)

It means that once you know Z, knowing Y gives no additional information about X.

P(X | Y, Z) = P(X | Z)

Covariance Matrix Σ

What do the diagonal and off-diagonal elements represent?

Variances & Covariances

Diagonal: Variances of individual variables (σ²). Off-Diagonal: Covariances between pairs (σ_XY).

Positive Semi-Definite (PSD)

Why must a Covariance Matrix be PSD?

Non-Negative Variance

It ensures that the variance of any linear combination of the variables is non-negative (Var(aX + bY) ≥ 0). Geometrically, it means the distribution has real, non-negative axes lengths.

Row-Major vs Column-Major

Why does matrix memory layout matter for Covariance calculation?

Cache Locality

Accessing memory sequentially (Row-major for C/Java) uses the CPU cache efficiently. Jumping across rows (Column access) causes cache misses, slowing down dot products significantly.

2. Cheat Sheet

Concept	Formula / Definition
Joint Probability	P(X, Y)
Marginal Probability	P(X) = Σ_y P(X, Y)
Conditional Probability	P(X \| Y) = P(X, Y) / P(Y)
Independence	P(X, Y) = P(X)P(Y)
Cond. Independence	P(X, Y \| Z) = P(X \| Z)P(Y \| Z)
Expectation (Linearity)	E[aX + bY] = aE[X] + bE[Y]
Covariance	Cov(X, Y) = E[(X - μ_X)(Y - μ_Y)]
Correlation	ρ_XY = Cov(X, Y) / (σ_X σ_Y)
Variance Sum Law	Var(X + Y) = Var(X) + Var(Y) + 2Cov(X, Y)

Matrix Operations

Operation	Result
Covariance Matrix	Σ = E[(X - μ)(X - μ)^T]
Transformation Y = AX	Σ_Y = A Σ_X A^T
Mahalanobis Distance	d² = (x - μ)^T Σ^-1 (x - μ)

Probability Glossary

Module Review: Multivariate Probability