Review: Multivariate Probability

This module covered how to handle multiple random variables simultaneously, from their joint behavior to their individual marginals and the structures that connect them (conditional independence).

Key Takeaways

  • Curse of Dimensionality: A full joint probability table grows exponentially (KN). We must use factorization (Conditional Independence) to make storage feasible.
  • Marginalization: Recovering the distribution of a single variable P(X) by summing/integrating out others from the Joint P(X, Y).
  • Naive Bayes: An algorithm that assumes conditional independence to reduce model complexity from exponential to linear.
  • Covariance Matrix: A symmetric, positive semi-definite matrix that summarizes the spread and direction of linear relationships between variables.
  • Hardware Reality: Matrix memory layout (Row-major vs Column-major) critically impacts the performance of covariance calculations due to CPU cache locality.

1. Interactive Flashcards

Test your understanding of the core concepts.

Marginal Probability P(X)

How do you calculate P(X) from the Joint Distribution P(X, Y)?

Sum Rule

Sum (or integrate) the joint probability over all possible values of Y:

P(X) = Σy P(X, Y)

Conditional Independence

What does it mean for X and Y to be conditionally independent given Z?

(X ⊥ Y | Z)

It means that once you know Z, knowing Y gives no additional information about X.

P(X | Y, Z) = P(X | Z)

Covariance Matrix Σ

What do the diagonal and off-diagonal elements represent?

Variances & Covariances

Diagonal: Variances of individual variables (σ2). Off-Diagonal: Covariances between pairs (σXY).

Positive Semi-Definite (PSD)

Why must a Covariance Matrix be PSD?

Non-Negative Variance

It ensures that the variance of any linear combination of the variables is non-negative (Var(aX + bY) ≥ 0). Geometrically, it means the distribution has real, non-negative axes lengths.

Row-Major vs Column-Major

Why does matrix memory layout matter for Covariance calculation?

Cache Locality

Accessing memory sequentially (Row-major for C/Java) uses the CPU cache efficiently. Jumping across rows (Column access) causes cache misses, slowing down dot products significantly.


2. Cheat Sheet

Concept Formula / Definition
Joint Probability P(X, Y)
Marginal Probability P(X) = Σy P(X, Y)
Conditional Probability P(X | Y) = P(X, Y) / P(Y)
Independence P(X, Y) = P(X)P(Y)
Cond. Independence P(X, Y | Z) = P(X | Z)P(Y | Z)
Expectation (Linearity) E[aX + bY] = aE[X] + bE[Y]
Covariance Cov(X, Y) = E[(X - μX)(Y - μY)]
Correlation ρXY = Cov(X, Y) / (σX σY)
Variance Sum Law Var(X + Y) = Var(X) + Var(Y) + 2Cov(X, Y)

Matrix Operations

Operation Result
Covariance Matrix Σ = E[(X - μ)(X - μ)T]
Transformation Y = AX ΣY = A ΣX AT
Mahalanobis Distance d2 = (x - μ)T Σ-1 (x - μ)

Probability Glossary