Review: Multivariate Probability
This module covered how to handle multiple random variables simultaneously, from their joint behavior to their individual marginals and the structures that connect them (conditional independence).
Key Takeaways
- Curse of Dimensionality: A full joint probability table grows exponentially (KN). We must use factorization (Conditional Independence) to make storage feasible.
- Marginalization: Recovering the distribution of a single variable P(X) by summing/integrating out others from the Joint P(X, Y).
- Naive Bayes: An algorithm that assumes conditional independence to reduce model complexity from exponential to linear.
- Covariance Matrix: A symmetric, positive semi-definite matrix that summarizes the spread and direction of linear relationships between variables.
- Hardware Reality: Matrix memory layout (Row-major vs Column-major) critically impacts the performance of covariance calculations due to CPU cache locality.
1. Interactive Flashcards
Test your understanding of the core concepts.
Marginal Probability P(X)
How do you calculate P(X) from the Joint Distribution P(X, Y)?
Sum Rule
Sum (or integrate) the joint probability over all possible values of Y:
Conditional Independence
What does it mean for X and Y to be conditionally independent given Z?
(X ⊥ Y | Z)
It means that once you know Z, knowing Y gives no additional information about X.
Covariance Matrix Σ
What do the diagonal and off-diagonal elements represent?
Variances & Covariances
Diagonal: Variances of individual variables (σ2). Off-Diagonal: Covariances between pairs (σXY).
Positive Semi-Definite (PSD)
Why must a Covariance Matrix be PSD?
Non-Negative Variance
It ensures that the variance of any linear combination of the variables is non-negative (Var(aX + bY) ≥ 0). Geometrically, it means the distribution has real, non-negative axes lengths.
Row-Major vs Column-Major
Why does matrix memory layout matter for Covariance calculation?
Cache Locality
Accessing memory sequentially (Row-major for C/Java) uses the CPU cache efficiently. Jumping across rows (Column access) causes cache misses, slowing down dot products significantly.
2. Cheat Sheet
| Concept | Formula / Definition |
|---|---|
| Joint Probability | P(X, Y) |
| Marginal Probability | P(X) = Σy P(X, Y) |
| Conditional Probability | P(X | Y) = P(X, Y) / P(Y) |
| Independence | P(X, Y) = P(X)P(Y) |
| Cond. Independence | P(X, Y | Z) = P(X | Z)P(Y | Z) |
| Expectation (Linearity) | E[aX + bY] = aE[X] + bE[Y] |
| Covariance | Cov(X, Y) = E[(X - μX)(Y - μY)] |
| Correlation | ρXY = Cov(X, Y) / (σX σY) |
| Variance Sum Law | Var(X + Y) = Var(X) + Var(Y) + 2Cov(X, Y) |
Matrix Operations
| Operation | Result |
|---|---|
| Covariance Matrix | Σ = E[(X - μ)(X - μ)T] |
| Transformation Y = AX | ΣY = A ΣX AT |
| Mahalanobis Distance | d2 = (x - μ)T Σ-1 (x - μ) |