Module Review: Regression Analysis

[!NOTE] This module explores the core principles of Module Review: Regression Analysis, deriving solutions from first principles and hardware constraints to build world-class, production-ready expertise.

1. Interactive Flashcards

Test your recall. Click a card to flip it.

Why do we minimize Squared Errors (SSE)?

1. Geometric intuition (Pythagoras). 2. Gauss-Markov: Equivalent to MLE for Normal errors. 3. Calculus: Differentiable everywhere.

What is the "Floating Point Trap" in OLS?

Matrix inversion is unstable with correlated features (multicollinearity). Tiny rounding errors explode, making the "Normal Equation" dangerous in production.

What does a pattern in residuals imply?

It implies there is still **information** left in the data that the model hasn't extracted. Ideal residuals are white noise (random).

Why is Autocorrelation dangerous?

It makes OLS think you have more independent data points than you really do, leading to tiny Standard Errors and fake statistical significance.

How does Lasso (L1) save memory?

It forces coefficients to exactly zero, allowing us to use **Sparse Matrix** formats (CSR) to store only non-zero values, saving RAM and CPU.

What is the Bias-Variance Tradeoff?

The balance between Underfitting (High Bias, too simple) and Overfitting (High Variance, too complex). Regularization adds Bias to lower Variance.

2. Cheat Sheet

Formulas

Concept	Formula
Linear Model	`y = β<sub>0</sub> + β<sub>1</sub>x + ε`
SSE	`Σ(y<sub>i</sub> - \hat{y}<sub>i</sub>)<sup>2</sup>`
Ridge Loss	`SSE + λΣβ<sup>2</sup>`
Lasso Loss	`SSE + λΣ\|β\|`

Code Reference

Go (Gonum)

import "gonum.org/v1/gonum/stat"
// alpha = intercept, beta = slope
alpha, beta := stat.LinearRegression(x, y, nil, false)

Java (Apache Commons)

import org.apache.commons.math3.stat.regression.SimpleRegression;
SimpleRegression r = new SimpleRegression();
r.addData(x, y);
r.getSlope();

Python (Statsmodels)

import statsmodels.api as sm
X = sm.add_constant(X)
model = sm.OLS(y, X).fit()
print(model.summary())

Statistics Glossary