Bias-Variance Tradeoff

The Bias-Variance Tradeoff is the central problem in supervised learning. Ideally, we want to choose a model that both accurately captures the regularities in its training data, but also generalizes well to unseen data. Unfortunately, it is typically impossible to do both simultaneously.

  • High Bias causes an algorithm to miss the relevant relations between features and target outputs (underfitting).
  • High Variance causes an algorithm to model the random noise in the training data, rather than the intended outputs (overfitting).

1. The Error Decomposition

The expected error of a learning algorithm can be decomposed into three components:

Error = Bias2 + Variance + Irreducible Error

1.1 Bias (Underfitting)

Bias is the error from erroneous assumptions in the learning algorithm. High bias can cause an algorithm to miss the relevant relations between features and target outputs (underfitting).

  • Concept: The model is “too simple” to capture the underlying structure of the data.
  • Symptoms: High training error AND high validation error.
  • Example: Linear regression on a dataset with a quadratic relationship.

1.2 Variance (Overfitting)

Variance is the error from sensitivity to small fluctuations in the training set. High variance can cause an algorithm to model the random noise in the training data, rather than the intended outputs (overfitting).

  • Concept: The model is “too complex” and memorizes the noise in the training data.
  • Symptoms: Low training error BUT high validation error.
  • Example: High-degree polynomial regression.

1.3 Irreducible Error (Noise)

The noise term (\epsilon) represents the fundamental limitation of the problem itself (e.g., measurement error, missing features). This error cannot be reduced by any model.

2. Interactive: Polynomial Fitter

Use the visualizer below to explore how model complexity (polynomial degree) affects Bias and Variance.

  • Degree 1 (Underfitting): The line is too simple to capture the curve. High Bias.
  • Degree 3 (Balanced): Captures the underlying sine wave pattern well. Low Bias, Low Variance.
  • Degree 15 (Overfitting): Wiggles wildly to hit every single noisy point. Low Bias, High Variance.
Train MSE: 0.00
Test MSE: 0.00

3. Detecting Bias and Variance with Code

We can diagnose these issues by plotting the training and validation errors as a function of the training set size (Learning Curves) or model complexity.

Python Implementation

Here is how you can visualize the Bias-Variance tradeoff using Python and Scikit-Learn.

import numpy as np
import matplotlib.pyplot as plt
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

# 1. Generate Synthetic Data
def true_fun(X):
    return np.cos(1.5 * np.pi * X)

np.random.seed(0)
n_samples = 30
degrees = [1, 4, 15] # Linear, Balanced, Overfit

X = np.sort(np.random.rand(n_samples))
y = true_fun(X) + np.random.randn(n_samples) * 0.1

# 2. Fit Models and Plot
plt.figure(figsize=(14, 5))
for i in range(len(degrees)):
    ax = plt.subplot(1, len(degrees), i + 1)
    plt.setp(ax, xticks=(), yticks=())

    polynomial_features = PolynomialFeatures(degree=degrees[i], include_bias=False)
    linear_regression = LinearRegression()
    pipeline = Pipeline([
        ("polynomial_features", polynomial_features),
        ("linear_regression", linear_regression)
    ])
    pipeline.fit(X[:, np.newaxis], y)

    # Evaluate
    X_test = np.linspace(0, 1, 100)
    plt.plot(X_test, pipeline.predict(X_test[:, np.newaxis]), label="Model")
    plt.plot(X_test, true_fun(X_test), label="True function")
    plt.scatter(X, y, edgecolor='b', s=20, label="Samples")

    plt.xlabel("x")
    plt.ylabel("y")
    plt.xlim((0, 1))
    plt.ylim((-2, 2))
    plt.legend(loc="best")
    plt.title("Degree {}\nMSE = {:.2e}".format(
        degrees[i], mean_squared_error(y, pipeline.predict(X[:, np.newaxis]))))

plt.show()

4. Key Takeaways

Metric High Bias (Underfitting) High Variance (Overfitting)
Training Error High Low (approx 0)
Validation Error High High
Gap Small gap between Train/Val Large gap between Train/Val
Solution Add features, increase complexity Add data, regularization, decrease complexity

[!TIP] Regularization (L1/L2) is a technique to explicitly control variance by adding a penalty term to the loss function that discourages complex models (large coefficients).