The Zoo of Distributions

[!NOTE] This module explores the core principles of The Zoo of Distributions, deriving solutions from first principles and hardware constraints to build world-class, production-ready expertise.

1. Introduction: Describing Randomness

In the previous chapter, we learned how to calculate probabilities. Now, we learn what to calculate. Most real-world phenomena follow specific patterns. These patterns are called Probability Distributions.

A Random Variable X is a function that maps outcomes to numbers.

  • Discrete: X ∈ {0, 1, 2, ...} (e.g., Number of emails).
  • Continuous: X ∈ ℝ (e.g., Height, Temperature, Weights in a Neural Network).

PDF vs PMF

  • PMF (Probability Mass Function): For discrete variables. It gives the probability that a discrete random variable is exactly equal to some value. > P(X=k)
  • PDF (Probability Density Function): For continuous variables. The probability at a single point is technically 0. We measure probability as the Area under the curve. > P(a ≤ X ≤ b) = ∫ f(x)dx

2. Common Discrete Distributions

2.1 Bernoulli (p)

The “atom” of probability. A single trial with two outcomes: Success (1) or Failure (0).

  • Generative Story: You flip a biased coin once.
  • Parameter: p (probability of success).
  • ML Application: Logistic Regression outputs a Bernoulli probability P(Y=1|X). It models binary classification tasks like “Spam vs Ham”.

2.2 Binomial (n, p)

The sum of n independent Bernoulli trials.

  • Generative Story: You flip the same coin n times. How many heads do you get?
  • Formula: > P(X=k) = C(n, k) · pk · (1-p)n-k
  • ML Application: Predicting the number of conversions from n ad impressions.

2.3 Poisson (λ)

Models the number of events happening in a fixed interval of time or space.

  • Generative Story: Events happen independently at a constant average rate.
  • Parameter: λ (lambda, average rate).
  • Example: Number of API requests per second to your server.
  • ML Application: Modeling count data (e.g., predicting call center volume or server load). This is critical for Capacity Planning (see System Design Module 01).

3. Continuous Distributions

3.1 The Gaussian (Normal) Distribution (&mu;, &sigma;<sup>2</sup>)

The “King of Distributions”. It is bell-shaped, symmetric, and defined by:

  1. Mean (&mu;): The center (Expectation).
  2. Variance (&sigma;<sup>2</sup>): The spread (Uncertainty).

f(x) = [1 / (σ √(2π))] · e-(x - μ)2 / (2σ2)

  • Why is it everywhere?: The Central Limit Theorem says that if you add up enough random things (regardless of their original distribution), the sum becomes Gaussian.
  • ML Application:
  • Weight Initialization: We initialize Neural Network weights from N(0, 1) or Xavier/He Normal to ensure stable training.
  • Error Analysis: We assume noise is Gaussian (y = mx + b + ε, where &epsilon; ~ N(0, &sigma;<sup>2</sup>)) in Linear Regression.

3.2 Exponential Distribution (&lambda;)

Models the time between events in a Poisson process.

  • Generative Story: How long do you have to wait for the next bus (if buses arrive randomly)?
  • Parameter: &lambda; (rate parameter).
  • Memoryless Property: P(T > t+s | T > s) = P(T > t). Past waiting time doesn’t affect future waiting time.

Implementation in Python

Generating samples from these distributions is trivial with NumPy:

import numpy as np

# 1. Bernoulli (1 Trial)
p = 0.5
bernoulli_trial = np.random.choice([0, 1], p=[1-p, p])
print(f"Bernoulli Result: {bernoulli_trial}")

# 2. Binomial (n Trials)
n, p = 10, 0.5
binomial_sample = np.random.binomial(n, p, size=1000)
# Most common value should be n*p = 5

# 3. Poisson (Rate lambda)
lam = 5
poisson_sample = np.random.poisson(lam, size=1000)

# 4. Normal (Gaussian)
mu, sigma = 0, 1
normal_sample = np.random.normal(mu, sigma, size=1000)

# 5. Exponential
scale = 1.0 # 1/lambda
exponential_sample = np.random.exponential(scale, size=1000)

4. Interactive Visualizer: The Distribution Explorer

Select a distribution and tweak the parameters to see how the shape changes. Toggle CDF: Switch between the Density/Mass (PDF/PMF) and the Cumulative Distribution Function (CDF).

[!TIP] Try it yourself: Increase n in Binomial to 50. Notice how it starts to look like a Gaussian Bell Curve. This is the Central Limit Theorem in action!

5. Summary

  • Bernoulli: 1 coin flip.
  • Binomial: n coin flips.
  • Poisson: Counts per hour.
  • Gaussian: The Bell Curve (The sum of everything).
  • Exponential: Waiting time.