The Zoo of Distributions
[!NOTE] This module explores the core principles of The Zoo of Distributions, deriving solutions from first principles and hardware constraints to build world-class, production-ready expertise.
1. Introduction: Describing Randomness
In the previous chapter, we learned how to calculate probabilities. Now, we learn what to calculate. Most real-world phenomena follow specific patterns. These patterns are called Probability Distributions.
A Random Variable X is a function that maps outcomes to numbers.
- Discrete:
X ∈ {0, 1, 2, ...}(e.g., Number of emails). - Continuous:
X ∈ ℝ(e.g., Height, Temperature, Weights in a Neural Network).
PDF vs PMF
- PMF (Probability Mass Function): For discrete variables. It gives the probability that a discrete random variable is exactly equal to some value. > P(X=k)
- PDF (Probability Density Function): For continuous variables. The probability at a single point is technically 0. We measure probability as the Area under the curve. > P(a ≤ X ≤ b) = ∫ f(x)dx
2. Common Discrete Distributions
2.1 Bernoulli (p)
The “atom” of probability. A single trial with two outcomes: Success (1) or Failure (0).
- Generative Story: You flip a biased coin once.
- Parameter:
p(probability of success). - ML Application: Logistic Regression outputs a Bernoulli probability
P(Y=1|X). It models binary classification tasks like “Spam vs Ham”.
2.2 Binomial (n, p)
The sum of n independent Bernoulli trials.
- Generative Story: You flip the same coin
ntimes. How many heads do you get? - Formula: > P(X=k) = C(n, k) · pk · (1-p)n-k
- ML Application: Predicting the number of conversions from
nad impressions.
2.3 Poisson (λ)
Models the number of events happening in a fixed interval of time or space.
- Generative Story: Events happen independently at a constant average rate.
- Parameter:
λ(lambda, average rate). - Example: Number of API requests per second to your server.
- ML Application: Modeling count data (e.g., predicting call center volume or server load). This is critical for Capacity Planning (see System Design Module 01).
3. Continuous Distributions
3.1 The Gaussian (Normal) Distribution (μ, σ<sup>2</sup>)
The “King of Distributions”. It is bell-shaped, symmetric, and defined by:
- Mean (
μ): The center (Expectation). - Variance (
σ<sup>2</sup>): The spread (Uncertainty).
f(x) = [1 / (σ √(2π))] · e-(x - μ)2 / (2σ2)
- Why is it everywhere?: The Central Limit Theorem says that if you add up enough random things (regardless of their original distribution), the sum becomes Gaussian.
- ML Application:
- Weight Initialization: We initialize Neural Network weights from
N(0, 1)or Xavier/He Normal to ensure stable training. - Error Analysis: We assume noise is Gaussian (y = mx + b + ε, where
ε ~ N(0, σ<sup>2</sup>)) in Linear Regression.
3.2 Exponential Distribution (λ)
Models the time between events in a Poisson process.
- Generative Story: How long do you have to wait for the next bus (if buses arrive randomly)?
- Parameter:
λ(rate parameter). - Memoryless Property:
P(T > t+s | T > s) = P(T > t). Past waiting time doesn’t affect future waiting time.
Implementation in Python
Generating samples from these distributions is trivial with NumPy:
import numpy as np
# 1. Bernoulli (1 Trial)
p = 0.5
bernoulli_trial = np.random.choice([0, 1], p=[1-p, p])
print(f"Bernoulli Result: {bernoulli_trial}")
# 2. Binomial (n Trials)
n, p = 10, 0.5
binomial_sample = np.random.binomial(n, p, size=1000)
# Most common value should be n*p = 5
# 3. Poisson (Rate lambda)
lam = 5
poisson_sample = np.random.poisson(lam, size=1000)
# 4. Normal (Gaussian)
mu, sigma = 0, 1
normal_sample = np.random.normal(mu, sigma, size=1000)
# 5. Exponential
scale = 1.0 # 1/lambda
exponential_sample = np.random.exponential(scale, size=1000)
4. Interactive Visualizer: The Distribution Explorer
Select a distribution and tweak the parameters to see how the shape changes. Toggle CDF: Switch between the Density/Mass (PDF/PMF) and the Cumulative Distribution Function (CDF).
[!TIP] Try it yourself: Increase
nin Binomial to 50. Notice how it starts to look like a Gaussian Bell Curve. This is the Central Limit Theorem in action!
5. Summary
- Bernoulli: 1 coin flip.
- Binomial:
ncoin flips. - Poisson: Counts per hour.
- Gaussian: The Bell Curve (The sum of everything).
- Exponential: Waiting time.