The Zoo of Distributions

[!NOTE] This module explores the core principles of The Zoo of Distributions, deriving solutions from first principles and hardware constraints to build world-class, production-ready expertise.

1. Introduction: Describing Randomness

In the previous chapter, we learned how to calculate probabilities. Now, we learn what to calculate. Most real-world phenomena follow specific patterns. These patterns are called Probability Distributions.

A Random Variable X is a function that maps outcomes to numbers.

Discrete: X &in; {0, 1, 2, ...} (e.g., Number of emails).
Continuous: X &in; &Ropf; (e.g., Height, Temperature, Weights in a Neural Network).

PDF vs PMF

PMF (Probability Mass Function): For discrete variables. It gives the probability that a discrete random variable is exactly equal to some value. > P(X=k)
PDF (Probability Density Function): For continuous variables. The probability at a single point is technically 0. We measure probability as the Area under the curve. > P(a ≤ X ≤ b) = ∫ f(x)dx

2. Common Discrete Distributions

2.1 Bernoulli (`p`)

The “atom” of probability. A single trial with two outcomes: Success (1) or Failure (0).

Generative Story: You flip a biased coin once.
Parameter: p (probability of success).
ML Application: Logistic Regression outputs a Bernoulli probability P(Y=1|X). It models binary classification tasks like “Spam vs Ham”.

2.2 Binomial (`n, p`)

The sum of n independent Bernoulli trials.

Generative Story: You flip the same coin n times. How many heads do you get?
Formula: > P(X=k) = C(n, k) · p^k · (1-p)^n-k
ML Application: Predicting the number of conversions from n ad impressions.

2.3 Poisson (`λ`)

Models the number of events happening in a fixed interval of time or space.

Generative Story: Events happen independently at a constant average rate.
Parameter: λ (lambda, average rate).
Example: Number of API requests per second to your server.
ML Application: Modeling count data (e.g., predicting call center volume or server load). This is critical for Capacity Planning (see System Design Module 01).

3. Continuous Distributions

3.1 The Gaussian (Normal) Distribution (`μ, σ2`)

The “King of Distributions”. It is bell-shaped, symmetric, and defined by:

Mean (μ): The center (Expectation).
Variance (σ2): The spread (Uncertainty).

f(x) = [1 / (σ √(2π))] · e^{-(x - μ)² / (2σ²)}

Why is it everywhere?: The Central Limit Theorem says that if you add up enough random things (regardless of their original distribution), the sum becomes Gaussian.
ML Application:
Weight Initialization: We initialize Neural Network weights from N(0, 1) or Xavier/He Normal to ensure stable training.
Error Analysis: We assume noise is Gaussian (y = mx + b + ε, where ε ~ N(0, σ2)) in Linear Regression.

3.2 Exponential Distribution (`λ`)

Models the time between events in a Poisson process.

Generative Story: How long do you have to wait for the next bus (if buses arrive randomly)?
Parameter: λ (rate parameter).
Memoryless Property: P(T > t+s | T > s) = P(T > t). Past waiting time doesn’t affect future waiting time.

Implementation in Python

Generating samples from these distributions is trivial with NumPy:

import numpy as np

# 1. Bernoulli (1 Trial)
p = 0.5
bernoulli_trial = np.random.choice([0, 1], p=[1-p, p])
print(f"Bernoulli Result: {bernoulli_trial}")

# 2. Binomial (n Trials)
n, p = 10, 0.5
binomial_sample = np.random.binomial(n, p, size=1000)
# Most common value should be n*p = 5

# 3. Poisson (Rate lambda)
lam = 5
poisson_sample = np.random.poisson(lam, size=1000)

# 4. Normal (Gaussian)
mu, sigma = 0, 1
normal_sample = np.random.normal(mu, sigma, size=1000)

# 5. Exponential
scale = 1.0 # 1/lambda
exponential_sample = np.random.exponential(scale, size=1000)

4. Interactive Visualizer: The Distribution Explorer

Select a distribution and tweak the parameters to see how the shape changes. Toggle CDF: Switch between the Density/Mass (PDF/PMF) and the Cumulative Distribution Function (CDF).

[!TIP] Try it yourself: Increase n in Binomial to 50. Notice how it starts to look like a Gaussian Bell Curve. This is the Central Limit Theorem in action!

Show CDF (Cumulative)

5. Summary

Bernoulli: 1 coin flip.
Binomial: n coin flips.
Poisson: Counts per hour.
Gaussian: The Bell Curve (The sum of everything).
Exponential: Waiting time.

The Zoo of Distributions

The Zoo of Distributions

1. Introduction: Describing Randomness

PDF vs PMF

2. Common Discrete Distributions

2.1 Bernoulli (p)

2.2 Binomial (n, p)

2.3 Poisson (&lambda;)

3. Continuous Distributions

3.1 The Gaussian (Normal) Distribution (&mu;, &sigma;<sup>2</sup>)

3.2 Exponential Distribution (&lambda;)

Implementation in Python

4. Interactive Visualizer: The Distribution Explorer

5. Summary

2.1 Bernoulli (`p`)

2.2 Binomial (`n, p`)

2.3 Poisson (`λ`)

3.1 The Gaussian (Normal) Distribution (`μ, σ<sup>2</sup>`)

3.2 Exponential Distribution (`λ`)