The Zoo of Distributions

1. Introduction: Describing Randomness

In the previous chapter, we learned how to calculate probabilities. Now, we learn what to calculate. Most real-world phenomena follow specific patterns. These patterns are called Probability Distributions.

A Random Variable X is a function that maps outcomes to numbers.

Discrete: X &in; {0, 1, 2, ...} (e.g., Number of emails).
Continuous: X &in; &Ropf; (e.g., Height, Temperature, Weights in a Neural Network).

PDF vs PMF

PMF (Probability Mass Function): For discrete variables. It gives the probability that a discrete random variable is exactly equal to some value.

P(X=k)
PDF (Probability Density Function): For continuous variables. The probability at a single point is technically 0. We measure probability as the Area under the curve.

P(a ≤ X ≤ b) = ∫ f(x)dx

2. Common Discrete Distributions

2.1 Bernoulli (`p`)

The “atom” of probability. A single trial with two outcomes: Success (1) or Failure (0).

Generative Story: You flip a biased coin once.
Parameter: p (probability of success).
ML Application: Logistic Regression outputs a Bernoulli probability P(Y=1|X). It models binary classification tasks like “Spam vs Ham”.

2.2 Binomial (`n, p`)

The sum of n independent Bernoulli trials.

Generative Story: You flip the same coin n times. How many heads do you get?
Formula:

P(X=k) = C(n, k) · p^k · (1-p)^n-k
ML Application: Predicting the number of conversions from n ad impressions.

2.3 Poisson (`λ`)

Models the number of events happening in a fixed interval of time or space.

Generative Story: Events happen independently at a constant average rate.
Parameter: λ (lambda, average rate).
Example: Number of API requests per second to your server.
ML Application: Modeling count data (e.g., predicting call center volume or server load).

3. Continuous Distributions

3.1 The Gaussian (Normal) Distribution (`μ, σ2`)

The “King of Distributions”. It is bell-shaped, symmetric, and defined by:

Mean (μ): The center (Expectation).
Variance (σ2): The spread (Uncertainty).

f(x) = [1 / (σ √(2π))] · e^{-(x - μ)² / (2σ²)}

Why is it everywhere?: The Central Limit Theorem says that if you add up enough random things (regardless of their original distribution), the sum becomes Gaussian.
ML Application:
- Weight Initialization: We initialize Neural Network weights from N(0, 1) or Xavier/He Normal to ensure stable training.
- Error Analysis: We assume noise is Gaussian ($y = mx + b + ε$, where ε ~ N(0, σ2)) in Linear Regression.

3.2 Exponential Distribution (`λ`)

Models the time between events in a Poisson process.

Generative Story: How long do you have to wait for the next bus (if buses arrive randomly)?
Parameter: λ (rate parameter).
Memoryless Property: P(T > t+s | T > s) = P(T > t). Past waiting time doesn’t affect future waiting time.

4. Interactive Visualizer: The Distribution Explorer

Select a distribution and tweak the parameters to see how the shape changes. Toggle CDF: Switch between the Density/Mass (PDF/PMF) and the Cumulative Distribution Function (CDF).

Show CDF (Cumulative)

5. Summary

Bernoulli: 1 coin flip.
Binomial: n coin flips.
Poisson: Counts per hour.
Gaussian: The Bell Curve (The sum of everything).
Exponential: Waiting time.

Next: Expectation & Variance →

The Zoo of Distributions

1. Introduction: Describing Randomness

PDF vs PMF

2. Common Discrete Distributions

2.1 Bernoulli (p)

2.2 Binomial (n, p)

2.3 Poisson (&lambda;)