Gaussian Properties
[!IMPORTANT] Why it matters: The Gaussian Distribution is the limit of everything. It’s not just a curve; it’s a law of nature.
While discrete variables jump from value to value, Continuous Random Variables flow smoothly across a range. For these, the probability of any exact value is zero. Instead, we measure probabilities over intervals.
1. Intuition: The Central Limit Theorem
Why does the “Bell Curve” appear everywhere? From heights to test scores to errors in measurement?
The answer is the Central Limit Theorem (CLT).
When you add up a large number of independent random variables (regardless of their original distribution), their sum tends toward a Gaussian distribution.
Genesis: Imagine you are measuring the position of a star.
- Wind pushes your telescope slightly left.
- Heat expansion pushes it slightly right.
- Atmospheric refraction bends light up.
- Vibration shakes it down.
Each error is random and small. When you sum thousands of these tiny, independent errors, the result is a Gaussian distribution. Gaussian = The Sum of Many Small Things.
2. Probability Density Function (PDF)
For continuous variables, we replace the PMF with the Probability Density Function (PDF), denoted f(x).
Key Concept: The area under the PDF curve represents probability.
- P(a ≤ X ≤ b) = ∫ab f(x) dx
- Total Area = ∫-∞∞ f(x) dx = 1
The Gaussian PDF Formula: f(x) = (1 / (σ√(2π))) × e-0.5((x-μ)/σ)2
Where:
- μ (Mu): The Mean (center of the bell).
- σ (Sigma): The Standard Deviation (width of the bell).
3. Properties of the Bell Curve
- Symmetric: The left side mirrors the right side.
- Unimodal: One single peak at the mean μ.
- Mean = Median = Mode: All align at the center.
- Asymptotic: The tails get closer and closer to the x-axis but never touch it.
The 68-95-99.7 Rule (Empirical Rule)
- 68% of data falls within μ ± 1σ
- 95% of data falls within μ ± 2σ
- 99.7% of data falls within μ ± 3σ
4. Hardware Reality: Precision & Generation
The IEEE 754 Problem (The Tails)
In the extreme tails of the Gaussian (e.g., 6σ away), probabilities become vanishingly small (10-9).
- Float (32-bit): Only has ~7 decimal digits of precision.
- Double (64-bit): Has ~15 decimal digits.
If you are calculating the probability of a “6-sigma event” (like a major market crash or particle physics discovery) using 32-bit floats, underflow will occur, and your result might snap to 0 or lose critical precision. Always use doubles for high-precision statistical tails.
Algorithm Genesis: The Box-Muller Transform
Computers can easily generate Uniform random numbers (0 to 1). But how do we turn a flat Uniform distribution into a bell-shaped Gaussian?
We use the Box-Muller Transform. It takes two independent uniform variables u1 and u2 and transforms them into two independent standard normal variables z0 and z1.
- z0 = √(-2 ln u1) &cos;(2πu2)
- z1 = √(-2 ln u1) &sin;(2πu2)
Why it works: It maps the polar coordinates of a 2D uniform distribution to Cartesian coordinates, naturally creating the Gaussian fall-off.
5. Interactive: Gaussian Curve Explorer
Adjust the Mean (μ) and Standard Deviation (σ) to see how the bell curve shifts and stretches. Notice how the area stays constant (Total Probability = 1).
6. Code Implementation
Generating Gaussian numbers from scratch using Box-Muller.
Java Example
import java.util.Random;
public class BoxMuller {
private static final Random rng = new Random();
// Box-Muller Transform
// Returns a standard normal variable Z ~ N(0, 1)
public static double generateStandardNormal() {
double u1 = rng.nextDouble();
double u2 = rng.nextDouble();
double z0 = Math.sqrt(-2.0 * Math.log(u1)) * Math.cos(2.0 * Math.PI * u2);
// double z1 = Math.sqrt(-2.0 * Math.log(u1)) * Math.sin(2.0 * Math.PI * u2);
return z0;
}
public static void main(String[] args) {
// Generate 5 random heights (Mean=170cm, StdDev=10cm)
double mean = 170.0;
double stdDev = 10.0;
System.out.println("Generated Heights:");
for (int i = 0; i < 5; i++) {
double z = generateStandardNormal();
double height = mean + (z * stdDev);
System.out.printf("%.2f cm%n", height);
}
}
}
Go Example
package main
import (
"fmt"
"math"
"math/rand"
"time"
)
// Box-Muller Transform
// Returns a standard normal variable Z ~ N(0, 1)
func generateStandardNormal() float64 {
u1 := rand.Float64()
u2 := rand.Float64()
// Handle the rare case where u1 is 0 to avoid Log(0)
if u1 < 1e-100 {
u1 = 1e-100
}
z0 := math.Sqrt(-2.0*math.Log(u1)) * math.Cos(2.0*math.Pi*u2)
return z0
}
func main() {
rand.Seed(time.Now().UnixNano())
mean := 170.0
stdDev := 10.0
fmt.Println("Generated Heights:")
for i := 0; i < 5; i++ {
z := generateStandardNormal()
height := mean + (z * stdDev)
fmt.Printf("%.2f cm\n", height)
}
}
[!TIP] Production Tip: In production, use
java.util.Random.nextGaussian()or Go’srand.NormFloat64(). They often use the Ziggurat Algorithm, which is even faster than Box-Muller because it avoids expensivelogandsqrtcalls most of the time!
Summary
- Central Limit Theorem: Sum of many small errors = Gaussian.
- IEEE 754: Use
doublefor tails to avoid underflow. - Box-Muller: The math hack to turn Uniform random numbers into Normal ones.