Probability Basics: The Language of Uncertainty
1. Introduction: Modeling the Unknown
In classical mechanics, if we drop a ball, we can calculate exactly where it lands. The universe is deterministic. In Machine Learning, we deal with Uncertainty. The real world is noisy, chaotic, and often hidden from us.
- Classification: “Is this image a Cat or a Dog?” We are 90% sure it’s a Cat.
- Prediction: “Will this user click the ad?” There is a 2% chance.
- NLP: “What is the next word in this sentence?”
Probability Theory provides the rigorous rules for manipulating these uncertainties. It allows us to build systems that can reason even when they don’t have all the facts.
2. The Two Schools of Thought
Before diving into formulas, it’s crucial to understand what probability actually means. There are two dominant interpretations:
2.1 The Frequentist View
Probability is the long-run frequency of an event.
- Definition:
P(Heads)is the proportion of Heads if we flip the coin infinite times. - Perspective: Probabilities are objective properties of the physical world.
- Limitation: It cannot assign probability to single, non-repeatable events (e.g., “Probability that candidate X wins the election”).
2.2 The Bayesian View
Probability is a degree of belief.
- Definition:
P(Rain)quantifies our uncertainty based on current knowledge. - Perspective: Probabilities are subjective and change as we get new information.
- Power: This allows us to update our beliefs (Bayesian Inference), which is the foundation of modern AI. “I am 70% sure it will rain” is a valid Bayesian statement.
3. Axioms of Probability
To build a consistent math system, we start with three rules (Kolmogorov’s Axioms). Let Ω be the Sample Space (all possible outcomes). An Event A is a subset of Ω.
- Non-Negativity:
P(A) ≥ 0.- Meaning: You cannot have a negative chance of something happening.
- Normalization:
P(Ω) = 1.- Meaning: Ideally, something must happen. The sum of probabilities of all possible outcomes is 100%.
- Additivity: If events
AandBare disjoint (mutually exclusive), then:P(A ∪ B) = P(A) + P(B)
Why do these matter?
Everything else in probability is derived from these three simple rules. For example, the rule for the complement of an event:
P(Ac) = 1 - P(A)
This ensures our reasoning is consistent. If an ML model predicts P(Cat) = 0.8 and P(Not Cat) = 0.3, it sums to 1.1, violating the axioms and proving the model is fundamentally broken.
4. Conditional Probability & Bayes’ Theorem
This is the cornerstone of Machine Learning. How does the probability of an event change when we get new information?
P(A B) = P(A ∩ B) / P(B)
- Read as: “The probability of A given B occurred.”
- Intuition: We restrict our universe from Ω to just
B. Now, we look at what fraction ofBis alsoA.
Bayes’ Theorem
This formula allows us to invert conditional probabilities. It tells us how to update our belief (A) after seeing evidence (B).
P(A B) = [ P(B A) · P(A) ] / P(B)
- P(A) (Prior): Our initial belief before seeing evidence.
-
**P(B A) (Likelihood)**: How likely is the evidence if our belief is true? - P(B) (Evidence): The total probability of the evidence occurring.
-
**P(A B) (Posterior)**: Our updated belief after seeing evidence.
5. Case Study: Medical Diagnosis
Let’s apply Bayes’ Theorem to a real-world scenario.
Scenario:
- A rare disease affects 1% of the population (
P(Disease) = 0.01). - A test is 99% accurate (
P(+|Disease) = 0.99,P(-|No Disease) = 0.99). - You test positive (
+). What is the probability you actually have the disease?
Intuition: Most people say 99%.
Math:
We need P(Disease | +).
- Prior:
P(D) = 0.01 - Likelihood:
P(+|D) = 0.99 - Evidence
P(+): We can test positive in two ways:- True Positive:
P(+|D)P(D) = 0.99 × 0.01 = 0.0099 - False Positive:
P(+|No D)P(No D) = 0.01 × 0.99 = 0.0099 - Total
P(+)=0.0099 + 0.0099 = 0.0198
- True Positive:
P(D +) = (0.99 × 0.01) / 0.0198 = 0.5 (50%)
Conclusion: Despite the 99% accuracy, a positive result only means a 50% chance of disease because the disease is so rare (Low Prior). This is why doctors re-test!
6. Interactive Visualizer: The Monty Hall Simulator
The famous counter-intuitive problem that stumps PhDs.
- 3 Doors: Behind 1 is a Car (Win), behind 2 are Goats (Lose).
- You pick a door (e.g., Door 1).
- The Host (Monty), who knows the truth, opens another door (e.g., Door 3) to reveal a Goat.
- Choice: Do you Stay with Door 1 or Switch to Door 2?
Intuition: “It’s 50/50. Two doors left, one car.” Reality: Switching doubles your win rate to 66% (2/3).
Let’s prove it with a simulation.
7. Summary
- Axioms are the unbreakable rules:
P(Ω)=1,P(A) ≥ 0, Additivity. - Bayes’ Theorem is the engine of learning. It turns Prior Beliefs into Posterior Knowledge using Evidence.
- Intuition is Flawed: The human brain struggles with conditional probability (as proven by the Monty Hall problem). Always trust the math or run a simulation.