Module Review: Inferential Statistics

We’ve covered the journey from sample to population. Here’s what you need to remember.

1. Key Takeaways

  • Central Limit Theorem (CLT): The sampling distribution of the mean approaches a Normal Distribution as n increases, regardless of the population shape. This is why we can use Z-scores and T-tests almost everywhere.
  • Hypothesis Testing: A “Proof by Contradiction” framework. We assume the Null Hypothesis (H0) is true, and only reject it if the data is vanishingly unlikely (P < α).
  • P-Value: The probability of observing your data (or something more extreme) if the Null Hypothesis were true. It is not the probability that the Null Hypothesis is true.
  • Confidence Intervals: A range of plausible values for the population parameter. “95% Confidence” means 95% of such intervals will capture the true parameter.
  • Standard Error: The standard deviation of the sampling distribution (σ / √n). It shrinks as sample size increases.

Review & Cheat Sheet: Inferential Statistics

[!NOTE] This module explores the core principles of Review & Cheat Sheet: Inferential Statistics, deriving solutions from first principles and hardware constraints to build world-class, production-ready expertise.

1. Interactive Flashcards

Click a card to reveal the definition.

Central Limit Theorem
States that the distribution of sample means approximates a normal distribution as the sample size gets larger, regardless of the population's distribution.
P-Value
The probability of obtaining test results at least as extreme as the results actually observed, under the assumption that the null hypothesis is correct.
Type I Error (α)
False Positive. Rejecting the Null Hypothesis when it is actually true. (Convicting an innocent person).
Type II Error (β)
False Negative. Failing to reject the Null Hypothesis when it is actually false. (Letting a guilty person go free).
Confidence Interval
A range of values so defined that there is a specified probability that the value of a parameter lies within it.
Standard Error
The standard deviation of the sampling distribution of a statistic, most commonly the mean. Formula: σ / √n

2. Python Cheat Sheet

Task Code Snippet
Normal Dist Data data = np.random.normal(loc=mu, scale=sigma, size=n)
T-Test (Indep) t_stat, p_val = stats.ttest_ind(group_a, group_b)
Standard Error sem = stats.sem(data)
Percentiles np.percentile(data, [2.5, 97.5])
Bootstrapping sample = np.random.choice(data, size=len(data), replace=True)

Essential Formulas

Concept Formula Notes
Standard Error σ / √n Precision increases with √n
Z-Score (X - μ) / σ Distance from mean in std devs
Confidence Interval X̄ ± Z × SE Range of likely values
Margin of Error Z × (σ / √n) Half the width of the CI

3. Next Steps

Now that you can test hypotheses and estimate intervals, it’s time to dive deeper into Estimation Theory. How do we calculate the “best” guess for a parameter in the first place?

For definitions of all terms, see the Statistics Glossary.