Statistics Glossary

A comprehensive guide to key terms and concepts in Statistics.

A

Anscombe’s Quartet

A set of four datasets that have nearly identical simple descriptive statistics (mean, variance, correlation) but appear very different when graphed. It demonstrates the importance of visualizing data before analyzing it.

Example: You might have two datasets with a mean of 7.5 and variance of 4.12, but when plotted, one is a straight line with noise, and the other is a perfect parabola.

Asymptotic Normality

The property of an estimator where its sampling distribution approaches a normal distribution as the sample size increases.

Analogy: Like zooming out of a pixelated image. Up close (small sample), it looks blocky and irregular, but as you zoom out (increase sample size), it smooths out into a perfect bell curve.

B

Bias

The difference between the expected value of an estimator and the true value of the parameter being estimated. Bias is a measure of systematic error.

Example: If a scale is calibrated incorrectly and consistently reads 2 kg heavier than the actual weight, it has a positive bias of 2 kg.

Bias-Variance Tradeoff

The conflict in trying to simultaneously minimize these two sources of error that prevent supervised learning algorithms from generalizing beyond their training set.

Analogy: Throwing darts at a dartboard. High bias is consistently hitting the top left corner (systematically off-target). High variance is hitting all over the board (widely scattered). You want low bias and low variance (all darts tightly clustered in the bullseye).

Box Plot

A standardized way of displaying the distribution of data based on a five-number summary: minimum, first quartile ($Q_1$), median, third quartile ($Q_3$), and maximum. It is useful for identifying outliers and skewness.

C

Central Tendency

A central or typical value for a probability distribution. The most common measures of central tendency are the arithmetic mean, the median, and the mode.

Confidence Interval

A range of values, derived from sample statistics, that is likely to contain the value of an unknown population parameter.

Example: A 95% confidence interval of [45, 55] for the average age of a population means that if we were to take 100 different samples and compute the interval for each, about 95 of those intervals would contain the true average age.

Consistency

A property of an estimator where it converges in probability to the true value of the parameter as the sample size tends to infinity.

Correlation

A statistical measure that expresses the extent to which two variables are linearly related (meaning they change together at a constant rate). The correlation coefficient $r$ ranges from $-1$ to $1$.

Example: Height and shoe size typically have a strong positive correlation ($r$ close to $1$); as height increases, shoe size generally increases.

D

Descriptive Statistics

Brief informational coefficients that summarize a given data set, which can be either a representation of the entire population or a sample of a population.

E

Efficiency

A measure of the quality of an estimator. An efficient estimator has the minimum possible variance among all unbiased estimators (achieving the Cramér-Rao lower bound).

Estimator

A rule or formula that tells us how to calculate an estimate of a population parameter based on sample data.

Example: The sample mean ($\bar{x}$) is an estimator for the population mean ($\mu$).

H

Histogram

A graphical representation of the distribution of numerical data. It is an estimate of the probability distribution of a continuous variable.

I

Interquartile Range (IQR)

A measure of statistical dispersion, being equal to the difference between 75th and 25th percentiles, or between upper and lower quartiles: $IQR = Q_3 - Q_1$.

Example: If $Q_1$ (25th percentile) is 40 and $Q_3$ (75th percentile) is 60, the $IQR$ is 20, meaning the middle 50% of the data spans a range of 20 units.

Irreducible Error

The error that cannot be reduced by creating a better model. It is caused by the noise in the data itself.

K

Kurtosis

A measure of the “tailedness” of the probability distribution of a real-valued random variable. High kurtosis indicates heavy tails (more outliers), while low kurtosis indicates light tails.

L

Likelihood Function

A function of the parameters of a statistical model, given specific observed data. Likelihood differs from probability in that the data is fixed and the parameters vary.

Log-Likelihood

The natural logarithm of the likelihood function. It is often easier to maximize the log-likelihood than the likelihood itself because sums are easier to work with than products.

M

Maximum A Posteriori (MAP)

An estimate of an unknown quantity, that equals the mode of the posterior distribution. It incorporates a prior distribution over the parameter.

Maximum Likelihood Estimation (MLE)

A method of estimating the parameters of a probability distribution by maximizing a likelihood function, so that under the assumed statistical model, the observed data is most probable.

Example: If you flip a coin 10 times and get 7 heads, the MLE for the probability of heads is $0.7$, because this value makes the observed data (7 heads in 10 flips) the most probable outcome.

Mean

The arithmetic average of a set of numbers, calculated by dividing the sum of the values by the number of values. It is sensitive to outliers.

Mean Squared Error (MSE)

A measure of the quality of an estimator. It measures the average squared difference between the estimated values and the actual value. MSE incorporates both bias and variance.

Median

The middle value separating the greater and lesser halves of a data set. It is robust to outliers.

Example: In the dataset [1, 2, 3, 4, 100], the mean is 22, but the median is 3. The median better represents the “typical” value when there’s an extreme outlier like 100.

Method of Moments (MoM)

A method of estimation that equates sample moments (e.g., sample mean, sample variance) to population moments (expected values) to solve for unknown parameters.

Mode

The value that appears most often in a set of data values.

N

Normal Distribution

A probability distribution that is symmetric about the mean, showing that data near the mean are more frequent in occurrence than data far from the mean. Also known as the bell curve.

O

Outlier

A data point that differs significantly from other observations. Outliers may be due to variability in the measurement or may indicate experimental error.

P

P-Value

The probability of obtaining test results at least as extreme as the results actually observed, under the assumption that the null hypothesis is correct.

Analogy: If you assume a coin is fair (null hypothesis), and you flip it 10 times getting 10 heads, the P-Value is the extremely low probability ($\approx 0.001$) of that happening by chance. A low P-Value makes you doubt your assumption that the coin is fair.

Percentile

A score below which a given percentage of scores in its frequency distribution falls (exclusive definition) or a score at or below which a given percentage falls (inclusive definition).

Population Moment

The expected value of a power of a random variable. The first moment is the mean, the second central moment is the variance.

Posterior Distribution

The probability distribution of an unknown quantity, treated as a random variable, conditional on the evidence obtained from an experiment or survey.

Prior Distribution

The probability distribution that would express one’s beliefs about this quantity before some evidence is taken into account.

R

Range

The difference between the largest and smallest values in a set of values.

S

Sample Moment

The average of a power of the observed values in a sample. The first sample moment is the sample mean.

Skewness

A measure of the asymmetry of the probability distribution of a real-valued random variable about its mean. Positive skew indicates a tail on the right; negative skew indicates a tail on the left.

Example: Income distribution is typically positively skewed (tail on the right): most people earn a modest amount, but a few billionaires pull the tail far to the right, making the mean higher than the median.

Standard Deviation

A measure of the amount of variation or dispersion of a set of values. A low standard deviation indicates that the values tend to be close to the mean, while a high standard deviation indicates that the values are spread out over a wider range.

Sufficiency

A statistic is sufficient for a parameter if no other statistic that can be calculated from the same sample provides any additional information as to the value of the parameter.

V

Variance

The expectation of the squared deviation of a random variable from its mean. It measures how far a set of numbers is spread out from their average value.

Variance (of an Estimator)

The expectation of the squared deviation of an estimator from its mean. It measures the spread or precision of the estimator. High variance implies overfitting in predictive models.