Measures of Central Tendency

When we analyze a dataset, the first question we usually ask is: “What is the typical value?” or “Where is the center?”

Measures of Central Tendency help us answer this question. While the arithmetic mean is the most common, it can be misleading. In this chapter, we’ll explore the three pillars of central tendency: Mean, Median, and Mode, and learn when to use each.

1. The Arithmetic Mean

The Mean is what we commonly call the “average”. It is the sum of all values divided by the number of values.

Formula: x̄ = (1/n) Σ x_i

Pros: Uses every data point; easy to calculate mathematically.
Cons: Highly sensitive to Outliers.

Visual Intuition: The Balance Point

Think of the Mean as the center of gravity. If you placed weights on a seesaw at the positions of your data points, the fulcrum would need to be at the Mean to balance perfectly.

2. The Median

The Median is the middle value when the data is ordered from smallest to largest. If there is an even number of observations, it is the average of the two middle values.

Pros: Robust to outliers. One extreme value won’t pull the median away.
Cons: Mathematical manipulation is harder than the mean.

Visual Intuition: The Middle Ground

Think of the Median as the person standing exactly in the middle of a line sorted by height. Even if the tallest person grows another 2 feet, the middle person remains the same.

3. Interactive Demo: Mean vs. Median

This interactive visualization demonstrates the robustness of the Median compared to the Mean.

The Balance Point Visualizer

Click on the track below to add data points. Observe how the Mean (Red) and Median (Blue) react.

020406080100

Mean: 0.0

Median: 0.0

[!TIP] Try adding a point at 95 (the far right). Notice how the Mean (Red) gets pulled heavily towards it, while the Median (Blue) barely moves. This is why Median is preferred for income or house price data.

4. The Mode

The Mode is simply the value that appears most frequently.

Pros: The only measure of central tendency for categorical data (e.g., “Most popular t-shirt size”).
Cons: There can be no mode, one mode, or multiple modes (multimodal).

5. Real-World Case Study: Tail Latency (P99)

In distributed systems (like Netflix or Uber), the Mean is often useless.

Imagine you are monitoring a microservice. You check the response times for 7 requests: [20ms, 22ms, 21ms, 19ms, 23ms, 20ms, 2500ms]

The 2500ms spike is due to a “Stop-the-World” Garbage Collection pause.

Mean: (20+22+21+19+23+20+2500) / 7 ≈ 375 ms
Median (P50): Sorted [19, 20, 20, 21, 22, 23, 2500] → Middle is 21 ms

If you alert when “Average Latency > 100ms”, you will wake up the on-call engineer for a single blip. The Mean suggests the system is broken (375ms), while the Median correctly tells you the system is healthy for the typical user (21ms).

[!IMPORTANT] P99 (99th Percentile): This is the value below which 99% of requests fall. In our sorted list, the max value (2500ms) represents the experience of the “unluckiest” user. High-performance systems optimize for P99 to ensure everyone has a fast experience, not just the average user.

6. Implementation

We provide implementations in Python (for Data Science), Java, and Go (for Systems Engineering).

import java.util.Arrays;
import java.util.HashMap;
import java.util.Map;

public class CentralTendency {
    public static void main(String[] args) {
        int[] data = {20, 22, 21, 19, 23, 20, 2500};

        // 1. Mean
        double sum = 0;
        for (int x : data) sum += x;
        double mean = sum / data.length;
        System.out.printf("Mean:   %.2f ms%n", mean);

        // 2. Median
        Arrays.sort(data);
        double median;
        if (data.length % 2 == 0) {
            median = (data[data.length/2] + data[data.length/2 - 1]) / 2.0;
        } else {
            median = data[data.length/2];
        }
        System.out.printf("Median: %.2f ms%n", median);

        // 3. Mode
        Map<Integer, Integer> frequencyMap = new HashMap<>();
        int maxCount = 0;
        int mode = data[0];

        for (int x : data) {
            int count = frequencyMap.getOrDefault(x, 0) + 1;
            frequencyMap.put(x, count);
            if (count > maxCount) {
                maxCount = count;
                mode = x;
            }
        }
        System.out.println("Mode:   " + mode + " ms");
    }
}

package main

import (
	"fmt"
	"sort"
)

func main() {
	data := []float64{20, 22, 21, 19, 23, 20, 2500}

	// 1. Mean
	sum := 0.0
	for _, x := range data {
		sum += x
	}
	mean := sum / float64(len(data))
	fmt.Printf("Mean:   %.2f ms\n", mean)

	// 2. Median
	sort.Float64s(data)
	var median float64
	mid := len(data) / 2
	if len(data)%2 == 0 {
		median = (data[mid-1] + data[mid]) / 2.0
	} else {
		median = data[mid]
	}
	fmt.Printf("Median: %.2f ms\n", median)

	// 3. Mode
	counts := make(map[float64]int)
	maxCount := 0
	var mode float64
	for _, x := range data {
		counts[x]++
		if counts[x] > maxCount {
			maxCount = counts[x]
			mode = x
		}
	}
	fmt.Printf("Mode:   %.0f ms\n", mode)
}

7. Deep Dive: The Geometric Mean

The Arithmetic Mean works for additive data (e.g., height, weight). But what about growth rates?

If your investment grows by 10% in Year 1 and drops by 10% in Year 2:

Year 1 Multiplier: 1.10
Year 2 Multiplier: 0.90
Arithmetic Mean: (1.10 + 0.90) / 2 = 1.00 (0% change).

But in reality: $100 * 1.10 * 0.90 = $99. You lost money!

For multiplicative processes, we use the Geometric Mean: Formula: μ_geo = (x₁ * x₂ * … * x_n)^1/n

Geometric Mean: sqrt(1.10 * 0.90) = sqrt(0.99) ≈ 0.995 (-0.5% return), which correctly reflects reality.