Rank-Based Statistics

Parametric tests like the t-test and ANOVA are powerful, but they come with strings attached: The Normality Assumption.

If your data is skewed, has outliers, or is ordinal (like a 1-5 star rating), using a t-test can lead to false conclusions.

Rank-based tests are the solution. Instead of analyzing the raw values, we analyze their ranks.

1. The Transformation: Values → Ranks

The core idea is simple:

  1. Combine all data from all groups.
  2. Sort the data from smallest to largest.
  3. Assign a rank (1, 2, 3…) to each value.
  4. If values are tied, assign the average rank (e.g., if 5th and 6th values are equal, both get rank 5.5).
  5. Perform the test on the ranks, not the values.

This transformation makes the test robust to outliers. A value of 1,000,000 is just “Rank N”, same as if it were 100 (provided it’s still the largest).


2. Interactive: The Rank-Sum Racer

This visualizer demonstrates the Mann-Whitney U Test logic.

  • We have two groups: Group A (Blue) and Group B (Green).
  • Drag the points along the line.
  • Watch how their Ranks change relative to each other.
  • The U Statistic measures the degree of separation.

Group A (Blue)

Rank Sum (RA): --

Group B (Green)

Rank Sum (RB): --

U Statistic

U = min(UA, UB): --

Lower U = More Separation

3. The “Big Three” Non-Parametric Tests

Here is your cheat sheet for choosing the right test.

Scenario Parametric Test (Normal) Non-Parametric Test (Any Distribution)
2 Independent Groups Independent t-test Mann-Whitney U Test
2 Paired Groups Paired t-test Wilcoxon Signed-Rank Test
3+ Groups One-way ANOVA Kruskal-Wallis H Test

1. Mann-Whitney U Test

Used to test if two independent populations have the same distribution.

  • Null Hypothesis (H0): The distributions of both populations are identical.
  • Alternative (H1): One population tends to have larger values than the other.

2. Wilcoxon Signed-Rank Test

Used for paired data (e.g., Before vs After). It looks at the differences between pairs.

  • It ranks the absolute differences.
  • It tests if the median difference is zero.

3. Kruskal-Wallis Test

An extension of Mann-Whitney for more than two groups.

  • It ranks all data together.
  • If the null is true, the average rank for each group should be roughly the same.

4. Implementation Examples

Python (SciPy)

We use scipy.stats for these tests.

from scipy import stats
import numpy as np

# Example Data (Small Sample Sizes, Non-Normal)
group_a = [12, 15, 14, 11, 45] # Outlier 45
group_b = [22, 24, 25, 28, 26]

# 1. Mann-Whitney U Test
u_stat, p_val = stats.mannwhitneyu(group_a, group_b, alternative='two-sided')

print(f"Mann-Whitney U statistic: {u_stat}")
print(f"P-value: {p_val:.4f}")

if p_val < 0.05:
    print("Result: Significant difference between groups.")
else:
    print("Result: No significant difference.")

Java

Calculating the Mann-Whitney U statistic manually involves sorting ranks.

import java.util.ArrayList;
import java.util.Collections;
import java.util.List;

class RankData implements Comparable<RankData> {
    double value;
    String group;
    double rank;

    public RankData(double value, String group) {
        this.value = value;
        this.group = group;
    }

    @Override
    public int compareTo(RankData o) {
        return Double.compare(this.value, o.value);
    }
}

public class MannWhitney {
    public static void main(String[] args) {
        double[] groupA = {12, 15, 14, 11, 45};
        double[] groupB = {22, 24, 25, 28, 26};

        List<RankData> combined = new ArrayList<>();
        for (double v : groupA) combined.add(new RankData(v, "A"));
        for (double v : groupB) combined.add(new RankData(v, "B"));

        Collections.sort(combined);

        // Assign ranks (simplified, no tie handling for brevity)
        double sumRankA = 0;
        for (int i = 0; i < combined.size(); i++) {
            combined.get(i).rank = i + 1;
            if (combined.get(i).group.equals("A")) {
                sumRankA += combined.get(i).rank;
            }
        }

        int nA = groupA.length;
        // U = R - (n(n+1))/2
        double uA = sumRankA - (nA * (nA + 1)) / 2.0;

        // We usually take min(uA, uB), but uA is sufficient for one-sided check
        // Or calculate uB = nA*nB - uA
        int nB = groupB.length;
        double uB = (nA * nB) - uA;
        double u = Math.min(uA, uB);

        System.out.println("Mann-Whitney U statistic: " + u);
    }
}

Go

package main

import (
	"fmt"
	"sort"
)

type RankData struct {
	Value float64
	Group string
	Rank  float64
}

func main() {
	groupA := []float64{12, 15, 14, 11, 45}
	groupB := []float64{22, 24, 25, 28, 26}

	var combined []RankData
	for _, v := range groupA {
		combined = append(combined, RankData{Value: v, Group: "A"})
	}
	for _, v := range groupB {
		combined = append(combined, RankData{Value: v, Group: "B"})
	}

	// Sort
	sort.Slice(combined, func(i, j int) bool {
		return combined[i].Value < combined[j].Value
	})

	// Assign Ranks and Sum
	sumRankA := 0.0
	for i := range combined {
		rank := float64(i + 1)
		combined[i].Rank = rank
		if combined[i].Group == "A" {
			sumRankA += rank
		}
	}

	nA := float64(len(groupA))
	nB := float64(len(groupB))

	uA := sumRankA - (nA * (nA + 1)) / 2.0
	uB := (nA * nB) - uA

	u := uA
	if uB < uA {
		u = uB
	}

	fmt.Printf("Mann-Whitney U statistic: %.1f\n", u)
}

[!IMPORTANT] Power Trade-off: Non-parametric tests are generally less powerful than parametric tests if the data is actually Normal. This means they are less likely to detect a real effect when one exists. Only use them when the assumptions of parametric tests are violated.