Kernel Density Estimation (KDE)

Histograms are the bread and butter of data visualization. But they have a fatal flaw: Binning Bias.

Depending on where you start your bins and how wide they are, the same dataset can look completely different.

Kernel Density Estimation (KDE) is a non-parametric way to estimate the probability density function (PDF) of a random variable. Think of it as a “smooth histogram” that doesn’t depend on arbitrary bin edges.

1. The Mechanic: Stacking Kernels

Imagine you have 5 data points: [2, 5, 7, 8, 12].

Place a Kernel: Instead of dropping each point into a bin, place a smooth curve (a “kernel”, usually a Gaussian bell curve) centered directly on top of each data point.
Sum Them Up: Add the height of all these individual curves together at every point along the x-axis.
Normalize: Divide by the number of points (and the bandwidth) so the total area under the curve equals 1.

The resulting curve is your Density Estimate.

2. Interactive: The KDE Smoother

Adjust the Bandwidth slider below to see how it affects the estimated density.

Low Bandwidth: The curve is “spiky” and fits every data point (Overfitting).
High Bandwidth: The curve is “flat” and washes out the details (Underfitting).

Bandwidth (h): 1.0

The dashed lines are individual kernels. The blue line is their sum (the KDE).

3. The Mathematics

The standard formula for KDE is:

f̂ (x) = (1 / nh) Σ i = 1 n K ( (x - x i) / h)

Where:

n: Number of data points.
h: Bandwidth (smoothing parameter).
K: Kernel function (must integrate to 1).
x_i: The data points.

Bandwidth Selection

Choosing h is the “secret sauce” of KDE.

Scott’s Rule or Silverman’s Rule: Common heuristics used by default in software like SciPy and Seaborn. They try to minimize the mean integrated squared error (MISE).
If h is too small, the noise in the data is modeled as structure (high variance).
If h is too large, the structure in the data is washed out (high bias).

4. Implementation Examples

Python (SciPy)

If you need the actual probability density values:

from scipy.stats import gaussian_kde
import numpy as np

data = np.array([2, 5, 7, 8, 12])
kde = gaussian_kde(data)

# Evaluate density at specific points
# This automatically selects bandwidth using Scott's Rule
print(f"Density at x=5: {kde(5)[0]:.4f}")

Java

We can implement the Gaussian kernel summation manually.

public class KDE {
    // Gaussian Kernel Function
    public static double gaussian(double x, double mean, double bandwidth) {
        return (1.0 / (bandwidth * Math.sqrt(2 * Math.PI))) *
               Math.exp(-0.5 * Math.pow((x - mean) / bandwidth, 2));
    }

    public static void main(String[] args) {
        double[] data = {2, 5, 7, 8, 12};
        double bandwidth = 1.0; // Fixed bandwidth for simplicity
        double x = 5.0; // Point to evaluate

        double density = 0;
        for (double val : data) {
            density += gaussian(x, val, bandwidth);
        }
        density /= data.length;

        System.out.printf("Density at x=%.1f: %.4f%n", x, density);
    }
}

Go

package main

import (
	"fmt"
	"math"
)

func gaussian(x, mean, bandwidth float64) float64 {
	return (1.0 / (bandwidth * math.Sqrt(2*math.Pi))) *
		math.Exp(-0.5*math.Pow((x-mean)/bandwidth, 2))
}

func main() {
	data := []float64{2, 5, 7, 8, 12}
	bandwidth := 1.0
	x := 5.0

	density := 0.0
	for _, val := range data {
		density += gaussian(x, val, bandwidth)
	}
	density /= float64(len(data))

	fmt.Printf("Density at x=%.1f: %.4f\n", x, density)
}

5. Summary

Feature	Histogram	KDE
Type	Discrete (Bars)	Continuous (Curve)
Parameter	Bin Count / Width	Bandwidth
Shape	Rough / Blocky	Smooth
Sensitivity	High (Bin Edges)	Moderate (Bandwidth)

[!TIP] When to use KDE? Use it when you want to compare the shapes of multiple distributions on the same plot. Overlapping histograms are messy; overlapping KDE lines are elegant.