Kernel Density Estimation (KDE)

Histograms are the bread and butter of data visualization. But they have a fatal flaw: Binning Bias.

Depending on where you start your bins and how wide they are, the same dataset can look completely different.

Kernel Density Estimation (KDE) is a non-parametric way to estimate the probability density function (PDF) of a random variable. Think of it as a “smooth histogram” that doesn’t depend on arbitrary bin edges.

Real-World Analogy: Heatmaps vs. Pins Imagine you are a city planner mapping out coffee shops.

A Histogram is like dividing the city into rigid ZIP code grids and counting coffee shops per ZIP code. If a coffee shop is exactly on the border, it randomly gets thrown into one bin. The map’s “hot spots” change completely if you shift the grid slightly.

A KDE is like placing a glowing heat lamp over every single coffee shop. The heat lamps overlap. Where there are many coffee shops, the heat adds up to form a bright, continuous “hot zone.” There are no artificial borders.

1. The Mechanic: Stacking Kernels

Let’s trace this visually with an example. Suppose we measure the waiting times (in minutes) at a coffee shop during rush hour: [2, 5, 7, 8, 12].

Place a Kernel: Instead of dropping each point into a bin, place a smooth curve (a “kernel”, usually a Gaussian bell curve) centered directly on top of each data point. For example, place a curve centered at x = 2, another at x = 5, etc.
Sum Them Up: Add the height of all these individual curves together at every point along the x-axis. Since 7 and 8 are close together, their individual curves will overlap significantly, creating a high peak in the combined density estimate around x = 7.5. Conversely, the gap between 2 and 5 will result in a dip.
Normalize: Divide by the number of points (and the bandwidth) so the total area under the curve equals 1.

The resulting curve is your Density Estimate.

2. Interactive: The KDE Smoother

Adjust the Bandwidth slider below to see how it affects the estimated density.

Low Bandwidth: The curve is “spiky” and fits every data point (Overfitting).
High Bandwidth: The curve is “flat” and washes out the details (Underfitting).

Bandwidth (h): 1.0

The dashed lines are individual kernels. The blue line is their sum (the KDE).

3. The Mathematics

The standard formula for KDE is:

f̂ (x) = (1 / nh) Σ i = 1 n K ( (x - x i) / h)

Where:

n: Number of data points.
h: Bandwidth (smoothing parameter).
K: Kernel function (must integrate to 1).
x_i: The data points.

Bandwidth Selection

Choosing h is the “secret sauce” of KDE.

Scott’s Rule or Silverman’s Rule: Common heuristics used by default in software like SciPy and Seaborn. They try to minimize the mean integrated squared error (MISE).
If h is too small, the noise in the data is modeled as structure (high variance).
If h is too large, the structure in the data is washed out (high bias).

4. Implementation Examples

Python (SciPy)

If you need the actual probability density values:

from scipy.stats import gaussian_kde
import numpy as np

data = np.array([2, 5, 7, 8, 12])
kde = gaussian_kde(data)

# Evaluate density at specific points
# This automatically selects bandwidth using Scott's Rule
print(f"Density at x=5: {kde(5)[0]:.4f}")

Java

We can implement the Gaussian kernel summation manually.

public class KDE {
  // Gaussian Kernel Function
  public static double gaussian(double x, double mean, double bandwidth) {
    return (1.0 / (bandwidth * Math.sqrt(2 * Math.PI))) *
       Math.exp(-0.5 * Math.pow((x - mean) / bandwidth, 2));
  }

  public static void main(String[] args) {
    double[] data = {2, 5, 7, 8, 12};
    double bandwidth = 1.0; // Fixed bandwidth for simplicity
    double x = 5.0; // Point to evaluate

    double density = 0;
    for (double val : data) {
      density += gaussian(x, val, bandwidth);
    }
    density /= data.length;

    System.out.printf("Density at x=%.1f: %.4f%n", x, density);
  }
}

Go

package main

import (
  "fmt"
  "math"
)

func gaussian(x, mean, bandwidth float64) float64 {
  return (1.0 / (bandwidth * math.Sqrt(2*math.Pi))) *
    math.Exp(-0.5*math.Pow((x-mean)/bandwidth, 2))
}

func main() {
  data := []float64{2, 5, 7, 8, 12}
  bandwidth := 1.0
  x := 5.0

  density := 0.0
  for _, val := range data {
    density += gaussian(x, val, bandwidth)
  }
  density /= float64(len(data))

  fmt.Printf("Density at x=%.1f: %.4f\n", x, density)
}

5. Summary

Feature	Histogram	KDE
Type	Discrete (Bars)	Continuous (Curve)
Parameter	Bin Count / Width	Bandwidth
Shape	Rough / Blocky	Smooth
Sensitivity	High (Bin Edges)	Moderate (Bandwidth)

[!TIP] When to use KDE? Use it when you want to compare the shapes of multiple distributions on the same plot. Overlapping histograms are messy; overlapping KDE lines are elegant.

Kernel Density Estimation