Binomial Theorem

The Binomial Theorem provides a quick way to expand expressions raised to a power, like (x+y)ⁿ. While simple algebra works for small n, it becomes tedious quickly.

(x+y)¹ = x + y
(x+y)² = x² + 2xy + y²
(x+y)³ = x³ + 3x²y + 3xy² + y³

Notice a pattern? The coefficients (1, 2, 1) and (1, 3, 3, 1) look familiar. They come directly from Combinations.

1. The Formula

For any positive integer n:

(x + y)ⁿ = Σ_k=0ⁿ C(n, k) x^n-k y^k

Where C(n, k) (often written as _nC_k) is the binomial coefficient:

C(n, k) = n! / (k!(n-k)!)

This means the coefficient of the term x^n-ky^k is simply the number of ways to choose k terms to be y from the n factors of (x+y).

2. Pascal’s Triangle

Pascal’s Triangle is a geometric arrangement of binomial coefficients.

Each number is the sum of the two numbers directly above it.
Row n contains the coefficients for (x+y)ⁿ.
Property: C(n, k) = C(n-1, k-1) + C(n-1, k)

Interactive Pascal's Triangle

Hover over any number to see how it's formed.

Target Parents (Sum)

3. Implementation Examples

Java

Calculating Binomial Coefficients efficiently avoids computing large factorials directly when possible.

import java.math.BigInteger;

public class Binomial {

    // Calculate C(n, k)
    public static BigInteger nCr(int n, int k) {
        if (k > n || k < 0) return BigInteger.ZERO;
        if (k == 0 || k == n) return BigInteger.ONE;
        if (k > n / 2) k = n - k; // Symmetry optimization

        BigInteger res = BigInteger.ONE;
        for (int i = 1; i <= k; i++) {
            res = res.multiply(BigInteger.valueOf(n - i + 1))
                     .divide(BigInteger.valueOf(i));
        }
        return res;
    }

    public static void main(String[] args) {
        int n = 10;
        int k = 2;
        System.out.println("C(" + n + ", " + k + ") = " + nCr(n, k)); // 45
    }
}

Go

package main

import (
	"fmt"
	"math/big"
)

// nCr calculates Binomial Coefficient
func nCr(n, k int64) *big.Int {
	if k > n || k < 0 {
		return big.NewInt(0)
	}

	// C(n, k) = n! / (k! * (n-k)!)
	// Using math/big Binomial function directly if available or implementing it
	// big.Int has a Binomial method since Go 1.10
	res := new(big.Int)
	return res.Binomial(n, k)
}

func main() {
	var n, k int64 = 10, 2
	fmt.Printf("C(%d, %d) = %s\n", n, k, nCr(n, k)) // 45
}

Python (Scientific Computing)

We can use scipy.special.comb (which handles large numbers better than manual factorials) or math.comb.

import math

n = 5
print(f"Expanding (x+y)^{n}:")

# Generate coefficients for row n
coefficients = []
for k in range(n + 1):
    coef = math.comb(n, k)
    coefficients.append(coef)

print(f"Coefficients: {coefficients}")
# Output: [1, 5, 10, 10, 5, 1]

4. Real-World ML Application: Polynomial Kernels

In Machine Learning, we often need to transform data into higher dimensions to make it linearly separable. This is common in Support Vector Machines (SVMs) and Polynomial Regression.

If you have a dataset with 2 features (x₁, x₂) and apply a polynomial expansion of degree 2, you generate new features based on the terms of (x₁ + x₂ + 1)².

Hardware Reality: The Curse of Dimensionality

When you perform a polynomial expansion, the number of features explodes combinatorially. For d input features and degree p, the number of output features N grows rapidly.

The formula for the number of terms in a polynomial expansion of degree p with d variables (including bias) is:

N = C(d + p, p)

Let’s look at the numbers:

Original Features (d)	Degree (p)	Expanded Features (N)	Memory (Float32)
10 (Small dataset)	2	C(12, 2) = 66	~264 bytes
10	3	C(13, 3) = 286	~1 KB
100 (Image 10x10)	2	C(102, 2) = 5,151	~20 KB
100	3	C(103, 3) = 176,851	~700 KB
1000 (Image 32x32)	3	C(1003, 3) ≈ 167 Million	~668 MB

[!CAUTION] Explosion Hazard: Increasing the polynomial degree just slightly (e.g., from 2 to 3) can turn a manageable dataset into one that crashes your RAM. This huge increase in dimensions is known as the Curse of Dimensionality.

The Kernel Trick

So how do we use high-dimensional spaces without running out of RAM? We use the Kernel Trick.

The core idea is that many algorithms (like SVMs) only depend on the dot product between data points, not the points themselves.

Instead of expanding vector x into a massive vector φ(x) and then computing φ(x) &cdot; φ(y), we compute a kernel function K(x, y) that equals that dot product but is computed in the original low-dimensional space.

For a polynomial kernel: K(x, y) = (x &cdot; y + c)^p

This computation takes O(d) time (linear in original features), whereas explicit expansion takes O(d^p).

[!TIP] Intuition: We are calculating the geometric relationship (angle/distance) in the high-dimensional space without ever actually visiting that space. It’s like measuring the distance between two stars using a telescope instead of flying there.