Binomial Theorem
The Binomial Theorem provides a quick way to expand expressions raised to a power, like (x+y)n. While simple algebra works for small n, it becomes tedious quickly.
- (x+y)1 = x + y
- (x+y)2 = x2 + 2xy + y2
- (x+y)3 = x3 + 3x2y + 3xy2 + y3
Notice a pattern? The coefficients (1, 2, 1) and (1, 3, 3, 1) look familiar. They come directly from Combinations.
1. The Formula
For any positive integer n:
(x + y)n = Σk=0n C(n, k) xn-k yk
Where C(n, k) (often written as nCk) is the binomial coefficient:
C(n, k) = n! / (k!(n-k)!)
This means the coefficient of the term xn-kyk is simply the number of ways to choose k terms to be y from the n factors of (x+y).
2. Pascal’s Triangle
Pascal’s Triangle is a geometric arrangement of binomial coefficients.
- Each number is the sum of the two numbers directly above it.
- Row n contains the coefficients for (x+y)n.
- Property: C(n, k) = C(n-1, k-1) + C(n-1, k)
Interactive Pascal's Triangle
Hover over any number to see how it's formed.
3. Implementation Examples
Java
Calculating Binomial Coefficients efficiently avoids computing large factorials directly when possible.
import java.math.BigInteger;
public class Binomial {
// Calculate C(n, k)
public static BigInteger nCr(int n, int k) {
if (k > n || k < 0) return BigInteger.ZERO;
if (k == 0 || k == n) return BigInteger.ONE;
if (k > n / 2) k = n - k; // Symmetry optimization
BigInteger res = BigInteger.ONE;
for (int i = 1; i <= k; i++) {
res = res.multiply(BigInteger.valueOf(n - i + 1))
.divide(BigInteger.valueOf(i));
}
return res;
}
public static void main(String[] args) {
int n = 10;
int k = 2;
System.out.println("C(" + n + ", " + k + ") = " + nCr(n, k)); // 45
}
}
Go
package main
import (
"fmt"
"math/big"
)
// nCr calculates Binomial Coefficient
func nCr(n, k int64) *big.Int {
if k > n || k < 0 {
return big.NewInt(0)
}
// C(n, k) = n! / (k! * (n-k)!)
// Using math/big Binomial function directly if available or implementing it
// big.Int has a Binomial method since Go 1.10
res := new(big.Int)
return res.Binomial(n, k)
}
func main() {
var n, k int64 = 10, 2
fmt.Printf("C(%d, %d) = %s\n", n, k, nCr(n, k)) // 45
}
Python (Scientific Computing)
We can use scipy.special.comb (which handles large numbers better than manual factorials) or math.comb.
import math
n = 5
print(f"Expanding (x+y)^{n}:")
# Generate coefficients for row n
coefficients = []
for k in range(n + 1):
coef = math.comb(n, k)
coefficients.append(coef)
print(f"Coefficients: {coefficients}")
# Output: [1, 5, 10, 10, 5, 1]
4. Real-World ML Application: Polynomial Kernels
In Machine Learning, we often need to transform data into higher dimensions to make it linearly separable. This is common in Support Vector Machines (SVMs) and Polynomial Regression.
If you have a dataset with 2 features (x1, x2) and apply a polynomial expansion of degree 2, you generate new features based on the terms of (x1 + x2 + 1)2.
Hardware Reality: The Curse of Dimensionality
When you perform a polynomial expansion, the number of features explodes combinatorially. For d input features and degree p, the number of output features N grows rapidly.
The formula for the number of terms in a polynomial expansion of degree p with d variables (including bias) is:
N = C(d + p, p)
Let’s look at the numbers:
| Original Features (d) | Degree (p) | Expanded Features (N) | Memory (Float32) |
|---|---|---|---|
| 10 (Small dataset) | 2 | C(12, 2) = 66 | ~264 bytes |
| 10 | 3 | C(13, 3) = 286 | ~1 KB |
| 100 (Image 10x10) | 2 | C(102, 2) = 5,151 | ~20 KB |
| 100 | 3 | C(103, 3) = 176,851 | ~700 KB |
| 1000 (Image 32x32) | 3 | C(1003, 3) ≈ 167 Million | ~668 MB |
[!CAUTION] Explosion Hazard: Increasing the polynomial degree just slightly (e.g., from 2 to 3) can turn a manageable dataset into one that crashes your RAM. This huge increase in dimensions is known as the Curse of Dimensionality.
The Kernel Trick
So how do we use high-dimensional spaces without running out of RAM? We use the Kernel Trick.
The core idea is that many algorithms (like SVMs) only depend on the dot product between data points, not the points themselves.
Instead of expanding vector x into a massive vector φ(x) and then computing φ(x) ċ φ(y), we compute a kernel function K(x, y) that equals that dot product but is computed in the original low-dimensional space.
For a polynomial kernel: K(x, y) = (x ċ y + c)p
This computation takes O(d) time (linear in original features), whereas explicit expansion takes O(dp).
[!TIP] Intuition: We are calculating the geometric relationship (angle/distance) in the high-dimensional space without ever actually visiting that space. It’s like measuring the distance between two stars using a telescope instead of flying there.