Playing by Rules: Lagrange Multipliers
[!NOTE] This module explores the core principles of Playing by Rules: Lagrange Multipliers, deriving solutions from first principles and hardware constraints to build world-class, production-ready expertise.
1. Introduction: The Fence
In unconstrained optimization, we just want to find the lowest point x where \nabla f(x) = 0. But what if we are not allowed to go everywhere? What if we must stay on a path, or inside a specific region?
- Example: Minimize cost (f), but protein intake must be \ge 50g (g).
- Example: Maximize Entropy (f), but probabilities must sum to 1 (g).
- Example: Train a Neural Network, but weights must be sparse (L1 regularization).
This is Constrained Optimization.
2. The Intuition: Tangency
Imagine you are hiking up a hill f(x,y). You want to reach the highest point possible, but you are forced to walk along a specific trail defined by g(x,y)=0.
- You walk along the trail.
- As long as the trail cuts across the contour lines of the hill, you can keep moving higher.
- If you cross a contour line, it means you are moving from a lower elevation to a higher elevation.
- You stop when the trail runs parallel to the contour lines.
- If you move forward or backward along the trail now, you don’t go up or down. You are at a local maximum on the path.
At this optimal point, the gradient of the hill \nabla f and the gradient of the trail \nabla g point in the same (or opposite) direction! They are parallel.
3. The Lagrange Multiplier (\lambda)
Since the gradients are parallel, one must be a scalar multiple of the other:
Here, \lambda is the Lagrange Multiplier.
- It represents the “shadow price” of the constraint.
- If \lambda is large, the constraint is pulling you hard away from the true unconstrained maximum.
- If \lambda = 0, the constraint is irrelevant (the peak is already on the path).
The Lagrangian Function
We package this into a single function to minimize:
Finding where \nabla L = 0 gives us the optimal point satisfying the constraint.
4. KKT Conditions & Inequality Constraints
For inequality constraints (g(x) \le 0), things get slightly more complex. We use the Karush-Kuhn-Tucker (KKT) conditions.
Basically, if the unconstrained minimum is inside the allowed region, the constraint doesn’t matter (\lambda = 0). If it’s outside, the solution lies on the boundary (\lambda > 0).
Projected Gradient Descent
In Deep Learning (e.g., Adversarial Attacks), we often enforce constraints by:
- Taking a gradient step.
- Projecting the result back into the valid set (e.g., clipping values to [0, 1]).
5. Python: Constrained Optimization with Scipy
We don’t usually write Lagrangian solvers by hand. We use scipy.optimize.
import numpy as np
from scipy.optimize import minimize
def optimization_example():
# Objective: Minimize x^2 + y^2 (Parabola)
fun = lambda x: x[0]**2 + x[1]**2
# Constraint: x + y = 1
# Rewrite as: x + y - 1 = 0
cons = ({'type': 'eq', 'fun': lambda x: x[0] + x[1] - 1})
# Initial guess
x0 = [2.0, 2.0]
res = minimize(fun, x0, constraints=cons)
print(f"Optimal Solution: {res.x}")
# Expected: [0.5, 0.5] (Closest point to origin on line x+y=1)
if __name__ == "__main__":
optimization_example()
6. Interactive Visualizer: The Constrained Climber
Find the optimal point on the Blue Constraint Circle.
- Contour Lines: Represent the hill height (Objective Function f(x,y) = x+y). Brighter is higher.
- Red Arrow: Gradient of Objective (\nabla f). Always points uphill (North-East).
- Green Arrow: Gradient of Constraint (\nabla g). Always points perpendicular to the circle surface.
- Goal: Drag the yellow climber until the Red and Green arrows align perfectly.
7. Summary
- Constraint: A rule that restricts the valid search space.
- Tangency Condition: The optimal point occurs where the constraint boundary runs parallel to the objective’s contour lines.
- Lagrange Multipliers: \nabla f = \lambda \nabla g. This equation mathematically captures the tangency condition.
- Shadow Price: \lambda tells us how much the objective f would improve if we relaxed the constraint g by a tiny bit.