Playing by Rules: Lagrange Multipliers

[!NOTE] This module explores the core principles of Playing by Rules: Lagrange Multipliers, deriving solutions from first principles and hardware constraints to build world-class, production-ready expertise.

1. Introduction: The Fence

In unconstrained optimization, we just want to find the lowest point x where \nabla f(x) = 0. But what if we are not allowed to go everywhere? What if we must stay on a path, or inside a specific region?

Example: Minimize cost (f), but protein intake must be \ge 50g (g).
Example: Maximize Entropy (f), but probabilities must sum to 1 (g).
Example: Train a Neural Network, but weights must be sparse (L₁ regularization).

This is Constrained Optimization.

2. The Intuition: Tangency

Imagine you are hiking up a hill f(x,y). You want to reach the highest point possible, but you are forced to walk along a specific trail defined by g(x,y)=0.

You walk along the trail.
As long as the trail cuts across the contour lines of the hill, you can keep moving higher.
- If you cross a contour line, it means you are moving from a lower elevation to a higher elevation.
You stop when the trail runs parallel to the contour lines.
- If you move forward or backward along the trail now, you don’t go up or down. You are at a local maximum on the path.

At this optimal point, the gradient of the hill \nabla f and the gradient of the trail \nabla g point in the same (or opposite) direction! They are parallel.

3. The Lagrange Multiplier (\lambda)

Since the gradients are parallel, one must be a scalar multiple of the other:

∇f = λ ∇g

Here, \lambda is the Lagrange Multiplier.

It represents the “shadow price” of the constraint.
If \lambda is large, the constraint is pulling you hard away from the true unconstrained maximum.
If \lambda = 0, the constraint is irrelevant (the peak is already on the path).

The Lagrangian Function

We package this into a single function to minimize:

L(x, y, λ) = f(x, y) - λ(g(x, y) - c)

Finding where \nabla L = 0 gives us the optimal point satisfying the constraint.

4. KKT Conditions & Inequality Constraints

For inequality constraints (g(x) \le 0), things get slightly more complex. We use the Karush-Kuhn-Tucker (KKT) conditions.

Basically, if the unconstrained minimum is inside the allowed region, the constraint doesn’t matter (\lambda = 0). If it’s outside, the solution lies on the boundary (\lambda > 0).

Projected Gradient Descent

In Deep Learning (e.g., Adversarial Attacks), we often enforce constraints by:

Taking a gradient step.
Projecting the result back into the valid set (e.g., clipping values to [0, 1]).

5. Python: Constrained Optimization with Scipy

We don’t usually write Lagrangian solvers by hand. We use scipy.optimize.

import numpy as np
from scipy.optimize import minimize

def optimization_example():
    # Objective: Minimize x^2 + y^2 (Parabola)
    fun = lambda x: x[0]**2 + x[1]**2

    # Constraint: x + y = 1
    # Rewrite as: x + y - 1 = 0
    cons = ({'type': 'eq', 'fun': lambda x: x[0] + x[1] - 1})

    # Initial guess
    x0 = [2.0, 2.0]

    res = minimize(fun, x0, constraints=cons)

    print(f"Optimal Solution: {res.x}")
    # Expected: [0.5, 0.5] (Closest point to origin on line x+y=1)

if __name__ == "__main__":
    optimization_example()

6. Interactive Visualizer: The Constrained Climber

Find the optimal point on the Blue Constraint Circle.

Contour Lines: Represent the hill height (Objective Function f(x,y) = x+y). Brighter is higher.
Red Arrow: Gradient of Objective (\nabla f). Always points uphill (North-East).
Green Arrow: Gradient of Constraint (\nabla g). Always points perpendicular to the circle surface.
Goal: Drag the yellow climber until the Red and Green arrows align perfectly.

Optimal Alignment! (∇f || ∇g)

Click or Drag along the Blue Circle to move the climber.

7. Summary

Constraint: A rule that restricts the valid search space.
Tangency Condition: The optimal point occurs where the constraint boundary runs parallel to the objective’s contour lines.
Lagrange Multipliers: \nabla f = \lambda \nabla g. This equation mathematically captures the tangency condition.
Shadow Price: \lambda tells us how much the objective f would improve if we relaxed the constraint g by a tiny bit.