Playing by Rules: Lagrange Multipliers

[!NOTE] This module explores the core principles of Playing by Rules: Lagrange Multipliers, deriving solutions from first principles and hardware constraints to build world-class, production-ready expertise.

1. Introduction: The Fence

In unconstrained optimization, we just want to find the lowest point x where \nabla f(x) = 0. But what if we are not allowed to go everywhere? What if we must stay on a path, or inside a specific region?

  • Example: Minimize cost (f), but protein intake must be \ge 50g (g).
  • Example: Maximize Entropy (f), but probabilities must sum to 1 (g).
  • Example: Train a Neural Network, but weights must be sparse (L1 regularization).

This is Constrained Optimization.


2. The Intuition: Tangency

Imagine you are hiking up a hill f(x,y). You want to reach the highest point possible, but you are forced to walk along a specific trail defined by g(x,y)=0.

  1. You walk along the trail.
  2. As long as the trail cuts across the contour lines of the hill, you can keep moving higher.
    • If you cross a contour line, it means you are moving from a lower elevation to a higher elevation.
  3. You stop when the trail runs parallel to the contour lines.
    • If you move forward or backward along the trail now, you don’t go up or down. You are at a local maximum on the path.

At this optimal point, the gradient of the hill \nabla f and the gradient of the trail \nabla g point in the same (or opposite) direction! They are parallel.


3. The Lagrange Multiplier (\lambda)

Since the gradients are parallel, one must be a scalar multiple of the other:

∇f = λ ∇g

Here, \lambda is the Lagrange Multiplier.

  • It represents the “shadow price” of the constraint.
  • If \lambda is large, the constraint is pulling you hard away from the true unconstrained maximum.
  • If \lambda = 0, the constraint is irrelevant (the peak is already on the path).

The Lagrangian Function

We package this into a single function to minimize:

L(x, y, λ) = f(x, y) - λ(g(x, y) - c)

Finding where \nabla L = 0 gives us the optimal point satisfying the constraint.


4. KKT Conditions & Inequality Constraints

For inequality constraints (g(x) \le 0), things get slightly more complex. We use the Karush-Kuhn-Tucker (KKT) conditions.

Basically, if the unconstrained minimum is inside the allowed region, the constraint doesn’t matter (\lambda = 0). If it’s outside, the solution lies on the boundary (\lambda > 0).

Projected Gradient Descent

In Deep Learning (e.g., Adversarial Attacks), we often enforce constraints by:

  1. Taking a gradient step.
  2. Projecting the result back into the valid set (e.g., clipping values to [0, 1]).

5. Python: Constrained Optimization with Scipy

We don’t usually write Lagrangian solvers by hand. We use scipy.optimize.

import numpy as np
from scipy.optimize import minimize

def optimization_example():
    # Objective: Minimize x^2 + y^2 (Parabola)
    fun = lambda x: x[0]**2 + x[1]**2

    # Constraint: x + y = 1
    # Rewrite as: x + y - 1 = 0
    cons = ({'type': 'eq', 'fun': lambda x: x[0] + x[1] - 1})

    # Initial guess
    x0 = [2.0, 2.0]

    res = minimize(fun, x0, constraints=cons)

    print(f"Optimal Solution: {res.x}")
    # Expected: [0.5, 0.5] (Closest point to origin on line x+y=1)

if __name__ == "__main__":
    optimization_example()

6. Interactive Visualizer: The Constrained Climber

Find the optimal point on the Blue Constraint Circle.

  • Contour Lines: Represent the hill height (Objective Function f(x,y) = x+y). Brighter is higher.
  • Red Arrow: Gradient of Objective (\nabla f). Always points uphill (North-East).
  • Green Arrow: Gradient of Constraint (\nabla g). Always points perpendicular to the circle surface.
  • Goal: Drag the yellow climber until the Red and Green arrows align perfectly.
Optimal Alignment! (∇f || ∇g)
Click or Drag along the Blue Circle to move the climber.

7. Summary

  • Constraint: A rule that restricts the valid search space.
  • Tangency Condition: The optimal point occurs where the constraint boundary runs parallel to the objective’s contour lines.
  • Lagrange Multipliers: \nabla f = \lambda \nabla g. This equation mathematically captures the tangency condition.
  • Shadow Price: \lambda tells us how much the objective f would improve if we relaxed the constraint g by a tiny bit.