29
Oct 17

The Kuhn-Tucker theorem ins and outs

We capitalize on previous posts on constrained optimization to explain the format of the Kuhn-Tucker theorem. So far all proofs for one constraint have been done in full. Here we indicate how they can be generalized to the case of many constraints, without going into technicalities.

Problem statement

Consider the maximization problem with several inequality constraints

(1) maximize $f(x)$ subject to $h_1(x)\geq 0,...,h_l(x)\geq 0$.

Suppose $x$ is a maximizing point and $k$ out of $l$ constraints are binding. Renumbering the constraints, if necessary, we can assume that it is the first $k$ constraints that are binding:

(2) $h_1(x)=0,...,h_k(x)=0,\ h_{k+1}(x)>0,...,h_l(x)>0.$

The full Lagrangian is defined by $L(x,\lambda)=f(x)+\lambda H(x)$ where

$H(x)=\left(\begin{array}{c}h_1(x) \\...\\h_l(x)\end{array}\right) ,~\lambda =(\lambda _1,...,\lambda_l).$

The purpose is to formulate conditions which the Lagrangian, Lagrangian multipliers and constraints should satisfy.

The implicit function existence

Let us separate the equality constraints by including them in

$H_k(x)=\left(\begin{array}{c}h_1(x)\\...\\h_k(x)\end{array}\right).$

Equality constraints are supposed to determine some of the variables as implicit functions of the others. The number of the implicit functions should be equal to the number of constraints. This results in the condition that the matrix

$\frac{\partial H_k}{\partial x}=\left(\begin{array}{ccc}\frac{\partial h_1}{\partial x_1}&...& \frac{\partial h_1}{\partial x_n}\\...&...&...\\ \frac{\partial h_k}{\partial x_1}&...&\frac{\partial h_k}{\partial x_n}\end{array}\right)$

should have rank $k.$ The number of "the other variables" should be positive, which leads to the condition $k

Reduction of inequality-constrained problem to equality-constrained one

If you compare my posts with the book by Sundaram, R.K. (A First Course in Optimisation Theory, Cambridge: Cambridge University Press, 1996) or the MT3042 Optimisation Theory Guide by M. Balcovic, you will notice that they assume the domain of $f$ to be an open set, while I everywhere implicitly assumed it to be the whole $R^n.$ This difference doesn't matter, as long as the extreme point is an internal point of the domain, which I also implicitly assumed. Another assumption that I have been skipping is that the functions have continuous derivatives of appropriate orders (one or two, depending on the context). Now I have to explicitly make and use such assumptions.

Smoothness conditions. The objective and constraint functions have continuous first derivatives.

Differentiable functions are known to be continuous. Therefore the set $U=\{x\in R^n:h_{k+1}(x)>0,...,h_l(x)>0\}$ determined by the nonbinding constraints is open. Since by assumption the maximizing point belongs to the set (2), it will not change if we replace the original domain of $f$, whatever it was, by $U$. What we have achieved by this trick is that the nonbinding constraints have been subsumed in the domain and can be dropped from the optimization problem. The resulting problem

(3) maximize $f(x)$ subject to $h_1(x)=0,...,h_k(x)=0$

has only equality constraints and admits application of the result we already have.

The Kuhn-Tucker theorem derived from constrained optimization

For (3) we form a partial Lagrangian $L_k(x,\lambda^{(k)})=f(x)+\lambda^{(k)}H_k(x)$ with $\lambda^{(k)}=(\lambda_1,...,\lambda_k).$ The first-order conditions are

(4) $\frac{\partial L_k}{\partial x}=0$, $\frac{\partial L_k}{\partial\lambda^{(k)}}=H_k(x)=0$.

Let us set the remaining components of $\lambda$ to zero:

(5) $\lambda_{k+1}=...=\lambda _l=0.$

For this reason the partial Lagrangian is the same as the full one, $L_k(x,\lambda^{(k)})=f(x)+\lambda^{(k)}H_{k}(x)=f(x)+\lambda H(x),$ and the first equation from (4) is the same as $\frac{\partial L}{\partial x}=0$ or, in terms of gradients,

(A) $Df(x)+\lambda_1Dh_1(x)+...+\lambda_lDh_l(x)=0.$

From (1), obviously,

(B) $h_1(x)\geq 0,...,h_l(x)\geq 0$.

Next we need to establish

(C) $\lambda _1h_1(x)=0,...,\lambda_lh_l(x)=0$.

The first $k$ equations follow from (2) and the last $k-l$ follow from (5). Finally, the Kuhn-Tucker theorem states that

(D) $\lambda_1\geq 0,...,\lambda _l\geq 0.$

The last $k-l$ inequalities follow from (5). The first $k$ inequalities actually express the property: If a constraint is efficient, then the Lagrange multiplier is nonnegative. This property was established by perturbing one equality constraint. Since in the present case each equality constraint can be perturbed independently of the others, the same proof goes through.

Kuhn-Tucker theorem. If the implicit function existence and smoothness conditions are satisfied, then an internal maximum $x$ must satisfy (A)-(D).

Remark. Karush, in his master's thesis, discovered this theorem 12 years before Kuhn and Tucker published it.