11
Oct 17

Unconstrained optimization on the plane 1

Unconstrained optimization on the plane: necessary condition

See a very simple geometric discussion of the one-dimensional case. It reveals the Taylor decomposition as the main research tool. Therefore we give the Taylor decomposition in a 2D case. Assuming that the reader has familiarized him/herself with that information, we go directly to the decomposition

(1) f(x+h)\approx f(x)+Df(x)h+\frac{1}{2}h^TD^2f(x)h.

Here f is a twice-differentiable function, x is an internal point of the domain D(f)h is a small vector such that x+h also belongs to the domain,

Df(x)=\left(\frac{\partial f(x)}{\partial x_1},\frac{\partial f(x)}{\partial x_2}\right)

is a row vector of first derivatives, and

D^2f(x)=\left(\begin{array}{cc}\frac{\partial^2f(x)}{\partial x_1^2} &\frac{\partial^2f(x)}{\partial x_1\partial x_2}\\\frac{\partial^2f(x)}{\partial x_1\partial x_2}&\frac{\partial^2f(x)}{\partial x_2^2}\end{array}\right)
is the Hessian (a matrix of second-order derivatives). T stands for transposition.

When there is no local minimum or maximum?

We have seen how reduction to a 1D case can be used to study a 2D case. A similar trick is applied here. Let us represent the vector h as h=tg where g is another vector (to be defined later) and t is a small real parameter. Then x+h=x+tg will be close to x. From (1) we get

(2) f(x+tg)\approx f(x)+t[Df(x)g]+t^2[\frac{1}{2}g^TD^2f(x)g].

We think of g as fixed, so the two expressions in square brackets are fixed numbers. Denote a=Df(x). An important observation is that

When t tends to zero, t^2 tends to zero even faster.

Therefore the last term in (2) is smaller than the second, and from (2) we obtain

(3) f(x+tg)\approx f(x)+tag.

The no-extremes case. Suppose the vector of first derivatives is not zero: a\ne 0, which means that

(4) at least one of the numbers a_1,\ a_2 is different from zero.

Select g_1=a_1,\ g_2=a_2. Then (3) implies

(5) f(x+tg)\approx f(x)+t(a_1^2+a_2^2).

From (4) it follows that a_1^2+a_2^2>0. Then (5) shows that x cannot be an extreme point. Indeed, for small positive t we have f(x+tg)\approx f(x)+t(a_1^2+a_2^2)>f(x) and for small negative t we have f(x+tg)\approx f(x)+t(a_1^2+a_2^2)<f(x). In any neighborhood of x  the values of f can be both higher and lower than f(x).

Conclusion. In case (4) x cannot be a local minimum or maximum. In other words, we should look for local extrema among critical points which satisfy the first order condition

\frac{\partial f(x)}{\partial x_1}=0,\ \frac{\partial f(x)}{\partial x_2}=0.

The FOC is necessary for a function to have a local minimum or maximum. All of the above easily generalizes to dimensions higher than 2.