11
Oct 17

## Unconstrained optimization on the plane: necessary condition

See a very simple geometric discussion of the one-dimensional case. It reveals the Taylor decomposition as the main research tool. Therefore we give the Taylor decomposition in a 2D case. Assuming that the reader has familiarized him/herself with that information, we go directly to the decomposition

(1) $f(x+h)\approx f(x)+Df(x)h+\frac{1}{2}h^TD^2f(x)h.$

Here $f$ is a twice-differentiable function, $x$ is an internal point of the domain $D(f)$ $h$ is a small vector such that $x+h$ also belongs to the domain, $Df(x)=\left(\frac{\partial f(x)}{\partial x_1},\frac{\partial f(x)}{\partial x_2}\right)$ is a row vector of first derivatives, and $D^2f(x)=\left(\begin{array}{cc}\frac{\partial^2f(x)}{\partial x_1^2} &\frac{\partial^2f(x)}{\partial x_1\partial x_2}\\\frac{\partial^2f(x)}{\partial x_1\partial x_2}&\frac{\partial^2f(x)}{\partial x_2^2}\end{array}\right)$
is the Hessian (a matrix of second-order derivatives). $T$ stands for transposition.

## When there is no local minimum or maximum?

We have seen how reduction to a 1D case can be used to study a 2D case. A similar trick is applied here. Let us represent the vector $h$ as $h=tg$ where $g$ is another vector (to be defined later) and $t$ is a small real parameter. Then $x+h=x+tg$ will be close to $x$. From (1) we get

(2) $f(x+tg)\approx f(x)+t[Df(x)g]+t^2[\frac{1}{2}g^TD^2f(x)g].$

We think of $g$ as fixed, so the two expressions in square brackets are fixed numbers. Denote $a=Df(x)$. An important observation is that

When $t$ tends to zero, $t^2$ tends to zero even faster.

Therefore the last term in (2) is smaller than the second, and from (2) we obtain

(3) $f(x+tg)\approx f(x)+tag.$

The no-extremes case. Suppose the vector of first derivatives is not zero: $a\ne 0$, which means that

(4) at least one of the numbers $a_1,\ a_2$ is different from zero.

Select $g_1=a_1,\ g_2=a_2$. Then (3) implies

(5) $f(x+tg)\approx f(x)+t(a_1^2+a_2^2).$

From (4) it follows that $a_1^2+a_2^2>0$. Then (5) shows that $x$ cannot be an extreme point. Indeed, for small positive $t$ we have $f(x+tg)\approx f(x)+t(a_1^2+a_2^2)>f(x)$ and for small negative $t$ we have $f(x+tg)\approx f(x)+t(a_1^2+a_2^2). In any neighborhood of $x$  the values of $f$ can be both higher and lower than $f(x).$

Conclusion. In case (4) $x$ cannot be a local minimum or maximum. In other words, we should look for local extrema among critical points which satisfy the first order condition $\frac{\partial f(x)}{\partial x_1}=0,\ \frac{\partial f(x)}{\partial x_2}=0.$

The FOC is necessary for a function to have a local minimum or maximum. All of the above easily generalizes to dimensions higher than 2.