Taylor decomposition for unconstrained optimization
As is clear from my earlier post, the Taylor decomposition is the principal tool to derive the rule for finding function extremes. In many dimensions, we need this decomposition even more. We derive it for a function of two arguments by reducing the 2D case to a one-dimensional case.
In the one-dimensional case, the Taylor approximation with three terms is
We want to generalize this to the case when is a 2D argument. For this, we take any real number and consider points of type where As changes, points run over a straight line drawn through the vector At the same time, points run over a straight line drawn through point parallel to Consider an auxiliary function defined by This is a function of a real argument. If we obtain a Taylor approximation for it, letting will give us the desired generalization of (1).
Struggling with the chain rule. The chain rule for functions of a real argument says that Here we need its generalization for the 2D case:
In our case Hence, (2) gives:
Differentiating (3) we get
(applying chain rule to first term)
(applying chain rule to second term)
Struggling with matrix algebra. Expressions (3) and (4) are unwieldy. We apply matrix algebra to make them shorter. Denote the gradient (this is a row vector) and (a column vector). The multiplication rule for matrices (rows from the first matrix are multiplied by the columns from the second) allows us to rewrite (3) as
This matrix is called a Hessian. The expression is called a quadratic form of the Hessian ( stands for transposition, so is a row vector). Using the multiplication rule for matrices we see that
Combining (4) and (6) we have
Finally, here is the result
(1) for gives
Plugging here (5) and (7) and letting we get a 2D analog of (1):