Taylor decomposition for unconstrained optimization
As is clear from my earlier post, the Taylor decomposition is the principal tool to derive the rule for finding function extremes. In many dimensions, we need this decomposition even more. We derive it for a function of two arguments by reducing the 2D case to a one-dimensional case.
In the one-dimensional case, the Taylor approximation with three terms is
(1)
We want to generalize this to the case when is a 2D argument. For this, we take any real number
and consider points of type
where
As
changes, points
run over a straight line drawn through the vector
At the same time, points
run over a straight line drawn through point
parallel to
Consider an auxiliary function
defined by
This is a function of a real argument. If we obtain a Taylor approximation for it, letting
will give us the desired generalization of (1).
Technical tools
Struggling with the chain rule. The chain rule for functions of a real argument says that Here we need its generalization for the 2D case:
(2)
In our case
Hence, (2) gives:
(3)
Differentiating (3) we get
(4)
(applying chain rule to first term)
(applying chain rule to second term)
Struggling with matrix algebra. Expressions (3) and (4) are unwieldy. We apply matrix algebra to make them shorter. Denote the gradient (this is a row vector) and
(a column vector). The multiplication rule for matrices (rows from the first matrix are multiplied by the columns from the second) allows us to rewrite (3) as
(5)
Further, denote
This matrix is called a Hessian. The expression is called a quadratic form of the Hessian (
stands for transposition, so
is a row vector). Using the multiplication rule for matrices we see that
(6)
Combining (4) and (6) we have
(7)
Finally, here is the result
(1) for gives
Plugging here (5) and (7) and letting we get a 2D analog of (1):
Leave a Reply
You must be logged in to post a comment.