## Taylor decomposition for unconstrained optimization

As is clear from my earlier post, the Taylor decomposition is the principal tool to derive the rule for finding function extremes. In many dimensions, we need this decomposition even more. We derive it for a function of two arguments by reducing the 2D case to a one-dimensional case.

In the one-dimensional case, the Taylor approximation with three terms is

(1)

We want to generalize this to the case when is a 2D argument. For this, we take any real number and consider points of type where As changes, points run over a straight line drawn through the vector At the same time, points run over a straight line drawn through point parallel to Consider an auxiliary function defined by This is a function of a real argument. If we obtain a Taylor approximation for it, letting will give us the desired generalization of (1).

## Technical tools

**Struggling with the chain rule**. The chain rule for functions of a real argument says that Here we need its generalization for the 2D case:

(2)

In our case Hence, (2) gives:

(3)

Differentiating (3) we get

(4)

(applying chain rule to first term)

(applying chain rule to second term)

**Struggling with matrix algebra**. Expressions (3) and (4) are unwieldy. We apply matrix algebra to make them shorter. Denote the **gradient**

(this is a row vector) and

(a column vector). The multiplication rule for matrices (rows from the first matrix are multiplied by the columns from the second) allows us to rewrite (3) as

(5)

Further, denote

This matrix is called a **Hessian**. The expression is called a **quadratic form** of the Hessian ( stands for transposition, so is a row vector). Using the multiplication rule for matrices we see that

(6)

Combining (4) and (6) we have

(7)

## Finally, here is the result

(1) for gives

Plugging here (5) and (7) and letting we get a 2D analog of (1):

## Leave a Reply

You must be logged in to post a comment.