27
Dec 17

How to study mt3042 Optimisation: a guide to a guide

How to study mt3042 Optimisation: a guide to a guide

Section and examples numbering follows that of mt3042 Optimization Guide by M. Baltovic.

Main idea: look at geometry in the two-dimensional case

Here is an example. The norm of a vector x\in R^n is defined by \left\Vert x\right\Vert =\sqrt{\sum_{i=1}^nx_i^2}. The combination of squaring and extracting a square root often makes it difficult to understand how this construction works. Here is a simple inequality that allows one to do without this norm (or, put it differently, replace it with another norm). Take n=2.

\max \{|x_1|,|x_2|\}=\max \{\sqrt{x_1^2},\sqrt{x_2^2}\}\leq\max\{\sqrt{x_1^2+x_2^2},\sqrt{x_1^2+x_2^2}\}=\left\Vert  x\right\Vert =\sqrt{x_1^2+x_2^2}\leq\sqrt{\max\{|x_1|,|x_2|\}^2+\max \{|x_1|,|x_2|\}^2}=\sqrt{2}\max\{|x_1|,|x_2|\}.

We have proved that \max \{|x_1|,|x_2|\}\leq\left\Vert x\right\Vert\leq\sqrt{2}\max\{|x_1|,|x_2|\}. This easily generalizes to R^{n}:

(1) \max \{|x_1|,...,|x_n|\}\leq\left\Vert x\right\Vert\leq\sqrt{n}\max\{|x_1|,...,|x_n|\}.

Application. The set A\subset R^n is called bounded if there is a constant C such that \left\Vert x\right\Vert \leq C for all x\in A. (1) implies an equivalent definition: the set A\subset R^n is called bounded if there is a constant C such that \max\{|x_1|,...,|x_n|\}\leq C for all x\in A. See p.35 of Baltovic's guide, where the inequality y_{i}\leq \frac{f(\hat{x})}{p_{i}} is sufficient for proving boundedness of the set Y.

Theorem 2.2 (The Cauchy-Schwarz Inequality). This inequality does not have serious applications in the guide. For a nontrivial application of the Cauchy-Schwarz inequality see my post.

2.1.8. Avoid using the definition of continuity in terms of \varepsilon-\delta (Definition 2.18). Use Definition 2.19 in terms of sequences instead.

2.6.2. Definition 2.21 for many students is indigestible. Just say that the matrix A consists of partial derivatives of components of f=(f_1,...,f_m):

A=\left(\begin{array}{ccc}\frac{\partial f_1}{\partial x_1} &...& \frac{\partial f_m}{\partial x_1} \\... &.. .&... \\ \frac{\partial f_1}{\partial x_n} &...& \frac{\partial f_m}{\partial x_n} \end{array}\right) .

Theorem 2.11. The proof is really simple in the one-dimensional case. By the definition of the derivative, \frac{f(x_n)-f(x)}{x_n-x}\rightarrow  f^{\prime }(x) for any sequence x_n\rightarrow x. Multiplying this equation by x_n-x\rightarrow 0 we get f(x_{n})-f(x)\rightarrow  (x_n-x)f^{\prime }(x)\rightarrow 0, which proves continuity of f at x.

3.3.1. There is Math that happens on paper (formulas) and the one that happens in the head (logic). Many students see the formulas and miss the logic. Carefully read this section and see if the logic happens in your head.

3.4. The solution to Example 3.2 is overblown. A professional mathematician never thinks like that. A pro would explain the idea as follows: because of Condition 2, the function is close to zero in some neighborhood of infinity \{x:|x|>N\}. Therefore, a maximum should be looked for in the set \{x:|x|\leq N\}. Since this is a compact, the Weierstrass theorem applies. With a proper graphical illustration, the students don't need anything else.

4.2 First-order conditions for optima. See the proof.

4.4 Second-order conditions for optima. See explanation using the Taylor decomposition.

5.3 The Theorem of Lagrange. For the Lagrange method see necessary conditionssufficient conditions, and case of many constraints.

5.4 Proof of Lagrange's Theorem. See a simple explanation of the constraint qualification condition. The explanation on pp.58-60 is hard to understand because of dimensionality.

5.6 The Lagrangian multipliers. See simpler derivation.

6.4 Proof of the Kuhn-Tucker Theorem. In case of the Kuhn-Tucker theorem, the most important point is that, once the binding constraints have been determined, the nonbinding ones can be omitted from the analysis. The proof of nonnegativity of the Lagrange multiplier for binding constraints is less than one page.

Example 6.4. In solutions that rely on the Kuhn-Tucker theorem, the author suggests to check the constraint qualification condition for all possible combinations of constraints. Not only is this time consuming, but this is also misleading, given the fact that often it is possible to determine the binding constraints and use the Lagrange method instead of the Kuhn-Tucker theorem or, alternatively, to use the Kuhn-Tucker theorem for eliminating simple cases. The same problem can be solved using the convexity theory.

Example 6.5. In this case Baltovic makes a controversial experiment: what happens if we go the wrong way (expectedly, bad things happen), without providing the correct solution.

Solution to Exercise 6.1. In this exercise, the revenue is homogeneous of degree 2 and the cost is homogeneous of degree 1, which indicates that the profit is infinite. No need to do a three-page analysis!

7.6 The Bellman equations. There are many optimization methods not covered in Sundaram's book. One of them, Pontryagin's maximum principle, is more general that the Bellman approach.

p. 172. The bound \sum \delta ^{t}|r(s_{t},a_{t})|\leq K\sum \delta ^{t} is obvious and does not require the Cauchy-Schwarz inequality.

Example 8.1. See the solution of this example using the Cauchy-Schwarz inequality.