Nov 17

Cauchy-Schwarz inequality and optimization 2

Cauchy-Schwarz inequality and optimization 2

Here we consider a problem from mt3042 Optimization Guide by M. Baltovic:

Example 8.1. Given the initial state s_{1}=W for some W>0 and a discount factor \delta\in(0,1), we need to maximize the discounted stream of future rewards

(1) \sum_{i=1}^{\infty }\delta^t\sqrt{a_t}.

The maximization is over actions (a_1,a_2,...) (you can think of them as consumptions). The initial state s_1=W is the initial wealth; at each step, the consumption is chosen between zero and the wealth left from the previous period, 0\leq a_t\leq s_t. This reduces the amount available for consumption in the next period:

(2) s_{t+1}=s_t-a_t, t\geq 1.

Step 1. Here we reformulate the problem to remove the states from the constraints. From (2) we have a_t=s_t-s_{t+1}. Fixing an integer T>0 then we obtain

(3) a_1+a_2+...+a_{T-1}+a_T=s_1-s_2+s_2-s_3+...+s_{T-1}-s_T+s_T-s_{T+1}=s_1-s_{T+1}.

Since in each state s_t consumption a_t is nonnegative, the states can only decrease over time. On the other hand, they cannot be negative. Hence, the sequence \{s_T\} is nonincreasing and bounded from below by zero. Therefore the limit S=\lim_{T\rightarrow\infty}s_T exists and is nonnegative. Letting T\rightarrow\infty in (3) we get \sum_{t=1}^{\infty}a_t=W-S\leq W.

The problem becomes: maximize (1) subject to

(4) \sum_{t=1}^{\infty}a_t\leq W.

Step 2. Getting rid of square roots. Denoting x_t=\sqrt{a_t}, we have to maximize \sum_{i=1}^{\infty }\delta^tx_t subject to \sum_{t=1}^{\infty }x_t^2\leq W.

Step 3. Applying the result on the Cauchy-Schwarz inequality. The result from the previous post does not depend on the number of variables. For our present needs it is formulated as follows: Let a_1,a_2,... \ge 0 be fixed numbers and let x_1,x_2,... \ge 0 be variables satisfying the restriction x_1^2+x_2^2+... \le W. Then the infinite weighted sum a_1x_1+a_2x_2+... is maximized at the point

(x_1,x_2,...)=\frac{W^{1/2}}{\left( a_1^2+a_2^2+... \right)^{1/2}}(a_1,a_2,...)

and the maximized value is a_1x_1+a_2x_2+...=W^{1/2} \left(a_1^2+a_2^2+... \right)^{1/2}.

Letting a_{t}=\delta^t we get a_1^2+a_2^2+...=1/(1-\delta^2) and

(x_1,x_2,...)=[(1-\delta^2)W]^{1/2}(1,\delta,\delta^2,...) with the maximized value \sqrt{W/(1-\delta^2)}.

And no clumsy terminology and notation from the Infinite Horizon Dynamic Programming method!

Leave a Reply

You must be logged in to post a comment.