5
Nov 17

## Finite Horizon Dynamic Programming ## Finite Horizon Dynamic Programming

This is the title of the theory we start studying here. We use the Traveller's problem to explain the main definitions:

1. Set of states, for each time
2. Set of actions, for each time and state
3. Reward, for each action
4. Strategy = set of actions that takes us from A to B (geometrically, it is a path from A to B).
5. Definition of a Markovian strategy: at each time, the action depends only on the state at that time and not on how we got there.
6. Value function of a strategy = sum of all rewards

## The method of backwards induction

By definition, an optimal strategy maximizes the value function.

Statement. Piece of optimal whole PLUS optimal remainder = optimal whole. That is, if we have an optimal strategy leading from A to B, we can take any piece of it, leading from A to an intermediate point, say, C. Then if we find a partial optimal strategy leading from C to B, then the combination of the piece A to C of the initial optimal strategy and the partial optimal strategy C to B will be optimal from A to B.

Idea: starting from the end, find optimal remainders for all states.