## Finite Horizon Dynamic Programming

This is the title of the theory we start studying here. We use the Traveller's problem to explain the main definitions:

- Set of states, for each time
- Set of actions, for each time and state
- Reward, for each action
- Strategy = set of actions that takes us from A to B (geometrically, it is a path from A to B).
- Definition of a
**Markovian strategy**: at each time, the action depends only on the state at that time and not on how we got there. - Value function of a strategy = sum of all rewards

## The method of backwards induction

By definition, an **optimal strategy** maximizes the value function.

**Statement**. Piece of optimal whole PLUS optimal remainder = optimal whole. That is, if we have an optimal strategy leading from A to B, we can take any piece of it, leading from A to an intermediate point, say, C. Then if we find a partial optimal strategy leading from C to B, then the combination of the piece A to C of the initial optimal strategy and the partial optimal strategy C to B will be optimal from A to B.

Idea: starting from the end, find optimal remainders for all states.

## Leave a Reply

You must be logged in to post a comment.