Finite Horizon Dynamic Programming
This is the title of the theory we start studying here. We use the Traveller's problem to explain the main definitions:
- Set of states, for each time
- Set of actions, for each time and state
- Reward, for each action
- Strategy = set of actions that takes us from A to B (geometrically, it is a path from A to B).
- Definition of a Markovian strategy: at each time, the action depends only on the state at that time and not on how we got there.
- Value function of a strategy = sum of all rewards
The method of backwards induction
By definition, an optimal strategy maximizes the value function.
Statement. Piece of optimal whole PLUS optimal remainder = optimal whole. That is, if we have an optimal strategy leading from A to B, we can take any piece of it, leading from A to an intermediate point, say, C. Then if we find a partial optimal strategy leading from C to B, then the combination of the piece A to C of the initial optimal strategy and the partial optimal strategy C to B will be optimal from A to B.
Idea: starting from the end, find optimal remainders for all states.