Nov 17

Finite Horizon Dynamic Programming

Finite Horizon Dynamic Programming

This is the title of the theory we start studying here. We use the Traveller's problem to explain the main definitions:

  1. Set of states, for each time
  2. Set of actions, for each time and state
  3. Reward, for each action
  4. Strategy = set of actions that takes us from A to B (geometrically, it is a path from A to B).
  5. Definition of a Markovian strategy: at each time, the action depends only on the state at that time and not on how we got there.
  6. Value function of a strategy = sum of all rewards

Video 1. Traveller's problem-Definitions

The method of backwards induction

By definition, an optimal strategy maximizes the value function.

Statement. Piece of optimal whole PLUS optimal remainder = optimal whole. That is, if we have an optimal strategy leading from A to B, we can take any piece of it, leading from A to an intermediate point, say, C. Then if we find a partial optimal strategy leading from C to B, then the combination of the piece A to C of the initial optimal strategy and the partial optimal strategy C to B will be optimal from A to B.

Idea: starting from the end, find optimal remainders for all states.


Video 2. Traveller's problem-Solution


Leave a Reply

You must be logged in to post a comment.