#DynamicProgramming
[[Dynamic Programming]]
The goal of Value Iteration is to find a policy that maximizes the [[Value Function]]:
In value iteration, you start with any initialised value function, for example set V(s) = 0 for all s in S. Then iterate to find a new and improved improved value function, assuming that the agent takes the best possible action in that state under the current estimate of the value function, until you reach an optimal value function. This optimal value functional is the one that finds the maximium final reward.
Value iIteration visits all states in states in the statesspace,
- Initialise values of all states V(i) to some number, say 0
- Set discount rate to some value, say 0.9
- For every state perform the Bellman update rule
- Calcute a delata, the change in the Value Function for that state
- repeat until delta is below some value
Once you have found the optimal value function, you can then derive the optimal policy
Value iteration is an example of [[Model Based Reinforcement Learning]]