State Value Learning with an Anticipatory Learning Classifier System in a Markov Decision Process
TR No.: 2002018 | Download PDF | Download PS
Abstract:
This paper addresses the combination of an online generalizing model learner with a state value learner in a Markov decision process (MDP). The model learner evolves online a generalized representation of the MDP’s state transition function. The learned model is called predictive model.
State values are evaluated by the means of the evolving predictive model
representation. State values approximate the Bellman equation giving rise
to an optimal policy in the MDP. It is proven that if the reinforcement
provision in the MDP only depends on resulting states, state values can be
redefined to yield an optimal policy independent of any immediate reward
representation. Internal temporal difference learning is applied to further speed up learning of an optimal policy.
Posted: April 4th, 2002 under Genetic algorithms.
Comments: none
Write a comment