On the Use of Non-Stationary Policies for Infinite-Horizon Discounted Markov Decision Processes - INRIA - Institut National de Recherche en Informatique et en Automatique Accéder directement au contenu
Rapport (Rapport De Recherche) Année : 2012

On the Use of Non-Stationary Policies for Infinite-Horizon Discounted Markov Decision Processes

Bruno Scherrer

Résumé

We consider infinite-horizon discounted Markov Decision Processes, for which it is known that there exists a stationary optimal policy. We consider the algorithm Value Iteration and the sequence of policies $\pi_1,\dots,\pi_k$ it gen erates until some iteration $k$. We provide performance bounds for non-stationary policies involving the last $m$ generated policies that reduce the state-of-the-art bound for the last stationary policy $\pi_k$ by a factor $\frac{1-\gamma}{1-\gamma^m}$. In other words, and contrary to a common intuition, we show that it may be much easier to find a non-stationary approximately-optimal policy than a stationary one.
Fichier principal
Vignette du fichier
nonstationary.pdf (79.04 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-00682172 , version 1 (25-03-2012)
hal-00682172 , version 2 (30-03-2012)

Identifiants

Citer

Bruno Scherrer. On the Use of Non-Stationary Policies for Infinite-Horizon Discounted Markov Decision Processes. [Research Report] 2012. ⟨hal-00682172v1⟩
147 Consultations
101 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More