On the Use of Non-Stationary Policies for Infinite-Horizon Discounted Markov Decision Processes

Bruno Scherrer

Rapport (Rapport De Recherche) Année : 2012

On the Use of Non-Stationary Policies for Infinite-Horizon Discounted Markov Decision Processes

(1)

Bruno Scherrer

Fonction : Auteur
PersonId : 1406
IdHAL : bruno-scherrer
IdRef : 073360708

Autonomous intelligent machine

Résumé

We consider infinite-horizon discounted Markov Decision Processes, for which it is known that there exists a stationary optimal policy. We consider the algorithm Value Iteration and the sequence of policies $\pi_1,\dots,\pi_k$ it gen erates until some iteration $k$. We provide performance bounds for non-stationary policies involving the last $m$ generated policies that reduce the state-of-the-art bound for the last stationary policy $\pi_k$ by a factor $\frac{1-\gamma}{1-\gamma^m}$. In other words, and contrary to a common intuition, we show that it may be much easier to find a non-stationary approximately-optimal policy than a stationary one.

Domaines

Intelligence artificielle [cs.AI]

Fichier principal

nonstationary.pdf (79.04 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Bruno Scherrer : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-00682172

Soumis le : dimanche 25 mars 2012-15:28:18

Dernière modification le : vendredi 24 mars 2023-14:52:55

Archivage à long terme le : mardi 26 juin 2012-02:20:36

Dates et versions

hal-00682172 , version 1 (25-03-2012)

hal-00682172 , version 2 (30-03-2012)

Identifiants

HAL Id : hal-00682172 , version 1
ARXIV : 1203.5532

Citer

Bruno Scherrer. On the Use of Non-Stationary Policies for Infinite-Horizon Discounted Markov Decision Processes. [Research Report] 2012. ⟨hal-00682172v1⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

147 Consultations

101 Téléchargements

On the Use of Non-Stationary Policies for Infinite-Horizon Discounted Markov Decision Processes

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Altmetric

Partager