On the rate of convergence and error bounds for LSTD(λ) - INRIA - Institut National de Recherche en Informatique et en Automatique Accéder directement au contenu
Communication Dans Un Congrès Année : 2015

On the rate of convergence and error bounds for LSTD(λ)

Résumé

We consider LSTD(λ), the least-squares temporal-difference algorithm with eligibility traces algorithm proposed by Boyan (2002). It computes a linear approximation of the value function of a fixed policy in a large Markov Decision Process. Under a β-mixing assumption , we derive, for any value of λ ∈ (0, 1), a high-probability bound on the rate of convergence of this algorithm to its limit. We deduce a high-probability bound on the error of this algorithm, that extends (and slightly improves) that derived by Lazaric et al. (2012) in the specific case where λ = 0. In the context of temporal-difference algorithms with value function approximation, this analysis is to our knowledge the first to provide insight on the choice of the eligibility-trace parameter λ with respect to the approximation quality of the space and the number of samples.
Fichier principal
Vignette du fichier
lstd.pdf (283.5 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-01186667 , version 1 (25-08-2015)

Identifiants

  • HAL Id : hal-01186667 , version 1

Citer

Manel Tagorti, Bruno Scherrer. On the rate of convergence and error bounds for LSTD(λ). ICML 2015, Jul 2015, Lille, France. ⟨hal-01186667⟩
167 Consultations
106 Téléchargements

Partager

Gmail Facebook X LinkedIn More