Zap Q-Learning With Nonlinear Function Approximation - INRIA - Institut National de Recherche en Informatique et en Automatique Accéder directement au contenu
Communication Dans Un Congrès Année : 2020

Zap Q-Learning With Nonlinear Function Approximation

Résumé

Zap Q-learning is a recent class of reinforcement learning algorithms, motivatedprimarily as a means to accelerate convergence. Stability theory has been absentoutside of two restrictive classes: the tabular setting, and optimal stopping. Thispaper introduces a new framework for analysis of a more general class of recursivealgorithms known as stochastic approximation. Based on this general theory, it isshown that Zap Q-learning is consistent under a non-degeneracy assumption, evenwhen the function approximation architecture is nonlinear. Zap Q-learning withneural network function approximation emerges as a special case, and is testedon examples from OpenAI Gym. Based on multiple experiments with a range ofneural network sizes, it is found that the new algorithms converge quickly and arerobust to choice of function approximation architecture.

Dates et versions

hal-02425985 , version 1 (31-12-2019)

Identifiants

Citer

Shuhang Chen, Adithya M. Devraj, Fan Lu, Ana Bušić, Sean P. Meyn. Zap Q-Learning With Nonlinear Function Approximation. NeurIPS 2020: Thirty-fourth Conference on Neural Information Processing Systems, Dec 2020, Vancouver / Virtual, Canada. ⟨hal-02425985⟩
72 Consultations
0 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More