Full Gradient DQN Reinforcement Learning: A Provably Convergent Scheme

Konstantin Avrachenkov; Vivek S Borkar; Harsh P Dolhare; Kishor Patil

doi:10.1007/978-3-030-76928-4_10

Chapitre D'ouvrage Année : 2021

Full Gradient DQN Reinforcement Learning: A Provably Convergent Scheme

(1) , (2) , (2) , (1)

1
2

Konstantin Avrachenkov

Fonction : Auteur
PersonId : 11963
IdHAL : konstantin-avrachenkov
ORCID : 0000-0002-8124-8272
IdRef : 087245280

Network Engineering and Operations

Vivek S Borkar

Fonction : Auteur
PersonId : 994265
ORCID : 0000-0003-0756-5402

Department of Electrical Engineering [IIT-Bombay]

Harsh P Dolhare

Fonction : Auteur
PersonId : 1119044

Department of Electrical Engineering [IIT-Bombay]

Kishor Patil

Fonction : Auteur

Network Engineering and Operations

Résumé

We analyze the DQN reinforcement learning algorithm as a stochastic approximation scheme using the o.d.e. (for 'ordinary differential equation') approach and point out certain theoretical issues. We then propose a modified scheme called Full Gradient DQN (FG-DQN, for short) that has a sound theoretical basis and compare it with the original scheme on sample problems. We observe a better performance for FG-DQN.

Mots clés

Markov Decision Process (MDP) approximate dynamic programming Deep Reinforcement Learning (DRL) stochastic approximation Deep Q-Network (DQN) Full Gradient DQN Bellman error minimization

Domaines

Apprentissage [cs.LG] Optimisation et contrôle [math.OC] Probabilités [math.PR]

Fichier principal

Full_Gradient_main.pdf (1.29 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Konstantin Avrachenkov : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-03462350

Soumis le : mercredi 1 décembre 2021-17:26:51

Dernière modification le : lundi 15 avril 2024-10:53:24

Archivage à long terme le : mercredi 2 mars 2022-20:01:24

Dates et versions

hal-03462350 , version 1 (01-12-2021)

Identifiants

HAL Id : hal-03462350 , version 1
DOI : 10.1007/978-3-030-76928-4_10

Citer

Konstantin Avrachenkov, Vivek S Borkar, Harsh P Dolhare, Kishor Patil. Full Gradient DQN Reinforcement Learning: A Provably Convergent Scheme. Alexey Piunovskiy; Yi Zhang. Modern Trends in Controlled Stochastic Processes: Theory and Applications, V.III, 41, Springer International Publishing, pp.192-220, 2021, Emergence, Complexity and Computation, 978-3-030-76928-4. ⟨10.1007/978-3-030-76928-4_10⟩. ⟨hal-03462350⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

INRIA INRIA2 TDS-MACS UNIV-COTEDAZUR

51 Consultations

173 Téléchargements

Full Gradient DQN Reinforcement Learning: A Provably Convergent Scheme

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager