Offline Reinforcement Learning with Pseudometric Learning

Robert Dadashi; Shideh Rezaeifar; Nino Vieillard; Léonard Hussenot; Olivier Pietquin; Matthieu Geist

Communication Dans Un Congrès Année : 2021

Offline Reinforcement Learning with Pseudometric Learning

(1) , (2) , (1, 3, 4) , (1, 5, 6) , (1) , (1)

1
2
3
4
5
6

Robert Dadashi

Fonction : Auteur

Google Research [Paris]

Shideh Rezaeifar

Fonction : Auteur

Université de Genève = University of Geneva

Nino Vieillard

Fonction : Auteur

Google Research [Paris]

Biology, genetics and statistics

Institut Élie Cartan de Lorraine

Léonard Hussenot

Fonction : Auteur

Google Research [Paris]

Scool

Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189

Olivier Pietquin

Fonction : Auteur

Google Research [Paris]

Matthieu Geist

Fonction : Auteur
PersonId : 790158
IdRef : 142341819

Google Research [Paris]

Résumé

Offline Reinforcement Learning methods seek to learn a policy from logged transitions of an environment, without any interaction. In the presence of function approximation, and under the assumption of limited coverage of the state-action space of the environment, it is necessary to enforce the policy to visit state-action pairs close to the support of logged transitions. In this work, we propose an iterative procedure to learn a pseudometric (closely related to bisimulation metrics) from logged transitions, and use it to define this notion of closeness. We show its convergence and extend it to the function approximation setting. We then use this pseudometric to define a new lookup based bonus in an actor-critic algorithm: PLOFF. This bonus encourages the actor to stay close, in terms of the defined pseudometric, to the support of logged transitions. Finally, we evaluate the method on hand manipulation and locomotion tasks.

Domaines

Apprentissage [cs.LG]

Nino Vieillard : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-03468847

Soumis le : mardi 7 décembre 2021-13:32:07

Dernière modification le : mercredi 24 janvier 2024-09:54:23

Dates et versions

hal-03468847 , version 1 (07-12-2021)

Identifiants

HAL Id : hal-03468847 , version 1
ARXIV : 2103.01948

Citer

Robert Dadashi, Shideh Rezaeifar, Nino Vieillard, Léonard Hussenot, Olivier Pietquin, et al.. Offline Reinforcement Learning with Pseudometric Learning. ICML 2021 - 38th International Conference on Machine Learning, Jun 2021, virtual, France. ⟨hal-03468847⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA IECN CRISTAL UNIV-LORRAINE INRIA2 IECLPS UNIV-LILLE CRISTAL-SCOOL

41 Consultations

0 Téléchargements

Offline Reinforcement Learning with Pseudometric Learning

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager