Reward-free exploration beyond finite-horizon - INRIA - Institut National de Recherche en Informatique et en Automatique Accéder directement au contenu
Communication Dans Un Congrès Année : 2020

Reward-free exploration beyond finite-horizon

Résumé

We consider the reward-free exploration framework introduced by Jin et al. (2020), where an RL agent interacts with an unknown environment without any explicit reward function to maximize. The objective is to collect enough information during the exploration phase, so that a near-optimal policy can be immediately computed once any reward function is provided. In this paper, we move from the finite-horizon setting studied by Jin et al. (2020) to the more general setting of goalconditioned RL, often referred to as stochastic shortest path (SSP). We first discuss the challenges specific to SSPs and then study two scenarios: 1) reward-free goal-free exploration in communicating MDPs, and 2) reward-free goal-free incremental exploration in non-communicating MDPs where the agent is provided with a reset action to an initial state. In both cases, we provide exploration algorithms and their samplecomplexity bounds which we contrast with the existing guarantees in the finite-horizon case. 1
Fichier principal
Vignette du fichier
tarbouriech2020reward-free.pdf (304.89 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03288970 , version 1 (16-07-2021)

Identifiants

  • HAL Id : hal-03288970 , version 1

Citer

Jean Tarbouriech, Matteo Pirotta, Michal Valko, Alessandro Lazaric. Reward-free exploration beyond finite-horizon. ICML 2020 Workshop on Theoretical Foundations of Reinforcement Learning, 2020, Vienna, France. ⟨hal-03288970⟩
62 Consultations
73 Téléchargements

Partager

Gmail Facebook X LinkedIn More