Heuristic Search Value Iteration for zero-sum Stochastic Games

Olivier Buffet; Jilles Dibangoye; Abdallah Saffidine; Vincent Thomas

doi:10.1109/TG.2020.3005214

Article Dans Une Revue IEEE Transactions on Games Année : 2021

Heuristic Search Value Iteration for zero-sum Stochastic Games

(1) , (2) , (3) , (1)

1
2
3

Olivier Buffet

Fonction : Auteur
PersonId : 1407
IdHAL : olivier-buffet
ORCID : 0000-0002-5072-5857

Lifelong Autonomy and interaction skills for Robots in a Sensing ENvironment

Jilles Dibangoye

Fonction : Auteur
PersonId : 4917
IdHAL : jilles-steeve-dibangoye
ORCID : 0000-0001-8826-4438
IdRef : 144368145

Robots coopératifs et adaptés à la présence humaine en environnements dynamiques

Abdallah Saffidine

Fonction : Auteur
PersonId : 1086345

University of New South Wales [Sydney]

Vincent Thomas

Fonction : Auteur
PersonId : 16368
IdHAL : vincent-thomas
ORCID : 0000-0003-3401-4649

Lifelong Autonomy and interaction skills for Robots in a Sensing ENvironment

Résumé

In sequential decision-making, heuristic search algorithms allow exploiting both the initial situation and an admissible heuristic to efficiently search for an optimal solution, often for planning purposes. Such algorithms exist for problems with uncertain dynamics, partial observability, multiple criteria, or multiple collaborating agents. Here we look at two-player zero-sum stochastic games with discounted criterion, in a view to propose a solution tailored to the fully observable case, while solutions have been proposed for particular, though still more general, partially observable cases. This setting induces reasoning on both a lower and an upper bound of the value function, which leads us to proposing zsSG-HSVI, an algorithm based on Heuristic Search Value Iteration (HSVI), and which thus relies on generating trajectories. We demonstrate that, each player acting optimistically, and employing simple heuristic initializations, HSVI's convergence in finite time to an ϵ-optimal solution is preserved. An empirical study of the resulting approach is conducted on benchmark problems of various sizes.

Mots clés

Games Game theory Convergence Heuristic algorithms Trajectory Markov processes Planning

Domaines

Intelligence artificielle [cs.AI]

Fichier principal

accepted.pdf (402.73 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Olivier Buffet : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-03080314

Soumis le : jeudi 27 mai 2021-14:58:22

Dernière modification le : lundi 11 septembre 2023-17:41:19

Archivage à long terme le : samedi 28 août 2021-19:34:16

Dates et versions

hal-03080314 , version 1 (27-05-2021)

Identifiants

HAL Id : hal-03080314 , version 1
DOI : 10.1109/TG.2020.3005214

Citer

Olivier Buffet, Jilles Dibangoye, Abdallah Saffidine, Vincent Thomas. Heuristic Search Value Iteration for zero-sum Stochastic Games. IEEE Transactions on Games, 2021, 13 (3), pp.1-10. ⟨10.1109/TG.2020.3005214⟩. ⟨hal-03080314⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA INSA-LYON UNIV-LORRAINE INRIA2 LORIA LORIA-AIS CITI INSA-GROUPE UDL ANR

137 Consultations

187 Téléchargements

Heuristic Search Value Iteration for zero-sum Stochastic Games

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager