Heuristic Search Value Iteration for zero-sum Stochastic Games - INRIA - Institut National de Recherche en Informatique et en Automatique Accéder directement au contenu
Article Dans Une Revue IEEE Transactions on Games Année : 2021

Heuristic Search Value Iteration for zero-sum Stochastic Games

Résumé

In sequential decision-making, heuristic search algorithms allow exploiting both the initial situation and an admissible heuristic to efficiently search for an optimal solution, often for planning purposes. Such algorithms exist for problems with uncertain dynamics, partial observability, multiple criteria, or multiple collaborating agents. Here we look at two-player zero-sum stochastic games with discounted criterion, in a view to propose a solution tailored to the fully observable case, while solutions have been proposed for particular, though still more general, partially observable cases. This setting induces reasoning on both a lower and an upper bound of the value function, which leads us to proposing zsSG-HSVI, an algorithm based on Heuristic Search Value Iteration (HSVI), and which thus relies on generating trajectories. We demonstrate that, each player acting optimistically, and employing simple heuristic initializations, HSVI's convergence in finite time to an ϵ-optimal solution is preserved. An empirical study of the resulting approach is conducted on benchmark problems of various sizes.
Fichier principal
Vignette du fichier
accepted.pdf (402.73 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03080314 , version 1 (27-05-2021)

Identifiants

Citer

Olivier Buffet, Jilles Dibangoye, Abdallah Saffidine, Vincent Thomas. Heuristic Search Value Iteration for zero-sum Stochastic Games. IEEE Transactions on Games, 2021, 13 (3), pp.1-10. ⟨10.1109/TG.2020.3005214⟩. ⟨hal-03080314⟩
137 Consultations
187 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More