Optimally Solving Dec-POMDPs as Continuous-State MDPs

Jilles Steeve Dibangoye; Christopher Amato; Olivier Buffet; François Charpillet

doi:10.1613/jair.4623

Article Dans Une Revue Journal of Artificial Intelligence Research Année : 2016

Optimally Solving Dec-POMDPs as Continuous-State MDPs

(1) , (2) , (3) , (4)

1
2
3
4

Jilles Steeve Dibangoye

Fonction : Auteur
PersonId : 4917
IdHAL : jilles-steeve-dibangoye
ORCID : 0000-0001-8826-4438
IdRef : 144368145

Robots coopératifs et adaptés à la présence humaine en environnements dynamiques

Christopher Amato

Fonction : Auteur
PersonId : 977293

University of New Hampshire

Olivier Buffet

Fonction : Auteur
PersonId : 1407
IdHAL : olivier-buffet
ORCID : 0000-0002-5072-5857

Inria Nancy - Grand Est

François Charpillet

Fonction : Auteur
PersonId : 1910
IdHAL : francois-charpillet
ORCID : 0000-0001-8260-1536
IdRef : 070140553

Lifelong Autonomy and interaction skills for Robots in a Sensing ENvironment

Résumé

Decentralized partially observable Markov decision processes (Dec-POMDPs) provide a general model for decision-making under uncertainty in decentralized settings, but are difficult to solve optimally (NEXP-Complete). As a new way of solving these problems, we introduce the idea of transforming a Dec-POMDP into a continuous-state deterministic MDP with a piecewise-linear and convex value function. This approach makes use of the fact that planning can be accomplished in a centralized offline manner, while execution can still be decentralized. This new Dec-POMDP formulation , which we call an occupancy MDP, allows powerful POMDP and continuous-state MDP methods to be used for the first time. To provide scalability, we refine this approach by combining heuristic search and compact representations that exploit the structure present in multi-agent domains, without losing the ability to converge to an optimal solution. In particular, we introduce a feature-based heuristic search value iteration (FB-HSVI) algorithm that relies on feature-based compact representations, point-based updates and efficient action selection. A theoretical analysis demonstrates that FB-HSVI terminates in finite time with an optimal solution. We include an extensive empirical analysis using well-known benchmarks, thereby demonstrating that our approach provides significant scalability improvements compared to the state of the art.

Mots clés

Decentralized Control Optimal Planning Partially observable Markov decision processes

Domaines

Intelligence artificielle [cs.AI]

Fichier principal

dibangoye16a.pdf (758.17 Ko)

Origine : Fichiers éditeurs autorisés sur une archive ouverte

Jilles Steeve Dibangoye : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01279444

Soumis le : mardi 1 mars 2016-08:34:11

Dernière modification le : jeudi 1 février 2024-10:05:17

Archivage à long terme le : mardi 31 mai 2016-10:53:28

Dates et versions

hal-01279444 , version 1 (01-03-2016)

Licence

Domaine public

Identifiants

HAL Id : hal-01279444 , version 1
DOI : 10.1613/jair.4623

Citer

Jilles Steeve Dibangoye, Christopher Amato, Olivier Buffet, François Charpillet. Optimally Solving Dec-POMDPs as Continuous-State MDPs. Journal of Artificial Intelligence Research, 2016, 55, pp.443-497. ⟨10.1613/jair.4623⟩. ⟨hal-01279444⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-RENNES1 CNRS INRIA INSA-LYON IRISA UNIV-LORRAINE INRIA2 LORIA LORIA-AIS UR1-MATH-STIC UR1-UFR-ISTIC LABEXIMU UNIV-RENNES CITI INSA-GROUPE UDL UR1-MATH-NUM

799 Consultations

360 Téléchargements

Optimally Solving Dec-POMDPs as Continuous-State MDPs

Résumé

Mots clés

Domaines

Dates et versions

Licence

Identifiants

Citer

Exporter

Collections

Altmetric

Partager