Bandit learning in concave N-person games

Mario Bravo; David Stuart Leslie; Panayotis Mertikopoulos

Communication Dans Un Congrès Année : 2018

Bandit learning in concave N-person games

(1) , (2) , (3)

1
2
3

Mario Bravo

Fonction : Auteur

Universidad de Santiago de Chile [Santiago]

David Stuart Leslie

Fonction : Auteur

Department of Mathematics & Statistics [Lancaster]

Panayotis Mertikopoulos

Fonction : Auteur
PersonId : 1933
IdHAL : mertikop
ORCID : 0000-0003-2026-9616
IdRef : 253119758

Performance analysis and optimization of LARge Infrastructures and Systems

Résumé

This paper examines the long-run behavior of learning with bandit feedback in non-cooperative concave games. The bandit framework accounts for extremely low-information environments where the agents may not even know they are playing a game; as such, the agents' most sensible choice in this setting would be to employ a no-regret learning algorithm. In general, this does not mean that the players' behavior stabilizes in the long run: no-regret learning may lead to cycles, even with perfect gradient information. However, if a standard monotonicity condition is satisfied, our analysis shows that no-regret learning based on mirror descent with bandit feedback converges to Nash equilibrium with probability 1. We also derive an upper bound for the convergence rate of the process that nearly matches the best attainable rate for single-agent bandit stochastic optimization.

Domaines

Optimisation et contrôle [math.OC]

Fichier principal

BanditConcave-NIPS.pdf (636.43 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Panayotis Mertikopoulos : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01891523

Soumis le : mardi 9 octobre 2018-16:36:15

Dernière modification le : vendredi 5 avril 2024-03:09:39

Dates et versions

hal-01891523 , version 1 (09-10-2018)

Identifiants

HAL Id : hal-01891523 , version 1

Citer

Mario Bravo, David Stuart Leslie, Panayotis Mertikopoulos. Bandit learning in concave N-person games. NIPS 2018 - Thirty-second Conference on Neural Information Processing Systems, Dec 2018, Montréal, Canada. pp.1-24. ⟨hal-01891523⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UGA CNRS INRIA LIG LIG_SRCPR INRIA-CHILE INRIA2 TDS-MACS LIG-SRCPR-POLARIS ANR LIG_SIDCH

152 Consultations

61 Téléchargements

Bandit learning in concave N-person games

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager