Bandit learning in concave N-person games - INRIA - Institut National de Recherche en Informatique et en Automatique Accéder directement au contenu
Communication Dans Un Congrès Année : 2018

Bandit learning in concave N-person games

Résumé

This paper examines the long-run behavior of learning with bandit feedback in non-cooperative concave games. The bandit framework accounts for extremely low-information environments where the agents may not even know they are playing a game; as such, the agents' most sensible choice in this setting would be to employ a no-regret learning algorithm. In general, this does not mean that the players' behavior stabilizes in the long run: no-regret learning may lead to cycles, even with perfect gradient information. However, if a standard monotonicity condition is satisfied, our analysis shows that no-regret learning based on mirror descent with bandit feedback converges to Nash equilibrium with probability 1. We also derive an upper bound for the convergence rate of the process that nearly matches the best attainable rate for single-agent bandit stochastic optimization.
Fichier principal
Vignette du fichier
BanditConcave-NIPS.pdf (636.43 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01891523 , version 1 (09-10-2018)

Identifiants

  • HAL Id : hal-01891523 , version 1

Citer

Mario Bravo, David Stuart Leslie, Panayotis Mertikopoulos. Bandit learning in concave N-person games. NIPS 2018 - Thirty-second Conference on Neural Information Processing Systems, Dec 2018, Montréal, Canada. pp.1-24. ⟨hal-01891523⟩
152 Consultations
61 Téléchargements

Partager

Gmail Facebook X LinkedIn More