Explore First, Exploit Next: The True Shape of Regret in Bandit Problems - Université Toulouse III - Paul Sabatier - Toulouse INP Accéder directement au contenu
Article Dans Une Revue Mathematics of Operations Research Année : 2019

Explore First, Exploit Next: The True Shape of Regret in Bandit Problems

Résumé

We revisit lower bounds on the regret in the case of multi-armed bandit problems. We obtain non-asymptotic, distribution-dependent bounds and provide straightforward proofs based only on well-known properties of Kullback-Leibler divergences. These bounds show in particular that in an initial phase the regret grows almost linearly, and that the well-known logarithmic growth of the regret only holds in a final phase. The proof techniques come to the essence of the information-theoretic arguments used and they are deprived of all unnecessary complications.
Fichier principal
Vignette du fichier
Bandit-lower-bounds-MOR-v3.pdf (732.16 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-01276324 , version 1 (19-02-2016)
hal-01276324 , version 2 (16-06-2016)
hal-01276324 , version 3 (08-10-2018)

Identifiants

Citer

Aurélien Garivier, Pierre Ménard, Gilles Stoltz. Explore First, Exploit Next: The True Shape of Regret in Bandit Problems. Mathematics of Operations Research, 2019, 44 (2), pp.377-399. ⟨10.1287/moor.2017.0928⟩. ⟨hal-01276324v3⟩
794 Consultations
1141 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More