Explore First, Exploit Next: The True Shape of Regret in Bandit Problems

Aurélien Garivier; Pierre Ménard; Gilles Stoltz

doi:10.1287/moor.2017.0928

Article Dans Une Revue Mathematics of Operations Research Année : 2019

Explore First, Exploit Next: The True Shape of Regret in Bandit Problems

(1) , (1) , (2)

1
2

Aurélien Garivier

Fonction : Auteur
PersonId : 4986
IdHAL : aurelien-garivier
ORCID : 0000-0002-4906-9573
IdRef : 111902495

Institut de Mathématiques de Toulouse UMR5219

Pierre Ménard

Fonction : Auteur
PersonId : 1022182

Institut de Mathématiques de Toulouse UMR5219

Gilles Stoltz

Fonction : Auteur
PersonId : 738739
IdHAL : gilles-stoltz
ORCID : 0000-0003-1240-1007
IdRef : 091575419

Groupement de Recherche et d'Etudes en Gestion à HEC

Résumé

We revisit lower bounds on the regret in the case of multi-armed bandit problems. We obtain non-asymptotic, distribution-dependent bounds and provide straightforward proofs based only on well-known properties of Kullback-Leibler divergences. These bounds show in particular that in an initial phase the regret grows almost linearly, and that the well-known logarithmic growth of the regret only holds in a final phase. The proof techniques come to the essence of the information-theoretic arguments used and they are deprived of all unnecessary complications.

Mots clés

cumulative regret non-asymptotic lower bounds information-theoretic proof techniques multi-armed bandits

Domaines

Statistiques [math.ST] Apprentissage [cs.LG]

Fichier principal

Bandit-lower-bounds-MOR-v3.pdf (732.16 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Gilles Stoltz : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01276324

Soumis le : lundi 8 octobre 2018-22:02:31

Dernière modification le : jeudi 18 avril 2024-16:49:24

Archivage à long terme le : mercredi 9 janvier 2019-16:01:18

Dates et versions

hal-01276324 , version 1 (19-02-2016)

hal-01276324 , version 2 (16-06-2016)

hal-01276324 , version 3 (08-10-2018)

Identifiants

HAL Id : hal-01276324 , version 3
ARXIV : 1602.07182
DOI : 10.1287/moor.2017.0928

Citer

Aurélien Garivier, Pierre Ménard, Gilles Stoltz. Explore First, Exploit Next: The True Shape of Regret in Bandit Problems. Mathematics of Operations Research, 2019, 44 (2), pp.377-399. ⟨10.1287/moor.2017.0928⟩. ⟨hal-01276324v3⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

HEC UNIV-TLSE2 CNRS INSA-TOULOUSE IMT UT1-CAPITOLE UNIV-PARIS-SACLAY INSA-GROUPE INSA-TOULOUSE-GEI ANR CIMI-TOULOUSE UNIV-UT3 UT3-TOULOUSEINP

794 Consultations

1141 Téléchargements

Explore First, Exploit Next: The True Shape of Regret in Bandit Problems

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager