Profitable Bandits

Mastane Achab; Stéphan Clémençon; Aurélien Garivier

Article Dans Une Revue Proceedings of Machine Learning Research Année : 2018

Profitable Bandits

(1, 2) , (1, 2) , (3, 4)

1
2
3
4

Mastane Achab

Fonction : Auteur
PersonId : 1164814
IdHAL : mastane-achab
ORCID : 0000-0002-4202-6802

Signal, Statistique et Apprentissage

Département Images, Données, Signal

Stéphan Clémençon

Fonction : Auteur
PersonId : 174491
IdHAL : stephan-clemencon
ORCID : 0000-0002-5879-9500
IdRef : 08905203X

Signal, Statistique et Apprentissage

Département Images, Données, Signal

Aurélien Garivier

Fonction : Auteur
PersonId : 4986
IdHAL : aurelien-garivier
ORCID : 0000-0002-4906-9573
IdRef : 111902495

Unité de Mathématiques Pures et Appliquées

Modèles de calcul, Complexité, Combinatoire

Résumé

Originally motivated by default risk management applications, this paper investigates a novel problem, referred to as the profitable bandit problem here. At each step, an agent chooses a subset of the K ≥ 1 possible actions. For each action chosen, she then respectively pays and receives the sum of a random number of costs and rewards. Her objective is to maximize her cumulated profit. We adapt and study three well-known strategies in this purpose, that were proved to be most efficient in other settings: kl-UCB, Bayes-UCB and Thompson Sampling. For each of them, we prove a finite time regret bound which, together with a lower bound we obtain as well, establishes asymptotic optimality in some cases. Our goal is also to compare these three strategies from a theoretical and empirical perspective both at the same time. We give simple, self-contained proofs that emphasize their similarities, as well as their differences. While both Bayesian strategies are automatically adapted to the geometry of information, the numerical experiments carried out show a slight advantage for Thompson Sampling in practice.

Mots clés

credit risk multi-armed bandits thresholding bandits bayesian policy

Domaines

Statistiques [math.ST]

Fichier principal

main_acml.pdf (355.12 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Stephan Clémençon : Connectez-vous pour contacter le contributeur

https://hal.science/hal-02023057

Soumis le : lundi 18 février 2019-13:24:54

Dernière modification le : jeudi 14 mars 2024-03:16:26

Dates et versions

hal-02023057 , version 1 (18-02-2019)

Identifiants

HAL Id : hal-02023057 , version 1
ARXIV : 1805.02908

Citer

Mastane Achab, Stéphan Clémençon, Aurélien Garivier. Profitable Bandits. Proceedings of Machine Learning Research, 2018, 95, pp.694-709. ⟨hal-02023057⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

ENS-LYON INSTITUT-TELECOM CNRS INRIA UNIV-LYON1 PARISTECH LTCI IDS S2A UDL

80 Consultations

31 Téléchargements

Profitable Bandits

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager