Finite Continuum-Armed Bandits - INRIA - Institut National de Recherche en Informatique et en Automatique Accéder directement au contenu
Communication Dans Un Congrès Année : 2020

Finite Continuum-Armed Bandits

Résumé

We consider a situation where an agent has $T$ ressources to be allocated to a larger number $N$ of actions. Each action can be completed at most once and results in a stochastic reward with unknown mean. The goal of the agent is to maximize her cumulative reward. Non trivial strategies are possible when side information on the actions is available, for example in the form of covariates. Focusing on a nonparametric setting, where the mean reward is an unknown function of a one-dimensional covariate, we propose an optimal strategy for this problem. Under natural assumptions on the reward function, we prove that the optimal regret scales as $O(T^{1/3})$ up to poly-logarithmic factors when the budget $T$ is proportional to the number of actions $N$. When $T$ becomes small compared to $N$, a smooth transition occurs. When the ratio $T/N$ decreases from a constant to $N^{-1/3}$, the regret increases progressively up to the $O(T^{1/2})$ rate encountered in continuum-armed bandits.
Fichier principal
Vignette du fichier
main.pdf (465.11 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-02975304 , version 1 (22-10-2020)
hal-02975304 , version 2 (02-11-2020)

Identifiants

Citer

Solenne Gaucher. Finite Continuum-Armed Bandits. Advances in Neural Information Processing Systems 33 (NeurIPS 2020), 2020, Online, Canada. pp.3186--3196. ⟨hal-02975304v2⟩
55 Consultations
246 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More