The price of unfairness in linear bandits with biased feedback - INRIA - Institut National de Recherche en Informatique et en Automatique Accéder directement au contenu
Pré-Publication, Document De Travail Année : 2022

The price of unfairness in linear bandits with biased feedback

Résumé

Artificial intelligence is increasingly used in a wide range of decision making scenarios with higher and higher stakes. At the same time, recent work has highlighted that these algorithms can be dangerously biased, and that their results often need to be corrected to avoid leading to unfair decisions. In this paper, we study the problem of sequential decision making with biased linear bandit feedback. At each round, a player selects an action described by a covariate and by a sensitive attribute. She receives a reward corresponding to the covariates of the action that she has chosen, but only observe a biased evaluation of this reward, where the bias depends on the sensitive attribute. To tackle this problem, we design a Fair Phased Elimination algorithm. We establish an upper bound on its worst-case regret, showing that it is smaller than Cκ 1/3 * log(T) 1/3 T 2/3 , where C is a numerical constant and κ * an explicit geometrical constant characterizing the difficulty of bias estimation. The worst case regret is higher than the dT 1/2 log(T) regret rate obtained under unbiased feedback. We show that this rate cannot be improved for all instances : we obtain lower bounds on the worst-case regret for some sets of actions showing that this rate is tight up to a sub-logarithmic factor. We also obtain gap-dependent upper bounds on the regret, and establish matching lower bounds for some problem instance. Interestingly, the gap-dependent rates reveal the existence of non-trivial instances where the problem is no more difficult than its unbiased counterpart.
Fichier principal
Vignette du fichier
main.pdf (718.52 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03611628 , version 1 (17-03-2022)
hal-03611628 , version 2 (01-06-2022)
hal-03611628 , version 3 (29-11-2022)

Identifiants

Citer

Solenne Gaucher, Alexandra Carpentier, Christophe Giraud. The price of unfairness in linear bandits with biased feedback. 2022. ⟨hal-03611628v1⟩
217 Consultations
106 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More