HAI Weekly Seminar with Mohsen Bayati
The Unreasonable Effectiveness of Greedy Algorithms in Multi-Armed Bandits
The stochastic multi-armed bandit (MAB) is a benchmark model for decision-making under uncertainty. In the classical MAB setting, a decision maker sequentially chooses between a set of alternatives ("arms"), and earns a reward upon each choice. The decision maker's goal is to ensure these rewards are as high as possible over their decision horizon. MABs are used in a wide range of applications, from Internet advertising to healthcare.
It is well known that high performing MAB algorithms must balance "exploration", i.e., learning about relatively unknown arms, against "exploitation", i.e., leveraging arms that have already been seen to perform reasonably well. Unfortunately, due to practical constraints, fairness requirements, and ethical considerations, actively exploring may not be possible in some domains. For example, in health care, "exploration" may involve using an untested treatment on a prospective patient, but ethical considerations may preclude such use without appropriate safeguards.
Surprisingly, a body of recent research has suggested that in many practical regimes of interest, algorithms for MAB problems that focus solely on exploitation (i.e., choosing the empirical best arm) -- known as "greedy" algorithms -- in fact can perform quite well, due to exploration that happens for "free" during the run of the algorithm. In this talk we describe this phenomenon; highlight its specific emergence in particular in MAB problems with large numbers of arms, as well as in a range of other settings; and suggest directions for future investigation.
Joint work with Nima Hamidi, Ramesh Johari, and Khashayar Khosravi.