Title | : | REX3: An algorithm for the Adversarial Dueling Bandit problem |
Speaker | : | Pratik Gajane (Orange labs/INRIA Lille) |
Details | : | Thu, 17 Dec, 2015 11:00 AM @ BSB 361 |
Abstract: | : | The dueling bandit is a variation of the classical Multi-Armed Bandit(MAB) problem in which the learner, at each time period, pulls two arms instead of one. The learner only sees the outcome of the duel between the selected arms and receives the average of the rewards of the selected arms.The goal of the learner is to maximize her cumulative reward which is equivalent to minimizing the cumulative regret. We propose an algorithm called "Relative Exponential-weight algorithm for Exploration and Exploitation" (REX3) for this problem. REX3 has the regret bound of sqrt{K lnK T} where K is the number of arms and T is the time horizon. In this talk, we will see the algorithm, it's brief analysis and the relevant experiments. This work appeared in ICML 2015. |