CS6046 - Multi-armed bandits

Course Data :

Course Objective

The objective of this course is to cover basic topics in multi-armed bandits, in particular emphasizing the popular exploration-exploitation dilemma as well as the best arm identification (or pure exploration) paradigms. Bulk of the course will be geared towards familiarizing the student with the rigorous mathematical foundations of a variety of popular bandit models in the finite armed setting, while the later part of the course will cover advanced topics such as linear bandit models, where the underlying arms are in a large (possibly infinite) set.

Course Contents:

Part 0: Recap of concentration inequalities (Lectures 1-3)

Part I: Exploration-exploitation dilemma in stochastic K-armed bandits (Lectures 4 – 15)
  • Explore-then-commit and ε-greedy strategies
  • Upper Confidence Bound (UCB) algorithm
  • Thompson sampling (TS) algorithm
  • Regret upper bounds for UCB and TS
  • Lower bounds: a) Information theoretic bounds on minimax error, b) Instance-dependent regret lower bounds
Part II: Pure exploration in K-armed bandits (Lectures 16-18)
  • Uniform sampling
  • Successive rejects algorithm
  • Upper bound on probability of error
  • Lower bound on probability of error
Part III: Adversarial bandits (Lectures 19-24)
  • EXP3, EXP3-IX, EXP4 algorithms
  • Upper bounds on regret of EXP3 and its variants
  • Regret lower bounds
Part IV: Advanced topics
  • Stochastic linear bandits (Lectures 25-30): Ellipsoidal confidence sets, UCB, Thompson sampling and lower bound.
  • Adversarial linear bandits (Lectures 31-39): Exp2 with John’s exploration, online mirror descent, lower bound.

Text Book:

Sébastien Bubeck and Nicolo Cesa-Bianchi. Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Foundations and Trends in Machine Learning, 2012.


  • Csaba Szepesvári and Tor Lattimore. http://banditalgs.com/. Blog posts on bandit theory1.
  • Nicolo Cesa-Bianchi and Gábor Lugosi. Prediction, learning, and games. Cambridge university press, 2006.

Learning outcomes:

  • Develop a rigorous understanding of mathematical foundations of classic bandit algorithms.
  • Implement bandit algorithms and correlate the empirical findings with the bandit theory learned from the course.
  • Develop familiarity with state-of-the-art bandit models and/or contribute original research through project work.




Credits Type Date of Introduction
3-1-0-0-0-8-12 Elective Aug 2017

Previous Instances of the Course

© 2016 - All Rights Reserved - Dept of CSE, IIT Madras
Website Credits