Title | : | Causal Contextual Bandits |
Speaker | : | Chandrasekar Subramanian (IITM) |
Details | : | Mon, 15 Jul, 2024 9:30 AM @ By Google Meet |
Abstract: | : | Contextual bandits have proved to be a powerful framework for modeling real world problems and have been used in a wide range of applications such as recommendation systems, marketing campaign allocation, software product experimentation, personalized medical treatments, etc. Contextual bandit agents aim to learn a near-optimal mapping from a context space to an action space from experience collected either interactively or from logs.
One of the key issues in the wide application of contextual bandits (and reinforcement learning, in general) is their need for a large number of samples that are costly to actualize in practice. For example, each arm might correspond to performing a product experiment on users. As a result, there has been increasing interest in developing more specialized frameworks where additional structure can be exploited to learn good policies faster.
In many applications, the learning agent has the ability to perform interventions on targeted subsets of the population. Further, the agent might also have access to some qualitative causal side information, often drawn from domain knowledge. However, none of the existing contextual bandit frameworks have captured these real world intricacies.
In this thesis, we present the first set of formalisms, which we call ‘causal contextual bandits’, that capture these real world nuances. We provide two different formulations that cater to different sets of applications: a purely interactive setting and a one-shot setting. We provide new contextual bandit algorithms that utilize novel entropy-like measures that we introduce. We show theoretical results on the performance of our algorithms, including regret bounds. We also provide extensive experimental results demonstrating the improved performance of our algorithms compared to baselines. Further, we prove that our algorithms achieve counterfactual fairness; in addition, we also provide a way to achieve demographic parity. Finally, we present the first detailed survey of causality in bandits, and also highlight possible future directions of research. Web Conference Link : https://meet.google.com/aav-xbwz-kbq |