Title | : | A Sliding-Window Approach for RL in MDPs with Arbitrarily Changing Rewards and Transitions |
Speaker | : | Pratik Gajane (University of Leoben) |
Details | : | Fri, 28 Dec, 2018 11:00 AM @ Ada Lovelace Conf Ro |
Abstract: | : | We consider reinforcement learning in changing Markov Decision Processes where both the state-transition probabilities and the reward functions may vary over time. For this problem setting, we propose an algorithm using a sliding window approach and provide performance guarantees for the regret evaluated against the optimal non-stationary policy. We also characterize the optimal window size suitable for our algorithm. These results are complemented by a sample complexity bound on the number of sub-optimal steps taken by the algorithm. Finally, we present some experimental results to support our theoretical analysis. Speaker Bio : Pratik Gajane is a postdoctoral fellow working with Dr. Peter Auer and Dr. Ronald Ortner at Montanuniversität Leoben. He received his Ph.D. in Computer Science from the University of Lille / INRIA (France) in 2017 under the guidance of Dr. Philippe Preux and Dr. Tanguy Urvoy. Previously, he completed his Masters thesis on multi-armed bandits under the guidance of Dr. Balaraman Ravindran at IIT Madras. His research interests include machine learning, reinforcement learning and algorithmic design. Furthermore, he is interested in devising fairness-aware machine learning algorithms. |