Title | : | Continuous Tactical Optimism and Pessimism |
Speaker | : | Kartik (IITM) |
Details | : | Tue, 12 Sep, 2023 12:00 PM @ MR - I (SSB 233) |
Abstract: | : | In the field of reinforcement learning for continuous control, deep off-policy actor-critic algorithms have become a popular approach due to their ability to address function approximation errors through the use of pessimistic value updates. However, this pessimism can reduce exploration, which is typically seen as beneficial for learning in uncertain environments. Tactical Optimism and Pessimism (TOP) proposed an actor-critic framework that dynamically adjusts the degree of optimism used in value learning based on the task and learning stage. However, their fixed bandit framework acts as a hyper-parameter for each task. We need to consider two hyperparameters: the number of arms and arm values. To simplify this problem, we consider learning the degree of optimism 𛽠while training the agent in the environment. We demonstrate that this approach outperforms other methods that use a fixed level of optimism in a series of continuous control tasks in Walker2d-v2 and HalfCheetah-v2 environments, and can be easily implemented in various off-policy algorithms. We call our algorithm: cTOP or continuous TOP. |