Title | : | Learning Dynamics and Interpretability of Attention Models |
Speaker | : | Rahul Vashisht (IITM) |
Details | : | Mon, 20 Feb, 2023 3:30 PM @ SSB 233 |
Abstract: | : | A subclass of deep neural networks known as attention networks has gained much interest recently in tasks such as image captioning, video summarization, and machine translation. Transformer models based on self-attention have shown great promise for a large number of tasks in natural language processing. Yet our understanding of the working and behavior of these models is limited. In this work, we study the various aspects of attention models. Attention models are based on one key idea: ‘’the output depends only on a small (but unknown) segment of the input’’. we analyze and demonstrate the differences in learning dynamics of three variants of attention models. Based on the different learning dynamics, we suggest a hybrid strategy that utilizes soft attention initially and then hard attention later. The strategy is nearly as effective as the latent v ariable model but does not incur the same costs in terms of compute resources. We also, quantify and show that the learned attention weights are indeed better than random attention weights, using an image-captioning model with attention. |