Title | : | Generating Natural-Language Video Descriptions using Deep Recurrent Networks |
Speaker | : | Subhashini Venugopalan (Univ. Texas, USA) |
Details | : | Thu, 12 Jan, 2017 11:00 AM @ BSB 361 |
Abstract: | : | For most people, watching a video and describing what happened (in words) is an easy task. For machines, extracting the meaning from video pixels and generating a sentence description is a challenging problem. In this talk, I will present methods that can automatically generate natural language descriptions for events in short videos using deep neural networks. Specifically, I apply convolutional and Long Short-Term Memory (LSTM) recurrent networks to translate videos to English descriptions. I will present an end-to-end deep network that can jointly model a sequence of video frames and a sequence of words. We will then explore how statistical linguistic knowledge mined from large text corpora, particularly LSTM language models and lexical embeddings, can improve the descriptions. I demonstrate the capabilities of these methods by comparing their output to human generated descriptions on a corpus of Youtube videos and two large movie datasets annotated with Descriptive Video Service.
Bio: Subhashini Venugopalan is a PhD candidate at the University of Texas at Austin. Her research focuses on deep learning techniques to generate descriptions for events in videos. She is advised by Prof. Raymond Mooney, and collaborates with Prof. Kate Saenko (Boston University) and Prof. Trevor Darrell (UC Berkeley). Subhashini holds a masters degree in Computer Science from IIT Madras and a bachelor’s degree from NIT Karnataka, Surathkal, India. She was a Blue Scholar with IBM Research, India and has interned with Google Brain and Google Research. |