Title | : | Video Captioning using Object Trajectory Features |
Speaker | : | Pawandeep Singh (IITM) |
Details | : | Tue, 23 Jul, 2019 3:00 PM @ AM Turing Hall |
Abstract: | : | In this paper, we address the problem of video captioning that translates a video into a short natural language description. Most of the existing methods use an ensemble of frame level features and obtain a compact video representation before feeding it to an RNN that outputs the caption. We propose an encoder-decoder based model, that along with frame level features also utilizes the object-level information to capture the per-object motion information in the video. The main idea in our method is to introduce the usage of object-trajectories based features in the encoder. We evaluate our model on the MSVD and MSR-VTT datasets and establish a new state-of-the-art result in various caption evaluation metrics, such as ROUGE-L, METEOR, and CIDEr. |