Title | : | Deep Learning Approaches to Image Paragraph Captioning and Image Captioning for Low Resource Languages |
Speaker | : | Vakada Naveen (IITM) |
Details | : | Thu, 8 Jun, 2023 10:30 AM @ MR - I (SSB 233) |
Abstract: | : | Image captioning is a task that aims to generate textual descriptions for images. There are many challenges in image captioning. The focus of our work is on image paragraph captioning, and image captioning for low-resource languages. The main challenge in image paragraph captioning is the generation of long, coherent, diverse, and descriptive captions. The proposed visual paragraph generation approach eliminates the need for Regions of Interest (RoI) supervision and uses a transformer-based encoder-decoder model to generate coherent and high-quality paragraphs. A post-processing step enhances the semantic relevance of generated paragraphs by incorporating image-text similarity scores and related-classes similarity scores. The proposed approach demonstrates an improved coherence of generated paragraphs and a higher Flesch reading ease score. The main challenge in generating captions for low-resource languages is the scarce training data. We explore the methods to improve the image captioning performance for two low-resource languages, Hindi and German. We propose to use the English caption as an additional input to the image captioning system to generate the caption in a low-resource language. The output of the image encoder is used as the initial hidden state of an LSTM-based decoder. The output of the encoder that encodes the English caption is used as the additional input to the decoder through the attention mechanism. The effectiveness of the proposed methods is demonstrated through the results of experimental studies carried out. |