Title | : | Efficient Adaptation of Language Models: A Comprehensive Analysis |
Speaker | : | Nandini Mundra (IITM) |
Details | : | Thu, 6 Jun, 2024 11:00 AM @ SSB-233 |
Abstract: | : | Given the computational and data-intensive nature of training language models from scratch, there is a pressing need for methods to efficiently adapt pre-trained models to specific tasks or languages. Task adaptation is commonly addressed through Full Fine-Tuning (FFT), Multi-Task Learning (MTL), and Adapter Tuning (AT), while language adaptation often involves vocabulary expansion and continued pretraining. In this seminar, we will discuss efficient methods for task and language adaptation of language models. The first part of this work focused on task adaptation. We explored the efficiency of FFT, AT, and MTL. We show that for Natural Language Understanding (NLU) tasks, the parameter efficiency of adapters does not translate to overall efficiency gains compared to FFT. Adapters are more expensive to train and have higher deployment latency. Additionally, the maintainability and extensibility benefits of adapters can be achieved with simpler approaches like MTL, which also offer faster training times. Therefore, for moderately sized models in NLU tasks, we recommend practitioners rely on FFT or MTL instead of using adapters. The second part of this work focused on language adaptation. We explored the best initialization methods for extended embeddings after vocabulary expansion. We proposed constrained Word2Vec, which initializes extended embeddings as a weighted average of pre-expansion embeddings, with weights learned from the Word2Vec architecture. We also explored five other baselines. Our findings suggest that simpler, non-resource-intensive methods like multivariate and mean initialization perform as well as the sophisticated approaches, indicating simpler baselines may be sufficient for effective vocabulary expansion in language models. |