Title | : | Large Pre-trained self-supervised models for automatic speech processing |
Speaker | : | Dr. Srikanth Madikeri (IDIAP Research Institute Martigny, Switzerland) |
Details | : | Wed, 17 Jan, 2024 12:00 PM @ CS-25 |
Abstract: | : | In this talk, I will present our work on the application of large pre-trained self-supervised models for different speech processing tasks: low-resource automatic speech recognition, spoken language understanding, and language identification. The success of wav2vec 2.0 style self-training paved the way to rapid training of automatic speech recognition systems, which was later extended to other speech processing tasks such as speaker recognition and language recognition. Our work combines the success of hybrid ASR (so called HMM/DNN approaches) with pre-trained audio encoders to leverage the best of both systems: from using Lattice Free-Maximum Mutual Information as cost function for acoustic model fine-tuning to adapters for parameter-efficient training. With effective ASR training methods, the current focus of research and development on spoken document processing has shifted towards downstream tasks such as intent detection, slot filling, information retrieval and dialog structure discovery. In our work, we compare different approaches to combine multiple hypotheses from ASR, as opposed to only one-best, to Natural Language Processing models such as HERMIT and BERT. |