Title | : | Towards Developing Robust Self-Supervised Speech Representations |
Speaker | : | Lodagala V S V Durga Prasad (IITM) |
Details | : | Tue, 30 May, 2023 3:30 PM @ MR - I (SSB 233) |
Abstract: | : | While Self-Supervised Learning (SSL) has helped reap the benefit of scale from the available unlabeled data, the learning paradigms are continuously being bettered. One of the most popular self-supervised learning objectives is the contrastive loss which aims to distinguish a target sample (positive sample) from a set of distractor samples (negative samples) given an anchor representation. Contrastive Predictive Coding (CPC) and wav2vec 2.0 are two of the most popular techniques that have used contrastive loss as the self-supervised learning objective. Surprisingly, the choice of negative samples in Contrastive Learning for Spoken Language Processing (SLP) has drawn much less attention in the literature. Also, the generalizability of representations from such models across domains and downstream speech tasks needs improvement. We introduce two new Self-Supervised Learning paradigms for speech representation learning namely, ccc-wav2vec 2.0 and data2vec-aqc. ccc-wav2vec 2.0 uses clustering and an augmentation based cross-contrastive loss as its self-supervised objective. Through the clustering module we scale down the influence of those negative examples that are highly similar to the positive. The Cross-Contrastive loss is computed between the encoder output of the original sample and the quantizer output of its augmentation, and vice-versa, bringing robustness to the pre-training strategy. Building on the recently introduced data2vec, we introduce additional modules to the data2vec framework that leverage the benefit of data augmentations, quantized representations and clustering. The interaction between these modules helps solve the cross-contrastive loss as an additional self-supervised objective. We evaluate our models from the two proposed approaches over the several downstream speech tasks ov er SUPERB and notice significant improvements over the baseline wav2vec 2.0 and data2vec models. |