Title | : | Feature switching: A new paradigm for speaker recognition and its effect on spoofing attacks |
Speaker | : | Saranya M S (IITM) |
Details | : | Wed, 18 Apr, 2018 5:00 PM @ A M Turing Hall |
Abstract: | : | Speech technologies have become ubiquitous today. This can be seen from the emergence of various intelligent personal assistant (IPA) devices in markets like Apple’s Siri, Microsoft’s Cortana and Google personal assistant. While communicating with humans, it is essential that the IPA knows the speaker’s identity. Recognizing a person’s identity using a machine is also crucial for applications such as e-commerce, tele-applications, financial transactions, forensics and law enforcement which use speech interface for input and output. Humans can easily recognize the identity of a voice when they communicate over the telephone. But it is difficult for machines to identify the speaker due to various factors like variation in channel, environment, session, the presence of noise, etc. The variations can be addressed either in the feature extraction stage or during the modeling stage. The first part of the work proposes a feature selection technique called “feature switching†which aims to select an appropriate feature representation for every speaker in an automatic speaker verification system (ASV). Out of a possible set of candidate features, the most optimal feature for a speaker is determined during enrollment. Then verification is performed using the optimal feature of the claimed speaker. Our results show that feature switching achieves improved performance compared to conventional as well as fusion-based systems. Spoofing poses a real threat for commercial usage of voice as a biometric. The replay attack is a type of spoofing attack, where a pre-recorded utterance of an authenticated speaker is used to gain illegitimate access to an ASV system. The second part of the work is to address the replay detection task using the concept of feature switching. A variant of feature switching, termed as Decision Level Feature Switching (DLFS) is used for this task. The proposed DLFS approach with a set of hand-crafted features surpasses the state-of-the-art replay detection system. In the final part of this research work, we revisit the Universal Background Model-Gaussian Mixture Model (UBM-GMM) based ASV system with a novel scoring approach. Owing to the mismatch between train and test conditions, it is possible that the number of maximum contributing components (top-C) need not be constant even for trials across different sessions of the same speaker. To enable variable top-C scoring, a novel trial-specific scoring approach is proposed. The effectiveness of the proposed trial-specific top-C is tested on three different applications namely, replay attack detection, speaker identification, and speaker verification. In all the three applications, the systems with trial-specific top-C scoring outperform the fixed top-C scoring systems. |