Title | : | Deep Learning for Speech Recognition and Keyword Search |
Speaker | : | Bhuvana Ramachandran (IBM Research) |
Details | : | Thu, 21 Dec, 2017 10:00 AM @ A. M. Turing Hall |
Abstract: | : | Deep Learning has made a huge impact on performance of state-of-the-art large vocabulary speech recognition systems over the last few years. Keyword search, localizing an orthographic query in a speech corpus, is typically performed through analysis of automatic speech recognition (ASR). The recently concluded IARPA funded Babel program focusses on the rapid development of speech recognition capability for keyword search in a previously unstudied language, working with speech recorded in a variety of conditions with limited amounts of transcription.
In this talk, I will focus on the impact of several ideas in deep learning on the Babel task from a speech recognition and key word search perspective. I will begin with the fundamentals, recent advances and trends in the variants of deep neural networks for acoustic and language modeling in speech recognition. Next, I will address the derivation of multilingual (ML) representations from several languages, different deep neural network architectures such as, LSTMs and VGG-net inspired convolution networks, and offer insights on the impact of these diverse ML representations on speech recognition performance. A comparison of these networks as feature extractors (embeddings) and as the model itself will be presented. I will also present a viable alternative to current frameworks, 'the end-to-end system' for keyword search. Lastly, given the non-convex nature of the loss function used in training neural networks, their performance depends very much on the starting point, as well as other factors, such as the type of input features, batch randomization and the type of non-linearity. I will address these issues and their impact on speech recognition and keyword search performance. Biography of Speaker : Bhuvana Ramabhadran (IEEE Fellow, ISCA Fellow) is a Distinguished Research Staff Member and Manager at the IBM TJ Watson Research. She leads a team of researchers in the Speech Technologies Research Group and co-ordinates research activities across IBM's world-wide research labs in the U.S., China, Tokyo, Prague and Haifa in the areas of speech recognition, synthesis, spoken term detection and machine learning. She serves as an adjunct professor at Columbia University, where she co-teaches a course on automatic speech recognition. She served as the elected chair of the Speech and Language Technical Committee (SLTC) of the IEEE Signal Processing Society (2015-2016). She is an IEEE Fellow and was named a Master Inventor twice by IBM. She has served on the editorial board of T-ASLP (2012-2016), technical area chair for ICASSP (2011-2017), Interspeech (2012, 2014-2016), and was one of the lead organizers and technical chair of IEEE ASRU 2011. She delivered a tutorial presentation at Interspeech 2016 and a keynote at IberiSpeech 2016. She has published over 150 papers and been granted over 40 U.S. patents. Her research interests include speech recognition and synthesis algorithms, statistical modeling, signal processing, pattern recognition and machine learning. |