Title | : | Bootstrapping the learning process for computer audition |
Speaker | : | Prem Seetharaman (Northwestern University) |
Details | : | Wed, 18 Dec, 2019 4:10 PM @ AM Turing Hall |
Abstract: | : | Computer audition is the study of how machines can organize, parse, and understand the sounds we hear every day. A fundamental problem in computer audition is audio source separation. Source separation is the isolation of a specific sound (e.g. a single speaker) in a complex audio scene, like a cocktail party. Humans, as evidenced by our daily experience with sounds as well as empirical studies manage the source separation task very effectively, attending to sources of interest in complex scenes. In this talk, I present computational methods for audio source separation that are inspired by the abilities of humans. Deep learning approaches are currently state-of-the-art for source separation tasks. They are typically trained on many examples (e.g. tens of thousands) where each source (e.g. a voice) was recorded in isolation. The sources are then artificially mixed together to create training data for a deep learning model. This artificial training set is at odds with how humans learn to separate sounds. We are never given sounds in isolation but rather hear them always in the context of other sounds. Further, while we can train models to separate sounds for which we have sufficient isolated source data (e.g. speech), we cannot for the many sounds where do not have isolated recordings. However, we do have vast datasets of complex audio mixtures (e.g. recordings YouTube, Spotify). How do we learn computer audition models directly from these mixtures, rather than from artificial training data? In this talk, I present work on building self-supervised machine learning models that learn to perform audio source separation directly from audio mixtures. These models are bootstrapped from separation algorithms that are inspired by the primitive grouping mechanisms used in human audition. Bio: Prem is a teaching postdoctoral scholar at Northwestern University in Evanston, Illinois, USA. He received his PhD in 2019, advised by Bryan Pardo from Northwestern University. Prior to his PhD, he studied computer science and music composition at Northwestern. He has worked with Adobe Research, Mitsubishi Electric Research Labs, and Gracenote on problems in computer audition. The objective of his research is to create machines that can understand the auditory world. He works at the intersection between computer audition, machine learning, and human-computer interaction. |