Title | : | Transcription correction using signal processing for continuous speech recognition |
Speaker | : | R Golda Brunet (IITM) |
Details | : | Tue, 27 Mar, 2018 2:00 PM @ A M Turing Hall |
Abstract: | : | Building acoustic models for any speech processing application needs
accurately labeled transcripts preferably at phoneme level. Generating
precise phonetic labels manually is challenging because of the short
duration of phones. Hence phonetic labels usually are generated
automatically using a lexicon. This method of producing phonetic
labels may not agree with the utterance because continuous speech
often undergoes phone insertion/deletion/substitution. It is therefore
necessary to correct the phonetic transcription before they are used
in any speech processing applications.
It is well established in the literature that vowels have longer duration compared to consonants. Hence any modification to vowels during articulation or transcription can hurt the performance of speech processing applications. The acoustic cues of poorly articulated or deleted vowels are inferred using group delay processing. This is then augmented with a multi-layer perceptron based silence-vowel-consonant (SVC) classifier and Viterbi forced-alignment. The vowels that are present in the transcription and absent in the utterance as evidenced by group delay processing and SVC are removed from the transcription. The corrected transcripts are used in training a speech recognition system on the TIMIT corpus. An improvement of 2.8% phone error rate (PER) over the baseline is observed. |