Title | : | Ameliorating the need for labeled training data for short text analytics using label semantics |
Speaker | : | Sangameshwar Suryakant Patil (IITM) |
Details | : | Mon, 7 Nov, 2016 2:00 PM @ BSB 361 |
Abstract: | : | Limited availability of labeled training data is a major challenge for machine learning. In many real life applications, a labeled training dataset is either not available or not sufficient to use supervised learning techniques. Creating sufficient amount of (expert) labeled training data is typically an expensive or infeasible option. This renders the supervised machine learning methods either not feasible or not viable in practice. For instance, in case of short (and typically noisy) text analytics, the amount of labeled training data available is limited and the supervised learning models trained on news-wire quality text data do not perform well. Traditional supervised learning approach (and its variants such as semi-supervised learning etc.) aim to learn a mapping from feature space to the target label space. Another shortcoming of this approach is that it basically treats labels as symbols without any intrinsic meaning. However, often the target labels have meaning grounded in human cognition and inherent information content. This is especially apparent in case of text data. We propose to use label semantics, i.e., information and meaning inherent in the target labels can be exploited to tackle the challenge of limited labeled training data availability for real-life applications. We show effectiveness of label semantics for different NLP tasks such as short text classification as well as named entity extraction. We validate the proposed approach with applications in multiple domains such as survey analytics, employee appraisal process, agriculture and automobile repair data. |