Title | : | Concept-based Classification of Software Defect Reports |
Speaker | : | Sangameshwar Suryakant Patil (IITM) |
Details | : | Mon, 12 Feb, 2018 2:00 PM @ A M Turing Hall |
Abstract: | : | The textual description in a software defect report is very important for understanding of the defect and its subsequent classification as per a given defect classification scheme. Automatic identification of the defect type from the textual defect description can significantly improve the defect analysis time and the overall defect management process. Use of standard supervised machine-learning and its variants for software defect type classification needs a significant amount of labeled training data to build a predictive model. This labeled dataset is typically created by humans with domain knowledge and expertise. This is clearly an effort-intensive as well as expensive activity. We propose to use Explicit Semantic Analysis (ESA) to carry out concept-based classification of software defect reports. We compute the 'semantic' similarity between the defect type labels and the defect report in a concept space spanned by Wikipedia articles and then assign the defect type which has the highest similarity with the defect report. This approach helps us to circumvent the problem of dependence on labeled training data. We further propose modifications to the original ESA approach and achieve accuracy comparable to the state-of-the-art semi-supervised and active learning based approach. Experimental results show that using concept-based classification is a viable approach for software defect classification to avoid the expensive process of creating labeled training data. To the best of our knowledge, this is the first use of Wikipedia and ESA for software defect classification problem. |