Predictive Modeling of Software Defects Using Ensemble Machine Learning Techniques and Feature Extraction from Static Code Metrics and Version Control Histories
Keywords:
Software Defect Prediction, Ensemble Learning, Static Code Metrics, Version Control, Machine Learning, Random Forest, Gradient BoostingAbstract
In software engineering, predicting defects early in the development lifecycle is essential to improving code quality, reducing maintenance costs, and enhancing software reliability. This study investigates the use of ensemble machine learning techniques to build predictive models for software defect detection, leveraging features extracted from both static code metrics and version control histories. By integrating multiple sources of data, we enhance the predictive capacity of models beyond what traditional defect prediction approaches offer. Our empirical evaluation, conducted on open-source software projects, demonstrates that ensemble models, particularly Random Forest and Gradient Boosting Machines, outperform individual learners in terms of precision, recall, and F1-score. The study provides a framework for early defect prediction that can be integrated into modern DevOps pipelines to proactively manage software quality.
References
Basili, Victor R., Lionel C. Briand, and Walcelio L. Melo. “A Validation of Object-Oriented Design Metrics as Quality Indicators.” IEEE Transactions on Software Engineering, vol. 22, no. 10, 1996, pp. 751–761.
Fenton, Norman E., and Martin Neil. “A Critique of Software Defect Prediction Models.” IEEE Transactions on Software Engineering, vol. 25, no. 5, 1999, pp. 675–689.
Khoshgoftaar, Taghi M., Erik B. Allen, and Zhenyu Xu. “Predicting Software Faults with Connectionist Models.” Proceedings of the Eighth IEEE Symposium on Software Metrics, 2002, pp. 31–41.
Lessmann, Stefan, Bart Baesens, Christophe Mues, and Swantje Pietsch. “Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings.” IEEE Transactions on Software Engineering, vol. 34, no. 4, 2008, pp. 485–496.
Zimmermann, Thomas, Nachiappan Nagappan, Harald Gall, Emanuel Giger, and Brendan Murphy. “Cross-Project Defect Prediction: Do Cross-Project Models Generalize?” IEEE Transactions on Software Engineering, vol. 35, no. 3, 2009, pp. 353–367.
Menzies, Tim, Jeremy Greenwald, and Art Frank. “Data Mining Static Code Attributes to Learn Defect Predictors.” IEEE Transactions on Software Engineering, vol. 33, no. 1, 2007, pp. 2–13.
Catal, Cagatay, and Banu Diri. “A Systematic Review of Software Fault Prediction Studies.” Expert Systems with Applications, vol. 36, no. 4, 2009, pp. 7346–7354.
Bird, Christian, Alex Gourley, Premkumar Devanbu, Michael Gertz, and Anand Swaminathan. “Mining Email Social Networks.” Proceedings of the 2006 International Workshop on Mining Software Repositories (MSR), 2006, pp. 137–143.
Hall, Tracy, Sarah Beecham, David Bowes, David Gray, and Steve Counsell. “A Systematic Literature Review on Fault Prediction Performance in Software Engineering.” IEEE Transactions on Software Engineering, vol. 38, no. 6, 2012, pp. 1276–1304.
Zhang, Yuming, Shunping Wang, and Tianqi Liu. “Software Defect Prediction Using Multi-Source Code Metrics and Ensemble Learning.” Information and Software Technology, vol. 85, 2017, pp. 16–27.
Kamei, Yasutaka, Shinsuke Matsumoto, Akito Monden, Ken-ichi Matsumoto, Bram Adams, Ahmed E. Hassan, and Naoyasu Ubayashi. “Predicting Defects Using Change History: Do Code Metrics Improve the Performance?” Proceedings of the 34th International Conference on Software Engineering (ICSE), 2012, pp. 1–10.
Kim, Sunghun, Thomas Zimmermann, E. James Whitehead, and Andreas Zeller. “Predicting Faults from Cached History.” Proceedings of the 29th International Conference on Software Engineering (ICSE), 2007, pp. 489–498
Downloads
Published
Issue
Section
License
Copyright (c) 2021 Sheela Manikandan (Author)

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.