Parameter Tuning in KNN for Software Defect Prediction: An Empirical Analysis

Modinat Abolore Mabayoje -  Department of Computer Science, University of Ilorin, Nigeria
*Abdullateef Olwagbemiga Balogun -  Department of Computer Science, University of Ilorin, Nigeria
Hajarah Afor Jibril -  Department of Computer Science, University of Ilorin, Nigeria
Jelili Olaniyi Atoyebi -  Department of Computer Science and Engineering, Obafemi Awolowo University, Nigeria
Hammed Adeleye Mojeed -  Department of Computer Science, University of Ilorin, Nigeria
Victor Elijah Adeyemo -  Department of Computer Science, University of Ilorin, Nigeria
Received: 27 Jan 2019; Revised: 31 Jul 2019; Accepted: 10 Aug 2019; Published: 31 Oct 2019; Available online: 3 Oct 2019.
Open Access Copyright (c) 2019 Jurnal Teknologi dan Sistem Komputer
Citation Format:
Article Info
Section: Articles
Language: EN
Statistics: 94

Software Defect Prediction (SDP) provides insights that can help software teams to allocate their limited resources in developing software systems. It predicts likely defective modules and helps avoid pitfalls that are associated with such modules. However, these insights may be inaccurate and unreliable if parameters of SDP models are not taken into consideration. In this study, the effect of parameter tuning on k nearest neighbor (k-NN) in SDP was investigated. More specifically, the impact of varying and selecting optimal k value, the influence of distance weighting and the impact of distance functions on k-NN. An experiment was designed to investigate this problem in SDP over 6 software defect datasets. The experimental results revealed that k value should be greater than 1 (default) as the average RMSE values of k-NN when k>1(0.2727) is less than when k=1(default) (0.3296). In addition, the predictive performance of k-NN with distance weighing improved by 8.82% and 1.7% based on AUC and accuracy respectively. In terms of the distance function, kNN models based on Dilca distance function performed better than the Euclidean distance function (default distance function). Hence, we conclude that parameter tuning has a positive effect on the predictive performance of k-NN in SDP.
Software Defect Prediction; Parameter Tuning; k Nearest Neighbor; Distance Function; Distance weighting

Article Metrics:

  1. M. M. Ali, S. Huda, J. Abawajy, S. Alyahya, H. Al-Dossari, and J. Yearwood, "A parallel framework for software defect detection and metric selection on cloud computing," Cluster Computing, vol. 20, no. 3, pp. 2267-2281, 2017.
  2. H. B. Yadav and D. K. Yadav, "A fuzzy logic based approach for phase-wise software defects prediction using software metrics," Information and Software Technology, vol. 63, pp. 44-57, 2015.
  3. S. Huda et al., "A Framework for Software Defect Prediction and Metric Selection," IEEE access, vol. 6, pp. 2844-2858, 2018.
  4. Z. Li, X.-Y. Jing, and X. Zhu, "Progress on approaches to software defect prediction," IET Software, 2018.
  5. M. Tan, L. Tan, S. Dara, and C. Mayeux, "Online defect prediction for imbalanced data," in Proceedings of the 37th International Conference on Software Engineering-Volume 2, 2015, pp. 99-108: IEEE Press.
  6. C. Tantithamthavorn, S. McIntosh, A. E. Hassan, and K. Matsumoto, "An empirical comparison of model validation techniques for defect prediction models," IEEE Transactions on Software Engineering, vol. 43, no. 1, pp. 1-18, 2017.
  7. X.-Y. Jing, F. Wu, X. Dong, and B. Xu, "An improved SDA based defect prediction framework for both within-project and cross-project class-imbalance problems," IEEE Transactions on Software Engineering, vol. 43, no. 4, pp. 321-339, 2017.
  8. H. Tong, B. Liu, and S. Wang, "Software defect prediction using stacked denoising autoencoders and two-stage ensemble learning," Information and Software Technology, 2017.
  9. Ö. F. Arar and K. Ayan, "Software defect prediction using cost-sensitive neural network," Applied Soft Computing, vol. 33, pp. 263-277, 2015.
  10. F. Zhang, Q. Zheng, Y. Zou, and A. E. Hassan, "Cross-project defect prediction using a connectivity-based unsupervised classifier," in Proceedings of the 38th International Conference on Software Engineering, 2016, pp. 309-320: ACM.
  11. S. Herbold, A. Trautsch, and J. Grabowski, "A comparative study to benchmark cross-project defect prediction approaches," IEEE Transactions on Software Engineering, 2017.
  12. Y. Kamei, T. Fukushima, S. McIntosh, K. Yamashita, N. Ubayashi, and A. E. Hassan, "Studying just-in-time defect prediction using cross-project models," Empirical Software Engineering, vol. 21, no. 5, pp. 2072-2106, 2016.
  13. R. Malhotra, "A systematic review of machine learning techniques for software fault prediction," Applied Soft Computing, vol. 27, pp. 504-518, 2015.
  14. C. Tantithamthavorn, S. McIntosh, A. E. Hassan, and K. Matsumoto, "Automated parameter optimization of classification techniques for defect prediction models," in Software Engineering (ICSE), 2016 IEEE/ACM 38th International Conference on, 2016, pp. 321-332: IEEE.
  15. W. Fu, T. Menzies, and X. Shen, "Tuning for software analytics: Is it really necessary?," Information and Software Technology, vol. 76, pp. 135-146, 2016.
  16. Y. Jiang, B. Cukic, and T. Menzies, "Can data transformation help in the detection of fault-prone modules?," in Proceedings of the 2008 workshop on Defects in large software systems, 2008, pp. 16-20: ACM.
  17. A. Tosun and A. Bener, "Reducing false alarms in software defect prediction by decision threshold optimization," in Proceedings of the 2009 3rd International Symposium on Empirical Software Engineering and Measurement, 2009, pp. 477-480: IEEE Computer Society.
  18. A. G. Koru and H. Liu, "An investigation of the effect of module size on defect prediction using static measures," in ACM SIGSOFT Software Engineering Notes, 2005, vol. 30, no. 4, pp. 1-5: ACM.
  19. T. Mende, "Replication of defect prediction studies: problems, pitfalls and recommendations," in Proceedings of the 6th International Conference on Predictive Models in Software Engineering, 2010, p. 5: ACM.
  20. T. Hall, S. Beecham, D. Bowes, D. Gray, and S. Counsell, "A systematic literature review on fault prediction performance in software engineering," IEEE Transactions on Software Engineering, vol. 38, no. 6, pp. 1276-1304, 2012.
  21. A. G. Akintola, A. O. Balogun, F. Lafenwa-Balogun, and H. A. Mojeed, "Comparative Analysis of Selected Heterogeneous Classifiers for Software Defects Prediction Using Filter-Based Feature Selection Methods," vol. 3, no. 1, pp. 134-137, 2018.
  22. R. Jimoh, A. Balogun, A. Bajeh, and S. Ajayi, "A PROMETHEE based evaluation of software defect predictors," Journal of Computer Science and Its Application, vol. 25, no. 1, pp. 106-119, 2018.
  23. T. G. Grbac, G. Mausa, and B. D. Basic, "Stability of Software Defect Prediction in Relation to Levels of Data Imbalance," in SQAMIA, 2013, pp. 1-10.
  24. Q. Yu, S. Jiang, and Y. Zhang, "The performance stability of defect prediction models with class imbalance: an empirical study," IEICE TRANSACTIONS on Information and Systems, vol. 100, no. 2, pp. 265-272, 2017.
  25. S. Bibi, G. Tsoumakas, I. Stamelos, and I. P. Vlahavas, "Software Defect Prediction Using Regression via Classification," in AICCSA, 2006, pp. 330-336.
  26. P. Singh and S. Verma, "Automated Tool for Extraction of Software Fault Data," in Advances in Data and Information Sciences: Springer, 2018, pp. 29-37.
  27. M. Tan, L. Tan, S. Dara, and C. Mayeux, "Online defect prediction for imbalanced data," in Software Engineering (ICSE), 2015 IEEE/ACM 37th IEEE International Conference on, 2015, vol. 2, pp. 99-108: IEEE.
  28. M. Mabayoje, A. Balogun, A. Bajeh, and B. Musa, "SOFTWARE DEFECT PREDICTION: EFFECT OF FEATURE SELECTION AND ENSEMBLE METHODS," FUW Trends in Science & Technology Journal, vol. 3, no. 2, pp. 518-522, 2018.
  29. G. I. Salama, M. Abdelhalim, and M. A.-e. Zeid, "Breast cancer diagnosis on three different datasets using multi-classifiers," Breast Cancer (WDBC), vol. 32, no. 569, p. 2, 2012.
  30. Y. A. Christobel and P. Sivaprakasam, "A New Classwise k Nearest Neighbor (CKNN) method for the classification of diabetes dataset," International Journal of Engineering and Advanced Technology, vol. 2, no. 3, pp. 396-200, 2013.
  31. Y. Liao and V. R. Vemuri, "Use of k-nearest neighbor classifier for intrusion detection1," Computers & security, vol. 21, no. 5, pp. 439-448, 2002.
  32. T. M. Kodinariya and P. R. Makwana, "Review on determining number of Cluster in K-Means Clustering," International Journal, vol. 1, no. 6, pp. 90-95, 2013.