skip to main content

Parameter tuning in KNN for software defect prediction: an empirical analysis

1Department of Computer Science, University of Ilorin, Nigeria

2Department of Computer Science and Engineering, Obafemi Awolowo University, Nigeria

Received: 27 Jan 2019; Revised: 31 Jul 2019; Accepted: 10 Aug 2019; Available online: 3 Oct 2019; Published: 31 Oct 2019.
Open Access Copyright (c) 2019 Jurnal Teknologi dan Sistem Komputer
Creative Commons License This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Citation Format:
Abstract
Software Defect Prediction (SDP) provides insights that can help software teams to allocate their limited resources in developing software systems. It predicts likely defective modules and helps avoid pitfalls that are associated with such modules. However, these insights may be inaccurate and unreliable if parameters of SDP models are not taken into consideration. In this study, the effect of parameter tuning on the k nearest neighbor (k-NN) in SDP was investigated. More specifically, the impact of varying and selecting optimal k value, the influence of distance weighting and the impact of distance functions on k-NN. An experiment was designed to investigate this problem in SDP over 6 software defect datasets. The experimental results revealed that k value should be greater than 1 (default) as the average RMSE values of k-NN when k>1(0.2727) is less than when k=1(default) (0.3296). In addition, the predictive performance of k-NN with distance weighing improved by 8.82% and 1.7% based on AUC and accuracy respectively. In terms of the distance function, kNN models based on Dilca distance function performed better than the Euclidean distance function (default distance function). Hence, we conclude that parameter tuning has a positive effect on the predictive performance of k-NN in SDP.
Keywords: software defect prediction; parameter tuning; k-nearest neighbor; distance function; distance weighting
Funding: University of Ilorin; Obafemi Awolowo University

Article Metrics:

  1. M. M. Ali, S. Huda, J. Abawajy, S. Alyahya, H. Al-Dossari, and J. Yearwood, "A parallel framework for software defect detection and metric selection on cloud computing," Cluster Computing, vol. 20, no. 3, pp. 2267-2281, 2017. doi: 10.1007/s10586-017-0892-6
  2. H. B. Yadav and D. K. Yadav, "A fuzzy logic based approach for phase-wise software defects prediction using software metrics," Information and Software Technology, vol. 63, pp. 44-57, 2015. doi: 10.1016/j.infsof.2015.03.001
  3. Huda et al., "A framework for software defect prediction and metric selection," IEEE access, vol. 6, pp. 2844-2858, 2018. doi: 10.1109/ACCESS.2017.2785445
  4. Z. Li, X.-Y. Jing and X. Zhu, "Progress on approaches to software defect prediction," IET Software, vol. 12, no. 3, pp. 161-175, 2018. doi: 10.1049/iet-sen.2017.0148
  5. M. Tan, L. Tan, S. Dara, and C. Mayeux, "Online defect prediction for imbalanced data," in the 37th IEEE International Conference on Software Engineering, Florence, Italy, May 2015, pp. 99-108. doi: 10.1109/ICSE.2015.139
  6. C. Tantithamthavorn, S. McIntosh, A. E. Hassan, and K. Matsumoto, "An empirical comparison of model validation techniques for defect prediction models," IEEE Transactions on Software Engineering, vol. 43, no. 1, pp. 1-18, 2017. doi: 10.1109/TSE.2016.2584050
  7. X.-Y. Jing, F. Wu, X. Dong, and B. Xu, "An improved SDA based defect prediction framework for both within-project and cross-project class-imbalance problems," IEEE Transactions on Software Engineering, vol. 43, no. 4, pp. 321-339, 2017. doi: 10.1109/TSE.2016.2597849
  8. H. Tong, B. Liu, and S. Wang, "Software defect prediction using stacked denoising autoencoders and two-stage ensemble learning," Information and Software Technology, vol. 96, pp. 94-111, 2017. doi: 10.1016/j.infsof.2017.11.008
  9. Ö. F. Arar and K. Ayan, "Software defect prediction using cost-sensitive neural network," Applied Soft Computing, vol. 33, pp. 263-277, 2015. doi: 10.1016/j.asoc.2015.04.045
  10. F. Zhang, Q. Zheng, Y. Zou, and A. E. Hassan, "Cross-project defect prediction using a connectivity-based unsupervised classifier," in the 38th International Conference on Software Engineering, Austin, USA, May 2016, pp. 309-320. doi: 10.1145/2884781.2884839
  11. A. O. Balogun, S. Basri, S. J. Abdulkadir, and A. S. Hashim, "Performance analysis of feature selection methods in software defect prediction: a search method approach," Applied Sciences, vol. 9, no. 13, pp. 1-20, 2019. doi: 10.3390/app9132764
  12. S. Herbold, A. Trautsch, and J. Grabowski, "A comparative study to benchmark cross-project defect prediction approaches," IEEE Transactions on Software Engineering, vol. 44, no. 9, pp. 811-833, 2017. doi: 10.1109/TSE.2017.2724538
  13. Y. Kamei, T. Fukushima, S. McIntosh, K. Yamashita, N. Ubayashi, and A. E. Hassan, "Studying just-in-time defect prediction using cross-project models," Empirical Software Engineering, vol. 21, no. 5, pp. 2072-2106, 2016. doi: 10.1007/s10664-015-9400-x
  14. R. Malhotra, "A systematic review of machine learning techniques for software fault prediction," Applied Soft Computing, vol. 27, pp. 504-518, 2015. doi: 10.1016/j.asoc.2014.11.023
  15. C. Tantithamthavorn, S. McIntosh, A. E. Hassan, and K. Matsumoto, "Automated parameter optimization of classification techniques for defect prediction models," in the IEEE/ACM 38th International Conference on Software Engineering, Austin, USA, May 2016, pp. 321-332. doi: 10.1145/2884781.2884857
  16. A. O. Balogun, S. Basri, S. J. Abdulkadir, and A. S. Hashim, "A hybrid multi-filter wrapper feature selection method for software defect predictors," International Journal of Supply Chain Management, vol. 8, no. 2, pp. 916-922, 2019
  17. W. Fu, T. Menzies, and X. Shen, "Tuning for software analytics: Is it really necessary?," Information and Software Technology, vol. 76, pp. 135-146, 2016. doi: 10.1016/j.infsof.2016.04.017
  18. Y. Jiang, B. Cukic, and T. Menzies, "Can data transformation help in the detection of fault-prone modules?," in the 2008 Workshop on Defects in Large Software Systems, Seattle, USA, Jul. 2008, pp. 16-20. doi: 10.1145/1390817.1390822
  19. A. Tosun and A. Bener, "Reducing false alarms in software defect prediction by decision threshold optimization," in the 2009 3rd International Symposium on Empirical Software Engineering and Measurement, Florida, USA, Oct. 2009, pp. 477-480. doi: 10.1109/ESEM.2009.5316006
  20. A. G. Koru and H. Liu, "An investigation of the effect of module size on defect prediction using static measures," in the 2005 Workshop on Predictor Models in Software Engineering, New York, USA, May 2005, pp. 1-5. doi: 10.1145/1083165.1083172
  21. T. Mende, "Replication of defect prediction studies: problems, pitfalls and recommendations," in the 6th International Conference on Predictive Models in Software Engineering, Timisoara, Romania, Sept. 2010, pp. 1-10. doi: 10.1145/1868328.1868336
  22. T. Hall, S. Beecham, D. Bowes, D. Gray, and S. Counsell, "A systematic literature review on fault prediction performance in software engineering," IEEE Transactions on Software Engineering, vol. 38, no. 6, pp. 1276-1304, 2012. doi: 10.1109/TSE.2011.103
  23. A. G. Akintola, A. O. Balogun, F. Lafenwa-Balogun, and H. A. Mojeed, "Comparative analysis of selected heterogeneous classifiers f analysis of gray code number system in image security or software defects prediction using filter-based feature selection methods," FUOYE Journal of Engineering and Technology, vol. 3, no. 1, pp. 134-137, 2018
  24. R. Jimoh, A. Balogun, A. Bajeh, and S. Ajayi, "A PROMETHEE based evaluation of software defect predictors," Journal of Computer Science and Its Application, vol. 25, no. 1, pp. 106-119, 2018
  25. M. A. Mabayoje, A. O. Balogun, S. M. Bello, J. O. Atoyebi, H. A. Mojeed, and A. H. Ekundayo, "Wrapper feature selection based heterogeneous classifiers for software defect prediction," Adeleke University Journal of Engineering and Technology, vol. 2, no. 1, pp. 1-11, 2019
  26. A. O. Balogun, R. O. Oladele, H. A. Mojeed, B. Amin-Balogun, V. E. Adeyemo, and T. O. Aro, "Performance analysis of selected clustering techniques for software defects prediction," African Journal of Computing & ICT, vol. 12, no. 2, pp. 30-42, 2019
  27. T. G. Grbac, G. Mausa, and B. D. Basic, "Stability of software defect prediction in relation to levels of data imbalance," in the 2nd Workshop on Software Quality Analysis, Monitoring, Improvement, and Applications, Novi Sad, Serbia, Sept. 2013, pp. 1-10
  28. Q. Yu, S. Jiang, and Y. Zhang, "The performance stability of defect prediction models with class imbalance: an empirical study," IEICE Transactions on Information and Systems, vol. 100, no. 2, pp. 265-272, 2017
  29. S. Bibi, G. Tsoumakas, I. Stamelos, and I. P. Vlahavas, "Software defect prediction using regression via classification," in IEEE International Conference on Computer Systems and Applications, Dubai, UAE, Mar. 2006, pp. 330-336. doi: 10.1109/AICCSA.2006.205110
  30. P. Singh and S. Verma, "Automated tool for extraction of software fault data," in Advances in Data and Information Sciences: Springer, 2018, pp. 29-37. doi: 10.1007/978-981-10-8360-0_3
  31. M. Tan, L. Tan, S. Dara, and C. Mayeux, "Online defect prediction for imbalanced data," in the 37th Internation Conference on Software Engineering, Florence, Italy, May 2015, pp. 99-108
  32. G. I. Salama, M. Abdelhalim, and M. A.-e. Zeid, "Breast cancer diagnosis on three different datasets using multi-classifiers," International Journal of Computer and Information Technology, vol. 1, no. 1, pp. 36-43, 2012
  33. Y. A. Christobel and P. Sivaprakasam, "A new classwise k nearest neighbor (CKNN) method for the classification of diabetes dataset," International Journal of Engineering and Advanced Technology, vol. 2, no. 3, pp. 396-200, 2013
  34. Y. Liao and V. R. Vemuri, "Use of k-nearest neighbor classifier for intrusion detection," Computers & Security, vol. 21, no. 5, pp. 439-448, 2002. doi: 10.1016/S0167-4048(02)00514-X
  35. M. Mabayoje, A. Balogun, A. Bajeh, and B. Musa, "Software defect prediction: effect of feature selection and ensemble methods," FUW Trends in Science & Technology Journal, vol. 3, no. 2, pp. 518-522, 2018
  36. P. Hall, B. U. Park, and R. J. Samworth, "Choice of neighbor order in nearest-neighbor classification," The Annals of Statistics, vol. 36, no. 5, pp. 2135-2152, 2008. doi: 10.1214/07-AOS537
  37. R. J. Samworth, "Optimal weighted nearest neighbour classifiers," The Annals of Statistics, vol. 40, no. 5, pp. 2733-2763, 2012. doi: 10.1214/12-AOS1049
  38. T. M. Kodinariya and P. R. Makwana, "Review on determining number of cluster in k-means clustering," International Journal of Advanced Research in Computer Science and Management Studies, vol. 1, no. 6, pp. 90-95, 2013
  39. L. Song, L. L. Minku, and X. Yao, "The impact of parameter tuning on software effort estimation using learning machines," in the 9th International Conference on Predictive Models in Software Engineering, Maryland, USA, Oct. 2013, pp. 1-10. doi: 10.1145/2499393.2499394

Last update:

  1. Optimising classification in sport: a replication study using physical and technical-tactical performance indicators to classify competitive levels in rugby league match-play

    Victor Elijah Adeyemo, Anna Palczewska, Ben Jones, Dan Weaving, Sarah Whitehead. Science and Medicine in Football, 8 (1), 2024. doi: 10.1080/24733938.2022.2146177
  2. Proceedings of the 2nd International Conference on Emerging Technologies and Intelligent Systems

    Abdullah B. Nasser, Waheed Ghanem, Antar Shaddad Hamed Abdul-Qawy, Mohammed A. H. Ali, Abdul-Malik Saad, Sanaa A. A. Ghaleb, Nayef Alduais. Lecture Notes in Networks and Systems, 573 , 2023. doi: 10.1007/978-3-031-20429-6_18
  3. International Conference on Emerging Applications and Technologies for Industry 4.0 (EATI’2020)

    Abdullateef O. Balogun, Fatimah B. Lafenwa-Balogun, Hammed A. Mojeed, Fatimah E. Usman-Hamza, Amos O. Bajeh, Victor E. Adeyemo, Kayode S. Adewole, Rasheed G. Jimoh. Lecture Notes in Networks and Systems, 254 , 2021. doi: 10.1007/978-3-030-80216-5_4
  4. Binary Grey Wolf Optimizer with K-Nearest Neighbor classifier for Feature Selection

    Ranya Al-wajih, Said Jadid Abdulakaddir, NorShakirah Bt A Aziz, Qasem Al-tashi. 2020 International Conference on Computational Intelligence (ICCI), 2020. doi: 10.1109/ICCI51257.2020.9247792
  5. Computational Science and Its Applications – ICCSA 2021

    Abdullateef O. Balogun, Noah O. Akande, Fatimah E. Usman-Hamza, Victor E. Adeyemo, Modinat A. Mabayoje, Ahmed O. Ameen. Lecture Notes in Computer Science, 12957 , 2021. doi: 10.1007/978-3-030-87013-3_12
  6. Computational Science and Its Applications – ICCSA 2022 Workshops

    Abdullateef O. Balogun, Babajide J. Odejide, Amos O. Bajeh, Zubair O. Alanamu, Fatima E. Usman-Hamza, Hammid O. Adeleke, Modinat A. Mabayoje, Shakirat R. Yusuff. Lecture Notes in Computer Science, 13381 , 2022. doi: 10.1007/978-3-031-10548-7_27
  7. Intelligent Algorithms in Software Engineering

    Abdullateef O. Balogun, Shuib Basri, Said A. Jadid, Saipunidzam Mahamad, Malek A. Al-momani, Amos O. Bajeh, Ammar K. Alazzawi. Advances in Intelligent Systems and Computing, 1224 , 2020. doi: 10.1007/978-3-030-51965-0_43
  8. Empirical analysis of tree-based classification models for customer churn prediction

    Fatima E. Usman-Hamza, Abdullateef O. Balogun, Salahdeen K. Nasiru, Luiz Fernando Capretz, Hammed A. Mojeed, Shakirat A. Salihu, Abimbola G. Akintola, Modinat A. Mabayoje, Joseph B. Awotunde. Scientific African, 23 , 2024. doi: 10.1016/j.sciaf.2023.e02054
  9. Proceedings of International Conference on Advanced Computing Applications

    Favour Onotse Momoh, Sandip Rakshit, Narasimha Rao Vajjhala. Advances in Intelligent Systems and Computing, 1406 , 2022. doi: 10.1007/978-981-16-5207-3_48
  10. When less is more: on the value of “co-training” for semi-supervised software defect predictors

    Suvodeep Majumder, Joymallya Chakraborty, Tim Menzies. Empirical Software Engineering, 29 (2), 2024. doi: 10.1007/s10664-023-10418-4
  11. Smell-Aware Bug Classification

    Khyber, Sikandar Ali, Fazli Wahid, Samad Baseer, Ahmed Alkhayyat, Akram M. Al-Radaei. IEEE Access, 12 , 2024. doi: 10.1109/ACCESS.2023.3335175
  12. Optimization of k value and lag parameter of k-nearest neighbor algorithm on the prediction of hotel occupancy rates

    Agus Subhan Akbar, R. Hadapiningradja Kusumodestoni. Jurnal Teknologi dan Sistem Komputer, 8 (3), 2020. doi: 10.14710/jtsiskom.2020.13648
  13. Advances in Computing and Data Sciences

    Yakub Kayode Saheed, Olumide Longe, Usman Ahmad Baba, Sandip Rakshit, Narasimha Rao Vajjhala. Communications in Computer and Information Science, 1440 , 2021. doi: 10.1007/978-3-030-81462-5_29
  14. Detecting block ciphers generic attacks: An instance-based machine learning method

    Yazan Ahmad Alsariera. International Journal of ADVANCED AND APPLIED SCIENCES, 9 (5), 2022. doi: 10.21833/ijaas.2022.05.007
  15. Software Engineering Perspectives in Systems

    Babajide J. Odejide, Amos O. Bajeh, Abdullateef O. Balogun, Zubair O. Alanamu, Kayode S. Adewole, Abimbola G. Akintola, Shakirat A. Salihu, Fatima E. Usman-Hamza, Hammed A. Mojeed. Lecture Notes in Networks and Systems, 501 , 2022. doi: 10.1007/978-3-031-09070-7_49
  16. An Optimized LSTM Neural Network for Accurate Estimation of Software Development Effort

    Anca-Elena Iordan. Mathematics, 12 (2), 2024. doi: 10.3390/math12020200
  17. Impact of Feature Selection Methods on the Predictive Performance of Software Defect Prediction Models: An Extensive Empirical Study

    Abdullateef O. Balogun, Shuib Basri, Saipunidzam Mahamad, Said J. Abdulkadir, Malek A. Almomani, Victor E. Adeyemo, Qasem Al-Tashi, Hammed A. Mojeed, Abdullahi A. Imam, Amos O. Bajeh. Symmetry, 12 (7), 2020. doi: 10.3390/sym12071147
  18. Machine Learning and Big Data Analytics

    Narasimha Rao Vajjhala, Kenneth David Strang. Springer Proceedings in Mathematics & Statistics, 401 , 2023. doi: 10.1007/978-3-031-15175-0_2
  19. Computational Science and Its Applications – ICCSA 2020

    Abdullateef O. Balogun, Fatimah B. Lafenwa-Balogun, Hammed A. Mojeed, Victor E. Adeyemo, Oluwatobi N. Akande, Abimbola G. Akintola, Amos O. Bajeh, Fatimah E. Usman-Hamza. Lecture Notes in Computer Science, 12254 , 2020. doi: 10.1007/978-3-030-58817-5_45

Last update: 2024-03-27 18:08:16

  1. Heterogeneous Ensemble with Combined Dimensionality Reduction for Social Spam Detection

    Oladepo A.G.. International Journal of Interactive Mobile Technologies, 15 (17), 2021. doi: 10.3991/ijim.v15i17.19915
  2. Data Sampling-Based Feature Selection Framework for Software Defect Prediction

    Balogun A.O.. Lecture Notes in Networks and Systems, 127 , 2021. doi: 10.1007/978-3-030-80216-5_4
  3. Binary Grey Wolf Optimizer with K-Nearest Neighbor classifier for Feature Selection

    Ranya Al-wajih, Said Jadid Abdulakaddir, NorShakirah Bt A Aziz, Qasem Al-tashi. 2020 International Conference on Computational Intelligence (ICCI), 2020. doi: 10.1109/ICCI51257.2020.9247792
  4. Search-Based Wrapper Feature Selection Methods in Software Defect Prediction: An Empirical Analysis

    Balogun A.O.. Advances in Intelligent Systems and Computing, 127 (7), 2020. doi: 10.1007/978-3-030-51965-0_43
  5. SMOTE-Based Homogeneous Ensemble Methods for Software Defect Prediction

    Balogun A.O.. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformat, 127 (7), 2020. doi: 10.1007/978-3-030-58817-5_45
  6. Impact of Feature Selection Methods on the Predictive Performance of Software Defect Prediction Models: An Extensive Empirical Study

    Abdullateef O. Balogun, Shuib Basri, Saipunidzam Mahamad, Said J. Abdulkadir, Malek A. Almomani, Victor E. Adeyemo, Qasem Al-Tashi, Hammed A. Mojeed, Abdullahi A. Imam, Amos O. Bajeh. Symmetry, 12 (7), 2020. doi: 10.3390/sym12071147