K-means-SMOTE untuk menangani ketidakseimbangan kelas dalam klasifikasi penyakit diabetes dengan C4.5, SVM, dan naive Bayes

K-means-SMOTE for handling class imbalance in the classification of diabetes with C4.5, SVM, and naive Bayes

*Hairani Hairani orcid scopus  -  Universitas Bumigora, Indonesia
Khurniawan Eko Saputro scopus  -  Universitas Bumigora, Indonesia
Sofiansyah Fadli  -  Sekolah Tinggi Manajemen Informatika dan Komputer Lombok, Indonesia
Received: 6 Nov 2019; Revised: 10 Feb 2020; Accepted: 14 Feb 2020; Published: 30 Apr 2020; Available online: 15 Feb 2020.
DOI: https://doi.org/10.14710/jtsiskom.8.2.2020.89-93 View
Pima Indian Diabetes Dataset
Subject
Type Data Set
  Download (71KB)    Indexing metadata
Open Access Copyright (c) 2020 Jurnal Teknologi dan Sistem Komputer
Creative Commons License This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Citation Format:
Article Info
Section: Original Research Articles
Language: ID
Statistics: 1536 221
Abstract
The occurrence of imbalanced class in a dataset causes the classification results to tend to the class with the largest amount of data (majority class). A sampling method is needed to balance the minority class (positive class) so that the class distribution becomes balanced and leading to better classification results. This study was conducted to overcome imbalanced class problems on the Indian Pima diabetes illness dataset using k-means-SMOTE. The dataset has 268 instances of the positive class (minority class) and 500 instances of the negative class (majority class). The classification was done by comparing C4.5, SVM, and naïve Bayes while implementing k-means-SMOTE in data sampling. Using k-means-SMOTE, the SVM classification method has the highest accuracy and sensitivity of 82 % and 77 % respectively, while the naive Bayes method produces the highest specificity of 89 %.

Note: This article has supplementary file(s).

Keywords: k-means-SMOTE; SMOTE; classification performance; class imbalance

Article Metrics:

  1. B. Santoso, H. Wijayanto, K. Notodiputro, and B. Sartono, “Class imbalanced problems: a review,” Conference Series: Earth and Environmental Science., vol. 58, no. 1, pp. 427-436, 2017. doi: 10.1088/1755-1315/58/1/012031
  2. N. V Chawla, K. W. Bowyer, and L. O. Hall, “SMOTE : synthetic minority over-sampling technique,” Journal of Artificial Intelligence Research, vol. 16, pp. 341-378, 2002. doi: 10.1613/jair.953
  3. S. Sisodia, N. K. Reddy, and S. Bhandari, “Performance evaluation of class balancing techniques for credit card fraud detection,” in IEEE International Conference on Power, Control, Signals and Instrumentation Engineering, Chennai, India, Sept. 2017, pp. 2747–2752. doi: 10.1109/ICPCSI.2017.8392219
  4. L. Demidova and I. Klyueva, “SVM classification: optimization with the SMOTE algorithm for the class imbalance problem,” in 6th Mediterranean Conference on Embedded Computing (MECO), Bar, Montenegro, Jun. 2017, pp. 17–20. doi: 10.1109/MECO.2017.7977136
  5. F. A. Bachtiar, I. K. Syahputra, and S. A. Wicaksono, “Perbandingan algoritme machine learning untuk memprediksi pengambil matakuliah,” Jurnal Teknologi Informasi dan Ilmu Komputer, vol. 6, no. 5, pp. 543-548, 2019. doi: 10.25126/jtiik.2019651755
  6. L. Demidova and I. Klyueva, “Improving the classification quality of the SVM classifier for the imbalanced datasets on the base of ideas the SMOTE algorithm,” ITM Web of Conferences, vol. 10, 2017, pp. 1-4. doi: 10.1051/itmconf/20171002002
  7. Y. Pristyanto, N. A. Setiawan, and I. Ardiyanto, “Hybrid resampling to handle imbalanced class on classification of student performance in classroom,” in 1st International Conference on Informatics and Computational Sciences (ICICoS), Semarang, Indonesia, Nov. 2017, pp. 207-212. doi: 10.1109/ICICOS.2017.8276363
  8. H. Hairani, N. A. Setiawan, and T. B. Adji, “Metode klasifikasi data mining dan teknik sampling SMOTE menangani class imbalance untuk segmentasi customer pada industri perbankan,” in Seminar Nasional Sains dan Teknologi, Semarang, Indonesia, Aug. 2016, pp. 168-172.
  9. A. C. Flores, R. I. Icoy, C. F. Pena, and K. D. Gorro, “An evaluation of SVM and naive Bayes with SMOTE on sentiment analysis data set,” in 2018 International Conference on Engineering, Applied Sciences, and Technology (ICEAST), Phuket, Thailand, Jul. 2018, pp. 1-4. doi: 10.1109/ICEAST.2018.8434401
  10. Z. Ulhaq and T. B. Adji, “Integrasi synthetic minority over-sampling technique (SMOTE) dengan correlated naïve Bayes pada klasifikasi siswa berkesulitan belajar,” in CITEE, Yogyakarta, Indonesia, Jul. 2017, pp. 201-205.
  11. R. Siringoringo, “Klasifikasi data tidak seimbang menggunakan algoritma SMOTE dan k-nearest neighbor,” Journal Information System Development, vol. 3, no. 1, pp. 44-49, 2018.
  12. G. Douzas, F. Bacao, and F. Last, “Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE,” Information System, vol. 465, pp. 1-20, 2018. doi: 10.1016/j.ins.2018.06.056
  13. H. Hairani, G. Nugraha, M. Nurkholis Abdillah, and M. Innuddin, “Komparasi akurasi metode correlated naive Bayes classifier dan naive Bayes classifier untuk diagnosis penyakit diabetes,” InfoTekJar (Jurnal Nasional. Informatika dan Teknologi Jaringan), vol. 3, no. 1, pp. 6-11, 2018.
  14. L. Nass, S. Swift, and A. Al Dallal, “Indepth analysis of medical dataset mining: a comparitive analysis on a diabetes dataset before and after preprocessing,” KnE Social Sciences, vol. 3, no. 25, pp. 45-63, 2019. doi: 10.18502/kss.v3i25.5190
  15. X. Wu et al., “Top 10 algorithms in data mining,” Knowledge and information systems, vol. 14, no. 1, pp. 1-37, 2008.
  16. N. Nurajijah and D. Riana, “Algoritma naïve Bayes, decision tree, dan SVM untuk klasifikasi persetujuan pembiayaan nasabah koperasi syariah,” Jurrnal Teknologi dan Sistem Komputer., vol. 7, no. 2, pp. 77-82, 2019. doi: 10.14710/jtsiskom.7.2.2019.77-82

No citation recorded.