skip to main content

K-means-SMOTE untuk menangani ketidakseimbangan kelas dalam klasifikasi penyakit diabetes dengan C4.5, SVM, dan naive Bayes

K-means-SMOTE for handling class imbalance in the classification of diabetes with C4.5, SVM, and naive Bayes

1Universitas Bumigora, Indonesia

2Sekolah Tinggi Manajemen Informatika dan Komputer Lombok, Indonesia

Received: 6 Nov 2019; Revised: 10 Feb 2020; Accepted: 14 Feb 2020; Available online: 15 Feb 2020; Published: 30 Apr 2020.
Open Access Copyright (c) 2020 Jurnal Teknologi dan Sistem Komputer
Creative Commons License This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Citation Format:
Abstract
The occurrence of imbalanced class in a dataset causes the classification results to tend to the class with the largest amount of data (majority class). A sampling method is needed to balance the minority class (positive class) so that the class distribution becomes balanced and leading to better classification results. This study was conducted to overcome imbalanced class problems on the Indian Pima diabetes illness dataset using k-means-SMOTE. The dataset has 268 instances of the positive class (minority class) and 500 instances of the negative class (majority class). The classification was done by comparing C4.5, SVM, and naïve Bayes while implementing k-means-SMOTE in data sampling. Using k-means-SMOTE, the SVM classification method has the highest accuracy and sensitivity of 82 % and 77 % respectively, while the naive Bayes method produces the highest specificity of 89 %.

Note: This article has supplementary file(s).

Fulltext View|Download |  Data Set
Pima Indian Diabetes Dataset
Subject
Type Data Set
  Download (71KB)    Indexing metadata
Email colleagues
Keywords: k-means-SMOTE; SMOTE; classification performance; class imbalance
Funding: Universitas Bumigora

Article Metrics:

  1. B. Santoso, H. Wijayanto, K. Notodiputro, and B. Sartono, “Class imbalanced problems: a review,” Conference Series: Earth and Environmental Science., vol. 58, no. 1, pp. 427-436, 2017. doi: 10.1088/1755-1315/58/1/012031
  2. N. V Chawla, K. W. Bowyer, and L. O. Hall, “SMOTE : synthetic minority over-sampling technique,” Journal of Artificial Intelligence Research, vol. 16, pp. 341-378, 2002. doi: 10.1613/jair.953
  3. S. Sisodia, N. K. Reddy, and S. Bhandari, “Performance evaluation of class balancing techniques for credit card fraud detection,” in IEEE International Conference on Power, Control, Signals and Instrumentation Engineering, Chennai, India, Sept. 2017, pp. 2747–2752. doi: 10.1109/ICPCSI.2017.8392219
  4. L. Demidova and I. Klyueva, “SVM classification: optimization with the SMOTE algorithm for the class imbalance problem,” in 6th Mediterranean Conference on Embedded Computing (MECO), Bar, Montenegro, Jun. 2017, pp. 17–20. doi: 10.1109/MECO.2017.7977136
  5. F. A. Bachtiar, I. K. Syahputra, and S. A. Wicaksono, “Perbandingan algoritme machine learning untuk memprediksi pengambil matakuliah,” Jurnal Teknologi Informasi dan Ilmu Komputer, vol. 6, no. 5, pp. 543-548, 2019. doi: 10.25126/jtiik.2019651755
  6. L. Demidova and I. Klyueva, “Improving the classification quality of the SVM classifier for the imbalanced datasets on the base of ideas the SMOTE algorithm,” ITM Web of Conferences, vol. 10, 2017, pp. 1-4. doi: 10.1051/itmconf/20171002002
  7. Y. Pristyanto, N. A. Setiawan, and I. Ardiyanto, “Hybrid resampling to handle imbalanced class on classification of student performance in classroom,” in 1st International Conference on Informatics and Computational Sciences (ICICoS), Semarang, Indonesia, Nov. 2017, pp. 207-212. doi: 10.1109/ICICOS.2017.8276363
  8. H. Hairani, N. A. Setiawan, and T. B. Adji, “Metode klasifikasi data mining dan teknik sampling SMOTE menangani class imbalance untuk segmentasi customer pada industri perbankan,” in Seminar Nasional Sains dan Teknologi, Semarang, Indonesia, Aug. 2016, pp. 168-172
  9. A. C. Flores, R. I. Icoy, C. F. Pena, and K. D. Gorro, “An evaluation of SVM and naive Bayes with SMOTE on sentiment analysis data set,” in 2018 International Conference on Engineering, Applied Sciences, and Technology (ICEAST), Phuket, Thailand, Jul. 2018, pp. 1-4. doi: 10.1109/ICEAST.2018.8434401
  10. Z. Ulhaq and T. B. Adji, “Integrasi synthetic minority over-sampling technique (SMOTE) dengan correlated naïve Bayes pada klasifikasi siswa berkesulitan belajar,” in CITEE, Yogyakarta, Indonesia, Jul. 2017, pp. 201-205
  11. R. Siringoringo, “Klasifikasi data tidak seimbang menggunakan algoritma SMOTE dan k-nearest neighbor,” Journal Information System Development, vol. 3, no. 1, pp. 44-49, 2018
  12. G. Douzas, F. Bacao, and F. Last, “Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE,” Information System, vol. 465, pp. 1-20, 2018. doi: 10.1016/j.ins.2018.06.056
  13. H. Hairani, G. Nugraha, M. Nurkholis Abdillah, and M. Innuddin, “Komparasi akurasi metode correlated naive Bayes classifier dan naive Bayes classifier untuk diagnosis penyakit diabetes,” InfoTekJar (Jurnal Nasional. Informatika dan Teknologi Jaringan), vol. 3, no. 1, pp. 6-11, 2018
  14. L. Nass, S. Swift, and A. Al Dallal, “Indepth analysis of medical dataset mining: a comparitive analysis on a diabetes dataset before and after preprocessing,” KnE Social Sciences, vol. 3, no. 25, pp. 45-63, 2019. doi: 10.18502/kss.v3i25.5190
  15. X. Wu et al., “Top 10 algorithms in data mining,” Knowledge and information systems, vol. 14, no. 1, pp. 1-37, 2008
  16. N. Nurajijah and D. Riana, “Algoritma naïve Bayes, decision tree, dan SVM untuk klasifikasi persetujuan pembiayaan nasabah koperasi syariah,” Jurrnal Teknologi dan Sistem Komputer., vol. 7, no. 2, pp. 77-82, 2019. doi: 10.14710/jtsiskom.7.2.2019.77-82

Last update:

  1. A proposed method for handling an imbalance data in classification of blood type based on Myers-Briggs type indicator

    Ahmad Taufiq Akbar, Rochmat Husaini, Bagus Muhammad Akbar, Shoffan Saifullah. Jurnal Teknologi dan Sistem Komputer, 8 (4), 2020. doi: 10.14710/jtsiskom.2020.13625
  2. Comparative Analysis of Algorithms Naïve Bayes and C45 for Student Satisfaction with Administrative Services

    Ramadani Ramadani, B.Herawan Hayadi, Hartono Hartono. 2023 International Conference of Computer Science and Information Technology (ICOSNIKOM), 2023. doi: 10.1109/ICoSNIKOM60230.2023.10364373
  3. AOH-Senti: Aspect-Oriented Hybrid Approach to Sentiment Analysis of Students’ Feedback

    Abhinav Kathuria, Anu Gupta, R. K. Singla. SN Computer Science, 4 (2), 2023. doi: 10.1007/s42979-022-01611-1
  4. Discrimination of civet coffee using visible spectroscopy

    Graciella Mae L Adier, Charlene A Reyes, Edwin R Arboleda. Jurnal Teknologi dan Sistem Komputer, 8 (3), 2020. doi: 10.14710/jtsiskom.2020.13734
  5. Integrasi Metode Naive Bayes dengan K-Means dan K-Means-Smote untuk Klasifikasi Jurusan SMAN 3 Mataram

    Hairani Hairani, Muhammad Ridho Hansyah, Lalu Zazuli Azhar Mardedi. Jurnal Sistem dan Informatika (JSI), 15 (1), 2020. doi: 10.30864/jsi.v15i1.317
  6. Evaluation of ensemble method for multiclass classification on unbalanced data

    Ayunda Afiani Rosita, Anang Kurnia, Anik Djuraidah. INTERNATIONAL CONFERENCE ON STATISTICS AND DATA SCIENCE 2021, 2662 , 2022. doi: 10.1063/5.0108842
  7. A Comparative Performance Analysis of Data Resampling Methods on Imbalance Medical Data

    Matloob Khushi, Kamran Shaukat, Talha Mahboob Alam, Ibrahim A. Hameed, Shahadat Uddin, Suhuai Luo, Xiaoyan Yang, Maranatha Consuelo Reyes. IEEE Access, 9 , 2021. doi: 10.1109/ACCESS.2021.3102399
  8. MWMOTE optimization for imbalanced data using complete linkage

    Meida Cahyo Untoro. Jurnal Teknologi dan Sistem Komputer, 9 (2), 2021. doi: 10.14710/jtsiskom.2021.13748
  9. Synthetic minority over-sampling technique nominal continous logistic regression for imbalanced data

    Iis Dewi Ratih, Sri Mumpuni Retnaningsih, Islahulhaq Islahulhaq, Vivi Mentari Dewi. THE 3RD INTERNATIONAL CONFERENCE ON MATHEMATICS AND SCIENCES (THE 3RD ICMSc): A Brighter Future with Tropical Innovation in the Application of Industry 4.0, 2668 , 2022. doi: 10.1063/5.0111804
  10. Accuracy Enhancement of Correlated Naive Bayes Method by Using Correlation Feature Selection (CFS) for Health Data Classification

    Hairani Hairani, Muhammad Innuddin, Majid Rahardi. 2020 3rd International Conference on Information and Communications Technology (ICOIACT), 2020. doi: 10.1109/ICOIACT50329.2020.9332021
  11. Social Media Sentiment Analysis for Local Water Company Customers Using a Support Vector Machine Algorithm

    Prastiani, Hanif Fakhrurroja, Faqih Hamami. 2023 10th International Conference on ICT for Smart Society (ICISS), 2023. doi: 10.1109/ICISS59129.2023.10291991

Last update: 2024-03-17 19:40:26

  1. Accuracy Enhancement of Correlated Naive Bayes Method by Using Correlation Feature Selection (CFS) for Health Data Classification

    Hairani Hairani, Muhammad Innuddin, Majid Rahardi. 2020 3rd International Conference on Information and Communications Technology (ICOIACT), 2020. doi: 10.1109/ICOIACT50329.2020.9332021