K-means-SMOTE untuk menangani ketidakseimbangan kelas dalam klasifikasi penyakit diabetes dengan C4.5, SVM, dan naive Bayes

Hairani Hairani; Khurniawan Eko Saputro; Sofiansyah Fadli

doi:10.14710/jtsiskom.8.2.2020.89-93

DOI: https://doi.org/10.14710/jtsiskom.8.2.2020.89-93

K-means-SMOTE untuk menangani ketidakseimbangan kelas dalam klasifikasi penyakit diabetes dengan C4.5, SVM, dan naive Bayes

K-means-SMOTE for handling class imbalance in the classification of diabetes with C4.5, SVM, and naive Bayes

Hairani Hairani¹

, Khurniawan Eko Saputro¹

, Sofiansyah Fadli²

¹Universitas Bumigora, Indonesia

²Sekolah Tinggi Manajemen Informatika dan Komputer Lombok, Indonesia

Received: 6 Nov 2019; Revised: 10 Feb 2020; Accepted: 14 Feb 2020; Available online: 15 Feb 2020; Published: 30 Apr 2020.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Citation Format:

Abstract

The occurrence of imbalanced class in a dataset causes the classification results to tend to the class with the largest amount of data (majority class). A sampling method is needed to balance the minority class (positive class) so that the class distribution becomes balanced and leading to better classification results. This study was conducted to overcome imbalanced class problems on the Indian Pima diabetes illness dataset using k-means-SMOTE. The dataset has 268 instances of the positive class (minority class) and 500 instances of the negative class (majority class). The classification was done by comparing C4.5, SVM, and naïve Bayes while implementing k-means-SMOTE in data sampling. Using k-means-SMOTE, the SVM classification method has the highest accuracy and sensitivity of 82 % and 77 % respectively, while the naive Bayes method produces the highest specificity of 89 %.

Note: This article has supplementary file(s).

Fulltext View|Download | Data Set

Pima Indian Diabetes Dataset

Subject
Type	Data Set
	Download (71KB) Indexing metadata

Email colleagues

Keywords: k-means-SMOTE; SMOTE; classification performance; class imbalance

Funding: Universitas Bumigora

Article Metrics:

Article Info

Section: Original Research Articles

Language : ID

In Volume 8, Issue 2, Year 2020 (April 2020)

Prediction of hotel bookings cancellation using hyperparameter optimization on Random Forest algorithm Data scaling performance on various machine learning algorithms to identify abalone sex Discrimination of civet coffee using visible spectroscopy Deep learning model for metagenome fragment classification using spaced k-mers feature extraction Naïve Bayes, Decision Tree, and SVM Algorithm for Classification of Sharia Cooperative Customer Financing Approval More related articles

Most cited articles

PSS Tuning on Power Generator System using Flower Pollination Algorithm K-means-SMOTE for handling class imbalance in the classification of diabetes with C4.5, SVM, and naive Bayes Identification of Herbal Medicinal Plants Based on Leaf Image Using Gray Level Co-occurence Matrix and K-Nearest Neighbor Algorithms Perancangan Game Math Adventure Sebagai Media Pembelajaran Matematika Berbasis Android Sistem Informasi Geografis Pariwisata Kota Semarang More cited articles

B. Santoso, H. Wijayanto, K. Notodiputro, and B. Sartono, “Class imbalanced problems: a review,” Conference Series: Earth and Environmental Science., vol. 58, no. 1, pp. 427-436, 2017. doi: 10.1088/1755-1315/58/1/012031
N. V Chawla, K. W. Bowyer, and L. O. Hall, “SMOTE : synthetic minority over-sampling technique,” Journal of Artificial Intelligence Research, vol. 16, pp. 341-378, 2002. doi: 10.1613/jair.953
S. Sisodia, N. K. Reddy, and S. Bhandari, “Performance evaluation of class balancing techniques for credit card fraud detection,” in IEEE International Conference on Power, Control, Signals and Instrumentation Engineering, Chennai, India, Sept. 2017, pp. 2747–2752. doi: 10.1109/ICPCSI.2017.8392219
L. Demidova and I. Klyueva, “SVM classification: optimization with the SMOTE algorithm for the class imbalance problem,” in 6th Mediterranean Conference on Embedded Computing (MECO), Bar, Montenegro, Jun. 2017, pp. 17–20. doi: 10.1109/MECO.2017.7977136
F. A. Bachtiar, I. K. Syahputra, and S. A. Wicaksono, “Perbandingan algoritme machine learning untuk memprediksi pengambil matakuliah,” Jurnal Teknologi Informasi dan Ilmu Komputer, vol. 6, no. 5, pp. 543-548, 2019. doi: 10.25126/jtiik.2019651755
L. Demidova and I. Klyueva, “Improving the classification quality of the SVM classifier for the imbalanced datasets on the base of ideas the SMOTE algorithm,” ITM Web of Conferences, vol. 10, 2017, pp. 1-4. doi: 10.1051/itmconf/20171002002
Y. Pristyanto, N. A. Setiawan, and I. Ardiyanto, “Hybrid resampling to handle imbalanced class on classification of student performance in classroom,” in 1st International Conference on Informatics and Computational Sciences (ICICoS), Semarang, Indonesia, Nov. 2017, pp. 207-212. doi: 10.1109/ICICOS.2017.8276363
H. Hairani, N. A. Setiawan, and T. B. Adji, “Metode klasifikasi data mining dan teknik sampling SMOTE menangani class imbalance untuk segmentasi customer pada industri perbankan,” in Seminar Nasional Sains dan Teknologi, Semarang, Indonesia, Aug. 2016, pp. 168-172
A. C. Flores, R. I. Icoy, C. F. Pena, and K. D. Gorro, “An evaluation of SVM and naive Bayes with SMOTE on sentiment analysis data set,” in 2018 International Conference on Engineering, Applied Sciences, and Technology (ICEAST), Phuket, Thailand, Jul. 2018, pp. 1-4. doi: 10.1109/ICEAST.2018.8434401
Z. Ulhaq and T. B. Adji, “Integrasi synthetic minority over-sampling technique (SMOTE) dengan correlated naïve Bayes pada klasifikasi siswa berkesulitan belajar,” in CITEE, Yogyakarta, Indonesia, Jul. 2017, pp. 201-205
R. Siringoringo, “Klasifikasi data tidak seimbang menggunakan algoritma SMOTE dan k-nearest neighbor,” Journal Information System Development, vol. 3, no. 1, pp. 44-49, 2018
G. Douzas, F. Bacao, and F. Last, “Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE,” Information System, vol. 465, pp. 1-20, 2018. doi: 10.1016/j.ins.2018.06.056
H. Hairani, G. Nugraha, M. Nurkholis Abdillah, and M. Innuddin, “Komparasi akurasi metode correlated naive Bayes classifier dan naive Bayes classifier untuk diagnosis penyakit diabetes,” InfoTekJar (Jurnal Nasional. Informatika dan Teknologi Jaringan), vol. 3, no. 1, pp. 6-11, 2018
L. Nass, S. Swift, and A. Al Dallal, “Indepth analysis of medical dataset mining: a comparitive analysis on a diabetes dataset before and after preprocessing,” KnE Social Sciences, vol. 3, no. 25, pp. 45-63, 2019. doi: 10.18502/kss.v3i25.5190
X. Wu et al., “Top 10 algorithms in data mining,” Knowledge and information systems, vol. 14, no. 1, pp. 1-37, 2008
N. Nurajijah and D. Riana, “Algoritma naïve Bayes, decision tree, dan SVM untuk klasifikasi persetujuan pembiayaan nasabah koperasi syariah,” Jurrnal Teknologi dan Sistem Komputer., vol. 7, no. 2, pp. 77-82, 2019. doi: 10.14710/jtsiskom.7.2.2019.77-82

Last update:

A proposed method for handling an imbalance data in classification of blood type based on Myers-Briggs type indicator
Ahmad Taufiq Akbar, Rochmat Husaini, Bagus Muhammad Akbar, Shoffan Saifullah. Jurnal Teknologi dan Sistem Komputer, 8 (4), 2020. doi: 10.14710/jtsiskom.2020.13625
Comparative Analysis of Algorithms Naïve Bayes and C45 for Student Satisfaction with Administrative Services
Ramadani Ramadani, B.Herawan Hayadi, Hartono Hartono. 2023 International Conference of Computer Science and Information Technology (ICOSNIKOM), 2023. doi: 10.1109/ICoSNIKOM60230.2023.10364373
AOH-Senti: Aspect-Oriented Hybrid Approach to Sentiment Analysis of Students’ Feedback
Abhinav Kathuria, Anu Gupta, R. K. Singla. SN Computer Science, 4 (2), 2023. doi: 10.1007/s42979-022-01611-1
Prediksi Siswa Putus Sekolah Swasta Menggunakan Algoritma Bayesian Network (Studi Pada : SMA Islam Al Wahid Kepung)
Rifky Yunus Krisnabayu, Ahmad Afif Supianto, Satrio Agung Wicaksono. Jurnal Teknologi dan Sistem Komputer, 10 (2), 2022. doi: 10.14710/jtsiskom.2022.14121
Unlocking the power of optimized data balancing ratios: a new frontier in tackling imbalanced datasets
Samet Aymaz. The Journal of Supercomputing, 81 (2), 2025. doi: 10.1007/s11227-025-06919-2
Discrimination of civet coffee using visible spectroscopy
Graciella Mae L Adier, Charlene A Reyes, Edwin R Arboleda. Jurnal Teknologi dan Sistem Komputer, 8 (3), 2020. doi: 10.14710/jtsiskom.2020.13734
Integrasi Metode Naive Bayes dengan K-Means dan K-Means-Smote untuk Klasifikasi Jurusan SMAN 3 Mataram
Hairani Hairani, Muhammad Ridho Hansyah, Lalu Zazuli Azhar Mardedi. Jurnal Sistem dan Informatika (JSI), 15 (1), 2020. doi: 10.30864/jsi.v15i1.317
Evaluation of ensemble method for multiclass classification on unbalanced data
Ayunda Afiani Rosita, Anang Kurnia, Anik Djuraidah. INTERNATIONAL CONFERENCE ON STATISTICS AND DATA SCIENCE 2021, 2662 , 2022. doi: 10.1063/5.0108842
A Comparative Performance Analysis of Data Resampling Methods on Imbalance Medical Data
Matloob Khushi, Kamran Shaukat, Talha Mahboob Alam, Ibrahim A. Hameed, Shahadat Uddin, Suhuai Luo, Xiaoyan Yang, Maranatha Consuelo Reyes. IEEE Access, 9 , 2021. doi: 10.1109/ACCESS.2021.3102399
MWMOTE optimization for imbalanced data using complete linkage
Meida Cahyo Untoro. Jurnal Teknologi dan Sistem Komputer, 9 (2), 2021. doi: 10.14710/jtsiskom.2021.13748
Synthetic minority over-sampling technique nominal continous logistic regression for imbalanced data
Iis Dewi Ratih, Sri Mumpuni Retnaningsih, Islahulhaq Islahulhaq, Vivi Mentari Dewi. THE 3RD INTERNATIONAL CONFERENCE ON MATHEMATICS AND SCIENCES (THE 3RD ICMSc): A Brighter Future with Tropical Innovation in the Application of Industry 4.0, 2668 , 2022. doi: 10.1063/5.0111804
The Evaluation of Extracted Features for Detecting Eclipse Attacks on Ethereum Network Layers
Dhanasak Bhumichai, Ryan G. Benton. 2024 IEEE International Conference on Big Data (BigData), 2024. doi: 10.1109/BigData62323.2024.10825144
Accuracy Enhancement of Correlated Naive Bayes Method by Using Correlation Feature Selection (CFS) for Health Data Classification
Hairani Hairani, Muhammad Innuddin, Majid Rahardi. 2020 3rd International Conference on Information and Communications Technology (ICOIACT), 2020. doi: 10.1109/ICOIACT50329.2020.9332021
Social Media Sentiment Analysis for Local Water Company Customers Using a Support Vector Machine Algorithm
Prastiani, Hanif Fakhrurroja, Faqih Hamami. 2023 10th International Conference on ICT for Smart Society (ICISS), 2023. doi: 10.1109/ICISS59129.2023.10291991

Last update: 2025-08-20 02:46:37

Accuracy Enhancement of Correlated Naive Bayes Method by Using Correlation Feature Selection (CFS) for Health Data Classification
Hairani Hairani, Muhammad Innuddin, Majid Rahardi. 2020 3rd International Conference on Information and Communications Technology (ICOIACT), 2020. doi: 10.1109/ICOIACT50329.2020.9332021

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Starting from 2021, the author(s) whose article is published in the JTSiskom journal attain the copyright for their article and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. By submitting the manuscript to JTSiskom, the author(s) agree with this policy. No special document approval is required.

The author(s) guarantee that:

their article is original, written by the mentioned author(s),
has never been published before,
does not contain statements that violate the law, and
does not violate the rights of others, is subject to copyright held exclusively by the author(s), is free from the rights of third parties, and the necessary written permission to quote from other sources has been obtained by the author(s).

The author(s) retain all rights to the published work, such as (but not limited to) the following rights:

Copyright and other proprietary rights related to the article, such as patents,
The right to use the substance of the article in its own future works, including lectures and books,
The right to reproduce the article for its own purposes,
The right to archive all versions of the article in any repository, and
The right to enter into separate additional contractual arrangements for the non-exclusive distribution of published versions of the article (for example, posting them to institutional repositories or publishing them in a book), acknowledging its initial publication in this journal (Jurnal Teknologi dan Sistem Komputer).

Suppose the article was prepared jointly by more than one author. Each author submitting the manuscript warrants that all co-authors have given their permission to agree to copyright and license notices (agreements) on their behalf and notify co-authors of the terms of this policy. JTSiskom will not be held responsible for anything arising because of the writer's internal dispute. JTSiskom will only communicate with correspondence authors.

Authors should also understand that their articles (and any additional files, including data sets and analysis/computation data) will become publicly available once published. The license of published articles (and additional data) will be governed by a Creative Commons Attribution-ShareAlike 4.0 International License. JTSiskom allows users to copy, distribute, display and perform work under license. Users need to attribute the author(s) and JTSiskom to distribute works in journals and other publication media. Unless otherwise stated, the author(s) is a public entity as soon as the article is published.

K-means-SMOTE untuk menangani ketidakseimbangan kelas dalam klasifikasi penyakit diabetes dengan C4.5, SVM, dan naive Bayes

K-means-SMOTE for handling class imbalance in the classification of diabetes with C4.5, SVM, and naive Bayes

EDITORIAL OFFICE OF JURNAL TEKNOLOGI DAN SISTEM KOMPUTER