skip to main content

Optimasi MWMOTE pada data tidak seimbang menggunakan complete linkage

MWMOTE optimization for imbalanced data using complete linkage

Department of Informatics, Institut Teknologi Sumatera. Jl. Ryacudu, Lampung Selatan, Indonesia 35365, Indonesia

Received: 11 May 2020; Revised: 7 Dec 2020; Accepted: 18 Jan 2021; Published: 30 Apr 2021; Available online: 20 Apr 2021.
Open Access Copyright (c) 2021 The Authors. Published by Department of Computer Engineering, Universitas Diponegoro
Creative Commons License This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Citation Format:
Abstract
Imbalanced data can result in classification errors, such as in WMMOTE, and can decrease its performance and accuracy. Clustering in MWMOTE can be optimized to improve synthetic data generation and improve MWMOTE performance. This study aims to optimize the MWMOTE algorithm's performance in the clustering process in making synthetic data with complete linkage (CL). The dataset used a variety of data ratios to handle imbalanced data. The decision tree was used to determine the performance of MWMOTE and CL-MWMOTE oversampling. CL-MWMOTE evaluation results provide better and optimal performance than MWMOTE and increase the precision, recall, f-measure, and accuracy of 0.53 %, 0.67 %, 0.66 %, and 0.67 %, respectively.

Note: This article has supplementary file(s).

Fulltext View|Download |  Data Analysis
Data Analysis: MWMOTE optimization for imbalanced data using complete linkage
Subject imbalaced; clustering; complete linkage; optimization; oversampling
Type Data Analysis
  Download (3MB)    Indexing metadata
Keywords: imbalanced data; clustering; complete linkage; optimization; oversampling
Funding: Institut Teknologi Sumatera, Indonesia

Article Metrics:

  1. M. S. Shelke, P. R. Deshmukh, and P. V. K. Shandilya, “A review on imbalanced data handling using undersampling and oversampling technique,” International Journal of Recent Trends in Engineering & Research, vol. 3, no. 4, pp. 444–449, 2017. doi: 10.23883/IJRTER.2017.3168.0UWXM
  2. T. Fahrudin, J. L. Buliali, and C. Fatichah, “Randshuff : an algorithm to handle imbalance class for qualitative data,” International Review on Computers and Software, vol. 11, no.12, pp. 1093–1104, 2016. doi: 10.15866/irecos.v11i12.10956
  3. H. Hairani, K. E. Saputro, and S. Fadli, "K-means-SMOTE untuk menangani ketidakseimbangankelas dalam klasifikasi penyakit diabetes dengan C4.5, SVM, dan naive Bayes," Jurnal Teknologi dan Sistem Komputer, vol. 8, no. 2, pp. 89-93, 2020. doi: 10.14710/jtsiskom.8.2.2020.89-93
  4. W. Y. Ng. Wing, J. Hu, D. S. Yeung, S. Yin, and F. Roli, “Diversified sensitivity-based undersampling for imbalance classification problems,” IEEE Transactions on Cybernetics, vol. 45, no. 11, pp. 2402–2412, 2015. doi: 10.1109/TCYB.2014.2372060
  5. N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: synthetic minority over-sampling technique,” Journal of Artificial Intelligence Research, vol. 16, pp. 321–357, 2002. doi: 10.1613/jair.953
  6. M. C. Untoro and J. L. Buliali, “Penanganan imbalance class data laboratorium kesehatan dengan Majority Weighted Minority Oversampling Technique,” Register: Jurnal Ilmiah Teknologi Sistem Informasi, vol. 4, no. 1, p. 23, 2018. doi: 10.26594/register.v4i1.1184
  7. J. Wei, H. Huang, L. Yao, Y. Hu, Q. Fan, and D. Huang, “NI-MWMOTE : an improving noise-immunity majority weighted minority oversampling,” Expert Systems with Applications, vol. 158, 113504, 2020. doi: 10.1016/j.eswa.2020.113504
  8. M. C. Untoro, M. Praseptiawan, and M. Widianingsih, “Evaluation of decision tree , k-nn , naive bayes and svm with mwmote on uci dataset evaluation of decision tree, k-nn, naive bayes and svm with mwmote on uci dataset,” Journal of Physics: Conference Series, vol. 1477, pp. 1–9, 2020. doi: 10.1088/1742-6596/1477/3/032005
  9. S. Barua, M. M. Islam, X. Yao, and K. Murase, “MWMOTE - Majority weighted minority oversampling technique for imbalanced dataset learning,” IEEE Transactions on Knowledge and Data Engineering, vol. 26, no. 2, pp. 405–425, 2014. doi: 10.1109/TKDE.2012.232
  10. C. Beyan and R. Fisher, “Classifying imbalanced datasets using similarity based hierarchical decomposition,” Pattern Recognition, vol. 48, no. 5, pp. 1653–1672, 2015. doi: 10.1016/j.patcog.2014.10.032
  11. S. Fikri and N. Ulinnuha, “Perbandingan metode single linkage, complete linkage dan average linkage dalam pengelompokan kecamatan berdasarkan variabel jenis ternak Kabupaten Sidoarjo,” Jurnal Ilmiah Bidang Teknologi Informasi dan Komunikasi, vol. 4, no. 2, pp. 1-5, 2019
  12. M. Fachrurrozi et al., “The grouping of facial images using agglomerative hierarchical clustering to improve the CBIR based face recognition system,” in International Conference on Data and Software Engineering, Palembang, Indonesia, Nov. 2017, pp. 1–6. doi: 10.1109/ICODSE.2017.8285868
  13. S. Sivaranjani, S. Sivakumari, and S. Maragatham “GIS based serial crime analysis using data mining techniques,” International Journal of Computer Applications, vol. 153, no. 8, pp. 19–23, 2016. doi: 10.5120/ijca2016912119
  14. Q. Nafisah and N. E. Chandra, “Analisis cluster average linkage berdasarkan faktor-faktor kemiskinan di Provinsi Jawa Timur,” Zeta - Math Journal, vol. 3, no. 2, pp. 31–36, 2017. doi: 10.31102/zeta.2017.3.2.31-36
  15. M. Y. Pusadan, J. L. Buliali, and R. V. H Ginardi, “Anomaly detection of flight routes through optimal waypoint,” in International Conference Computer Application Informatics, Medan, Indonesia, Dec. 2016, pp. 3–10. doi: 10.1088/1742-6596/801/1/012041
  16. W. Han, Z. Huang, S. Li, and Y. Jia, “Distribution-sensitive unbalanced data oversampling method for medical diagnosis,” Journal of Medical Systems, vol. 43, 39, pp. 1–10, 2019. doi: 10.1007/s10916-018-1154-8
  17. R. Lotfian and C. Busso, “Over-sampling emotional speech data based on subjective evaluations provided by multiple individuals,” IEEE Transactions on Affective Computing, vol. XX, no. XX, 2019. doi: 10.1109/TAFFC.2019.2901465

Last update: 2021-07-23 10:22:26

No citation recorded.

Last update: 2021-07-23 10:22:27

No citation recorded.