skip to main content

A proposed method for handling an imbalance data in classification of blood type based on Myers-Briggs type indicator

Department of Informatics, Universitas Pembangunan Nasional Veteran Yogyakarta, Indonesia

Received: 11 Jan 2020; Revised: 4 Sep 2020; Accepted: 11 Sep 2020; Available online: 16 Sep 2020; Published: 31 Oct 2020.
Open Access Copyright (c) 2020 Jurnal Teknologi dan Sistem Komputer under http://creativecommons.org/licenses/by-sa/4.0.

Citation Format:
Abstract
Blood type still leads to an assumption about its relation to some personality aspects. This study observes preprocessing methods for improving the classification accuracy of MBTI data to determine blood type. The training and testing data use 250 data from the MBTI questionnaire answers given by 250 respondents. The classification uses the k-Nearest Neighbor (k-NN) algorithm. Without preprocessing, k-NN results in about 32 % accuracy, so it needs some preprocessing to handle data imbalance before the classification. The proposed preprocessing consists of two-stage, the first stage is the unsupervised resample, and the second is the supervised resample. For the validation, it uses ten cross-validations. The result of k-Nearest Neighbor classification after using these proposed preprocessing stages has finally increased the accuracy, F-score, and recall significantly.
Keywords: imbalance data; blood type; resample; k-nearest neighbor; MBTI
Funding: Universitas Pembangunan Nasional Veteran Yogyakarta

Article Metrics:

  1. S. Tsuchimine, J. Saruwatari, A. Kaneda, and N. Yasui-Furukori, “ABO blood type and personality traits in healthy Japanese subjects,” PLoS One, vol. 10, no. 5, pp. 1-10, 2015. doi: 10.1371/journal.pone.0126983
  2. A. Nahida, N. Chatterjee, and C. A. Nahida, “A study on relationship between blood group and personality,” International Journal of Home Sciences, vol. 2, no. 21, pp. 239–243, 2016
  3. C. Y. Lee and S. Chin, “Finding EEG correlates of ABO blood types,” International Journal of Multimedia and Ubiquitous Engineering, vol. 9, no. 3, pp. 291–300, 2014
  4. S. Bharadwaj, S. Sridhar, R. Choudhary, and R. Srinath, “persona traits identification based on myers-briggs type indicator (MBTI) - a text classification approach,” in 2018 international conference on advances in computing, communications and informatics, bangalore, india, sept. 2018, pp. 1076–1082. doi: 10.1109/ICACCI.2018.8554828
  5. F. Noori and M. Kazemifard, “Simulation of pair programming using multi-agent and MBTI personality model,” in 6th International Conference of Cognitive Science, Tehran, Iran, Apr. 2015, pp. 29–36. doi: 10.1109/COGSCI.2015.7426665
  6. M. S. Halawa, M. E. Shehab, and E. M. R. Hamed, “Predicting student personality based on a data-driven model from student behavior on LMS and social networks,” in 5th International Conference on Digital Information Processing and Communications, Sierre, Switzerland, Oct. 2015, pp. 294–299. doi: 10.1109/ICDIPC.2015.7323044
  7. S. Selvi, S. Rohini, and C. Velou, “Relation between blood group and mood changes,” Indian Journal of Basic and Applied Medical Research, vol. 6, no. 3, pp. 118–125, 2017
  8. J. Patil et al., “Influence of blood group on the character traits - A cross-sectional study on Malaysian student population,” Journal of Chemical and Pharmaceutical Sciences, vol. 9, no. 2, pp. 865–868, 2016
  9. L. S. Katore and J. S. Umale, “Comparative study of recommendation algorithms and systems using WEKA,” International Journal of computer Applications, vol. 110, no. 3, pp. 14–17. doi: 10.5120/19295-0731
  10. Z. Zheng, Y. Cai, and Y. Li, “Oversampling method for imbalanced classification,” Computing and Informatics, vol. 34, no. 5, pp. 1017–1037, 2015
  11. G. N. Ramadevi, K. U. Rani, and D. Lavanya, “Evaluation of Classifiers Performance using Resampling on Breast cancer Data,” International Journal of Scientific & Engineering Research, vol. 6, no. 2, pp. 200–207, 2015
  12. S. Zhang et al., “Efficient knn classification with different numbers of nearest neighbors,” IEEE Transactions On Neural Networks And Learning Systems, vol. 29, no. 5, pp. 1–12, 2017. doi: 10.1109/TNNLS.2017.2673241
  13. Hartono, O. S. Sitompul, T. Tulus, and E. B. Nababan, “Biased support vector machine and weighted-SMOTE in handling class imbalance problem,” International Journal of Advances in Intelligent Informatics, vol. 4, no. 1, pp. 21–27, 2018. doi: 10.26555/ijain.v4i1.146
  14. N. Cahyana, S. Khomsah, and A. S. Aribowo, “Improving imbalanced dataset classification using oversampling and gradient boosting,” in 5th International Conference on Science in Information Technology, Yogyakarta, Indonesia, Oct. 2019, pp. 217–222. doi: 10.1109/ICSITech46713.2019.8987499
  15. M. Tajik, M. Malakpour, and J. G. Bidgoli, “Examine the relationship between blood groups and intercity driving jobs in Iran,” International Journal of Medical Research & Health Science., vol. 5, no. 12, pp. 292–301, 2016
  16. V. D. Valerio, R. M. Pereira, Y. M. G. Costa, and D. Bertolini, “A resampling approach for imbalanceness on music genre classification using spectrograms,” in International Florida Artificial Intelligence Research Society Conference (FLAIRS-31), Florida, USA, May 2018, pp. 500–505
  17. N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: synthetic minoriy over-sampling technique,” Journal of Artificial Intelligence Research, vol. 16, no. 1, pp. 321–357, 2002. doi: 10.1613/jair.953
  18. T. E. Tallo and A. Musdholifah, “The implementation of genetic algorithm in SMOTE (synthetic minority oversampling technique) for handling imbalanced dataset problem,” in 4th International Conference on Science and Technology, Yogyakarta, Indonesia, Aug. 2018, pp. 1–4. doi: 10.1109/ICSTC.2018.8528591
  19. H. Hairani, K. E. Saputro, and S. Fadli, “K-means-SMOTE for handling class imbalance in the classification of diabetes with C4.5, SVM, and naive Bayes,” Jurnal Teknologi dan Sistem Komputer, vol. 8, no. 2, pp. 89–93, 2020. doi: 10.14710/jtsiskom.8.2.2020.89-93
  20. M. Al-Khaldy, “Resampling imbalanced class and the effectiveness of feature selection methods for heart failure dataset,” International Robotics & Automation Journal, vol. 4, no. 1, pp. 37–45, 2018. doi: 10.15406/iratj.2018.04.00090
  21. J. Huang, Y. Wei, J. Yi, and M. Liu, “An improved knn based on class contribution and feature weighting,” in 10th International Conference on Measuring Technology and Mechatronics Automation, Changsha, China, Feb. 2018, pp. 313–316. doi: 10.1109/ICMTMA.2018.00083
  22. X. Wang, Z. Jiang, and D. Yu, “an improved knn algorithm based on kernel methods and attribute reduction,” in International Conference On Instrumentation And Measurement, Computer, Communication, And Control, Qinhuangdao, China, Sept. 2015, pp. 567–570. doi: 10.1109/IMCCC.2015.125
  23. A. More, “Survey of resampling techniques for improving classification performance in unbalanced datasets,” 2016, arXiv:1608.06048
  24. R. Batuwita and V. Palade, “Efficient resampling methods for training support vector machines with imbalanced datasets,” in International Joint Conference on Neural Networks, Barcelona, Spain, Jul. 2010, pp. 1-8. doi: 10.1109/IJCNN.2010.5596787
  25. A. N. Kasanah, Muladi, and U. Pujianto, “Penerapan teknik SMOTE untuk mengatasi imbalance class dalam klasifikasi objektivitas berita online menggunakan algoritma kNN,” RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 3, no. 10, pp.196-201, 2019. doi: 10.29207/resti.v3i2.945
  26. R. Siringoringo, “K-Nearest Neighbor pada prediksi cacat,” Journal Information System Development (ISD), vol. 2, no. 1, pp. 47–58, 2017

Last update:

  1. 9th International Conference on the Development of Biomedical Engineering in Vietnam

    Le Xuan Hieu, Le T. H. Toan, Ngo Thanh Hoan. IFMBE Proceedings, 95 , 2024. doi: 10.1007/978-3-031-44630-6_36
  2. Data scaling performance on various machine learning algorithms to identify abalone sex

    Willdan Aprizal Arifin, Ishak Ariawan, Ayang Armelita Rosalia, Lukman Lukman, Nabila Tufailah. Jurnal Teknologi dan Sistem Komputer, 10 (1), 2022. doi: 10.14710/jtsiskom.2021.14105
  3. Optimal feature selection for a weighted k-nearest neighbors for compound fault classification in wind turbine gearbox

    Samuel M. Gbashi, Paul A. Adedeji, Obafemi O. Olatunji, Nkosinathi Madushele. Results in Engineering, 2024. doi: 10.1016/j.rineng.2024.103791

Last update: 2024-12-22 00:36:02

No citation recorded.