A proposed method for handling an imbalance data in classification of blood type based on Myers-Briggs type indicator

*Ahmad Taufiq Akbar  -  Department of Informatics, Universitas Pembangunan Nasional Veteran Yogyakarta, Indonesia
Rochmat Husaini  -  Department of Informatics, Universitas Pembangunan Nasional Veteran Yogyakarta, Indonesia
Bagus Muhammad Akbar  -  Department of Informatics, Universitas Pembangunan Nasional Veteran Yogyakarta, Indonesia
Shoffan Saifullah  -  Department of Informatics, Universitas Pembangunan Nasional Veteran Yogyakarta, Indonesia
Received: 11 Jan 2020; Revised: 4 Sep 2020; Accepted: 11 Sep 2020; Published: 31 Oct 2020; Available online: 16 Sep 2020.
Fulltext Fulltext |
Open Access Copyright (c) 2020 Jurnal Teknologi dan Sistem Komputer under http://creativecommons.org/licenses/by-sa/4.0.

Citation Format:
Article Info
Section: Original Research Articles
Language: EN
Statistics: 606 81
Share:
Abstract
Blood type still leads to an assumption about its relation to some personality aspects. This study observes preprocessing methods for improving the classification accuracy of MBTI data to determine blood type. The training and testing data use 250 data from the MBTI questionnaire answers given by 250 respondents. The classification uses the k-Nearest Neighbor (k-NN) algorithm. Without preprocessing, k-NN results in about 32 % accuracy, so it needs some preprocessing to handle data imbalance before the classification. The proposed preprocessing consists of two-stage, the first stage is the unsupervised resample, and the second is the supervised resample. For the validation, it uses ten cross-validations. The result of k-Nearest Neighbor classification after using these proposed preprocessing stages has finally increased the accuracy, F-score, and recall significantly.
Keywords: imbalance data; blood type; resample; k-nearest neighbor; MBTI
  1. S. Tsuchimine, J. Saruwatari, A. Kaneda, and N. Yasui-Furukori, “ABO blood type and personality traits in healthy Japanese subjects,” PLoS One, vol. 10, no. 5, pp. 1-10, 2015. doi: 10.1371/journal.pone.0126983
  2. A. Nahida, N. Chatterjee, and C. A. Nahida, “A study on relationship between blood group and personality,” International Journal of Home Sciences, vol. 2, no. 21, pp. 239–243, 2016.
  3. C. Y. Lee and S. Chin, “Finding EEG correlates of ABO blood types,” International Journal of Multimedia and Ubiquitous Engineering, vol. 9, no. 3, pp. 291–300, 2014.
  4. S. Bharadwaj, S. Sridhar, R. Choudhary, and R. Srinath, “persona traits identification based on myers-briggs type indicator (MBTI) - a text classification approach,” in 2018 international conference on advances in computing, communications and informatics, bangalore, india, sept. 2018, pp. 1076–1082. doi: 10.1109/ICACCI.2018.8554828
  5. F. Noori and M. Kazemifard, “Simulation of pair programming using multi-agent and MBTI personality model,” in 6th International Conference of Cognitive Science, Tehran, Iran, Apr. 2015, pp. 29–36. doi: 10.1109/COGSCI.2015.7426665
  6. M. S. Halawa, M. E. Shehab, and E. M. R. Hamed, “Predicting student personality based on a data-driven model from student behavior on LMS and social networks,” in 5th International Conference on Digital Information Processing and Communications, Sierre, Switzerland, Oct. 2015, pp. 294–299. doi: 10.1109/ICDIPC.2015.7323044
  7. S. Selvi, S. Rohini, and C. Velou, “Relation between blood group and mood changes,” Indian Journal of Basic and Applied Medical Research, vol. 6, no. 3, pp. 118–125, 2017.
  8. J. Patil et al., “Influence of blood group on the character traits - A cross-sectional study on Malaysian student population,” Journal of Chemical and Pharmaceutical Sciences, vol. 9, no. 2, pp. 865–868, 2016.
  9. L. S. Katore and J. S. Umale, “Comparative study of recommendation algorithms and systems using WEKA,” International Journal of computer Applications, vol. 110, no. 3, pp. 14–17. doi: 10.5120/19295-0731
  10. Z. Zheng, Y. Cai, and Y. Li, “Oversampling method for imbalanced classification,” Computing and Informatics, vol. 34, no. 5, pp. 1017–1037, 2015.
  11. G. N. Ramadevi, K. U. Rani, and D. Lavanya, “Evaluation of Classifiers Performance using Resampling on Breast cancer Data,” International Journal of Scientific & Engineering Research, vol. 6, no. 2, pp. 200–207, 2015.
  12. S. Zhang et al., “Efficient knn classification with different numbers of nearest neighbors,” IEEE Transactions On Neural Networks And Learning Systems, vol. 29, no. 5, pp. 1–12, 2017. doi: 10.1109/TNNLS.2017.2673241
  13. Hartono, O. S. Sitompul, T. Tulus, and E. B. Nababan, “Biased support vector machine and weighted-SMOTE in handling class imbalance problem,” International Journal of Advances in Intelligent Informatics, vol. 4, no. 1, pp. 21–27, 2018. doi: 10.26555/ijain.v4i1.146
  14. N. Cahyana, S. Khomsah, and A. S. Aribowo, “Improving imbalanced dataset classification using oversampling and gradient boosting,” in 5th International Conference on Science in Information Technology, Yogyakarta, Indonesia, Oct. 2019, pp. 217–222. doi: 10.1109/ICSITech46713.2019.8987499
  15. M. Tajik, M. Malakpour, and J. G. Bidgoli, “Examine the relationship between blood groups and intercity driving jobs in Iran,” International Journal of Medical Research & Health Science., vol. 5, no. 12, pp. 292–301, 2016.
  16. V. D. Valerio, R. M. Pereira, Y. M. G. Costa, and D. Bertolini, “A resampling approach for imbalanceness on music genre classification using spectrograms,” in International Florida Artificial Intelligence Research Society Conference (FLAIRS-31), Florida, USA, May 2018, pp. 500–505.
  17. N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: synthetic minoriy over-sampling technique,” Journal of Artificial Intelligence Research, vol. 16, no. 1, pp. 321–357, 2002. doi: 10.1613/jair.953
  18. T. E. Tallo and A. Musdholifah, “The implementation of genetic algorithm in SMOTE (synthetic minority oversampling technique) for handling imbalanced dataset problem,” in 4th International Conference on Science and Technology, Yogyakarta, Indonesia, Aug. 2018, pp. 1–4. doi: 10.1109/ICSTC.2018.8528591
  19. H. Hairani, K. E. Saputro, and S. Fadli, “K-means-SMOTE for handling class imbalance in the classification of diabetes with C4.5, SVM, and naive Bayes,” Jurnal Teknologi dan Sistem Komputer, vol. 8, no. 2, pp. 89–93, 2020. doi: 10.14710/jtsiskom.8.2.2020.89-93
  20. M. Al-Khaldy, “Resampling imbalanced class and the effectiveness of feature selection methods for heart failure dataset,” International Robotics & Automation Journal, vol. 4, no. 1, pp. 37–45, 2018. doi: 10.15406/iratj.2018.04.00090
  21. J. Huang, Y. Wei, J. Yi, and M. Liu, “An improved knn based on class contribution and feature weighting,” in 10th International Conference on Measuring Technology and Mechatronics Automation, Changsha, China, Feb. 2018, pp. 313–316. doi: 10.1109/ICMTMA.2018.00083
  22. X. Wang, Z. Jiang, and D. Yu, “an improved knn algorithm based on kernel methods and attribute reduction,” in International Conference On Instrumentation And Measurement, Computer, Communication, And Control, Qinhuangdao, China, Sept. 2015, pp. 567–570. doi: 10.1109/IMCCC.2015.125
  23. A. More, “Survey of resampling techniques for improving classification performance in unbalanced datasets,” 2016, arXiv:1608.06048 .
  24. R. Batuwita and V. Palade, “Efficient resampling methods for training support vector machines with imbalanced datasets,” in International Joint Conference on Neural Networks, Barcelona, Spain, Jul. 2010, pp. 1-8. doi: 10.1109/IJCNN.2010.5596787
  25. A. N. Kasanah, Muladi, and U. Pujianto, “Penerapan teknik SMOTE untuk mengatasi imbalance class dalam klasifikasi objektivitas berita online menggunakan algoritma kNN,” RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 3, no. 10, pp.196-201, 2019. doi: 10.29207/resti.v3i2.945
  26. R. Siringoringo, “K-Nearest Neighbor pada prediksi cacat,” Journal Information System Development (ISD), vol. 2, no. 1, pp. 47–58, 2017.

No citation recorded.