A proposed method for handling an imbalance data in classification of blood type based on Myers-Briggs type indicator

Ahmad Taufiq Akbar, Rochmat Husaini, Bagus Muhammad Akbar, Shoffan Saifullah

Abstract


Blood type still leads to an assumption about its relation to some personality aspects. This study observes preprocessing methods for improving the classification accuracy of MBTI data to determine blood type. The training and testing data use 250 data from the MBTI questionnaire answers given by 250 respondents. The classification uses the k-Nearest Neighbor (k-NN) algorithm. Without preprocessing, k-NN results in about 32 % accuracy, so it needs some preprocessing to handle data imbalance before the classification. The proposed preprocessing consists of two-stage, the first stage is the unsupervised resample, and the second is the supervised resample. For the validation, it uses ten cross-validations. The result of k-Nearest Neighbor classification after using these proposed preprocessing stages has finally increased the accuracy, F-score, and recall significantly.

Keywords


imbalance data; blood type; resample; k-nearest neighbor; MBTI

Full Text:

PDF

References


S. Tsuchimine, J. Saruwatari, A. Kaneda, and N. Yasui-Furukori, “ABO blood type and personality traits in healthy Japanese subjects,” PLoS One, vol. 10, no. 5, pp. 1-10, 2015. doi: 10.1371/journal. pone.0126983

A. Nahida, N. Chatterjee, and C. A. Nahida, “A study on relationship between blood group and personality,” International Journal of Home Sciences, vol. 2, no. 21, pp. 239–243, 2016.

C. Y. Lee and S. Chin, “Finding EEG correlates of ABO blood types,” International Journal of Multimedia and Ubiquitous Engineering, vol. 9, no. 3, pp. 291–300, 2014.

S. Bharadwaj, S. Sridhar, R. Choudhary, and R. Srinath, “persona traits identification based on myers-briggs type indicator (MBTI) - a text classification approach,” in 2018 international conference on advances in computing, communications and informatics, bangalore, india, sept. 2018, pp. 1076–1082. doi: 10.1109/ICACCI. 2018.8554828

F. Noori and M. Kazemifard, “Simulation of pair programming using multi-agent and MBTI personality model,” in 6th International Conference of Cognitive Science, Tehran, Iran, Apr. 2015, pp. 29–36. doi: 10.1109/COGSCI.2015.7426665

M. S. Halawa, M. E. Shehab, and E. M. R. Hamed, “Predicting student personality based on a data-driven model from student behavior on LMS and social networks,” in 5th International Conference on Digital Information Processing and Communications, Sierre, Switzerland, Oct. 2015, pp. 294–299. doi: 10.1109/ICDIPC.2015.7323044

S. Selvi, S. Rohini, and C. Velou, “Relation between blood group and mood changes,” Indian Journal of Basic and Applied Medical Research, vol. 6, no. 3, pp. 118–125, 2017.

J. Patil et al., “Influence of blood group on the character traits - A cross-sectional study on Malaysian student population,” Journal of Chemical and Pharmaceutical Sciences, vol. 9, no. 2, pp. 865–868, 2016.

L. S. Katore and J. S. Umale, “Comparative study of recommendation algorithms and systems using WEKA,” International Journal of computer Applications, vol. 110, no. 3, pp. 14–17. doi: 10.5120/19295-0731

Z. Zheng, Y. Cai, and Y. Li, “Oversampling method for imbalanced classification,” Computing and Informatics, vol. 34, no. 5, pp. 1017–1037, 2015.

G. N. Ramadevi, K. U. Rani, and D. Lavanya, “Evaluation of Classifiers Performance using Resampling on Breast cancer Data,” International Journal of Scientific & Engineering Research, vol. 6, no. 2, pp. 200–207, 2015.

S. Zhang et al., “Efficient knn classification with different numbers of nearest neighbors,” IEEE Transactions On Neural Networks And Learning Systems, vol. 29, no. 5, pp. 1–12, 2017. doi: 10.1109/TNNLS.2017.2673241

Hartono, O. S. Sitompul, T. Tulus, and E. B. Nababan, “Biased support vector machine and weighted-SMOTE in handling class imbalance problem,” International Journal of Advances in Intelligent Informatics, vol. 4, no. 1, pp. 21–27, 2018. doi: 10.26555/ijain.v4i1.146

N. Cahyana, S. Khomsah, and A. S. Aribowo, “Improving imbalanced dataset classification using oversampling and gradient boosting,” in 5th International Conference on Science in Information Technology, Yogyakarta, Indonesia, Oct. 2019, pp. 217–222. doi: 10.1109/ICSITech46713.2019.8987499

M. Tajik, M. Malakpour, and J. G. Bidgoli, “Examine the relationship between blood groups and intercity driving jobs in Iran,” International Journal of Medical Research & Health Science., vol. 5, no. 12, pp. 292–301, 2016.

V. D. Valerio, R. M. Pereira, Y. M. G. Costa, and D. Bertolini, “A resampling approach for imbalanceness on music genre classification using spectrograms,” in International Florida Artificial Intelligence Research Society Conference (FLAIRS-31), Florida, USA, May 2018, pp. 500–505.

N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: synthetic minoriy over-sampling technique,” Journal of Artificial Intelligence Research, vol. 16, no. 1, pp. 321–357, 2002. doi: 10.1613/jair.953

T. E. Tallo and A. Musdholifah, “The implementation of genetic algorithm in SMOTE (synthetic minority oversampling technique) for handling imbalanced dataset problem,” in 4th International Conference on Science and Technology, Yogyakarta, Indonesia, Aug. 2018, pp. 1–4. doi: 10.1109/ICSTC.2018.8528591

H. Hairani, K. E. Saputro, and S. Fadli, “K-means-SMOTE for handling class imbalance in the classification of diabetes with C4.5, SVM, and naive Bayes,” Jurnal Teknologi dan Sistem Komputer, vol. 8, no. 2, pp. 89–93, 2020. doi: 10.14710/jtsiskom.8.2.2020.89-93

M. Al-Khaldy, “Resampling imbalanced class and the effectiveness of feature selection methods for heart failure dataset,” International Robotics & Automation Journal, vol. 4, no. 1, pp. 37–45, 2018. doi: 10.15406/iratj.2018.04.00090

J. Huang, Y. Wei, J. Yi, and M. Liu, “An improved knn based on class contribution and feature weighting,” in 10th International Conference on Measuring Technology and Mechatronics Automation, Changsha, China, Feb. 2018, pp. 313–316. doi: 10.1109/ICMTMA.2018.00083

X. Wang, Z. Jiang, and D. Yu, “an improved knn algorithm based on kernel methods and attribute reduction,” in International Conference On Instrumentation And Measurement, Computer, Communication, And Control, Qinhuangdao, China, Sept. 2015, pp. 567–570. doi: 10.1109/IMCCC.2015.125

A. More, “Survey of resampling techniques for improving classification performance in unbalanced datasets,” 2016, arXiv:1608.06048 .

R. Batuwita and V. Palade, “Efficient resampling methods for training support vector machines with imbalanced datasets,” in International Joint Conference on Neural Networks, Barcelona, Spain, Jul. 2010, pp. 1-8. doi: 10.1109/IJCNN.2010.5596787

A. N. Kasanah, Muladi, and U. Pujianto, “Penerapan teknik SMOTE untuk mengatasi imbalance class dalam klasifikasi objektivitas berita online menggunakan algoritma kNN,” RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 3, no. 10, pp.196-201, 2019. doi: 10.29207/resti.v3i2.945

R. Siringoringo, “K-Nearest Neighbor pada prediksi cacat,” Journal Information System Development (ISD), vol. 2, no. 1, pp. 47–58, 2017.




DOI: https://doi.org/10.14710/jtsiskom.2020.13625

Copyright (c) 2020 Jurnal Teknologi dan Sistem Komputer

License URL: http://creativecommons.org/licenses/by-sa/4.0