Tree-based homogeneous ensemble model with feature selection for diabetic retinopathy prediction

Tamunopriye Ene Dagogo-George, Hammed Adeleye Mojeed, Abdulateef Oluwagbemiga Balogun, Modinat Abolore Mabayoje, Shakirat Aderonke Salihu

Abstract


Diabetic Retinopathy (DR) is a condition that emerges from prolonged diabetes, causing severe damages to the eyes. Early diagnosis of this disease is highly imperative as late diagnosis may be fatal. Existing studies employed machine learning approaches with Support Vector Machines (SVM) having the highest performance on most analyses and Decision Trees (DT) having the lowest. However, SVM has been known to suffer from parameter and kernel selection problems, which undermine its predictive capability. Hence, this study presents homogenous ensemble classification methods with DT as the base classifier to optimize predictive performance. Boosting and Bagging ensemble methods with feature selection were employed, and experiments were carried out using Python Scikit Learn libraries on DR datasets extracted from UCI Machine Learning repository. Experimental results showed that Bagged and Boosted DT were better than SVM. Specifically, Bagged DT performed best with accuracy 65.38 %, f-score 0.664, and AUC 0.731, followed by Boosted DT with accuracy 65.42 %, f-score 0.655, and AUC 0.724 when compared to SVM (accuracy 65.16 %, f-score 0.652, and AUC 0.721). These results indicate that DT's predictive performance can be optimized by employing the homogeneous ensemble methods to outperform SVM in predicting DR.

Keywords


machine learning; ensemble learning; diabetic retinopathy; decision trees

Full Text:

PDF

References


K. Zielinski, M. Duplaga, and D. Ingram, Information technology solutions for healthcare. Springer Science & Business Media, 2007. doi: 10.1007/1-84628-141-5

S. Dua, U. R. Acharya, and P. Dua, Machine learning in healthcare informatics. Springer, 2014. doi: 10.1007/978-3-642-40017-9

R. Beaglehole et al., "Improving the prevention and management of chronic disease in low-income and middle-income countries: a priority for primary health care," The Lancet, vol. 372, no. 9642, pp. 940-949, 2008. doi: 10.1016/S0140-6736(08)61404-X

P. S. Kumar and S. Pranavi, "Performance analysis of machine learning algorithms on diabetes dataset using big data analytic," in International Conference on Infocom Technologies and Unmanned Systems (Trends and Future Directions), Dubai, UAE, Dec. 2017, pp. 508-513. doi: 10.1109/ICTUS.2017.8286062

R. Balaji, R. Duraisamy, and M. Kumar, "Complications of diabetes mellitus: A review," Drug Invention Today, vol. 12, no. 1, 2019.

C. Dow et al., "Diet and risk of diabetic retinopathy: a systematic review," European journal of epidemiology, vol. 33, no. 2, pp. 141-156, 2018. doi: 10.1007/s10654-017-0338-8

S. Mohammadian, A. Karsaz, and Y. M. Roshan, "A comparative analysis of classification algorithms in diabetic retinopathy screening," in 7th International Conference on Computer and Knowledge Engineering, Mashhad, Iran, Oct. 2017, pp. 84-89. doi: 10.1109/ICCKE.2017.8167934

N. K. Das et al., "Investigation of alterations in multifractality in optical coherence tomographic images of in vivo human retina," Journal of Biomedical Optics, vol. 21, no. 9, 096004, 2016. doi: 10.1117/1.JBO.21.9.096004

G. Mahendran and R. Dhanasekaran, "Investigation of the severity level of diabetic retinopathy using supervised classifier algorithms," Computers & Electrical Engineering, vol. 45, pp. 312-323, 2015. doi: 10.1016/j.compeleceng.2015.01.013

R. Pal, J. Poray, and M. Sen, "Application of machine learning algorithms on diabetic retinopathy," in 2nd IEEE International Conference on Recent Trends in Electronics, Information & Communication Technology, Bangalore, India, May 2017, pp. 2046-2051. doi: 10.1109/RTEICT.2017.8256959

P. Sonar and K. JayaMalini, "Diabetes prediction using different machine learning approaches," in 3rd International Conference on Computing Methodologies and Communication, Erode, India, Mar. 2019, pp. 367-371. doi: 10.1109/ICCMC. 2019.8819841

H.-Y. Tsao, P.-Y. Chan, and E. C.-Y. Su, "Predicting diabetic retinopathy and identifying interpretable biomedical features using machine learning algorithms," BMC bioinformatics, vol. 19, no. 9, 195, 2018. doi: 10.1186/s12859-018-2277-0

S. Cui, D. Wang, Y. Wang, P.-W. Yu, and Y. Jin, "An improved support vector machine-based diabetic readmission prediction," Computer Methods and Programs in Biomedicine, vol. 166, pp. 123-135, 2018. doi: 10.1016/j.cmpb.2018.10.012

S. Yin and J. Yin, "Tuning kernel parameters for SVM based on expected square distance ratio," Information Sciences, vol. 370, pp. 92-102, 2016. doi: 10.1016/j.ins.2016.07.047

D. Zhao, H. Liu, Y. Zheng, Y. He, D. Lu, and C. Lyu, "Whale optimized mixed kernel function of support vector machine for colorectal cancer diagnosis," Journal of Biomedical Informatics, vol. 92, 103124, 2019. doi: 10.1016/j.jbi.2019.103124

A. O. Balogun, A. O. Bajeh, V. A. Orie, and W. A. Yusuf-Asaju, "Software defect prediction using ensemble learning: an ANP based evaluation method," FUOYE Journal of Engineering and Technology, vol. 3, no. 2, pp. 50-55, 2018. doi: 10.46792/fuoyejet.v3i2.200

A. O. Balogun, A. M. Balogun, P. O. Sadiku, and L. Amusa, "An ensemble approach based on decision tree and bayesian network for intrusion detection," Annals. Computer Science Series, vol. 15, no. 1, pp. 82-91, 2017.

S. P. Healey et al., "Mapping forest change using stacked generalization: An ensemble approach," Remote Sensing of Environment, vol. 204, pp. 717-728, 2018. doi: 10.1016/j.rse.2017.09.029

N. Gurudath, M. Celenk, and H. B. Riley, "Machine learning identification of diabetic retinopathy from fundus images," in IEEE Signal Processing in Medicine and Biology Symposium, Philadelphia, USA, Dec. 2014, pp. 1-7. doi: 10.1109/SPMB.2014.7002949

J. Lachure, A. Deorankar, S. Lachure, S. Gupta, and R. Jadhav, "Diabetic Retinopathy using morphological operations and machine learning," in IEEE International Advance Computing Conference, Banglore, India, Jun. 2015, pp. 617-622. doi: 10.1109/IADCC.2015.7154781

S. Murugeswari and R. Sukanesh, "Investigations of severity level measurements for diabetic macular oedema using machine learning algorithms," Irish Journal of Medical Science (1971-), vol. 186, no. 4, pp. 929-938, 2017. doi: 10.1007/s11845-017-1598-8

E. V. Carrera, A. González, and R. Carrera, "Automated detection of diabetic retinopathy using SVM," in IEEE XXIV International Conference on Electronics, Electrical Engineering and Computing, Cusco, Peru, Aug. 2017, pp. 1-4. doi: 10.1109/INTERCON.2017.8079692

S. Somasundaram and P. Alli, "A machine learning ensemble classifier for early prediction of diabetic retinopathy," Journal of Medical Systems, vol. 41, no. 12, 201, 2017. doi: 10.1007/s10916-017-0853-x

A. O. Balogun, S. Basri, S. J. Abdulkadir, and A. S. Hashim, "Performance analysis of feature selection methods in software defect prediction: a search method approach," Applied Sciences, vol. 9, no. 13, p. 2764, 2019. doi: 10.3390/app9132764

S. Piri, D. Delen, T. Liu, and H. M. Zolbanin, "A data analytics approach to building a clinical decision support system for diabetic retinopathy: Developing and deploying a model ensemble," Decision Support Systems, vol. 101, pp. 12-27, 2017. doi: 10.1016/j.dss.2017.05.012

Y. Yang, Temporal data mining via unsupervised ensemble learning. Elsevier, 2016. doi: 10.1016/B978-0-12-811654-8.00004-X

C. Zhang and Y. Ma, Ensemble machine learning: methods and applications. Springer, 2012. doi: 10.1007/978-1-4419-9326-7

A. G. Akintola, A. O. Balogun, F. Lafenwa-Balogun, and H. A. Mojeed, "Comparative analysis of selected heterogeneous classifiers for software defects prediction using filter-based feature selection methods," FUOYE Journal of Engineering and Technology, vol. 3, no. 1, pp. 134-137, 2018. doi: 10.46792/fuoyejet.v3i1.178

M. A. Mabayoje, A. O. Balogun, S. M. Bello, J. O. Atoyebi, H. A. Mojeed, and A. H. Ekundayo, "Wrapper feature selection based heterogeneous classifiers for software defect prediction," Adeleke University Journal of Engineering and Technology, vol. 2, no. 1, pp. 1-11, 2019.




DOI: https://doi.org/10.14710/jtsiskom.2020.13669

Copyright (c) 2020 Jurnal Teknologi dan Sistem Komputer

License URL: http://creativecommons.org/licenses/by-sa/4.0