skip to main content

Data scaling performance on various machine learning algorithms to identify abalone sex

Marine Information System, Universitas Pendidikan Indonesia, Jl. Dr. Setiabudi No.229, Isola, Sukasari, Bandung City, West Java 40154, Indonesia

Received: 15 Feb 2021; Revised: 22 Jul 2021; Accepted: 10 Aug 2021; Published: 31 Jan 2022.
Open Access Copyright (c) 2022 The authors. Published by Department of Computer Engineering, Universitas Diponegoro
Creative Commons License This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Citation Format:
Abstract
This study aims to analyze the performance of machine learning algorithms with the data scaling process to show the method's effectiveness. It uses min-max (normalization) and zero-mean (standardization) data scaling techniques in the abalone dataset. The stages carried out in this study included data normalization on the data of abalone physical measurement features. The model evaluation was carried out using k-fold cross-validation with the number of k-fold 10. Abalone datasets were normalized in machine learning algorithms: Random Forest, Naïve Bayesian, Decision Tree, and SVM (RBF kernels and linear kernels). The eight features of the abalone dataset show that machine learning algorithms did not too influence data scaling. There is an increase in the performance of SVM, while Random Forest decreases when the abalone dataset is applied to data scaling. Random Forest has the highest average balanced accuracy (74.87%) without data scaling.
Keywords: data scaling; machine learning algorithms; min-max normalization; zero-mean standardization
Funding: Universitas Pendidikan Indonesia

Article Metrics:

  1. M. I. Jordan and T. M. Mitchell, “Machine learning: Trends, perspectives, and prospects,” Science, vol. 349, no. 6245, pp. 255–260, 2015. doi: 10.1126/science.aaa8415
  2. C. Beyan and H. I. Browman, “Setting the stage for the machine intelligence era in marine science,” ICES Journal of Marine Science, vol. 77, no. 4, pp. 1267–1273, 2020. doi: 10.1093/icesjms/fsaa084
  3. K. Malde, N. O. Handegard, L. Eikvil, and A. B. Salberg, “Machine intelligence and the data-driven future of marine science,” ICES Journal of Marine Science, vol. 77, no. 4, pp. 1274–1285, 2020. doi: 10.1093/icesjms/fsaa084
  4. L. Moitinho-Silva et al., “Predicting the HMA-LMA status in marine sponges by machine learning,” Frontiers in Microbiology, vol. 8, no. 5, pp. 1–14, 2017. doi: 10.3389/fmicb.2017.00752
  5. Y. Shiu et al., “Deep neural networks for automated detection of marine mammal species,” Scientific Reports, vol. 10, no. 1, pp. 1–12, 2020. doi: 10.1038/s41598-020-57549-y
  6. L. Xu, M. Bennamoun, S. An, F. Sohel, and F. Boussaid, Deep learning for marine species recognition, vol. 136. Springer International Publishing, 2019. doi: 10.1007/978-3-030-11479-4_7
  7. K. J. Wang, H. L. Ren, D. D. Xu, L. Cai, and M. Yang, “Identification of the up-regulated expression genes in hemocytes of variously colored abalone (Haliotis diversicolor Reeve, 1846) challenged with bacteria,” Developmental and Comparative Immunology, vol. 32, no. 11, pp. 1326–1347, 2008. doi: 10.1016/j.dci.2008.04.007
  8. A. B. A. Graf and S. Borer, “Normalization in support vector machines,” in in Radig B., Florczyk S. (eds) Pattern Recognition, 2001, pp. 277–278. doi: 10.1007/3-540-45404-7_37
  9. I. Ariawan, Y. Herdiyeni, and I. Z. Siregar, “Geometric morphometric analysis of leaf venation in four shorea species for identification using digital image processing,” Biodiversitas, vol. 21, no. 7, pp. 3303–3309, 2020. doi: 10.13057/biodiv/d210754
  10. I. Ariawan, Y. Herdiyeni, and I. Z. Siregar, “Geometry feature extraction of shorea leaf venation based on digital image and classification using random forest,” International Journal of Computing and Digital Systems, vol. 11, no. 1, pp. 1–10, 2021. doi: 10.12785/ijcds/110111
  11. A. Juneja and N. N. Das, “Big data quality framework: pre-processing data in weather monitoring application,” in the International Conference on Machine Learning, Big Data, Cloud and Parallel Computing: Trends, Prespectives and Prospects, Faridabad, India, Oct. 2019, pp. 559–563. doi: 10.1109/COMITCon.2019.8862267
  12. A. Sahu, Z. Mao, K. Davis, and A. E. Goulart, “Data processing and model selection for machine learning-based network intrusion detection,” in IEEE International Workshop Technical Committee on Communications Quality and Reliability, Stevenson, USA, May 2020, pp. 1-6. doi: 10.1109/CQR47547.2020.9101394
  13. A. Ambarwari, Q. Jafar Adrian, and Y. Herdiyeni, “Analisis pengaruh data scaling terhadap performa algoritme machine learning untuk identifikasi tanaman,” Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 4, no. 1, pp. 117-122, 2020
  14. W. Li and Z. Liu, “A method of SVM with normalization in intrusion detection,” Procedia Environmental Sciences, vol. 11, no. PART A, pp. 256–262, 2011. doi: 10.1016/j.proenv.2011.12.040
  15. W. J. Nash, T. L. Sellers, S. R. Talbot, A. J. Cawthorn, and W. B. Ford, The Population Biology of Abalone (Haliotis species) in Tasmania. I. Blacklip Abalone (H. rubra) from the North Coast and the Islands of Bass Strait. Tasmania: The Sea Fisheries Division, Marine Research Laboratories, 1994
  16. A. T. Akbar, R. Husaini, B. M. Akbar, and S. Saifullah, “A proposed method for handling an imbalance data in classification of blood type based on Myers-Briggs type indicator,” Jurnal Teknologi dan Sistem Komputer, vol. 8, no. 4, pp. 276–283, 2020. doi: 10.14710/jtsiskom.2020.13625
  17. A. N. Kasanah, M. Muladi, and U. Pujianto, “Penerapan teknik SMOTE untuk mengatasi imbalance class dalam klasifikasi objektivitas berita online menggunakan algoritma KNN,” Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 3, no. 2, pp. 196–201, 2019. doi: 10.29207/resti.v3i2.945
  18. N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: Synthetic minority over-sampling technique,” Journal of Artificial Intelligence Research, vol. 16, pp. 321–357, 2002. doi: 10.1613/jair.953
  19. P. Kashyap, Machine Learning for Decision Makers. Bangalore: Apress, 2017. doi: 10.1007/978-1-4842-2988-0
  20. M. Liu, M. Wang, J. Wang, and D. Li, “Comparison of random forest, support vector machine and back propagation neural network for electronic tongue data classification: Application to the recognition of orange beverage and Chinese vinegar,” Sensors and Actuators, B: Chemical, vol. 177, pp. 970–980, 2013. doi: 10.1016/j.snb.2012.11.071
  21. S. Devella, Y. Yohannes, and F. N. Rahmawati, “Implementasi random forest untuk klasifikasi motif songket palembang berdasarkan SIFT,” JATISI (Jurnal Teknik Informatika dan Sistem Informasi), vol. 7, no. 2, pp. 310–320, 2020. doi: 10.35957/jatisi.v7i2.289

Last update:

  1. Applied Machine Learning and Data Analytics

    Ruben Barrera-Hernandez, Viridiana Barrera-Soto, Jose L. Martinez-Rodriguez, Ana B. Rios-Alvarado, Fernando Ortiz-Rodriguez. Communications in Computer and Information Science, 1818 , 2023. doi: 10.1007/978-3-031-34222-6_9

Last update: 2024-04-18 19:53:32

No citation recorded.