Data scaling performance on various machine learning algorithms to identify abalone sex

Willdan Aprizal Arifin; Ishak Ariawan; Ayang Armelita Rosalia; Lukman Lukman; Nabila Tufailah

doi:10.14710/jtsiskom.2021.14105

DOI: https://doi.org/10.14710/jtsiskom.2021.14105

Data scaling performance on various machine learning algorithms to identify abalone sex

Willdan Aprizal Arifin , Ishak Ariawan

, Ayang Armelita Rosalia

, Lukman Lukman, Nabila Tufailah

Marine Information System, Universitas Pendidikan Indonesia, Jl. Dr. Setiabudi No.229, Isola, Sukasari, Bandung City, West Java 40154, Indonesia

Received: 15 Feb 2021; Revised: 22 Jul 2021; Accepted: 10 Aug 2021; Published: 31 Jan 2022.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Citation Format:

Abstract

This study aims to analyze the performance of machine learning algorithms with the data scaling process to show the method's effectiveness. It uses min-max (normalization) and zero-mean (standardization) data scaling techniques in the abalone dataset. The stages carried out in this study included data normalization on the data of abalone physical measurement features. The model evaluation was carried out using k-fold cross-validation with the number of k-fold 10. Abalone datasets were normalized in machine learning algorithms: Random Forest, Naïve Bayesian, Decision Tree, and SVM (RBF kernels and linear kernels). The eight features of the abalone dataset show that machine learning algorithms did not too influence data scaling. There is an increase in the performance of SVM, while Random Forest decreases when the abalone dataset is applied to data scaling. Random Forest has the highest average balanced accuracy (74.87%) without data scaling.

Fulltext View|Download Email colleagues

Keywords: data scaling; machine learning algorithms; min-max normalization; zero-mean standardization

Funding: Universitas Pendidikan Indonesia

Article Metrics:

Article Info

Section: Original Research Articles

Language : EN

In Volume 10, Issue 1, Year 2022 (January 2022)

MWMOTE optimization for imbalanced data using complete linkage Flood Prediction with Ensemble Machine Learning using BP-NN and SVM Identification of significant protein in protein-protein interaction of Alzheimer disease using top-k representative skyline query Classification of potential blood donors using machine learning algorithms approach Maturity classification of cacao through spectrogram and convolutional neural network More related articles

Most cited articles

Perancangan dan Pengembangan Permainan “Super Sigi” Menggunakan Stencyl Sebagai Media Pengenalan Menyikat Gigi Pengembangan Sistem Informasi Rekam Medis untuk Dinas Kabupaten Grobogan Web Monitoring System of pH Level, Temperature and Color on River Water using Wireless Sensor Network Perancangan Aplikasi Multimedia Untuk Pembelajaran Gerbang Logika Menggunakan Augmented Reality Mamdani fuzzy inference system for mapping water quality level of biofloc ponds in catfish cultivation More cited articles

M. I. Jordan and T. M. Mitchell, “Machine learning: Trends, perspectives, and prospects,” Science, vol. 349, no. 6245, pp. 255–260, 2015. doi: 10.1126/science.aaa8415
C. Beyan and H. I. Browman, “Setting the stage for the machine intelligence era in marine science,” ICES Journal of Marine Science, vol. 77, no. 4, pp. 1267–1273, 2020. doi: 10.1093/icesjms/fsaa084
K. Malde, N. O. Handegard, L. Eikvil, and A. B. Salberg, “Machine intelligence and the data-driven future of marine science,” ICES Journal of Marine Science, vol. 77, no. 4, pp. 1274–1285, 2020. doi: 10.1093/icesjms/fsaa084
L. Moitinho-Silva et al., “Predicting the HMA-LMA status in marine sponges by machine learning,” Frontiers in Microbiology, vol. 8, no. 5, pp. 1–14, 2017. doi: 10.3389/fmicb.2017.00752
Y. Shiu et al., “Deep neural networks for automated detection of marine mammal species,” Scientific Reports, vol. 10, no. 1, pp. 1–12, 2020. doi: 10.1038/s41598-020-57549-y
L. Xu, M. Bennamoun, S. An, F. Sohel, and F. Boussaid, Deep learning for marine species recognition, vol. 136. Springer International Publishing, 2019. doi: 10.1007/978-3-030-11479-4_7
K. J. Wang, H. L. Ren, D. D. Xu, L. Cai, and M. Yang, “Identification of the up-regulated expression genes in hemocytes of variously colored abalone (Haliotis diversicolor Reeve, 1846) challenged with bacteria,” Developmental and Comparative Immunology, vol. 32, no. 11, pp. 1326–1347, 2008. doi: 10.1016/j.dci.2008.04.007
A. B. A. Graf and S. Borer, “Normalization in support vector machines,” in in Radig B., Florczyk S. (eds) Pattern Recognition, 2001, pp. 277–278. doi: 10.1007/3-540-45404-7_37
I. Ariawan, Y. Herdiyeni, and I. Z. Siregar, “Geometric morphometric analysis of leaf venation in four shorea species for identification using digital image processing,” Biodiversitas, vol. 21, no. 7, pp. 3303–3309, 2020. doi: 10.13057/biodiv/d210754
I. Ariawan, Y. Herdiyeni, and I. Z. Siregar, “Geometry feature extraction of shorea leaf venation based on digital image and classification using random forest,” International Journal of Computing and Digital Systems, vol. 11, no. 1, pp. 1–10, 2021. doi: 10.12785/ijcds/110111
A. Juneja and N. N. Das, “Big data quality framework: pre-processing data in weather monitoring application,” in the International Conference on Machine Learning, Big Data, Cloud and Parallel Computing: Trends, Prespectives and Prospects, Faridabad, India, Oct. 2019, pp. 559–563. doi: 10.1109/COMITCon.2019.8862267
A. Sahu, Z. Mao, K. Davis, and A. E. Goulart, “Data processing and model selection for machine learning-based network intrusion detection,” in IEEE International Workshop Technical Committee on Communications Quality and Reliability, Stevenson, USA, May 2020, pp. 1-6. doi: 10.1109/CQR47547.2020.9101394
A. Ambarwari, Q. Jafar Adrian, and Y. Herdiyeni, “Analisis pengaruh data scaling terhadap performa algoritme machine learning untuk identifikasi tanaman,” Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 4, no. 1, pp. 117-122, 2020
W. Li and Z. Liu, “A method of SVM with normalization in intrusion detection,” Procedia Environmental Sciences, vol. 11, no. PART A, pp. 256–262, 2011. doi: 10.1016/j.proenv.2011.12.040
W. J. Nash, T. L. Sellers, S. R. Talbot, A. J. Cawthorn, and W. B. Ford, The Population Biology of Abalone (Haliotis species) in Tasmania. I. Blacklip Abalone (H. rubra) from the North Coast and the Islands of Bass Strait. Tasmania: The Sea Fisheries Division, Marine Research Laboratories, 1994
A. T. Akbar, R. Husaini, B. M. Akbar, and S. Saifullah, “A proposed method for handling an imbalance data in classification of blood type based on Myers-Briggs type indicator,” Jurnal Teknologi dan Sistem Komputer, vol. 8, no. 4, pp. 276–283, 2020. doi: 10.14710/jtsiskom.2020.13625
A. N. Kasanah, M. Muladi, and U. Pujianto, “Penerapan teknik SMOTE untuk mengatasi imbalance class dalam klasifikasi objektivitas berita online menggunakan algoritma KNN,” Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 3, no. 2, pp. 196–201, 2019. doi: 10.29207/resti.v3i2.945
N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: Synthetic minority over-sampling technique,” Journal of Artificial Intelligence Research, vol. 16, pp. 321–357, 2002. doi: 10.1613/jair.953
P. Kashyap, Machine Learning for Decision Makers. Bangalore: Apress, 2017. doi: 10.1007/978-1-4842-2988-0
M. Liu, M. Wang, J. Wang, and D. Li, “Comparison of random forest, support vector machine and back propagation neural network for electronic tongue data classification: Application to the recognition of orange beverage and Chinese vinegar,” Sensors and Actuators, B: Chemical, vol. 177, pp. 970–980, 2013. doi: 10.1016/j.snb.2012.11.071
S. Devella, Y. Yohannes, and F. N. Rahmawati, “Implementasi random forest untuk klasifikasi motif songket palembang berdasarkan SIFT,” JATISI (Jurnal Teknik Informatika dan Sistem Informasi), vol. 7, no. 2, pp. 310–320, 2020. doi: 10.35957/jatisi.v7i2.289

Last update:

Applied Machine Learning and Data Analytics
Ruben Barrera-Hernandez, Viridiana Barrera-Soto, Jose L. Martinez-Rodriguez, Ana B. Rios-Alvarado, Fernando Ortiz-Rodriguez. Communications in Computer and Information Science, 1818 , 2023. doi: 10.1007/978-3-031-34222-6_9
When to Use Standardization and Normalization: Empirical Evidence From Machine Learning Models and XAI
Khaled Mahmud Sujon, Rohayanti Binti Hassan, Zeba Tusnia Towshi, Manal A. Othman, Md Abdus Samad, Kwonhue Choi. IEEE Access, 12 , 2024. doi: 10.1109/ACCESS.2024.3462434

Last update: 2025-08-19 11:22:38

No citation recorded.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Starting from 2021, the author(s) whose article is published in the JTSiskom journal attain the copyright for their article and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. By submitting the manuscript to JTSiskom, the author(s) agree with this policy. No special document approval is required.

The author(s) guarantee that:

their article is original, written by the mentioned author(s),
has never been published before,
does not contain statements that violate the law, and
does not violate the rights of others, is subject to copyright held exclusively by the author(s), is free from the rights of third parties, and the necessary written permission to quote from other sources has been obtained by the author(s).

The author(s) retain all rights to the published work, such as (but not limited to) the following rights:

Copyright and other proprietary rights related to the article, such as patents,
The right to use the substance of the article in its own future works, including lectures and books,
The right to reproduce the article for its own purposes,
The right to archive all versions of the article in any repository, and
The right to enter into separate additional contractual arrangements for the non-exclusive distribution of published versions of the article (for example, posting them to institutional repositories or publishing them in a book), acknowledging its initial publication in this journal (Jurnal Teknologi dan Sistem Komputer).

Suppose the article was prepared jointly by more than one author. Each author submitting the manuscript warrants that all co-authors have given their permission to agree to copyright and license notices (agreements) on their behalf and notify co-authors of the terms of this policy. JTSiskom will not be held responsible for anything arising because of the writer's internal dispute. JTSiskom will only communicate with correspondence authors.

Authors should also understand that their articles (and any additional files, including data sets and analysis/computation data) will become publicly available once published. The license of published articles (and additional data) will be governed by a Creative Commons Attribution-ShareAlike 4.0 International License. JTSiskom allows users to copy, distribute, display and perform work under license. Users need to attribute the author(s) and JTSiskom to distribute works in journals and other publication media. Unless otherwise stated, the author(s) is a public entity as soon as the article is published.

Data scaling performance on various machine learning algorithms to identify abalone sex

EDITORIAL OFFICE OF JURNAL TEKNOLOGI DAN SISTEM KOMPUTER