Perbandingan Metode Ensemble Machine Learning untuk Klasifikasi Tenaga Kerja di Indonesia dengan Random Forest, XGBoost, dan CatBoost

Bayu Dwi Kurniawan; Arie Wahyu Wijayanto

doi:10.14710/jtsiskom.2022.14031

DOI: https://doi.org/10.14710/jtsiskom.2022.14031

Perbandingan Metode Ensemble Machine Learning untuk Klasifikasi Tenaga Kerja di Indonesia dengan Random Forest, XGBoost, dan CatBoost

Bayu Dwi Kurniawan¹, Arie Wahyu Wijayanto²

¹Politeknik Statistika STIS, Indonesia

²Politeknik Statistika STIS, Jl. Otto Iskandardinata 64C Jakarta Timur, Indonesia

Received: 28 Dec 2020; Published: 24 Sep 2024.

Citation Format:

Abstract

Survei Angkatan Kerja Nasional (Sakernas) adalah survei periodik yang besar sehingga membutuhkan pengolahan data kompleks serta validasi benar untuk menjaga kualitas data. Salah satu pertanyaan Sakernas yang pengisian dan validasinya secara manual yaitu lapangan pekerjaan utama. Untuk memberikan validasi, Machine Learning dapat diterapkan dengan memanfaatkan informasi pada isian lain. Penelitian ini menggunakan metode Random Forest, XGBoost, dan CatBoost untuk klasifikasi lapangan pekerjaan utama pada Sakernas Agustus 2019. Berdasarkan hasil, ketiga model memiliki performa yang hampir sama baik dari presisi, recall, dan f1 yaitu untuk sektor primer dan tersier diatas 90 % dan sektor sekunder sebesar 80%. Model dari Random Forest, XGBoost, dan CatBoost memiliki akurasi sebesar 91,80%; 90,88%; dan 91,84%. Nilai Area Under Curve (AUC) dari ketiga model relatif tinggi dengan CatBoost memiliki nilai tertinggi pada klasifikasi sektor primer, sekunder, dan tersier masing-masing sebesar 1,00; 0,97; dan 0,98.

Note: This article has supplementary file(s).

Fulltext | Instrumen Riset

Deskirpsi Atribut Penelitian

Subject
Type	Instrumen Riset
	Download (15KB) Indexing metadata

common.other

Perjanjian Pengalihan Hak Cipta

Subject
Type	Other
	Download (364KB) Indexing metadata

Email colleagues

Keywords: sakernas; random forest; xgboost; catboost

Article Metrics:

Article Info

Section: Original Research Articles

Language : ID

In [IN PRESS] Volume 10, Issue 4, Year 2022 (October 2022)

Grape leaf image disease classification using CNN-VGG16 model Tree-based homogeneous ensemble model with feature selection for diabetic retinopathy prediction Combining the NER-OCR methods to improve information retrieval efficiency in the Indonesian posters Data scaling performance on various machine learning algorithms to identify abalone sex Pandemic dynamics prediction in Java using the Moving Average and the Knowledge Growing System methods More related articles

Most cited articles

Application caching strategy based on in-memory using Redis server to accelerate relational data access Pengembangan Teknologi Informasi Mobile Learning Universitas Diponegoro Berbasis Android Expert System for Diagnosis of Plant Pest and Disease Horticulture with Forward and Backward Chaining Inference Web Monitoring System of pH Level, Temperature and Color on River Water using Wireless Sensor Network Sistem Informasi Geografis Pariwisata Kota Semarang More cited articles

A. Ikudo, J. I. Lane, J. Staudt, and B. A. Weinberg, “Occupational Classifications: A Machine Learning Approach,” Journal of Economic and Social Measurement, vol. 44, pp. 57–87, 2020, doi: 10.3233/JEM-190463
A. Y. Wijayanto and D. W. Sari, “Analysis of Decision to Work of Female Workers in Indonesia,” Economics Development Analysis Journal, vol. 8, no. 3, pp. 290–300, 2019, doi: 10.15294/edaj.v8i3.29529
M. Beck, F. Dumpert, and J. Feuerhake, “Machine Learning in Official Statistics,” arXiv. 2018. doi: 10.48550/arXiv.1812.10422
W. Hacking and L. Wilenborg, “Method Series Theme: Coding; interpreting short descriptions using a classification,” 2012
Y. Toko, K. Wada, S. Yui, and M. Sato-Ilic, “A Supervised Multiclass Classifier as an Autocoding System for the Family Income and Expenditure Survey,” in Advanced Studies in Classification and Data Science, Studies in Classification, Data Analysis, and Knowledge Organization, Singapore: Springer Nature Singapore Pte Ltd, 2020, pp. 513–524. doi: 10.1007/978-981-15-3311-2_40
A. Gerunov, “Employment Modelling Through Classification and Regression Trees,” International Journal of Data Science, vol. 1, no. 4, p. 316, 2016, doi: 10.1504/ijds.2016.081368
L. Rokach, “Decision forest: Twenty Years of Research,” Information Fusion, vol. 27, pp. 111–125, 2016, doi: 10.1016/j.inffus.2015.06.005
A. Lawi, F. Aziz, and S. Syarif, “Ensemble GradientBoost for Increasing Classification Accuracy of Credit Scoring,” in 2017 4th International Conference on Computer Applications and Information Processing Technology (CAIPT), 2017, pp. 1–4. doi: 10.1109/CAIPT.2017.8320700
X. Li, J. Liu, S. Liu, and J. Wang, “Differentially private ensemble learning for classification,” Neurocomputing, vol. 430, pp. 34–46, 2021, doi: 10.1016/j.neucom.2020.12.051
R. Punmiya and S. Choe, “Energy theft detection using gradient boosting theft detector with feature engineering-based preprocessing,” IEEE Transactions on Smart Grid, vol. 10, no. 2, pp. 2326–2329, 2019, doi: 10.1109/TSG.2019.2892595
S. Jhaveri, I. Khedkar, Y. Kantharia, and S. Jaswal, “Success prediction using random forest, catboost, xgboost and adaboost for kickstarter campaigns,” 2019 3rd International Conference on Computing Methodologies and Communication (ICCMC), 2019, pp. 1170-1173, doi: 10.1109/ICCMC.2019.8819828
A. S. More and D. P. Rana, “Review of Random Forest Classification Techniques to Resolve Data Imbalance,” in 2017 1st International Conference on Intelligent Systems and Information Management (ICISIM), 2017, pp. 72–78. doi: 10.1109/ICISIM.2017.8122151
J. L. Speiser, M. E. Miller, J. Tooze, and E. Ip, “A comparison of random forest variable selection methods for classification prediction modeling,” Expert Systems with Applications, vol. 134, pp. 93–101, 2019, doi: 10.1016/j.eswa.2019.05.028
V. A. Dev and M. R. Eden, “Formation lithology classification using scalable gradient boosted decision trees,” Computers and Chemical Engineering, vol. 128, pp. 392–404, 2019, doi: 10.1016/j.compchemeng.2019.06.001
J. H. Friedman, “Stochastic gradient boosting,” Computational Statistics and Data Analysis, vol. 38, no. 4, pp. 367–378, 2002, doi: 10.1016/S0167-9473(01)00065-2
J. Zhang, Q. Feng, X. Zhang, C. Shu, S. Wang, and K. Wu, “A Supervised Learning Approach for Accurate Modeling of CO2-Brine Interfacial Tension with Application in Identifying the Optimum Sequestration Depth in Saline Aquifers,” Energy and Fuels, vol. 34, no. 6, pp. 7353–7362, 2020, doi: 10.1021/acs.energyfuels.0c00846
J. Ma, J. C. P. Cheng, Z. Xu, K. Chen, C. Lin, and F. Jiang, “Identification of the most influential areas for air pollution control using XGBoost and Grid Importance Rank,” Journal of Cleaner Production, vol. 274, p. 122835, 2020, doi: 10.1016/j.jclepro.2020.122835
X. Dou, “Online Purchase Behavior Prediction and Analysis Using Ensemble Learning,” 2020 IEEE 5th International Conference on Cloud Computing and Big Data Analytics, ICCCBDA 2020, pp. 532–536, 2020, doi: 10.1109/ICCCBDA49378.2020.9095554
Badan Pusat Statistik, “Kuesioner Survei Angkatan Kerja Nasional 2019,” 2019. [Online]. Available: https://sirusa.bps.go.id/sirusa/index.php/kuesioner/2386
S. González, S. García, J. Del Ser, L. Rokach, and F. Herrera, “A practical tutorial on bagging and boosting based ensembles for machine learning: Algorithms, software tools, performance study, practical perspectives and opportunities,” Information Fusion, vol. 64, no. May, pp. 205–237, 2020, doi: 10.1016/j.inffus.2020.07.007
S. Agarwal, "Data Mining: Data Mining Concepts and Techniques," 2013 International Conference on Machine Intelligence and Research Advancement, 2013, pp. 203-207, doi: 10.1109/ICMIRA.2013.45
H. Nguyen, X. N. Bui, H. B. Bui, and D. T. Cuong, “Developing an XGBoost model to predict blast-induced peak particle velocity in an open-pit mine: a case study,” Acta Geophysica, vol. 67, no. 2, pp. 477–490, 2019, doi: 10.1007/s11600-019-00268-4
T. Chen and C. Guestrin, “XGBoost: A scalable tree boosting system,” in Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016, pp. 785–794. doi: 10.1145/2939672.2939785
L. Prokhorenkova, G. Gusev, A. Vorobev, A. V. Dorogush, and A. Gulin, “Catboost: Unbiased boosting with categorical features,” Advances in Neural Information Processing Systems, vol. 2018-Decem, no. Section 4, pp. 6638–6648, 2018. doi: 10.48550/arXiv.1706.09516
X. Fei, Y. Fang, and Q. Ling, “Discrimination of Excessive Exhaust Emissions of Vehicles based on Catboost Algorithm,” Proceedings of the 32nd Chinese Control and Decision Conference, CCDC 2020, pp. 4396–4401, 2020, doi: 10.1109/CCDC49329.2020.9164224
S. Yadav and S. Shukla, "Analysis of k-Fold Cross-Validation over Hold-Out Validation on Colossal Datasets for Quality Classification," 2016 IEEE 6th International Conference on Advanced Computing (IACC), 2016, pp. 78-83, doi: 10.1109/IACC.2016.25
S. Wang et al., “A new method of diesel fuel brands identification: SMOTE oversampling combined with XGBoost ensemble learning,” Fuel, vol. 282, no. March, p. 118848, 2020, doi: 10.1016/j.fuel.2020.118848
X. He, B. D. Gallas, and E. C. Frey, “Three-class ROC analysis toward a general decision theoretic solution,” IEEE Transactions on Medical Imaging, vol. 29, no. 1, pp. 206–215, 2010, doi: 10.1109/TMI.2009.2034516
I. M. El-hasnony, S. I. Barakat, M. Elhoseny, and R. R. Mostafa, “Improved Feature Selection Model for Big Data Analytics,” vol. 8, pp. 66989–67004, 2020, doi: 10.1109/ACCESS.2020.2986232
F. Mohr and J. N. van Rijn, “Fast and Informative Model Selection using Learning Curve Cross-Validation,” Nov. 2021, doi: 10.48550/arXiv.2111.13914
J. Hancock and T. M. Khoshgoftaar, “Performance of CatBoost and XGBoost in Medicare Fraud Detection,” in 2020 19th IEEE International Conference on Machine Learning and Applications (ICMLA), 2020, pp. 572–579. doi: 10.1109/ICMLA51294.2020.00095
M. M. Muhammed, A. A. Ibrahim, R. L. Ridwan, R. O. Abdulaziz, and G. A. Saheed, “Comparison of the CatBoost Classifier with other Machine Learning Methods,” 2020. doi: 10.14569/IJACSA.2020.0111190
A. M. W. Saputra, A. W. Wijayanto, "Implementation of Ensemble Techniques for Diarrhea Cases Classification of Under-Five Children in Indonesia," Jurnal Ilmu Pengetahuan dan Teknologi Komputer, vol. 6, no. 2, pp. 175-180, 2021, doi: 10.33480/jitk.v6i2.1935
I. Kemala, A. W. Wijayanto, "Perbandingan Kinerja Metode Bagging dan Non-Ensemble Machine Learning pada Klasifikasi Wilayah di Indonesia menurut Indeks Pembangunan Manusia," Jurnal Sistem dan Teknologi Informasi, vol. 9, no. 2, pp. 269-275, 2021, doi: 10.26418/justin.v9i2.44166

Last update:

No citation recorded.

Last update: 2025-08-19 11:04:55

No citation recorded.

Starting from 2021, the author(s) whose article is published in the JTSiskom journal attain the copyright for their article and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. By submitting the manuscript to JTSiskom, the author(s) agree with this policy. No special document approval is required.

The author(s) guarantee that:

their article is original, written by the mentioned author(s),
has never been published before,
does not contain statements that violate the law, and
does not violate the rights of others, is subject to copyright held exclusively by the author(s), is free from the rights of third parties, and the necessary written permission to quote from other sources has been obtained by the author(s).

The author(s) retain all rights to the published work, such as (but not limited to) the following rights:

Copyright and other proprietary rights related to the article, such as patents,
The right to use the substance of the article in its own future works, including lectures and books,
The right to reproduce the article for its own purposes,
The right to archive all versions of the article in any repository, and
The right to enter into separate additional contractual arrangements for the non-exclusive distribution of published versions of the article (for example, posting them to institutional repositories or publishing them in a book), acknowledging its initial publication in this journal (Jurnal Teknologi dan Sistem Komputer).

Suppose the article was prepared jointly by more than one author. Each author submitting the manuscript warrants that all co-authors have given their permission to agree to copyright and license notices (agreements) on their behalf and notify co-authors of the terms of this policy. JTSiskom will not be held responsible for anything arising because of the writer's internal dispute. JTSiskom will only communicate with correspondence authors.

Authors should also understand that their articles (and any additional files, including data sets and analysis/computation data) will become publicly available once published. The license of published articles (and additional data) will be governed by a Creative Commons Attribution-ShareAlike 4.0 International License. JTSiskom allows users to copy, distribute, display and perform work under license. Users need to attribute the author(s) and JTSiskom to distribute works in journals and other publication media. Unless otherwise stated, the author(s) is a public entity as soon as the article is published.

Perbandingan Metode Ensemble Machine Learning untuk Klasifikasi Tenaga Kerja di Indonesia dengan Random Forest, XGBoost, dan CatBoost

EDITORIAL OFFICE OF JURNAL TEKNOLOGI DAN SISTEM KOMPUTER