Algoritme decision tree untuk mendeteksi ujaran kebencian dan bahasa kasar multilabel pada Twitter berbahasa Indonesia

Fauzi Ihsan; Iwan Iskandar; Nazruddin Safaat Harahap; Surya Agustian

doi:10.14710/jtsiskom.2021.13907

DOI: https://doi.org/10.14710/jtsiskom.2021.13907

Algoritme decision tree untuk mendeteksi ujaran kebencian dan bahasa kasar multilabel pada Twitter berbahasa Indonesia

Decision tree algorithm for multi-label hate speech and abusive language detection in Indonesian Twitter

Fauzi Ihsan, Iwan Iskandar, Nazruddin Safaat Harahap, Surya Agustian

Department of Informatics, UIN Sultan Syarif Kasim Riau. Jl. H.R. Soebrantas km 11.5 Simpang Baru Panam, Pekanbaru, Riau 28293, Indonesia

Received: 7 Sep 2020; Revised: 4 Jun 2021; Accepted: 8 Aug 2021; Published: 31 Oct 2021.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Citation Format:

Abstract

Hate speech and abusive language are easily found in written communications in social media like Twitter. They often cause a dispute between parties, the victims, and the first who write the tweet. However, it is also difficult to distinguish whether a tweet contains hate speech and/or abusive language for those who take sides. This research aims to develop a method to classify the tweets into abusive and/or contain hate speech classes. If hate speech is detected, then the system will measure the hardness level of hatred. The dataset includes 13,126 real tweets data. Word embeddings are used for featuring text input. For the tweets classification, we use a Decision Tree algorithm. Some engineering of features and parameters tuning has improved the classification of the three classes: hate speech class, abusive words, and hate speech level. The lexicon feature in the Decision Tree classification produces the highest accuracy for detecting the three classes rather than engineering special features and textual features. The average accuracy of the three classes increased from 69.77 % to 70.48 % for the training-testing composition of 90:10, and another 69.35 % to 69.54 % for 80:20 respectively.

Fulltext View|Download Email colleagues

Keywords: hate speech; abusive language; decision tree; Twitter; word embeddings

Funding: UIN Sultan Syarif Kasim Riau

Article Metrics:

Article Info

Section: Original Research Articles

Language : ID

In Volume 9, Issue 4, Year 2021 (October 2021)

Most viewed articles

Sistem Monitoring Digital Penggunaan dan Kualitas Kekeruhan Air PDAM Berbasis Mikrokontroler ATMega328 Menggunakan Sensor Aliran Air dan Sensor Fotodiode Intelligent System of Parking Reservation and Monitoring on Campus using Internet of Things Concept Pengembangan Sistem Pakar Untuk Diagnosis Penyakit Hepatitis Berbasis Web Menggunakan Metode Certainty Factor Perancangan dan Pengembangan Sistem Informasi Perpustakaan Berbasis Web (Studi Kasus Jurusan Sistem Komputer) Perancangan Sistem Informasi Akademik Berbasis Mobile Web Studi Kasus di Program Studi Sistem Komputer Universitas Diponegoro More articles

Most cited articles

Application of Quality of Service on Internet Network using Hierarchical Token Bucket Method Perancangan Papan Informasi Digital Berbasis Web pada Raspberry pi Speed Bump Enforcement System Based on Vehicle Speed Classified by Haar Cascade Classifier Rancang Bangun Sistem Informasi Kampus Hijau Berbasis Web Pada JSN (Jaringan Sensor Nirkabel) Prediction of Call Drops in GSM Network using Artificial Neural Network More cited articles

M. Febriyani, “Analisis faktor penyebab pelaku melakukan ujaran kebencian (hate speech ) dalam media sosial,” Poenale: Jurnal Bagian Hukum Pidana, vol. 3, no. 2, pp. 139–157, 2018
T. Davidson, D. Warmsley, M. Macy, and I. Weber, “Automated hate speech detection and the problem of offensive language,” arXiv:1703.04009v1 [cs.CL], 2017
A. F. Hidayatullah, A. A. Fadila, K. P. Juwairi, and R. A. Nayoan, “Identifikasi konten kasar pada tweet bahasa Indonesia,” Jurnal Linguistik Komputasional, vol. 2, no. 1, pp. 1-5, 2019. doi: 10.26418/jlk.v2i1.15
E. D. Putra, Menguak jejaring sosial. Tangerang, 2014
F. Gorunescu, Data mining: Concepts, models and techniques. Berlin: Springer, 2011
N. T. Romadloni, I. Santoso, and S. Budilaksono, “Perbandingan metode naive bayes, knn dan decision tree terhadap analisis sentimen transportasi commuter line,” Jurnal Komputer dan Informatika, vol. 3, no. 2, pp. 1–9, 2019
W. A. Luqyana, I. Cholissodin, and R. S. Perdana, “Analisis sentimen cyberbullying pada komentar instagram dengan metode klasifikasi support vector machine,” Jurnal Pengembangan Teknlogi Informasi dan Ilmu Komputer, vol. 2, no. 11, pp. 4704–4713, 2018
M. Hakiem and M. A. Fauzi, “Klasifikasi ujaran kebencian pada twitter menggunakan metode naïve bayes berbasis n-gram dengan seleksi fitur information gain,” Jurnal Pengembangan Teknlogi Informasi dan Ilmu Komputer, vol. 3, no. 3, pp. 2443–2451, 2019
M. O. Ibrohim and I. Budi, “Multi-label hate speech and abusive language detection in Indonesian Twitter,” in the Third Workshop on Abusive Language Online, Florence, Italy, Aug. 2019, pp. 46–57. doi: 10.18653/v1/W19-3506
A. K. B. A. Putra, M. A. Fauzi, B. D. Setiawan, and E. Setiawati, “Identifikasi ujaran kebencian pada Facebook dengan metode ensemble feature dan support vector machine,” Jurnal Pengembangan Teknlogi Informasi dan Ilmu Komputer, vol. 2, no. 12, 2018
T. Mikolov, K. Chen, G. Corrado, and J. Dean, "Efficient estimation of word representations in vector space," in International Conference on Learning Representations, Arizona, USA, May 2013, pp. 1-12
K. Antariksa, Y. S. Purnomo, and D. Ernawati, “Klasifikasi ujaran kebencian pada cuitan dalam bahasa Indonesia,” Jurnal Buana Informatika, vol. 10, no. 2, pp. 164–171, 2019. doi: 10.24002/jbi.v10i2.2451
S. Santoso, A. Dewa, B. Soetiono, E. Setyati, and E. M. Yuniarno, “Self-training naive bayes berbasis word2vec untuk kategorisasi berita bahasa Indonesia,” Jurnal Nasional Teknik Elektro dan Teknologi Informasi, vol. 7, no. 2, pp. 158–166, 2018. doi: 10.22146/jnteti.v7i2.418
Z. A. Arliyanti Nurdin, Bernadus Anggo Seno Aji, Anugrayani Bustamin, “Perbandingan kinerja word embedding word2vec, Glove dan FastText pada klasifikasi teks,” Jurnal Teknokompak, vol. 14, no. 2, pp. 74--79, 2020. doi: 10.33365/jtk.v14i2.732
D. M. W. Powers, "Evaluation: from precision, recall and f-measure to ROC, informedness, markedness & correlation," Journal of Machine Learning Technologies. vol. 2, no. 1, pp. 37–63, 2011

Last update:

Exploring the Performance of BERT Models for Multi-Label Hate Speech Detection on Indonesian Twitter
Muhammad Razi Mahardika, I Putu Janardana Wijaya, Arvin Rayhandi Prayoga, Henry Lucky, Irene Anindaputri Iswanto. 2023 4th International Conference on Artificial Intelligence and Data Sciences (AiDAS), 2023. doi: 10.1109/AiDAS60501.2023.10284596
Deep Learning based Multilabel Hateful Speech Text Comments Recognition and Classification Model for Resource Scarce Ethiopian Language: The case of Afaan Oromo
Naol Bakala Defersha, Jemal Abawajy, Kula Kekeba. 2022 IEEE International Conference on Current Development in Engineering and Technology (CCET), 2022. doi: 10.1109/CCET56606.2022.10080837

Last update: 2026-02-24 08:17:01

No citation recorded.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Starting from 2021, the author(s) whose article is published in the JTSiskom journal attain the copyright for their article and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. By submitting the manuscript to JTSiskom, the author(s) agree with this policy. No special document approval is required.

The author(s) guarantee that:

their article is original, written by the mentioned author(s),
has never been published before,
does not contain statements that violate the law, and
does not violate the rights of others, is subject to copyright held exclusively by the author(s), is free from the rights of third parties, and the necessary written permission to quote from other sources has been obtained by the author(s).

The author(s) retain all rights to the published work, such as (but not limited to) the following rights:

Copyright and other proprietary rights related to the article, such as patents,
The right to use the substance of the article in its own future works, including lectures and books,
The right to reproduce the article for its own purposes,
The right to archive all versions of the article in any repository, and
The right to enter into separate additional contractual arrangements for the non-exclusive distribution of published versions of the article (for example, posting them to institutional repositories or publishing them in a book), acknowledging its initial publication in this journal (Jurnal Teknologi dan Sistem Komputer).

Suppose the article was prepared jointly by more than one author. Each author submitting the manuscript warrants that all co-authors have given their permission to agree to copyright and license notices (agreements) on their behalf and notify co-authors of the terms of this policy. JTSiskom will not be held responsible for anything arising because of the writer's internal dispute. JTSiskom will only communicate with correspondence authors.

Authors should also understand that their articles (and any additional files, including data sets and analysis/computation data) will become publicly available once published. The license of published articles (and additional data) will be governed by a Creative Commons Attribution-ShareAlike 4.0 International License. JTSiskom allows users to copy, distribute, display and perform work under license. Users need to attribute the author(s) and JTSiskom to distribute works in journals and other publication media. Unless otherwise stated, the author(s) is a public entity as soon as the article is published.

Algoritme decision tree untuk mendeteksi ujaran kebencian dan bahasa kasar multilabel pada Twitter berbahasa Indonesia

Decision tree algorithm for multi-label hate speech and abusive language detection in Indonesian Twitter

EDITORIAL OFFICE OF JURNAL TEKNOLOGI DAN SISTEM KOMPUTER