Malicious URLs detection using data streaming algorithms

Kayode Sakariyah Adewole; Muiz Olalekan Raheem; Muyideen Abdulraheem; Idowu Dauda Oladipo; Abdullateef Oluwagbemiga Balogun; Omotola Fatimah Baker

doi:10.14710/jtsiskom.2021.13965

DOI: https://doi.org/10.14710/jtsiskom.2021.13965

Malicious URLs detection using data streaming algorithms

Kayode Sakariyah Adewole

, Muiz Olalekan Raheem

, Muyideen Abdulraheem, Idowu Dauda Oladipo, Abdullateef Oluwagbemiga Balogun

, Omotola Fatimah Baker

Department of Computer Science, Faculty of Communication and Information Sciences, University of Ilorin. PMB 1515 Ilorin, Kwara State, Nigeria

Received: 29 Oct 2020; Revised: 7 Jul 2021; Accepted: 9 Jul 2021; Published: 31 Oct 2021.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

BibTex Citation Data :

@article{JTSISKOM13965,
    author = {Kayode Sakariyah Adewole and Muiz Olalekan Raheem and Muyideen Abdulraheem and Idowu Dauda Oladipo and Abdullateef Oluwagbemiga Balogun and Omotola Fatimah Baker},
    title = {Malicious URLs detection using data streaming algorithms},
    journal = {Jurnal Teknologi dan Sistem Komputer},
  volume = {9},
    number = {4},
    year = {2021},
    keywords = {Data streaming; Phishing; Naïve Bayes; Machine learning; Hoeffding Tree.},
    abstract = {As a result of advancements in technology and technological devices, data is now spawned at an infinite rate, emanating from a vast array of networks, devices, and daily operations like credit card transactions and mobile phones. Datastream entails sequential and real-time continuous data in the inform of evolving stream. However, the traditional machine learning approach is characterized by a batch learning model. Labeled training data are given apriori to train a model based on some machine learning algorithms. This technique necessitates the entire training sample to be readily accessible before the learning process. The training procedure is mainly done offline in this setting due to the high training cost. Consequently, the traditional batch learning technique suffers severe drawbacks, such as poor scalability for real-time phishing websites detection. The model mostly requires re-training from scratch using new training samples. This paper presents the application of streaming algorithms for detecting malicious URLs based on selected online learners: Hoeffding Tree (HT), Naïve Bayes (NB), and Ozabag. Ozabag produced promising results in terms of accuracy, Kappa and Kappa Temp on the dataset with large samples while HT and NB have the least prediction time with comparable accuracy and Kappa with Ozabag algorithm for the real-time detection of phishing websites.},
   issn = {2338-0403},   pages = {224--229}  doi = {10.14710/jtsiskom.2021.13965},
    url = {https://jtsiskom.undip.ac.id/article/view/13965}
}

Citation Format:

Abstract

As a result of advancements in technology and technological devices, data is now spawned at an infinite rate, emanating from a vast array of networks, devices, and daily operations like credit card transactions and mobile phones. Datastream entails sequential and real-time continuous data in the inform of evolving stream. However, the traditional machine learning approach is characterized by a batch learning model. Labeled training data are given apriori to train a model based on some machine learning algorithms. This technique necessitates the entire training sample to be readily accessible before the learning process. The training procedure is mainly done offline in this setting due to the high training cost. Consequently, the traditional batch learning technique suffers severe drawbacks, such as poor scalability for real-time phishing websites detection. The model mostly requires re-training from scratch using new training samples. This paper presents the application of streaming algorithms for detecting malicious URLs based on selected online learners: Hoeffding Tree (HT), Naïve Bayes (NB), and Ozabag. Ozabag produced promising results in terms of accuracy, Kappa and Kappa Temp on the dataset with large samples while HT and NB have the least prediction time with comparable accuracy and Kappa with Ozabag algorithm for the real-time detection of phishing websites.

Fulltext View|Download Email colleagues

Keywords: Data streaming; Phishing; Naïve Bayes; Machine learning; Hoeffding Tree.

Funding: University of Ilorin

Article Metrics:

Article Info

Section: Original Research Articles

Language : EN

In Volume 9, Issue 4, Year 2021 (October 2021)

Performance comparison of RSA and AES to SMS messages compression using Huffman algorithm Naïve Bayes, Decision Tree, and SVM Algorithm for Classification of Sharia Cooperative Customer Financing Approval Optimization of k value and lag parameter of k-nearest neighbor algorithm on the prediction of hotel occupancy rates MWMOTE optimization for imbalanced data using complete linkage Pengembangan Sistem Administrasi Pengolahan Data KKN Universitas Diponegoro More related articles

Most cited articles

Pembuatan Aplikasi Terintegrasi, Pendataan Barang di Gudang Berbasis Android Optimal Frequency Control System On Wind-Diesel Hybrid Generator With Imperialist Competitive Algorithm PID Parameters Auto-Tuning on GPS-based Antenna Tracker Control using Fuzzy Logic Perancangan dan Pengembangan Permainan “Super Sigi” Menggunakan Stencyl Sebagai Media Pengenalan Menyikat Gigi Flood Prediction with Ensemble Machine Learning using BP-NN and SVM More cited articles

D. Sahoo, C. Liu, and S. C. Hoi, “Malicious URL detection using machine learning: A survey,” arXiv:1701.07179v3 [cs.LG], 2019
R. K. Nepali and Y. Wang, “You look suspicious!!: Leveraging visible attributes to classify malicious short URLs on Twitter,” in the 49th Hawaii International Conference on System Sciences, Koloa, USA, Jan. 2016, pp. 2648-2655. doi: 10.1109/HICSS.2016.332
J. Ma, L. K. Saul, S. Savage, and G. M. Voelker, “Identifying suspicious URLs: an application of large-scale online learning,” in the 26th Annual International Conference on Machine Learning, Quebec, Canada, Jun. 2009, pp. 681-688. doi: 10.1145/1553374.1553462
H. Choi, B. B. Zhu, and H. Lee, “Detecting malicious web links and identifying their attack types,” WebApps, vol. 11, pp. 125-136, 2011
A. Zamir et al., “Phishing web site detection using diverse machine learning algorithms,” The Electronic Library, vol. 38, no. 1, pp. 65-80, 2020. doi: 10.1108/EL-05-2019-0118
R. Verma, D. Crane, and O. Gnawali, “Phishing during and after disaster: Hurricane Harvey,” in 2018 Resilience Week, Denver, USA, Aug. 2018, pp. 88-94. doi: 10.1109/RWEEK.2018.8473509
K. S. Adewole, A. G. Akintola, S. A. Salihu, N. Faruk, and R. G. Jimoh, “Hybrid rule-based model for phishing URLs detection,” in International Conference for Emerging Technologies in Computing, London, United Kingdom, Aug. 2019, pp. 119-135. doi: 10.1007/978-3-030-23943-5_9
M. Kuyama, Y. Kakizaki, and R. Sasaki, “Method for detecting a malicious domain by using only well-known information,” International Journal of Cyber-Security and Digital Forensics, vol. 5, no. 4, pp. 166-174, 2016. doi: 10.17781/P002212
D. Gugelmann, B. Ager, V. Lenders, and M. Happe, “Towards understanding upstream Web traffic,” in International Wireless Communications and Mobile Computing Conference, Dubrovnik, Croatia, Aug. 2015, pp. 538-544. doi: 10.1109/IWCMC.2015.7289141
W. Zhang, Q. Jiang, L. Chen, and C. Li, “Two-stage ELM for phishing web pages detection using hybrid features,” World Wide Web, vol. 20, pp. 797-813, 2017. doi: 10.1007/s11280-016-0418-9
H. Y. Abutair and A. Belghith, “Using case-based reasoning for phishing detection,” Procedia Computer Science, vol. 109, pp. 281-288, 2017. doi: 10.1016/j.procs.2017.05.352
P. Domingos and G. Hulten, "Mining high-speed data streams," in the sixth International Conference on Knowledge Discovery & Data Mining, Boston, USA, Aug. 2000, pp. 71-80. doi: 10.1145/347090.347107
C. Manapragada, G. I. Webb, and M. Salehi, “Extremely fast decision tree,” in the 24th International Conference on Knowledge Discovery & Data Mining, London, United Kingdom, Jul. 2018, pp. 1953-1962. doi: 10.1145/3219819.3220005
R. P. Ferreira et al., “Artificial neural network for websites classification with phishing characteristics,” Social Networking, vol. 7, no. 2, pp. 97-109, 2018. doi: 10.4236/sn.2018.72008
R. M. Mohammad, F. Thabtah, and L. McCluskey, “Predicting phishing websites based on self-structuring neural network,” Neural Computing and Applications, vol. 25, pp. 443-458, 2014. doi: 10.1007/s00521-013-1490-z
W. Ali and A. A. Ahmed, “Hybrid intelligent phishing website prediction using deep neural networks with genetic algorithm-based feature selection and weighting,” IET Information Security, vol. 13, pp. 659-669, 2019. doi: 10.1049/iet-ifs.2019.0006

Last update:

Empirical Analysis of Data Streaming and Batch Learning Models for Network Intrusion Detection
Kayode S. Adewole, Taofeekat T. Salau-Ibrahim, Agbotiname Lucky Imoize, Idowu Dauda Oladipo, Muyideen AbdulRaheem, Joseph Bamidele Awotunde, Abdullateef O. Balogun, Rafiu Mope Isiaka, Taye Oladele Aro. Electronics, 11 (19), 2022. doi: 10.3390/electronics11193109

Last update: 2026-02-14 16:58:44

No citation recorded.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Starting from 2021, the author(s) whose article is published in the JTSiskom journal attain the copyright for their article and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. By submitting the manuscript to JTSiskom, the author(s) agree with this policy. No special document approval is required.

The author(s) guarantee that:

their article is original, written by the mentioned author(s),
has never been published before,
does not contain statements that violate the law, and
does not violate the rights of others, is subject to copyright held exclusively by the author(s), is free from the rights of third parties, and the necessary written permission to quote from other sources has been obtained by the author(s).

The author(s) retain all rights to the published work, such as (but not limited to) the following rights:

Copyright and other proprietary rights related to the article, such as patents,
The right to use the substance of the article in its own future works, including lectures and books,
The right to reproduce the article for its own purposes,
The right to archive all versions of the article in any repository, and
The right to enter into separate additional contractual arrangements for the non-exclusive distribution of published versions of the article (for example, posting them to institutional repositories or publishing them in a book), acknowledging its initial publication in this journal (Jurnal Teknologi dan Sistem Komputer).

Suppose the article was prepared jointly by more than one author. Each author submitting the manuscript warrants that all co-authors have given their permission to agree to copyright and license notices (agreements) on their behalf and notify co-authors of the terms of this policy. JTSiskom will not be held responsible for anything arising because of the writer's internal dispute. JTSiskom will only communicate with correspondence authors.

Authors should also understand that their articles (and any additional files, including data sets and analysis/computation data) will become publicly available once published. The license of published articles (and additional data) will be governed by a Creative Commons Attribution-ShareAlike 4.0 International License. JTSiskom allows users to copy, distribute, display and perform work under license. Users need to attribute the author(s) and JTSiskom to distribute works in journals and other publication media. Unless otherwise stated, the author(s) is a public entity as soon as the article is published.

Malicious URLs detection using data streaming algorithms

EDITORIAL OFFICE OF JURNAL TEKNOLOGI DAN SISTEM KOMPUTER