skip to main content

Malicious URLs detection using data streaming algorithms

Department of Computer Science, Faculty of Communication and Information Sciences, University of Ilorin. PMB 1515 Ilorin, Kwara State, Nigeria

Received: 29 Oct 2020; Revised: 7 Jul 2021; Accepted: 9 Jul 2021; Published: 31 Oct 2021.
Open Access Copyright (c) 2021 The authors. Published by Department of Computer Engineering, Universitas Diponegoro
Creative Commons License This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Citation Format:
Abstract
As a result of advancements in technology and technological devices, data is now spawned at an infinite rate, emanating from a vast array of networks, devices, and daily operations like credit card transactions and mobile phones. Datastream entails sequential and real-time continuous data in the inform of evolving stream. However, the traditional machine learning approach is characterized by a batch learning model. Labeled training data are given apriori to train a model based on some machine learning algorithms. This technique necessitates the entire training sample to be readily accessible before the learning process. The training procedure is mainly done offline in this setting due to the high training cost. Consequently, the traditional batch learning technique suffers severe drawbacks, such as poor scalability for real-time phishing websites detection. The model mostly requires re-training from scratch using new training samples. This paper presents the application of streaming algorithms for detecting malicious URLs based on selected online learners: Hoeffding Tree (HT), Naïve Bayes (NB), and Ozabag. Ozabag produced promising results in terms of accuracy, Kappa and Kappa Temp on the dataset with large samples while HT and NB have the least prediction time with comparable accuracy and Kappa with Ozabag algorithm for the real-time detection of phishing websites.
Keywords: Data streaming; Phishing; Naïve Bayes; Machine learning; Hoeffding Tree.
Funding: University of Ilorin

Article Metrics:

  1. D. Sahoo, C. Liu, and S. C. Hoi, “Malicious URL detection using machine learning: A survey,” arXiv:1701.07179v3 [cs.LG], 2019
  2. R. K. Nepali and Y. Wang, “You look suspicious!!: Leveraging visible attributes to classify malicious short URLs on Twitter,” in the 49th Hawaii International Conference on System Sciences, Koloa, USA, Jan. 2016, pp. 2648-2655. doi: 10.1109/HICSS.2016.332
  3. J. Ma, L. K. Saul, S. Savage, and G. M. Voelker, “Identifying suspicious URLs: an application of large-scale online learning,” in the 26th Annual International Conference on Machine Learning, Quebec, Canada, Jun. 2009, pp. 681-688. doi: 10.1145/1553374.1553462
  4. H. Choi, B. B. Zhu, and H. Lee, “Detecting malicious web links and identifying their attack types,” WebApps, vol. 11, pp. 125-136, 2011
  5. A. Zamir et al., “Phishing web site detection using diverse machine learning algorithms,” The Electronic Library, vol. 38, no. 1, pp. 65-80, 2020. doi: 10.1108/EL-05-2019-0118
  6. R. Verma, D. Crane, and O. Gnawali, “Phishing during and after disaster: Hurricane Harvey,” in 2018 Resilience Week, Denver, USA, Aug. 2018, pp. 88-94. doi: 10.1109/RWEEK.2018.8473509
  7. K. S. Adewole, A. G. Akintola, S. A. Salihu, N. Faruk, and R. G. Jimoh, “Hybrid rule-based model for phishing URLs detection,” in International Conference for Emerging Technologies in Computing, London, United Kingdom, Aug. 2019, pp. 119-135. doi: 10.1007/978-3-030-23943-5_9
  8. M. Kuyama, Y. Kakizaki, and R. Sasaki, “Method for detecting a malicious domain by using only well-known information,” International Journal of Cyber-Security and Digital Forensics, vol. 5, no. 4, pp. 166-174, 2016. doi: 10.17781/P002212
  9. D. Gugelmann, B. Ager, V. Lenders, and M. Happe, “Towards understanding upstream Web traffic,” in International Wireless Communications and Mobile Computing Conference, Dubrovnik, Croatia, Aug. 2015, pp. 538-544. doi: 10.1109/IWCMC.2015.7289141
  10. W. Zhang, Q. Jiang, L. Chen, and C. Li, “Two-stage ELM for phishing web pages detection using hybrid features,” World Wide Web, vol. 20, pp. 797-813, 2017. doi: 10.1007/s11280-016-0418-9
  11. H. Y. Abutair and A. Belghith, “Using case-based reasoning for phishing detection,” Procedia Computer Science, vol. 109, pp. 281-288, 2017. doi: 10.1016/j.procs.2017.05.352
  12. P. Domingos and G. Hulten, "Mining high-speed data streams," in the sixth International Conference on Knowledge Discovery & Data Mining, Boston, USA, Aug. 2000, pp. 71-80. doi: 10.1145/347090.347107
  13. C. Manapragada, G. I. Webb, and M. Salehi, “Extremely fast decision tree,” in the 24th International Conference on Knowledge Discovery & Data Mining, London, United Kingdom, Jul. 2018, pp. 1953-1962. doi: 10.1145/3219819.3220005
  14. R. P. Ferreira et al., “Artificial neural network for websites classification with phishing characteristics,” Social Networking, vol. 7, no. 2, pp. 97-109, 2018. doi: 10.4236/sn.2018.72008
  15. R. M. Mohammad, F. Thabtah, and L. McCluskey, “Predicting phishing websites based on self-structuring neural network,” Neural Computing and Applications, vol. 25, pp. 443-458, 2014. doi: 10.1007/s00521-013-1490-z
  16. W. Ali and A. A. Ahmed, “Hybrid intelligent phishing website prediction using deep neural networks with genetic algorithm-based feature selection and weighting,” IET Information Security, vol. 13, pp. 659-669, 2019. doi: 10.1049/iet-ifs.2019.0006

Last update:

No citation recorded.

Last update: 2022-12-06 04:36:19

No citation recorded.