skip to main content

Malicious URLs Detection Using Data Streaming Algorithms

1Department of Computer Science, Faculty of Communication and Information Sciences, University of Ilorin, , Nigeria

2PMB 1515 Ilorin, Nigeria., Nigeria

Received: 29 Oct 2020; Published: 31 Oct 2021.
Open Access Copyright (c) 2021 Jurnal Teknologi dan Sistem Komputer under

Citation Format:
As a result of the advancement in technology and technological devices, data is now spawned at an infinite rate, emanating from a vast array of networks, devices as well daily operations like credit card transactions and mobile phones. Data stream entails sequential and real-time continuous data in the inform of evolving stream. However, the traditional machine learning approach is characterized by a batch learning model in which labelled training data are given apriori to train a model based on some machine learning algorithms. This technique necessitates the entire training samples to be readily accessible before the learning process. In this setting, the training procedure is mostly done in an offline environment owing to the high cost of training. Consequently, traditional batch learning technique suffers from some serious drawbacks, such as poor scalability for the real-time phishing websites detection, because the model mostly requires re-training from scratch using new training samples. Thus, this paper presents the application of streaming algorithms for detecting malicious URLs based on some selected online learners which include: Hoeffding Tree (HT), Naïve Bayes (NB), and Ozabag. Hence, experimental results on two prominent phishing datasets showed that Ozabag produced promising results in terms of accuracy, Kappa and Kappa Temp on the dataset with large samples while HT and NB have the least prediction time with comparable accuracy and Kappa with Ozabag algorithm for the real-time detection of phishing websites.
Fulltext Email colleagues
Keywords: Data streaming; Phishing; Naïve Bayes; Machine learning; Hoeffding Tree.

Article Metrics:

  1. D. Sahoo, C. Liu, and S. C. Hoi, "Malicious URL detection using machine learning: A survey," arXiv preprint arXiv:1701.07179, 2017
  2. R. K. Nepali and Y. Wang, "You look suspicious!!: Leveraging visible attributes to classify malicious short urls on twitter," in 2016 49th Hawaii International Conference on System Sciences (HICSS), 2016, pp. 2648-2655
  3. J. Ma, L. K. Saul, S. Savage, and G. M. Voelker, "Identifying suspicious URLs: an application of large-scale online learning," in Proceedings of the 26th annual international conference on machine learning, 2009, pp. 681-688
  4. H. Choi, B. B. Zhu, and H. Lee, "Detecting Malicious Web Links and Identifying Their Attack Types," WebApps, vol. 11, p. 218, 2011
  5. D. R. Patil and J. Patil, "Survey on malicious web pages detection techniques," International Journal of u-and e-Service, Science and Technology, vol. 8, pp. 195-206, 2015
  6. A. Zamir, H. U. Khan, T. Iqbal, N. Yousaf, F. Aslam, A. Anjum, et al., "Phishing web site detection using diverse machine learning algorithms," The Electronic Library, 2020
  7. R. Verma, D. Crane, and O. Gnawali, "Phishing During and After Disaster: Hurricane Harvey," in 2018 Resilience Week (RWS), 2018, pp. 88-94
  8. M. Kuyama, Y. Kakizaki, and R. Sasaki, "Method for detecting a malicious domain by using only well-known information," International Journal of Cyber-Security and Digital Forensics, vol. 5, pp. 166-174, 2016
  9. D. Gugelmann, B. Ager, V. Lenders, and M. Happe, "Towards understanding upstream Web traffic," in 2015 International Wireless Communications and Mobile Computing Conference (IWCMC), 2015, pp. 538-544
  10. R. Balasubramaniam and K. Nandhini, "Algorithms Associated with Streaming Data Problems," International Journal of Applied Engineering Research, vol. 14, pp. 2238-2243, 2019
  11. B. Cui, S. He, X. Yao, and P. Shi, "Malicious URL detection with feature extraction based on machine learning," International Journal of High Performance Computing and Networking, vol. 12, pp. 166-178, 2018
  12. G. Liu, B. Qiu, and L. Wenyin, "Automatic detection of phishing target from phishing webpage," in 2010 20th International Conference on Pattern Recognition, 2010, pp. 4153-4156
  13. H. Y. Abutair and A. Belghith, "Using case-based reasoning for phishing detection," Procedia Computer Science, vol. 109, pp. 281-288, 2017
  14. R. B. Basnet and T. Doleck, "Towards developing a tool to detect phishing URLs: a machine learning approach," in 2015 IEEE International Conference on Computational Intelligence & Communication Technology, 2015, pp. 220-223
  15. S. Marchal, K. Saari, N. Singh, and N. Asokan, "Know your phish: Novel techniques for detecting phishing sites and their targets," in 2016 IEEE 36th International Conference on Distributed Computing Systems (ICDCS), 2016, pp. 323-333
  16. A. K. Jain and B. B. Gupta, "A novel approach to protect against phishing attacks at client side using auto-updated white-list," EURASIP Journal on Information Security, vol. 2016, p. 9, 2016
  17. K. S. Adewole, A. G. Akintola, S. A. Salihu, N. Faruk, and R. G. Jimoh, "Hybrid rule-based model for phishing URLs detection," in International Conference for Emerging Technologies in Computing, 2019, pp. 119-135
  18. W. Zhang, Q. Jiang, L. Chen, and C. Li, "Two-stage ELM for phishing Web pages detection using hybrid features," World Wide Web, vol. 20, pp. 797-813, 2017
  19. C. C. Aggarwal, S. Y. Philip, J. Han, and J. Wang, "A framework for clustering evolving data streams," in Proceedings 2003 VLDB conference, 2003, pp. 81-92
  20. C. C. Aggarwal, J. Han, J. Wang, and P. S. Yu, "A framework for projected clustering of high dimensional data streams," in Proceedings of the Thirtieth international conference on Very large data bases-Volume 30, 2004, pp. 852-863
  21. P. Domingos and G. Hulten, "Mining high-speed data streams," in Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, 2000, pp. 71-80
  22. G. Ruan, P. C. Hanson, H. A. Dugan, and B. Plale, "Mining lake time series using symbolic representation," Ecological informatics, vol. 39, pp. 10-22, 2017
  23. Y. Chen, G. Dong, J. Han, B. W. Wah, and J. Wang, "Multi-dimensional regression analysis of time-series data streams," in VLDB'02: Proceedings of the 28th International Conference on Very Large Databases, 2002, pp. 323-334
  24. H. Kargupta, R. Bhargava, K. Liu, M. Powers, P. Blair, S. Bushra, et al., "VEDAS: A mobile and distributed data stream mining system for real-time vehicle monitoring," in Proceedings of the 2004 SIAM International Conference on Data Mining, 2004, pp. 300-311
  25. R. P. Ferreira, A. Martiniano, D. Napolitano, M. Romero, D. D. D. O. Gatto, E. B. P. Farias, et al., "Artificial neural network for websites classification with phishing characteristics," Social Networking, vol. 7, p. 97, 2018
  26. R. M. Mohammad, F. Thabtah, and L. McCluskey, "Predicting phishing websites based on self-structuring neural network," Neural Computing and Applications, vol. 25, pp. 443-458, 2014
  27. W. Ali and A. A. Ahmed, "Hybrid intelligent phishing website prediction using deep neural networks with genetic algorithm-based feature selection and weighting," IET Information Security, vol. 13, pp. 659-669, 2019

Last update:

No citation recorded.

Last update: 2022-01-19 04:23:15

No citation recorded.