A Non-Content Multilayers Hybrid Machine Learning Web Phishing Detection Model

Mohammad Masoud; Yousef Jaradat; Rashed Alsakarnah

doi:10.15866/iremos.v15i2.21975

A Non-Content Multilayers Hybrid Machine Learning Web Phishing Detection Model

Mohammad Masoud^(1*), Yousef Jaradat⁽²⁾, Rashed Alsakarnah⁽³⁾

^(*) Corresponding author

Authors' affiliations

DOI: https://doi.org/10.15866/iremos.v15i2.21975

Abstract

Phishing detection is one of the most important services for cyber security. Phishing is an easy to implement attack that depends on social engineering. It is implemented leveraging web, Short Message Service (SMS) and social media. This attack is the backbone of other attacks. In this work, a new web phishing detection algorithm is proposed based on machine learning algorithms. The proposed algorithm is a hybrid machine learning algorithm that consists of two main layers. The first layer is the feature extraction and reduction layer that consists of multiple machine learning algorithms that work in parallel. The second layer consists of one feed-forward neural network layer. Different numbers and types of algorithms can be leveraged in the first layer of the proposed algorithm. The proposed model has been tested with four algorithms in the first layer, namely, K-means, support vector machine, logistic regression and random forest. Four features have been extracted from this layer and fed into the second layer “the neural network layer”. The output of the second layer shows an accuracy that exceeded 99%.
Copyright © 2022 Praise Worthy Prize - All rights reserved.

Keywords

Web Phishing; Detection; Machine Learning; Hybrid Models; Security

Full Text:

PDF

References

Sakai, Tetsuya, Sijie Tao, Zhaohao Zeng, Yukun Zheng, Jiaxin Mao, Zhumin Chu, Yiqun Liu et al., Overview of the NTCIR-15 we want web with CENTRE (WWW-3) task. (2020) Proceedings of NTCIR-15, pp. 219-234..

Gourley, David, Brian Totty, Marjorie Sayer, Anshu Aggarwal, and Sailu Reddy, HTTP: the definitive guide, O'Reilly Media, Inc., 2002.

Felt, Adrienne Porter, Richard Barnes, April King, Chris Palmer, Chris Bentzel, and Parisa Tabriz, Measuring {HTTPS} adoption on the web, In 26th USENIX security symposium (USENIX security 17), pp. 1323-1338. 2017.

World Wide Web (WWW) Size Report:
www.worldwidewebsize.com

Number of Web Users:
websitesetup.org/news/how-many-websites-are-there/

Luna, Sergio, and Michael J. Pennock, Social media applications and emergency management: A literature review and research agenda, International Journal of Disaster Risk Reduction 28 (2018): 565-577.
https://doi.org/10.1016/j.ijdrr.2018.01.006

Wang, Lu, Hailiang Yang, Xiaoke Qi, Jun Xu, and Kaishun Wu, ICast: Fine-grained wireless video streaming over Internet of intelligent vehicles, IEEE Internet of Things Journal, 6, no. 1 (2018): 111-123.4
https://doi.org/10.1109/JIOT.2018.2872518

Kaur, Balpreet, and Rishi Raj Sharma, Impact of viral advertising on product promotion: An experimental study, Indian Journal of Marketing 48, no. 6 (2018): 57-68.
https://doi.org/10.17010/ijom/2018/v48/i6/127837

Rizky, Robby, and Zaenal Hakim, Analysis and Design of Voip Server (Voice Internet Protocol) using Asterisk in Statistics and Statistical Informatics Communication of Banten Province using Ppdioo Method, In Journal of Physics: Conference Series, vol. 1179, no. 1, p. 012160. IOP Publishing, 2019.
https://doi.org/10.1088/1742-6596/1179/1/012160

Masoud, Mohammad, Yousef Jaradat, Ahmad Manasrah, and Ismael Jannoud, Sensors of smart devices in the internet of everything (IoE) era: big opportunities and massive doubts, Journal of Sensors 2019 (2019).
https://doi.org/10.1155/2019/6514520

Masoud, M., Jaradat, Y., Manasrah, A., Taleb, B., Designing of a General Purpose Soft Programmable Logic Controller (PLC) for the Internet of Things (IoT) Era, (2020) International Review of Automatic Control (IREACO), 13 (4), pp. 153-161.
https://doi.org/10.15866/ireaco.v13i4.19328

Hanson, Ward A., and Kirthi Kalyanam. Internet marketing and e-commerce.(Student ed.). Thomson/South-Western, 2020.

Sellars, Andrew, Twenty Years of Web Scraping and the Computer Fraud and Abuse Act., BUJ Sci. & Tech. L. 24 (2018): 372.

Phishing Statistics:
www.tessian.com/blog/phishing-statistics-2020/

Chanti, S., and T. Chithralekha, Classification of anti-phishing solutions, SN Computer Science 1, no. 1 (2020): 1-18.
https://doi.org/10.1007/s42979-019-0011-2

Bataineh, A., Batayneh, W., Harahsheh, T., Hijazi, K., Alrayes, A., Olimat, M., Bataineh, A., Early Detection of Cardiac Diseases from Electrocardiogram Using Artificial Intelligence Techniques, (2021) International Review on Modelling and Simulations (IREMOS), 14 (2), pp. 128-136.
https://doi.org/10.15866/iremos.v14i2.19869

Bani Yassein, M., Shatnawi, M., Alomari, O., Users Awareness Prediction of Cyber Security Aspects in Twitter Using Machine Learning Algorithms, (2021) International Journal on Communications Antenna and Propagation (IRECAP), 11 (6), pp. 383-392.
https://doi.org/10.15866/irecap.v11i6.20725

Shatnawi, M., Bani Yassein, M., Aljawarneh, S., Alodibat, S., Meqdadi, O., Hmeidi, I., Al Zoubi, O., An Improvement of Neural Network Algorithm for Anomaly Intrusion Detection System, (2020) International Journal on Communications Antenna and Propagation (IRECAP), 10 (2), pp. 84-93.
https://doi.org/10.15866/irecap.v10i2.18735

Masoud, M., Jaradat, Y., Jannoud, I., On Detecting Wi-Fi Unauthorized Access Utilizing Software Define Network (SDN) and Machine Learning Algorithms, (2017) International Review on Computers and Software (IRECOS), 12 (1), pp. 21-29.
https://doi.org/10.15866/irecos.v12i1.11020

Mughaid, Ala, Shadi Al-Zu'bi, Ahmed Al Arjan, Rula Al-Amrat, Rathaa Alajmi, Raed Abu Zitar, and Laith Abualigah, An intelligent cybersecurity system for detecting fake news in social media websites, Soft Computing 26, no. 12 (2022): 5577-5591.
https://doi.org/10.1007/s00500-022-07080-1

Masoud, Mohammad, Yousef Jaradat, and Amal Q. Ahmad, On tackling social engineering web phishing attacks utilizing software defined networks (SDN) approach, In 2016 2nd International Conference on Open Source Software Computing (OSSCOM), pp. 1-6. IEEE, 2016.
https://doi.org/10.1109/OSSCOM.2016.7863679

Netcraft Ltd., Netcraft extension: Phishing protection and site reports, 2015. Available at:
http://toolbar.netcraft.com/

OpenDNS, PhishTankm, 2015. Available at:
http://www.phishtank.com

V. Retro, isitPhishing-anti phishing tools and informations, 2015, Available at:
http://www.isitphishing.org/

Ardi, Calvin, and John Heidemann, Auntietuna: Personalized content-based phishing detection, In NDSS Usable Security Workshop (USEC). 2016.
https://doi.org/10.14722/usec.2016.23012

Zhang, Yue, Jason I. Hong, and Lorrie F. Cranor, Cantina: a content-based approach to detecting phishing web sites, In Proceedings of the 16th international conference on World Wide Web, pp. 639-648. 2007.
https://doi.org/10.1145/1242572.1242659

Ozker, Uğur, and Ozgur Koray Sahingoz. Content based phishing detection with machine learning, In 2020 International Conference on Electrical Engineering (ICEE), pp. 1-6. IEEE, 2020.
https://doi.org/10.1109/ICEE49691.2020.9249892

Zhang, Haijun, Gang Liu, Tommy WS Chow, and Wenyin Liu, Textual and visual content-based anti-phishing: a Bayesian approach, IEEE Transactions on Neural Networks 22, no. 10 (2011): 1532-1546.
https://doi.org/10.1109/TNN.2011.2161999

Mishra, Sandhya, and Devpriya Soni, Smishing Detector: A security model to detect smishing through SMS content analysis and URL behavior analysis, Future Generation Computer Systems 108 (2020): 803-815.
https://doi.org/10.1016/j.future.2020.03.021

Noh, Norzaidah Binti Md, and M. Nazmi Bin M. Basri, Phishing Website Detection Using Random Forest and Support Vector Machine: A Comparison, In 2021 2nd International Conference on Artificial Intelligence and Data Sciences (AiDAS), pp. 1-5. IEEE, 2021.
https://doi.org/10.1109/AiDAS53897.2021.9574282

Yaseen, Yousef A., Malik Qasaimeh, Raad S. Al-Qassas, and Mustafa Al-Fayoumi, Email Fraud Attack Detection Using Hybrid Machine Learning Approach, Recent Advances in Computer Science and Communications (Formerly: Recent Patents on Computer Science) 14, no. 5 (2021): 1370-1380.
https://doi.org/10.2174/2213275912666190617162707

Vrbančič, Grega, Iztok Fister Jr, and Vili Podgorelec, Datasets for phishing websites detection, Data in Brief 33 (2020): 106438.
https://doi.org/10.1016/j.dib.2020.106438

Salloum, Said, Tarek Gaber, Sunil Vadera, and Khaled Shaalan, Phishing Website Detection from URLs Using Classical Machine Learning ANN Model, In International Conference on Security and Privacy in Communication Systems, pp. 509-523. Springer, Cham, 2021.
https://doi.org/10.1007/978-3-030-90022-9_28

Fitzpatrick, Benjamin, Xinyu Liang, and Jeremy Straub, Fake news and phishing detection using a machine learning trained expert system, arXiv preprint arXiv:2108.08264 (2021).

Sulaiman, Nurul Fadhilah, and Mohd Zalisham Jali, A new SMS spam detection method using both content-based and non content-based features, In Advanced Computer and Communication Engineering Technology, pp. 505-514. Springer, Cham, 2016.
https://doi.org/10.1007/978-3-319-24584-3_43

Hu, Yong, Ce Guo, E. W. T. Ngai, Mei Liu, and Shifeng Chen, A scalable intelligent non-content-based spam-filtering framework, Expert systems with applications 37, no. 12 (2010): 8557-8565.
https://doi.org/10.1016/j.eswa.2010.05.020

Soon, Gan Kim, Liew Chean Chiang, Chin Kim On, Nordaliela Mohd Rusli, and Tan Soo Fun, Comparison of ensemble simple feedforward neural network and deep learning neural network on phishing detection, In Computational Science and Technology, pp. 595-604. Springer, Singapore, 2020.
https://doi.org/10.1007/978-981-15-0058-9_57

Lin, Tian, Daniel E. Capecci, Donovan M. Ellis, Harold A. Rocha, Sandeep Dommaraju, Daniela S. Oliveira, and Natalie C. Ebner, Susceptibility to spear-phishing emails: Effects of internet user demographics and email content, ACM Transactions on Computer-Human Interaction (TOCHI) 26, no. 5 (2019): 1-28.
https://doi.org/10.1145/3336141

Albakry, Sara, Kami Vaniea, and Maria K. Wolters, What is this URL's destination? Empirical evaluation of users' URL reading, In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems, pp. 1-12. 2020.
https://doi.org/10.1145/3313831.3376168

Tchakounté, Franklin, Leonel Kanmogne Wabo, and Marcellin Atemkeng, A review of gamification applied to phishing, Preprint (2020).
https://doi.org/10.20944/preprints202003.0139.v1

Arachchilage, Nalin Asanka Gamagedara, and Mumtaz Abdul Hameed, Integrating self-efficacy into a gamified approach to thwart phishing attacks, arXiv preprint arXiv:1706.07748 (2017).

Arachchilage, Nalin Asanka Gamagedara, and Melissa Cole, Designing a mobile game for home computer users to protect against phishing attacks, arXiv preprint arXiv:1602.03929 (2016).

Refbacks

There are currently no refbacks.

Username
Password
Remember me