Open Access Open Access  Restricted Access Subscription or Fee Access

Users Awareness Prediction of Cyber Security Aspects in Twitter Using Machine Learning Algorithms

Muneer Bani Yassein(1*), Mohammad Shatnawi(2), Omar Alomari(3)

(1) Department of Computer Science, Jordan University of Science and Technology (JUST), Jordan
(2) Department of Computer Science, Jordan University of Science and Technology (JUST), Jordan
(3) Department of Computer Science, Jordan University of Science and Technology (JUST), Jordan
(*) Corresponding author



Social media websites contain huge amount of Personal Identifiable Information for a tremendous number of users. Network security measures play a vital role in protecting such sensitive information. Unfortunately, such measures are plagued with vulnerabilities at all levels, which are destined to be exploited, especially, when users lack awareness of the many different aspects of cyber security. Security is as strong as its weakest link, and often, its weakest link is an oblivious user. Therefore, the user best interest to become aware of the various aspects of cyber security to keep their sensitive data out of attacker’s reach is in social media. These aspects include privacy, security, social engineering, and different types of network attacks. In this work, a dataset that consists of tweets related to these four aspects has been collected from Twitter website for analysis. The aim is to measure Twitter users' awareness of these aspects and to leverage machine learning in developing a set of recommendations for tuning privacy and network controls. Such recommendations can be utilized by existing and new users alike. To this end, a new model has been built by applying pre-processing techniques, then extracting the features from the dataset using two popular techniques, and finally feeding them into four popular machine learning models which are Multinominal Naïve Bayes, Support Vector Machine, Decision Tree, and Logistic Regression. A two-level classification is used: general category and specific category. General category consists of five labels, i.e. privacy, security, social engineering, several network attacks, and other. In specific category, each one of the general category labels, except "other", is further classified into sub-labels. The results achieved for major category classification have been as follows: the model with Logistic Regression algorithm has given the best results in four evaluation metrics compared with the others. Whereas, for specific category, the results have been as follows. Decision Tree has achieved the highest performance in privacy and security aspects. Support Vector Machine has given the best results in social engineering aspects. Both Logistic Regression and Multinominal Naïve Bayes have outperformed the other algorithms for network attacks aspect. Finally, some recommendations are presented, such as avoiding private information sharing on social media websites, use of strong passwords, use of different passwords for different social media accounts, and using secure http (https) instead of plain http to browse social media networking websites.
Copyright © 2021 Praise Worthy Prize - All rights reserved.


Social Media; Awareness Prediction; Machine Learning; Cyber Security; Network Recommendations

Full Text:



D. Weissenbacher, et al., Overview of the Fourth Social Media Mining for Health (SMM4H) Shared Tasks at ACL 2019, in Proceedings of the Fourth Social Media Mining for Health Applications (#SMM4H) Workshop & Shared Task, Florence, Italy, pp. 21-30, Aug. 2019.

V. Mohan, Preprocessing Techniques for Text Mining - An Overview, International Journal of Computer Science & Communication Networks, vol. 5, pp. 7-16, Feb 2015.

B. Wahl, et al., Artificial intelligence (AI) and global health: how can AI contribute to health in resource-poor settings?, BMJ Global Health, vol. 3, no. 4, pp. 1-7, Aug 2018.

K. Gull, et al., A Comparative Analysis of Lexical/NLP Method with WEKA's Bayes Classifier, International Journal on Recent and Innovation Trends in Computing and Communication, vol. 5, no. 2, pp. 221 - 227.

W. Han, et al., Automatic Privacy Preservation for User-based Data Sharing on Social Media, In 2020 Zooming Innovation in Consumer Technologies Conference (ZINC), pp. 227-230, May 2020.

N. S. B. Aryasomayajula, Machine Learning Models for Categorizing Privacy Policy Text, University of Cincinnati, 2018.

C. Cai, et al., Behavior enhanced deep bot detection in social media, 2017 IEEE International Conference on Intelligence and Security Informatics (ISI), Beijing, China, pp. 128-130, 2017.

D. Schatz, et al., Towards a More Representative Definition of Cyber Security, The Journal of Digital Forensics, Security and Law, 2017.

C. Fiesler, et al., What (or Who) Is Public?, Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing, 2017.

M. Priya, et al., Privacy analysis of comment using text mining in osn framework, International Journal of Advanced Research in Computer Science, vol. 9, pp. 309-313, 2018.

G. Beigi, et al., Privacy preserving text representation learning, In Proceedings of the 30th ACM Conference on Hypertext and Social Media, pp. 275-276, 2019.

Ms. Aghav, et al., Privacy leakage of a photo and text using visual features and short text classifier, Resincap Journal of Science and Engineering, vol. 4, pp. 1123-1126, 2020.

M. Lansley, et al., Seader++: social engineering attack detection in online environments using machine learning, J. Inf. Telecommun., vol. 4, pp. 346-362, 2020.

A. Das, et al., Automated email generation for targeted attacks using natural language, CoRR, abs/1908.06893, 2019.

M. Lansley, et al., Seen the villains: Detecting social engineering attacks using case-based reasoning and deep learning, In Workshops Proceedings for the Twenty-seventh International Conference on Case-Based Reasoning co-located with the Twenty-seventh International Conference on Case-Based Reasoning (ICCBR 2019), pp. 39-48, 2019.

P. Verma, et al., Email phishing: text classification using natural language processing, Computer Science and Information Technologies, vol. 1, pp. 1-12, 2020.

V. R., M. Alazab, et al., Ransomware Triage Using Deep Learning: Twitter as a Case Study, 2019 Cybersecurity and Cyberforensics Conference (CCC), Melbourne, VIC, Australia, pp. 67-73, 2019.

R. Alguliyev, et al., Deep learning method for prediction of ddos attacks on social media, Adv. Data Sci. Adapt. Anal., vol. 11, pp. 1950002:1-1950002:19, 2019.

A. Al-Hababi, et al., Man-in-the-Middle Attacks to Detect and Identify Services in Encrypted Network Flows using Machine Learning, 2020 3rd International Conference on Advanced Communication Technologies and Networking (CommNet), Marrakech, Morocco, pp. 1-5, 2020.

S. AlAjlan, et al., Machine learning approach for threat detection on social media posts containing arabic text, Evolutionary Intelligence, pp. 1-12, 2020.

R. Wongso, et al., News Article Text Classification in Indonesian Language, Procedia Computer Science, vol. 116, pp. 137-143, 2017.

A. Kulkarni, et al., Converting Text to Features, Natural Language Processing Recipes, pp. 67-96, 2019.

R. Al-khurayji, et al., An Effective Arabic Text Classification Approach Based on Kernel Naive Bayes Classifier" International Journal of Artificial Intelligence & Applications, vol. 8, no. 6, pp. 01-10, 2017.

Bani Yassein, M., Alomari, O., Detecting the Online Shopping Factors Using the Arab Tweets on Media Technology, (2020) International Journal on Communications Antenna and Propagation (IRECAP), 10 (3), pp. 206-211.

Q. B. Baker, et al., Detecting Epidemic Diseases Using Sentiment Analysis of Arabic Tweets, Journal of Universal Computer Science, vol. 26, no. 1, pp. 50-70, 2020.

H. M. Najadat, et al., Analyzing Social Media Opinions Using Data Analytics, 2020 11th International Conference on Information and Communication Systems (ICICS), Irbid, Jordan, pp. 266-271, 2020.

G. Tsoumakas, et al., Multi-Label Classification, International Journal of Data Warehousing and Mining, vol. 3, no. 3, pp. 1-13, 2007.

R. Duwairi, et al., Sentiment Analysis in Arabic tweets, 2014 5th International Conference on Information and Communication Systems (ICICS), 2014.

M. Varma, et al., Pixel-Based Classification Using Support Vector Machine Classifier, 2016 IEEE 6th International Conference on Advanced Computing (IACC), 2016.

S. T. Indra, et al., Using logistic regression method to classify tweets into the selected topics, 2016 International Conference on Advanced Computer Science and Information Systems (ICACSIS), Malang, Indonesia, pp. 385-390,2 016.

Lukman, S., Nazaruddin, Y., Ai, B., Joelianto, E., Path Loss Measurement Validation Using Naïve Bayes Classification for High-Speed Rail, (2021) International Journal on Engineering Applications (IREA), 9 (4), pp. 226-233.

Al-Tarawneh, M., Muheilan, M., Al Tarawneh, Z., Hand Movement-Based Diabetes Detection Using Machine Learning Techniques, (2021) International Journal on Engineering Applications (IREA), 9 (4), pp. 234-242.

Shatnawi, M., Bani Yassein, M., Aljawarneh, S., Alodibat, S., Meqdadi, O., Hmeidi, I., Al Zoubi, O., An Improvement of Neural Network Algorithm for Anomaly Intrusion Detection System, (2020) International Journal on Communications Antenna and Propagation (IRECAP), 10 (2), pp. 84-93.


  • There are currently no refbacks.

Please send any question about this web site to
Copyright © 2005-2022 Praise Worthy Prize