EMBEDDING AND WEIGHTING OF WEBSITE FEATURES FOR PHISHING DETECTION

Nikola Stevanović

DOI Number
https://doi.org/10.22190/FUMI221113002S
First page
013
Last page
031

Abstract


One of the most common cyber threats are phishing attacks. During a phishing attack, attackers use various technical and social engineering tricks to try to lure victims to a phishing website. The website looks like it belongs to a trusted organization but is actually run by the attackers and used to mislead victims into revealing their passwords, credit card numbers, or other confidential information. In this paper, we use discrete descriptive website features to detect whether a website is phishing or legitimate. We create a customized embedding layer specifically designed for these types of features, as well as an embedding weighting mechanism that we later apply. We propose a convolutional neural network-based model for phishing website detection and demonstrate its efficacy on three datasets. With accuracy rates of up to 97.56%, the model performed on par with or better than the current state-of-the-art approaches on each dataset.


Keywords

phishing website detection, web attacks, embeddings, weighting of embeddings, cybersecurity, deep learning

Full Text:

PDF

References


E. A. G. Abad, J. R. A. Ferrer and P. C. Naval Jr.: Phishing Website

Classification Using Features of Web Addresses and Web Pages. In: Proc. 20th Philippine Computing Science Conference, Baguio City, Philippines, 2020.

S. Al-Ahmadi and T. Lasloum: PDMLP: Phishing Detection Using Multilayer Perceptron. International Journal of Network Security & Its Applications, 12(3) (2020), 59–72.

Anti-Phishing Working Group: Phishing activity trends report, 1st quarter 2021. https://docs.apwg.org/reports/apwg_trends_report_q1_2021.pdf.

M. Al-Sarem, F. Saeed, Z. G. Al-Mekhlafi, B. A. Mohammed, T. Al-Hadhrami, M. T. Alshammari, A. Alreshidi and T. S. Alshammari: An Optimized Stacking Ensemble Model for Phishing Websites Detection. Electronics, 10(11) (2021).

L. Lakshmi, M. P. Reddy, C. Santhaiah and U. J. Reddy: Smart Phishing Detection in Web Pages using Supervised Deep Learning Classification and Optimization Technique ADAM. Wireless Personal Communications, 118(4) (2021), 3549–3564.

W. Ali and A. A. Ahmed: Hybrid intelligent phishing website prediction using deep neural networks with genetic algorithm-based feature selection and weighting. IET Information Security, 13(6) (2019), 659–669.

W. Ali and S. Malebary: Particle Swarm Optimization-Based Feature Weighting for Improving Intelligent Phishing Website Detection. IEEE Access, 8 (2020), 116766–116780, https://doi.org/10.1109/access.2020.3003569.

M. Alqahtani: Phishing Websites Classification Using Association Classification (PWCAC). In: 2019 International Conference on Computer and Information Sciences (ICCIS), Sakaka, Saudi Arabia, 2019, https://doi.org/10.1109/iccisci. 2019.8716444.

V. Borisov, J. Haug and G. Kasneci: Cancelout: A layer for feature selection in deep neural networks. In: International Conference on Artificial Neural Networks, 2019, 72–83.

I. Fister Jr., D. Fister and X. S. Yang: A hybrid bat algorithm. Elektrotehniski Vestnik, 80(1-2) (2013), 1–7.

Google LLC: Google colab. 2017, https://colab.research.google.com/

W. Hadi, F. Aburub and S. Alhawari: A New Fast Associative Classification Algorithm for Detecting Phishing Websites. Applied Soft Computing, 48 (2016), 729–734.

R. Islam and J. Abawajy: A Multi-Tier Phishing Detection and Filtering Approach. Journal of Network and Computer Applications, 36(1) (2013), 324–335.

D. P. Kingma and J. Ba: Adam: A method for stochastic optimization. In: 3rd International Conference on Learning Representations, San Diego, CA, USA, 2015.

T. Mikolov, K. Chen, G. S. Corrado and J. Dean: Efficient Estimation of Word Representations in Vector Space. In: ICLR (Workshop Poster), 2013.

R. M. Mohammad, L. McCluskey and F. Thabtah: UCI Machine Learning Repository - Phishing Websites Dataset. Irvine, University of California, School of Information and Computer Science, 2012, https://archive.ics.uci.edu/ml/datasets/Phishing+Websites.

N. Abdelhamid: Irvine, CA: University of California, School of Information and Computer Science, Machine Learning Repository. 2016, https://archive.ics.uci.edu/dataset/379/website+phishing.

N. Abdelhamid, A. Ayesh and F. Thabtah: Phishing detection based associative classification data mining. Expert Systems with Applications, 41(13) (2014), 5948–5959.

R. M. Mohammad, F. Thabtah and L. McCluskey: An Assessment of Features Related to Phishing Websites Using an Automated Technique. In: International Conference for Internet Technology and Secured Transactions, London, UK, 2012, 492–497.

R. M. Mohammad, F. Thabtah and L. McCluskey: Predicting phishing websites based on self-structuring neural network. Neural Computing and Applications, 25(2) (2014), 443–458.

N. Moradpoor, B. Clavie and B. Buchanan: Employing Machine Learning Techniques for Detection and Classification of Phishing Emails. In: 2017 Computing Conference, London, UK, 2017, 149–156.

F. Parandeh Motlagh and A. Khatibi Bardsiri: Detecting fake websites using

swarm intelligence mechanism in human learning. International Journal of Engineering, 31(10) (2018), 1642–1650, https://doi.org/10.5829/ije.2018.31.10a.05.

S. Hochreiter and J. Schmidhuber: Long short-term memory. Neural computation,

(8) (1997), 1735–1780.

J. Chung, C. Gulcehre, K. Cho and Y. Bengio: Empirical evaluation of gated

recurrent neural networks on sequence modeling. 2014, arXiv preprint arXiv:1412.3555.

A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga and A. Lerer: Automatic differentiation in pytorch. In: 31st Conference on Neural Information Processing Systems, Long Beach, CA, USA, 2017.

R. K. V. Penmatsa and P. Kakarlapudi: Web Phishing Detection: Feature Selection Using Rough Sets and Ant Colony Optimisation. International Journal of Intelligent Systems Design and Computing, 2(2) (2018), 102–113, https://doi.org/10.1504/ijisdc.2018.096329.

M. Rajab: Visualisation Model Based on Phishing Features. Journal of Information & Knowledge Management, 18(01) (2019), https://doi.org/10.1142/s0219649219500102.

F. Thabtah and N. Abdelhamid: Deriving Correlated Sets of Website Features

for Phishing Detection: A Computational Intelligence Approach. Journal of Information & Knowledge Management, 15(04) (2016), https://doi.org/10.1142/s0219649216500428.

G. Vrbancic, I. Fister Jr. and V. Podgorelec: Swarm intelligence approaches for parameter setting of deep learning neural network: Case study on phishing websites classification. In: Proceedings of the 8th international conference on web intelligence, mining and semantics, New York, United States, 2018, 1–8.

G. Vrbancic, I. Fister Jr. and V. Podgorelec: Parameter setting for deep neural networks using swarm intelligence on phishing websites classification. International Journal on Artificial Intelligence Tools, 28(06) (2019).

X. S. Yang: A new metaheuristic bat-inspired algorithm. In: Nature inspired cooperative strategies for optimization, Granada, Spain, 2010, 65–74.

A. D. Kulkarni and L. L. Brown III: Phishing website detection using machine learning. In: Computer Science Faculty Publications and Presentations, Paper 20, 2019, http://hdl.handle.net/10950/1862.

S. A. Khan, W. Khan and A. Hussain: Phishing attacks and websites classification using machine learning and multiple datasets (a comparative analysis). In: Intelligent Computing Methodologies: 16th International Conference, Bari, Italy, 2020, 301–313.

M. Almousa, T. Zhang, A. Sarrafzadeh and M. Anwar: Phishing website detection: How effective are deep learning-based models and hyperparameter optimization?. Security and Privacy, 5(6) (2022), https://doi.org/10.1002/spy2.256.

F. Thabtah, R. M. Mohammad and L. McCluskey: A dynamic self-structuring neural network model to combat phishing. In: International joint conference on neural networks (IJCNN), Vancouver, Canada, 2016, 4221–4226.

G. D. L. T. Parra, P. Rad, K. K. R. Choo and N. Beebe: Detecting Internet of Things attacks using distributed deep learning. Journal of Network and Computer Applications, 163 (2020), https://doi.org/10.1016/j.jnca.2020.102662.

N. Al-Milli and B. H. Hammo: A convolutional neural network model to detect illegitimate URLs. In: 11th International Conference on Information and Communication

Systems (ICICS), Irbid, Jordan, 2020, 220–225.

S. Wang, S. Khan, C. Xu, S. Nazir and A. Hafeez: Deep learning-based efficient model development for phishing detection using random forest and BLSTM classifiers. Complexity, 1 (2020), https://doi.org/10.1155/2020/8694796.

K. Jalal and S. Naaz: Detection of phishing websites using machine learning approach. In: International Conference on Sustainable Computing in Science, Technology

& Management (SUSCOM), Jaipur, Rajasthan, India, 2019.




DOI: https://doi.org/10.22190/FUMI221113002S

Refbacks

  • There are currently no refbacks.




© University of Niš | Created on November, 2013
ISSN 0352-9665 (Print)
ISSN 2406-047X (Online)