PHIBOOST- A NOVEL PHISHING DETECTION MODEL USING ADAPTIVE BOOSTING APPROACH

(Received: 14-Sep.-2020, Revised: 6-Nov.-2020 , Accepted: 15-Dec.-2020)

Authors Ammar Odeh, Ismail Keshta, Eman Abdelfattah,

Keywords #Adaptive boost #Feature selection #Correlation-based feature #Machine learning

Abstract Every day, cyberattacks increase and use different strategies. One of the most common cyberattacks is Phishing, where the attacker collects sensitive and confidential information by pretending as a trusted party. Different traditional strategies have been introduced for anti-phishing, such as blacklisted, heuristic search and visual similarity. Most of these traditional methods have a high false rate and take a long time to detect the phishing website. New modes have been introduced using machine learning techniques which improve the detection’s accuracy. Machine learning techniques require a huge amount of data called features that are collected from different websites. These collected features are classified into four categories. This paper introduces a novel detection model by utilizing features’ selection to pick up the highly correlated features with the class label. The phase of features’ selection employs independent significance features library from MATLAB and heat-map from Python to find the highly correlated features. Then, the proposed model uses an adaptive boosting approach which consists of multiple classifiers to increase the model’s accuracy. The proposed model produces an extremely high predictive accuracy of approximately 99%.

References

[1] G. Varshney, M. Misra and P. K. Atrey, "A Survey and Classification of Web Phishing Detection Schemes," Security and Communication Networks, vol. 9, pp. 6266-6284, 2016.

[2] A. Aleroud and L. Zhou, "Phishing Environments, Techniques and Countermeasures: A Survey," Computers & Security, vol. 68, pp. 160-196, 2017.

[3] C. Singh, "Phishing Website Detection Based on Machine Learning: A Survey," Proc. of the 6th IEEE International Conference on Advanced Computing and Communication Systems (ICACCS), pp. 398- 404, Coimbatore, India, 2020.

[4] L. Jelovčan, S. L. Vrhovec and A. Mihelič, "A Literature Survey of Security Indicators in Web Browsers," Elektrotehniski Vestnik, vol. 87, pp. 31-38, 2020.

[5] Y. Al-Hamar, H. Kolivand and A. Al-Hamar, "Phishing Attacks in Qatar: A Literature Review of the Problems and Solutions," Proc. of the 12th IEEE International Conference on Developments in eSystems Engineering (DeSE), 2019, pp. 837-842, Kazan, Russia, 2019.

[6] M. Sánchez-Paniagua, E. Fidalgo, V. González-Castro and E. Alegre, "Impact of Current Phishing Strategies in Machine Learning Models for Phishing Detection," Proc. of the 13th International Conference on Computational Intelligence in Security for Information Systems (CISIS 2020), Part of the Advances in Intelligent Systems and Computing Book Series (AISC), vol. 1267, pp. 87-96, 2020.

[7] A. S. Onashoga, O. E. Ojo and O. O. Soyombo, "Securix: A 3D Game-based Learning Approach for Phishing Attack Awareness," Journal of Cyber Security Technology, vol. 3, pp. 108-124, 2019.

[8] K. Hynek, T. Čejka, M. Žádník and H. Kubátová, "Evaluating Bad Hosts Using Adaptive Blacklist Filter," Proc. of the 9th IEEE Mediterranean Conf. on Emb. Comp. (MECO), pp. 1-5, Budva, Montenegro, 2020.

[9] S. Sarika, "A Heuristic Model to Detect Malicious URLs Using Case-based Reasoning," Journal of Information and Computational Science, vol. 9, no. 11, pp. 1066–1079, 2019.

[10] S. Abdelnabi, K. Krombholz and M. Fritz, "VisualPhishNet: Zero-Day Phishing Website Detection by Visual Similarity," Proc. of the ACM SIGSAC Conference on Computer and Communications Security (CCS '20), pp. 1681–1698, [Online], Available: https://doi.org/10.1145/3372297.3417233, Oct. 2020.

[11] B. B. Gupta and A. K. Jain, "Phishing Attack Detection Using a Search Engine and Heuristics-based Technique," Journal of Information Technology Research (JITR), vol. 13, pp. 94-109, 2020.

[12] E. Anthi, L. Williams, M. Słowińska, G. Theodorakopoulos and P. Burnap, "A Supervised Intrusion Detection System for Smart Home IoT Devices," IEEE Internet of Things J., vol. 6, pp. 9042-9053, 2019.

[13] B. Wei, R. A. Hamad, L. Yang, X. He, H. Wang, B. Gao and W. L. Woo, "A Deep Learning-driven Light- weight Phishing Detection Sensor," Sensors, vol. 19, p. 4258, 2019.

[14] P. Yi, Y. Guan, F. Zou, Y. Yao, W. Wang and T. Zhu, "Web Phishing Detection Using a Deep Learning Framework," Wireless Communications and Mobile Computing, vol. 2018, [Online], available: https://doi.org/10.1155/2018/4678746, 2018.

[15] T. Alves, R. Das and T. Morris, "Embedding Encryption and Machine Learning Intrusion Prevention Systems on Programmable Logic Controllers," IEEE Embedded Sys. Letters, vol. 10, pp. 99-102, 2018.

[16] P. Prakash, M. Kumar, R. R. Kompella and M. Gupta, "PhishNet: Predictive Blacklisting to Detect Phishing Attacks," Proceedings of IEEE INFOCOM, pp. 1-5, San Diego, USA, 2010.

[17] S. Marchal, J. François, R. State and T. Engel, "PhishStorm: Detecting Phishing with Streaming Analytics," IEEE Transactions on Network and Service Management, vol. 11, pp. 458-471, 2014.

[18] A. Subasi, E. Molah, F. Almkallawi and T. J. Chaudhery, "Intelligent Phishing Website Detection Using Random Forest Classifier," Proc. of the IEEE International Conference on Electrical and Computing Technologies and Applications (ICECTA), pp. 1-5, Ras Al Khaimah, United Arab Emirates, 2017.

[19] S. Smadi, N. Aslam and L. Zhang, "Detection of Online Phishing Email Using Dynamic Evolving Neural Network Based on Reinforcement Learning," Decision Support Systems, vol. 107, pp. 88-102, 2018.

[20] N. Abdelhamid, F. Thabtah and H. Abdel-jaber, "Phishing Detection: A Recent Intelligent Machine Learning Comparison Based on Models Content and Features," Proc. of the IEEE International Conference on Intelligence and Security Informatics (ISI), pp. 72-77, Beijing, China, 2017.

[21] K. L. Chiew, C. L. Tan, K. Wong, K. S. Yong and W. K. Tiong, "A New Hybrid Ensemble Feature Selection Framework for Machine Learning-based Phishing Detection System," Information Sciences, vol. 484, pp. 153-166, 2019.

[22] P. PhishTank, "Join the Fight against Phishing," [Online], Available: http://phishtank.org, 2016.

[23] R. K. V. Penmatsa and P. Kakarlapudi, "Web Phishing Detection: Feature Selection Using Rough Sets and Ant Colony Optimization," International Journal of Intelligent Systems Design and Computing, vol. 2, pp. 102-113, 2018.

[24] O. S. Qasim and Z. Y. Algamal, "Feature Selection Using Particle Swarm Optimization-based Logistic Regression Model," Chemometrics and Intelligent Laboratory Systems, vol. 182, pp. 41-46, 2018.

[25] N. A. Azeez and A. Oluwatosin, "CyberProtector: Identifying Compromised URLs in Electronic Mails with Bayesian Classification," Proc. of the IEEE International Conference on Computational Science and Computational Intelligence (CSCI), pp. 959-965, Las Vegas, USA, 2016.

[26] S. Zaman, S. M. U. Deep, Z. Kawsar, M. Ashaduzzaman and A. I. Pritom, "Phishing Website Detection Using Effective Classifiers and Feature Selection Techniques," Proc. of International Conf. on Innovation in Engineering and Technology (ICIET), vol. 23, p. 24, DOI: 10.13140/RG.2.2.24043.08483, 2019.

[27] A. Odeh, I. Keshta and E. Abdelfattah. "Efficient Detection of Phishing Websites Using Multilayer Perceptron," International J. of Interactive Mobile Technologies (iJIM), vol. 14, no. 11, pp. 22- 31, 2020.

[28] Y. Freund, R. Schapire and N. Abe, "A Short Introduction to Boosting," Journal-Japanese Society for Artificial Intelligence, vol. 14, no. 5, pp. 771-780, 1999.

[29] D. C. Feng, Z. T. Liu, X. D. Wang, Y. Chen, J. Q. Chang, D. F. Wei and Z. M. Jiang, "Machine Learning- based Compressive Strength Prediction for Concrete: An Adaptive Boosting Approach," Construction and Building Materials, vol. 230, ID no. 117000, [Online], Available: https://doi.org/10.1016/j.conbuildm at.2019.117000, 2020.

[30] S. Abdulhamit and E. Kremic. "Comparison of Adaboost with MultiBoosting for Phishing Website Detection," Procedia-Computer Science, vol. 168, pp. 272-278, 2020.

[31] V. Shahrivari, M. M. Darabi and M. Izadi, "Phishing Detection Using Machine Learning Techniques," arXiv preprint arXiv:2009.11116, [Online], Available: https://arxiv.org/pdf/2009.11116. pdf, Sep. 2020.

[32] V. Ramanathan and H. Wechsler, "Phishing Website Detection Using Latent Dirichlet Allocation and AdaBoost," Proc. of the IEEE International Conference on Intelligence and Security Informatics, pp. 102- 107, Arlington, USA, 2012.

[33] B. Alotaibi and M. Alotaibi, "Consensus and Majority Vote Feature Selection Methods and A Detection Technique for Web Phishing," Journal of Ambient Intelligence and Humanized Computing, [Online], Available: https://doi.org/10.1007/s12652-020-02054-3, 2020.

[34] M. Zabihimayvan and D. Doran, "Fuzzy Rough Set Feature Selection to Enhance Phishing Attack Detection," Proc. of the IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), pp. 1-6, DOI: 10.1109/FUZZ-IEEE.2019.8858884, June 2019.

[35] Y. A. Alsariera, A. V. Elijah and A. O. Balogun, "Phishing Website Detection: Forest by Penalizing Attributes Algorithm and Its Enhanced Variations," Arabian Journal for Science and Engineering, vol. 45, pp. 10459–10470, 2020.