(Received: 17-Sep.-2023, Revised: 2-Dec.-2023 and 7-Feb.-2024 , Accepted: 20-Feb.-2024)
We live in an era where time is the most precious resource. Thus, dealing with the vast amount of data collected from different resources for various purposes requires creating systems that can process the data correctly to make it worthwhile. Using big data in machine-learning (ML) and artificial-intelligence (AI) models enhances the efficiency and robustness of such models. This work proposes a DDoS attack detection model using Apache-spark to deal with the CIC-DDOS2019 dataset, a significant public dataset used to train this model. The model is trained to predict the type of DDoS attack among multiclass attacks: SYN, UDP and MSSQL. Two state-of-the-art algorithms, Random Forest (RF) and eXtreme Gradient Boosting (XGBoost), have been chosen as the base of our proposed model. These two algorithms inherit their robustness and efficiency from the ensemble nature of their architecture, where each is constructed of several decision trees with different parameters. To contribute to this work, a stacked ensemble model has been built using both RF and XGBoost to enhance the accuracy of the DDoS attack-detection task. It has been found that using such a combination guarantees the best results. The prolonged execution time that resulted from training such a large dataset, on the other hand, is another issue that should be handled. To tackle the speed problem, the Apache-spark platform has been used. Apache-spark divides the large dataset, distributes the divisions and trains them in parallel using the proposed model. Thus, it enhances the execution time while preserving the accuracy of training the same dataset without Apache-Spark. The proposed model has achieved a high accuracy of (99.94%) while reducing the execution time to almost half of the time when applied without Apache-spark. Using Apache-Spark increases the demand on RAMs; using Spark to build the proposed DDoS attack-detection model urged us to improve the hardware used to run the code on Spark. Other relevant research works focus on accuracy measures and need more suitable time analysis, which is crucial in DDoS attack-detection applications; some other models provide less accuracy than the accuracy provided in this study.

[1] J. S. Ward and A. Barker, "Undefined by Data: A Survey of Big Data Definitions," arXiv preprint, arXiv: 1309.5821, 2013.

[2] M. I. Jordan and T. M. Mitchell, "Machine Learning: Trends, Perspectives and Prospects," Science, vol. 349, no. 6245, pp. 255-260, 2015.

[3] G. Kaur and M. Jain, "A Comparison of Two Blending-based Ensemble Techniques for Network Anomaly Detection in Spark Distributed Environment," International Journal of Ad Hoc and Ubiquitous Computing, vol. 35, no. 2, pp. 71-83, 2020.

[4] M. Zaharia et al., "Apache Spark: A Unified Engine for Big Data Processing," Communications of the ACM, vol. 59, no. 11, pp. 56-65, 2016.

[5] I. Sharafaldin, A. H. Lashkari, S. Hakak and A. A. Ghorbani, "Developing Realistic Distributed Denial of Service (DDoS) Attack Dataset and Taxonomy," Proc. of the 2019 IEEE Int. Carnahan Conf. on Security Technology (ICCST), pp. 1-8, Chennai, India, 2019.

[6] S. Manickam et al., "Labelled Dataset on Distributed Denial-of-Service (DDoS) Attacks Based on Internet Control Message Protocol Version 6 (ICMPv6)," Wireless Communications and Mobile Computing, vol. 2022, Article ID 8060333, DOI: 10.1155/2022/8060333, 2022.

[7] T. H. Chua and I. Salam, "Evaluation of Machine Learning Algorithms in Network-based Intrusion Detection Using Progressive Dataset," Symmetry, vol. 15, no. 6, p. 1251, 2023.

[8] B. I. Farhan and A. D. Jasim, "Performance Analysis of Intrusion Detection for Deep Learning Model Based on CSE-CIC-IDS2018 Dataset," Indonesian Journal of Electrical Engineering and Computer Science, vol. 26, no. 2, pp. 1165-1172, 2022.

[9] A. Elhanashi, K. Gasmi, A. Begni, P. Dini, Q. Zheng and S. Saponara, "Machine Learning Techniques for Anomaly-based Detection System on CSE-CIC-IDS2018 Dataset," Proc. of the Int. Conf. on Applications in Electronics Pervading Industry, Environment and Society (ApplePies 2022), Part of the Lecture Notes in Electrical Engineering Book Series, vol. 1036, pp. 131-140, Springer, 2022.

[10] I. F. Kilincer, F. Ertam and A. Sengur, "A Comprehensive Intrusion Detection Framework Using Boosting Algorithms," Computers and Electrical Engineering, vol. 100, p. 107869, 2022.

[11] R. Atefinia and M. Ahmadi, "Performance Evaluation of Apache Spark MLlib Algorithms on an Intrusion Detection Dataset," arXiv preprint, arXiv: 2212.05269, 2022.

[12] P. H. H. N. de Araujo et al., "Impact of Feature Selection Methods on the Classification of DDoS Attacks using XGBoost," Journal of Communication and Information Systems, vol. 36, no. 1, pp. 200-214, 2021.

[13] H. A. Alamri and V. Thayananthan, "Bandwidth Control Mechanism and Extreme Gradient Boosting Algorithm for Protecting Software-defined Networks against DDoS Attacks," IEEE Access, vol. 8, pp. 194269-194288, 2022.

[14] H. A. Alamri and V. Thayananthan, "Analysis of Machine Learning for Securing Software-defined Networking," Procedia Computer Science, vol. 194, pp. 229-236, 2021.

[15] R. Zhou, X. Wang, J. Yang, W. Zhang and S. Zhang, "Characterizing Network Anomaly Traffic withEuclidean Distance-based Multiscale Fuzzy Entropy," Security and Communication Networks, vol. 2021, Article ID 5560185, DOI: 10.1155/2021/5560185, 2021.

[16] T. Dong, S. Li, H. Qiu and J. Lu, "An Interpretable Federated Learning-based Network Intrusion Detection Framework," arXiv preprint, arXiv: 2201.03134, 2022.

[17] N. Ahuja, G. Singal, D. Mukhopadhyay and N. Kumar, "Automated DDOS Attack Detection in Software Defined Networking," Journal of Network and Computer Applications, vol. 187, p. 103108, DOI: 10.1016/j.jnca.2021.103108, 2021.

[18] M. I. Mohmand et al., "A Machine Learning-based Classification and Prediction Technique for DDoS Attacks," IEEE Access, vol. 10, pp. 21443-21454, 2022.

[19] T. G. Zewdie and A. Girma, "An Evaluation Framework for Machine Learning Methods in Detection of DoS and DDoS Intrusion," Proc. of the 2022 IEEE Int. Conf. on Artificial Intelligence in Information and Communication (ICAIIC), pp. 115-121, Jeju Island, Korea, 2022.

[20] A. Alsirhani, S. Sampalli and P. Bodorik, "DDoS Attack-detection System: Utilizing Classification Algorithms with Apache Spark," Proc. of the 2018 9th IEEE IFIP Int. Conf. on New Technologies, Mobility and Security (NTMS), pp. 1-7, Paris, France, 2018.

[21] A. Alsirhani, S. Sampalli and P. Bodorik, "DDoS Detection System: Utilizing Gradient Boosting Algorithm and Apache Spark," Proc. of the 2018 IEEE Canadian Conf. on Electrical & Computer Engineering (CCECE), pp. 1-6, Quebec, Canada, 2018.

[22] C. J. Hsieh and T. Y. Chan, "Detection DDoS Attacks Based on Neural-Network Using Apache Spark," Proc. of the 2016 IEEE Int. Conf. on Applied System Innovation (ICASI), pp. 1-4, Okinawa, Japan, 2016.

[23] K. Kato and V. Klyuev, "Development of a Network Intrusion Detection System Using Apache Hadoop and Spark," Proc. of the 2017 IEEE Conf. on Dependable and Secure Computing, pp. 416-423, Taipei, Taiwan, 2017.

[24] M. Jain and G. Kaur, "Distributed Anomaly Detection Using Concept Drift Detection Based Hybrid Ensemble Techniques in Streamed Network Data," Cluster Computing, vol. 24, pp. 2099-2114, 2021.

[25] S. Gumaste, D. G. Narayan, S. Shinde and K. Amit, "Detection of DDoS Attacks in OpenStack-based Private Cloud Using Apache Spark," Journal of Telecommunications and Information Technology, vol. 2020, no. 4, pp. 62-71, 2020.

[26] B. Zhou, J. Li, J. Wu, S. Guo, Y. Gu and Z. Li, "Machine-learning-based Online Distributed Denial-of-Service Attack Detection Using Spark Streaming," Proc. of the 2018 IEEE Int. Conf. on Communications (ICC), pp. 1-6, Kansas City, USA, 2018.

[27] M. J. Awan et al., "Real-time DDoS Attack Detection System Using Big Data Approach," Sustainability, vol. 13, no. 19, p. 10743, 2021.

[28] M. Alduailij, Q. W. Khan, M. Tahir, M. Sardaraz, M. Alduailij and F. Malik, "Machine-learning-based DDoS Attack Detection Using Mutual Information and Random Forest Feature Importance Method," Symmetry, vol. 14, no. 6, p. 1095, 2022.

[29] N. V. Patil, C. R. Krishna and K. Kumar, "SSK-DDoS: Distributed Stream Processing Framework Based Classification System for DDoS Attacks," Cluster Computing, vol. 25, no. 2, pp. 1355-1372, 2022.

[30] C. S. Shieh et al., "Detection of Unknown DDoS Attacks with Deep Learning and Gaussian Mixture Model," Applied Sciences, vol. 11, pp. 11, p. 5213, 2021.

[31] D. Alghazzawi, O. Bamasag, H. Ullah and M. Z. Asghar, "Efficient Detection of DDoS Attacks Using a Hybrid Deep Learning Model with Improved Feature Selection," Applied Sciences, vol. 11, no. 24, p. 11634, 2021.

[32] A. Chartuni and J. Márquez, "Multi-classifier of DDoS Attacks in Computer Networks Built on Neural Networks," Applied Sciences, vol. 11, no. 22, p. 10609, 2021.

[33] Y. Yilmaz and S. Buyrukoglu, "Development and Evaluation of Ensemble Learning Models for Detection of Distributed Denial-of-Service Attacks in Internet of Things," Hittite Journal of Science & Engineering, vol. 9, no. 2, pp. 73-82, 2022.

[34] M. Seydali, F. Khunjush and J. Dogani, "Streaming Traffic Classification: A Hybrid Deep Learning and Big Data Approach," Cluster Computing, DOI: 10.1007/s10586-023-04234-0, 2024.

[35] S. M. S. Bukhari et al., "Secure and Privacy-preserving Intrusion Detection in Wireless Sensor Networks: Federated Learning with SCNN-Bi-LSTM for Enhanced Reliability," Ad Hoc Networks, vol. 155, p. 103407, 2024.

[36] "ColabCode," [Online], Available: Https://Colab.Research.Google.Com/Drive/1oZu2cz

CK9tJSwcjEfyv LiZnrI0JqYW62?Usp=Sharing, December 22, 2023.