CURATING DATASETS TO ENHANCE SPYWARE CLASSIFICATION


(Received: 22-Jun.-2024, Revised: 26-Aug.-2024 , Accepted: 14-Sep.-2024)
Current methods for spyware classification lack effectiveness as well-structured datasets are typically absent, especially those with directionality properties in their set of features. In this particular research work, the efficacy of directionality properties for classification is explored, through engineered features from those on existing datasets. This study curates two datasets, Dataset A which includes features extracted from only single directional packet flows and Dataset B which includes those from bi-directional packet flows. Classification with these features is performed with selected classifiers, where SVM obtained the highest accuracy with 99.88% for Dataset A, while the highest accuracy went to RF, DT and XGBoost for Dataset B with 99.24%. Comparing these results with those from existing research work, the directional properties in these engineered features are able to provide improvements in terms of accuracy in classifying these spywares.

[1] T. Munusamy and T. Khodadi, "Building Cyber Resilience: Key Factors for Enhancing Organizational Cyber Security," Journal of Informatics and Web Engineering, vol. 2, no. 2, pp. 59-71, 2023.

[2] M. Al-Hashedi, L.K. Soon, H. N. Goh, A. H. L. Lim and E. G. Siew, "Cyberbullying Detection Based on Emotion," IEEE Access, vol. 11, pp. 53907-53918, 2023.

[3] R. Thangaveloo et al., "Datdroid: Dynamic Analysis Technique in Android Malware Detection," Int. J. on Advanced Science, Engineering and Information Technology, vol. 10, no. 2, pp. 536-541, 2020.

[4] T.A.A. Abdullah, W. Ali, S. Malebary and A. A. Ahmed, "A Review of Cyber Security Challenges: Attacks and Solutions for Internet of Things-based Smart Home," Int. J. of Computer Science and Network Security, vol. 19, no. 9, pp. 139-146, 2019.

[5] A. S. Grillis, "What is Spyware?" [Online], Available: https://www.techtarget.com/ searchsecurity/definition/spyware, Dec. 12, 2023.

[6] S. S. Rawat and A. K. Mishra, "Review of Methods for Handling Class-imbalanced in Classification Problems," arXiv preprint, arXiv: 2211.05456, 2022.

[7] M. Botacin et al., "On the Security of Application Installers and Online Software Repositories," Proc. of the 17th Int. Conf. on Detection of Intrusions and Malware, and Vulnerability Assessment (DIMVA2020), pp. 192-214, Lisbon, Portugal, 2020.

[8] M. K. Qabalin, M. Naser and M. Alkasassbeh, "Android Spyware Detection Using Machine Learning: A Novel Dataset," Sensors, vol. 22, no. 15, pp. 5765-5790, 2022.

[9] Google Play, "PCAPdroid-Network Monitor Apps," [Online], Available: https://play.google.com/apps/, Jan. 08, 2024.

[10] M. Conti, G. Rigoni and F. Toffalini, "ASAINT: A Spy App Identification System Based on Network Traffic," Proc. of the 15th Int. Conference on Availability, Reliability and Security, Article no. 51, pp. 1-8, DOI:10.1145/3407023.3407076, August 2020.

[11] WireShark, Go Deep, [Online], Available: https://www.wireshark.org/, Dec. 12, 2023.

[12] Google Play, "DroidBox Mikrotik Config Tool-Apps," [Online], Available: https://play.Google.com/ store/ apps, Dec. 12, 2023.

[13] Google, "Google Photos," [Online], Available: https://www.google.com/photos/about/, Jan. 08, 2024.

[14] M. Naser and Q. A. Al-Haija, "Spyware Identification for Android Systems Using Fine Trees," Information, vol. 14, no. 2, pp. 1-10, 2023.

[15] D. Noetzold et al., "Spyware Integrated with Prediction Models for Monitoring Corporate Computers," Preprints.org, vol. 1, DOI: 10.20944/preprints .202301.0580.v1, 2023.

[16] F. Pierazzi, R. Emilia, R and I. V. S. Subrahmanian, "A Data-driven Characterization of Modern Android Spyware," ACM Transactions on Management Information Systems, vol. 11, pp. 1-38, 2020.

[17] VirusTotal-Home, [Online], Available: https://www.virustotal.com/gui/home/, Dec. 07, 2023.

[18] V. Mahesh and S. D. KA, "Detection and Prediction of Spyware for User Applications by Interdisciplinary Approach," Proc. of 2020 Int. Conf. on Computational Intelligence for Smart Power System and Sustainable Energy (CISPSSE), DOI: 10.1109/CISPSSE49931.2020.9212222, Keonjhar, India, July 1-6, 2020.

[19] O. F. Catak, "API Call Based Malware Dataset," [Online], Available: https://www.kaggle.com./datasets/ focatak/ malapi2019, Dec. 08, 2019.

[20] Kaggle, "Your Machine Learning and Data Science Community," [Online], Available: https://www.kaggle.com/, Nov. 01, 2024.

[21] N. Zahan, P. Burckhardt, M. Lysenko, F. Aboukhadijeh and L. Williams, "MalwareBench: Malware Samples Are Not Enough," Proc. of 2024 IEEE/ACM 21st Int. Conf. on Mining Software Repositories (MSR), pp. 728-732, DOI: 10.1145/3643991.3644883, April 2024.

[22] Z. Zhang, P. Qi and W. Wang, "Dynamic Malware Analysis with Feature Engineering and Feature Learning," Proc. of 34th AAAI Conf. on Artificial Intelligence (AAAI-20), pp. 1210-1217, April 2020.

[23] D. Gibert et al., "Fusing Feature Engineering and Deep Learning: A Case Study for Malware Classification," Expert Systems with Applications, vol. 207, pp. 117957-117974, 2022.

[24] E. Masabo, K. S. Kaawaase, J. S. Otim, J. Ngubiri and D. Hanyurwimfura, "Improvement of Malware Classification Using Hybrid Feature Engineering," SN Computer Science, vol. 1, pp. 1-14, 2020.

[25] A. Nawaz, "Feature Engineering Based on Hybrid Features for Malware Detection over Android Framework," Turkish J. of Computer and Mathematics Education, vol. 12, no. 10, pp. 2856-2864, 2021.

[26] M. Humayun, N. Z. Jhanjhi and M. Z. Alamri, "Smart Secure and Energy Efficient Scheme for E-Health Applications Using IoT: A Review," Int. J. of Computer Science and Network Security, vol. 20, no. 4, pp. 55-74, 2020.

[27] Apktool, "Apktool," [Online], Available: https://apktool.org/, Dec. 01, 2024.

[28] J. Jung, J. Park, S. J. Cho, S. Han, M. Park and H. H. Cho, "Feature Engineering and Evaluation for Android Malware Detection Scheme," J. of Internet Technology, vol. 22, no. 2, pp. 423-440, 2021.

[29] K. Allix et al., "AndroZoo: Collecting Millions of Android Apps for the Research Community," Proc. of the 13th Int. Conf. on Mining Software Repositories (MSR), pp. 468-471, Austin, USA, May 2016.

[30] M. X. Low et al., "Comparison of Label Encoding and Evidence Counting for Malware Classification," Journal of System and Management Sciences, vol. 12, no. 6, pp. 17-30, 2022.

[31] T. N. AlMasri and M. A. N. AlDalaien, "Detecting Spyware in Android Devices Using Random Forest," Proc. of the 2023 Int. Conf. on Advances in Comput. Research (ACR’23), pp. 294-315, 2023.

[32] N. Ben-Asher, S. Hutchinson and A. Oltramari, "Characterizing Network Behavior Features Using a Cyber-security Ontology," Proc. of MILCOM 2016-2016 IEEE Military Communications Conf., pp. 758-763, Baltimore, USA, November 2016.

[33] S. Misra, M. Tan, M. Rezazad, M. R. Brust and N. M. Cheung, "Early Detection of Crossfire Attacks Using Deep Learning," arXiv preprint, arXiv: 1801.00235, 2017.

[34] L. Zhou et al., "DDOS Attack Detection Using Packet Size Interval," Proc. of the 11th Int. Conf. on Wireless Comm., Networking and Mobile Computing (WiCOM), pp. 1-7, Shanghai, China, 2015.

[35] A. Iorliam et al., "Flow Size Difference Can Make a Difference: Detecting Malicious TCP Network Flows Based on Benford's Law," arXiv preprint, arXiv: 1609.04214, 2016.

[36] N. Davis, G. Raina and K. Jagannathan, "A Framework for End-to-End Deep Learning-based Anomaly Detection in Transportation Networks," Transportation Research Interdisciplinary Perspectives, vol. 5, pp. 100-112, 2020.

[37] M. Kuchnik et al., "Plumber: Diagnosing and Removing Performance Bottlenecks in Machine Learning Data Pipelines," Proc. of Machine Learning and Systems, vol. 4, pp.33-51, 2022.