LONGCGDROID: ANDROID MALWARE DETECTION THROUGH LONGITUDINAL STUDY FOR MACHINE LEARNING AND DEEP LEARNING


(Received: 30-Aug.-2023, Revised: 3-Oct.-2023 and 24-Oct.-2023 , Accepted: 31-Oct.-2023)
This study aims to compare the longitudinal performance between machine-learning and deep-learning classifiers for Android malware detection, employing different levels of feature abstraction. Using a dataset of 200k Android apps labeled by date within a 10-year range (2013-2022), we propose the LongCGDroid, an image-based effective approach for Android malware detection. We use the semantic Call Graph API representation that is derived from the Control Flow Graph and Data Flow Graph to extract abstracted API calls. Thus, we evaluate the longitudinal performance of LongCGDroid against API changes. Different models are used; machine-learning models (LR, RF, KNN, SVM) and deep-learning models (CNN, RNN). Empirical experiments demonstrate a progressive decline in performance for all classifiers when evaluated on samples from later periods. However, the deep-learning CNN model under the class abstraction maintains a certain stability over time. In comparison with eight state-of-the-art approaches, LongCGDroid achieves higher accuracy.

[1] Statista.com, "Mobile Operating Systems’ Market Share Worldwide from January 2022 to January 2023," [Online], Available: https://www.statista.com/statistics/272698/global-market-share-held-by-mobile- operating-systems-since-2009/.

[2] Security List by Kaspersky, "IT Threat Evolution Q1 2023, Mobile Statistics," [Online], Aailable: https://securelist.com/it-threat-evolution-q1-2023-mobile-statistics/109893/.

[3] W. Wang et al., "Constructing Features for Detecting Android Malicious Applications: Issues, Taxonomy and Directions," IEEE Access, vol. 7, pp. 67602-67631, 2019.

[4] Y. C. Shyong, T. H. Jeng and Y. M. Chen, "Combining Static Permissions and Dynamic Packet Analysis to Improve Android Malware Detection," Proc. of the 2nd Int. Conf. on Computer Communication and the Internet (ICCCI), pp. 75-81, Nagoya, Japan, 2020.

[5] L. Li, T. F. Bissyandé, M. Papadakis et al., "Static Analysis of Android Apps: A Systematic Literature Review," Information and Software Technology, vol. 88, pp. 67-95, 2017.

[6] H. Cai, "Embracing Mobile App Evolution via Continuous Ecosystem Mining and Characterization," Proc. of the IEEE/ACM 7th Int. Conf. on Mobile Software Engineering and Systems, pp. 31–35, Seoul, Korea, 2020.

[7] H. Cai and B. Ryder, "A Longitudinal Study of Application Structure and Behaviors in Android," IEEE Transactions on Software Engineering, vol. 47, no. 12, pp. 2934-2955, 2021.

[8] H. Cai, X. Fu and A. Hamou-Lhadj, "A Study of Run-time Behavioral Evolution of Benign versus Malicious Apps in Android," Information and Software Technology, vol. 122, p. 106291, 2020.

[9] F. Pendlebury, F. Pierazzi, R. Jordaney, J. Kinder and L. Cavallaro, "TESSERACT: Eliminating Experimental Bias in Malware Classification across Space and Time," Proc. of the 28th USENIX Conf. on Security Symposium, pp. 729–746, 2019.

[10] L. Nguyen-Vu, J. Ahn and S. Jung, "Android Fragmentation in Malware Detection," Computers & Security, vol. 87, p. 101573, 2019.

[11] G. Suarez-Tangil and G. Stringhini, "Eight Years of Rider Measurement in the Android Malware Ecosystem: Evolution and Lessons Learned," IEEE Transactions on Dependable and Secure Computing, vol. 19, pp. 107-118, 2018.

[12] A. Guerra-Manzanares and H. Bahsi, "On the Relativity of Time: Implications and Challenges of Data Drift on Long-term Effective Android Malware Detection," Computers & Security, vol. 122, p. 102835, 2022.

[13] F. Ceschin et al., "Fast & Furious: On the Modeling of Malware Detection As an Evolving Data Stream," Expert Systems with Applications, vol. 212, p. 118590, 2023.

[14] L. Onwuzurike et al., "MAMADROID: Detecting Android Malware by Building Markov Chains of Behavioral Models," ACM Transactions on Privacy and Security, vol. 22, no. 2, pp. 1-34, 2019.

[15] E. B. Karbab, M. Debbabi, A. Derhab and D. Mouheb, "Android Malware Detection Using Deep Learning API Method Sequences," Digital Investigation, vol. 24, pp. S48-S59, 2017.

[16] K. Xu et al., "DroidEvolver: Self-Evolving Android Malware Detection System," Proc. of the 2019 IEEE European Symp. on Security and Privacy (EuroS&P), pp. 47-62, Stockholm, Sweden, 2019.

[17] G. Suarez-Tangil et al., "DroidSieve: Fast and Accurate Classification of Obfuscated Android Malware," Proc. of the 7th ACM on Conference on Data and Application Security and Privacy, pp. 309–320, DOI: 10.1145/3029806.3029825, 2017.

[18] L. N. Vu and S. Jung, "AdMat: A CNN-on-Matrix Approach to Android Malware Detection and Classification," IEEE Access, vol. 9, pp. 39680-39694, 2021.

[19] J. Garcia, M. Hammad and S. Malek, "Lightweight, Obfuscation-resilient Detection and Family Identification of Android Malware," Proc. of the 40th Int. Conf. on Software Engineering, p. 497, DOI: 10.1145/3180155.3182551, 2018.

[20] H. Cai, N. Meng, B. Ryder and D. Yao, "DroidCat: Effective Android Malware Detection and Categorization via App-Level Profiling," IEEE Transactions on Information Forensics and Security, vol.14, no. 6, pp. 1455-1470, 2019.

[21] H. Cai, "Assessing and Improving Malware Detection Sustainability through App Evolution Studies," ACM Trans. Softw. Eng. Methodol., vol. 29, no. 2, p. Article 8, 2020.

[22] N. Elenkov, Android Security Internals: An In-Depth Guide to Android's Security Architecture, ISBN-10: 9781593275815, No Starch Press, 2014.

[23] A. Desnos. "Androguard-reverse Engineering, Malware and Goodware Analysis of Android Applications," [Online],Aailable: https://github.com/androguard/androguard.

[24] S. Arzt et al., "FlowDroid: Precise Context, Flow, Field, Object-sensitive and Lifecycle-aware Taint Analysis for Android Apps," SIGPLAN Not., vol. 49, no. 6, pp. 259–269, 2014.

[25] R. Nix and J. Zhang, "Classification of Android Apps and Malware Using Deep Neural Networks," Proc. of the Int. Joint Conf. on Neural Networks (IJCNN), pp. 1871-1878, Anchorage, USA, 2017.

[26] S. Y. Yerima and S. Khan, "Longitudinal Performance Analysis of Machine Learning-based Android Malware Detectors," Proc. of the 2019 Int. Conf. on Cyber Security and Protection of Digital Services (Cyber Security), pp. 1-8, Oxford, UK, 2019.

[27] J. Jung, H. Kim, D. Shin, M. Lee, H. Lee, S.-j. Cho and K. Suh, "Android Malware Detection Based on Useful API Calls and Machine Learning," Proc. of the 2018 IEEE 1st Int. Conf. on Artificial Intelligence and Knowledge Engineering (AIKE), pp. 175-178, Laguna Hills, USA, 2018.

[28] N. Peiravian and X. Zhu, "Machine Learning for Android Malware Detection Using Permission and API Calls," Proc. of the 2013 IEEE 25th Int. Conf. on Tools with Artificial Intelligence, pp. 300-305, Herndon, USA, 2013.

[29] M. Qiao, A. H. Sung and Q. Liu, "Merging Permission and API Features for Android Malware Detection," Proc. of the 2016 5th IIAI Int. Congress on Advanced Applied Informatics (IIAI-AAI), pp. 566-571, Kumamoto, Japan, 2016.

[30] A. H. E. Fiky, A. Elshenawy and M. A. Madkour, "Detection of Android Malware Using Machine Learning," Proc. of the IEEE Int. Mobile, Intelligent and Ubiquitous Computing Conf. (MIUCC), pp. 9- 16, Cairo, Egypt, 2021.

[31] M. Alazab et al., "Intelligent Mobile Malware Detection Using Permission Requests and API Calls," Future Generation Computer Systems, vol. 107, pp. 509-521, 2020.

[32] Z. Wang, K. Li, Y. Hu, A. Fukuda and W. Kong, "Multilevel Permission Extraction in Android Applications for Malware Detection," Proc. of the 2019 Int. Conf. on Computer, Information and Telecommunication Systems (CITS), pp. 1-5, Beijing, China, 2019.

[33] J. Li, L. Sun, Q. Yan, Z. Li, W. Srisa-an and H. Ye, "Significant Permission Identification for Machine Learning-based Android Malware Detection," IEEE Transactions on Industrial Informatics, vol. 14, no. 7, pp. 3216-3225, 2018.

[34] C. Yang, Z. Xu, G. Gu, V. Yegneswaran and P. Porras, "DroidMiner: Automated Mining and Characterization of Fine-grained Malicious Behaviors in Android Applications," Proc. of the European Symposium on Research in Computer Security (ESORICS 2014), vol. 8712, pp. 163-182, 2014.

[35] T. E. Wei et al., "DroidExec: Root Exploit Malware Recognition against Wide Variability via Folding Redundant Function-relation Graph," Proc. of the 17th Int. Conf. on Advanced Communication Technology (ICACT), pp. 161-169, PyeongChang, Korea, 2015.

[36] A. Narayanan, L. Yang, L. Chen and L. Jinliang, "Adaptive and Scalable Android Malware Detection through Online Learning," Proc. of the Int. Joint Conf. on Neural Networks (IJCNN), pp. 2484-2491, Vancouver, Canada, 2016.

[37] Y. Wu, J. Shi, P. Wang, D. Zeng and C. Sun, "DeepCatra: Learning Flow- and Graph-based Behaviours for Android Malware Detection," IET Information Security, vol. 17, no. 1, pp. 118-130, 2023.

[38] T. Lei, Z. Qin, Z. Wang, Q. Li and D. Ye, "EveDroid: Event-aware Android Malware Detection against Model Degrading for IoT Devices," IEEE Internet of Things Journal, vol. 6, no. 4, pp. 6668-6680, 2019.

[39] J. McGiff et al., "Towards Multimodal Learning for Android Malware Detection," Proc. of the Int. Conf. on Computing, Networking and Communicat. (ICNC), pp. 432-436, Honolulu, USA, 2019.

[40] D. Li et al., "Opcode Sequence Analysis of Android Malware by a Convolutional Neural Network," Concurrency and Computation: Practice and Experience, vol. 32, no. 18, p. e5308, 2020.

[41] X. Sun et al., "Android Malware Detection Using Sequential Convolutional Neural Networks," Journal of Physics: Conference Series, vol. 1168, no. 6, p. 062010, 2019.

[42] N. McLaughlin et al., "Deep Android Malware Detection," Proc. of the Seventh ACM on Conf. on Data and Application Security and Privacy, pp. 301–308, DOI: 10.1145/3029806.3029823, 2017.

[43] T. H.-D. Huang and H.-Y. Kao, "R2-D2: ColoR-inspired Convolutional NeuRal Network (CNN)-based AndroiD Malware Detections," arXiv: 1705.04448 [cs.CR], 2018.

[44] P. Faruki, B. Buddhadev, B. Shah, A. Zemmari, V. Laxmi and M. S. Gaur, "DroidDivesDeep: Android Malware Classification via Low Level Monitorable Features with Deep Neural Networks," Proc. of the Int. Conf. on Security & Privacy (ISEA-ISAP 2019), vol. 939, pp. 125-139, 2019.

[45] K. Allix, T. F. Bissyandé, J. Klein and Y. L. Traon, "AndroZoo: Collecting Millions of Android Apps for the Research Community," Proc. of the 2016 IEEE/ACM 13th Working Conf. on MiningSoftware Repositories (MSR), pp. 468-471, Austin, USA, 2016.

[46] VirusShare, "VirusShare.com - Because Sharing is Caring," [Online], Available: https://virusshare.com/.

[47] Virustotal, "Analyse Suspicious Files," [Online], Available: https://www.virustotal.com/.

[48] A. Salem, S. Banescu and A. Pretschner, "Don't Pick the Cherry: An Evaluation Methodology for Android Malware Detection Methods," arXiv: 1903.10560 [cs.CR], 2019.

[49] Android, "Android APIs Reference," [Online], Available: https://developer.android.com/reference/packages.