BENCHMARKING MACHINE LEARNING ALGORITHMS FOR ANDROID MALWARE DETECTION

(Received: 3-Jun.-2019, Revised: 31-Jul.,-2019 and 12-Aug.-2019 , Accepted: 4-Sep.-2019)

Authors Somayyeh Fallah, Amir Jalaly Bidgoly,

Keywords #Android malware #Malware detection #Network traffic #Machine learning

Abstract Nowadays, smartphones have captured a significant part of human life and has led to an increasing number of users involved with this technology. The rising number of users has encouraged hackers to generate malicious applications. Identifying these malwares is critical for preserving the security and privacy of users. The recent trend of cyber security shows that threats can be effectively identified using network-based detection techniques and machine learning methods. In this paper, several well-known methods of machine learning were investigated for smartphone malware detection using network traffic. A wide range of malware families are used in the investigations, including Adware, Ransomware, Scareware and SMS Malware. Also, the most used and famous supervised and unsupervised machine learning methods are considered. This article benchmarked the methods from different points of view, such as the required features count, the recorded traffic volume, the ability of malware family identification and the ability of a new malware family detection. The results showed that using these methods with appropriate features and traffic volume would achieve the F1-measure of malware detection by a percentage of about 90%. However, these methods did not show acceptable results in detecting malicious as well as new families of malware. The paper also explained some of the challenges and potential research problems in this context which can be used by researchers interested in this field.

References

[1] T. T. Mikko Hypponen, "F-Secure 2017 State of Cybersecurity Report," F-Secure, Tech. Rep., 2017.

[2] S. -H. Seo, A. Gupta, A. M. Sallam, E. Bertino and K. Yim, "Detecting Mobile Malware Threats to Homeland Security through Static Analysis," Journal of Network and Computer Applications, vol. 38, pp. 43-53, 2014.

[3] M. Finsterbusch, C. Richter, E. Rocha, J.-A. Muller and K. Hanssgen, "A Survey of Payload-based Traffic Classification Approaches," IEEE Communications Surveys & Tutorials, vol. 16, no. 2, pp. 1135-1156, 2013.

[4] H. Singh, "Performance Analysis of Unsupervised Machine Learning Techniques for Network Traffic Classification," Proc. of the 5th IEEE International Conference on Advanced Computing & Communication Technologies, pp. 401-404, , 2015.

[5] S. Zander, T. Nguyen and G. Armitage, "Automated Traffic Classification and Application Identification Using Machine Learning," Proc. of IEEE Conference on Local Computer Networks 30th Anniversary (LCN'05), pp. 250-257, 2005.

[6] F. A. Narudin, A. Feizollah, N. B. Anuar and A. Gani, "Evaluation of Machine Learning Classifiers for Mobile Malware Detection," Soft Computing, vol. 20, no. 1, pp. 343-357, 2016.

[7] S. Garg, S. K. Peddoju and A. K. Sarje, "Network-based Detection of Android Malicious Apps,"International Journal of Information Security, vol. 16, no. 4, pp. 385-400, 2017.

[8] Y. Pang et al., "Finding Android Malware Trace from Highly Imbalanced Network Traffic," Proc. of IEEE International Conference on Computational Science and Engineering (CSE) and IEEE International Conference on Embedded and Ubiquitous Computing (EUC), vol. 1, pp. 588-595, 2017.

[9] S. Pooryousef and K. Fouladi, "Proposing a New Feature for Structure-Aware Analysis of Android Malwares," Prco. of the 14th IEEE International ISC (Iranian Society of Cryptology) Conference on Information Security and Cryptology (ISCISC), pp. 93-98, 2017.

[10] A. Arora, S. Garg and S. K. Peddoju, "Malware Detection Using Network Traffic Analysis in Android Based Mobile Devices," Proc. of the 8th IEEE International Conference on Next Generation Mobile Apps, Services and Technologies, pp. 66-71, 2014.

[11] J. Erman, A. Mahanti and M. Arlitt, "Qrp05-4: Internet Traffic Identification Using Machine Learning," IEEE Globecom, pp. 1-6, 2006.

[12] J. Erman, M. Arlitt and A. Mahanti, "Traffic Classification Using Clustering Algorithms," Proceedings of the 2006 SIGCOMM Workshop on Mining Network Data, ACM, pp. 281-286, 2006.

[13] A. Arora and S. K. Peddoju, "Minimizing Network Traffic Features for Android Mobile Malware Detection," Proceedings of the 18th International Conference on Distributed Computing and Networking, ACM, p. 32, 2017.

[14] T. Bujlow, T. Riaz and J. M. Pedersen, "Classification of HTTP Traffic Based on C5. 0 Machine Learning Algorithm," Proc. of IEEE Symposium on Computers and Communications (ISCC), pp. 000882-000887, 2012.

[15] O. M. Alhawi, J. Baldwin and A. Dehghantanha, "Leveraging Machine Learning Techniques for Windows Ransomware Network Traffic Detection," Cyber Threat Intelligence, pp. 93-106, 2018.

[16] F. Pendlebury, F. Pierazzi, R. Jordaney, J. Kinder and L. Cavallaro, "TESSERACT: Eliminating Experimental Bias in Malware Classification across Space and Time," arXiv preprint arXiv:1807.07838, 2018.

[17] D. Nancy and D. Sharma, "Android Malware Detection Using Decision Trees and Network Traffic," International Journal of Computer Science and Information Technologies, vol. 7, no. 4, pp. 1970-1974, 2016.

[18] Z. Chen et al., "Machine Learning Based Mobile Malware Detection Using Highly Imbalanced Network Traffic," Information Sciences, vol. 433, pp. 346-364, 2018.

[19] M. Zaman, T. Siddiqui, M. R. Amin and M. S. Hossain, "Malware Detection in Android by Network Traffic Analysis," Proc. of IEEE International Conference on Networking Systems and Security (NSysS), pp. 1-5, 2015.

[20] K. Allix, T. F. D. A. Bissyande, Q. Jerome, J. Klein and Y. Le Traon, "Empirical Assessment of Machine Learning-based Malware Detectors for Android: Measuring the Gap Between in-the-lab and in-the-wild Validation Scenarios," Empirical Software Engineering, pp. 1-29, 2014.

[21] L. Chen, M. Zhang, C.-Y. Yang and R. Sahita, "Semi-supervised Classification for Dynamic Android Malware Detection," arXiv preprint arXiv:1704.05948, 2017.

[22] H. Debar, M. Dacier and A. Wespi, "A Revised Taxonomy for Intrusion-detection Systems," Annales des Télécommunications, Springer, vol. 55, no. 7-8, pp. 361-378, 2000.

[23] I. Homoliak, Intrusion Detection in Network Traffic, Dissertation, Faculty of Information Technology, University of Technology, 2016.

[24] A. H. Lashkari, A. F. A. Kadir, H. Gonzalez, K. F. Mbah and A. A. Ghorbani, "Towards a Network-based Framework for Android Malware Detection and Characterization," Proc. of the 15th IEEE Annual Conference on Privacy, Security and Trust (PST), pp. 233-239, 2017.

[25] A. H. Lashkari, A. F. A. Kadir, L. Taheri and A. A. Ghorbani, "Toward Developing a Systematic Approach to Generate Benchmark Android Malware Datasets and Classification," Proc. of IEEE International Carnahan Conference on Security Technology (ICCST), pp. 1-7, 2018.

[26] A. Jović, K. Brkić and N. Bogunović, "A Review of Feature Selection Methods with Applications," Proc. of the 38th IEEE International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), pp. 1200-1205, 2015.

[27] S. Maldonado, R. Weber and J. Basak, "Simultaneous Feature Selection and Classification Using Kernel-penalized Support Vector Machines," Information Sciences, vol. 181, no. 1, pp. 115-128, 2011.

[28] M. Q. Nguyen and J. P. Allebach, "Feature Ranking and Selection Used in a Machine Learning Framework for Predicting Uniformity of Printed Pages," Electronic Imaging, vol. 2017, no. 12, pp. 166-173, 2017.

[29] H. Alaidaros, M. Mahmuddin and A. Al Mazari, "An Overview of Flow-based and Packet-based Intrusion Detection Performance in High Speed Networks," Proceedings of the International Arab Conference on Information Technology, pp. 1-9, 2011.