A MACHINE LEARNING BASED DECISION SUPPORT FRAMEWORK FOR BIG DATA PIPELINE MODELING AND DESIGN


(Received: 25-Mar.-2024, Revised: 13-May-2024 , Accepted: 27-May-2024)
The data warehousing process requires an architectural revolution to settle big-data challenges and address new data sources, such as social networks, recommendation systems, smart cities and the web to extract value from shared data. In this respect, the pipeline-modeling community for the acquisition, storage and processing of data for analysis purposes is enacting a wide range of technological solutions that present significant challenges and difficulties. More specifically, the choice of the most appropriate tool for the user’s specific business needs and the interoperability between the different tools have become primary challenges. From this perspective, we propose in this paper a new interactive framework based on machine learning (ML) techniques to assist experts in the process of modeling a customized pipeline for data warehousing. More precisely, we elaborate first (i) an analysis of the experts’ requirements and the characteristics of the data to be processed, then (ii) we propose the most appropriate architecture to their requirements from a multitude of specific architectures instantiated from a generic one, by using (iii) several ML methods to predict the most suitable tool for each phase and task within the architecture. Additionally, our framework is validated through two real-world use cases and user feedback.

[1] T. P. Raptis and A. Passarella, "A Survey on Networked Data Streaming with Apache Kafka," IEEE Access, vol. 11, pp. 85333-85350, 2023.

[2] S. Mishra and A. Misra, "Structured and Unstructured Big Data Analytics," Proc. of the 2017 IEEE Int. Conf. on Current Trends in Computer, Electrical, Electronics and Communication (CTCEEC), pp. 740-746, Mysore, India, 2017.

[3] A. Davoudian and M. Liu, "Big Data Systems: A Software Engineering Perspective," ACM Computing Surveys (CSUR), vol. 53, no. 5, pp. 1-39, 2020.

[4] K. Rahul, R. K. Banyal and N. Arora, "A Systematic Review on Big Data Applications and Scope for Industrial Processing and Healthcare Sectors," Journal of Big Data, vol. 10, Article no. 133, 2023.

[5] A. Dhaouadi, W. Paccoud, K. Bousselmi, S. Monnet, M. M. Gammoudi and S. Hammoudi, "Big Data Tools: Interoperability Study and Performance Testing," Proc. of the IEEE Int. Conf. on Big Data, MIDP Workshop (MIDP-2023), pp. 2386-2395, 2023.

[6] ISO/IEC, "ISO/IEC 25022:2016 - Systems and Software Engineering — Systems and Software Quality Requirements and Evaluation (SQuaRE) — Measurement of Quality in Use," ISO/IEC 25022:2016, [Online], Available: https://www.iso.org/standard/35746.html, 2016.

[7] J. Sulla-Torres, A. Gutierrez-Quintanilla, H. Pinto-Rodriguez, R. Gómez-Campos and M. A. Cossio-Bolaños, "Quality in Use of an Android-based Mobile Application for Calculation of Bone Mineral Density with the Standard ISO/IEC 25022," IJACSA, DOI: 10.14569/IJACSA.2020.0110821,  2020.

[8] A. Dhaouadi, K. Bousselmi, S. Monnet, M. M. Gammoudi and S. Hammoudi, "A Multi-layer Modeling for the Generation of New Architectures for Big Data Warehousing," Proc. of the 36th Int. Conf. on Advanced Information Networking and Applications (AINA- 2022), vol. 2, pp. 204–218, 2022.

[9] A. M. Olawoyin, C. K. Leung, C. CJ. Hryhoruk and A. Cuzzocrea, "Big Data Management for MachineLearning from Big Data," Proc. of the 37th Int. Conf. on Advanced Information Networking and Applications (AINA-2023), vol. 1, pp. 393–405, 2023.

[10] A. Abbasi, A. R. Javed, C. Chakraborty, J. Nebhen, W. Zehra and Z. Jalil, "ElStream: An Ensemble Learning Approach for Concept Drift Detection in Dynamic Social Big Data Stream Learning," IEEE Access, vol. 9, pp. 66408–66419, 2021.

[11] S. Yousfi, D. Chiadmi and M. Rhanoui, "Smart Big Data Framework for Insight Discovery," Journal of King Saud University-Computer and Information Sciences, vol. 34, no. 10, pp. 9777–9792, 2022.

[12] W. Inoubli, S. Aridhi, H. Mezni, M. Maddouri and E. M. Nguifo, "An Experimental Survey on Big Data Frameworks," Future Generation Computer Systems, vol. 86, pp. 546–564, 2018.

[13] S. Riaz, M. U. Ashraf and A. Siddiq, "A Comparative Study of Big Data Tools and Deployment Platforms," Proc. of the IEEE Int. Conf. on Engineering and Emerging Technologies (ICEET), pp. 1–6, Lahore, Pakistan, 2020.

[14] H. Daki, A. El Hannani, A. Aqqal, A. Haidine and A. Dahbi, "Big Data Management in Smart Grid: Concepts, Requirements and Implementation," Journal of Big Data, vol. 4, no. 1, pp. 1–19, 2017.

[15] M. R. Sureddy and P. Yallamula, "Approach to Help Choose Right Data Warehousing Tool for an Enterprise", Int. J. of Advance Research, Ideas and Innovat. in Technol., vol. 6, no. 4, pp. 579-583, 2020.

[16] Y. Cardinale, S. Guehis and M. Rukoz, "Classifying Big Data Analytic Approaches: A Generic Architecture," Proc. of the 12th Int. Joint Conf. on Software Technologies (ICSOFT), Part of the Book Series: Communi. in Computer and Information Science, vol. 868, pp. 268-295, Madrid, Spain, 2018.

[17] R. Tardio, A. Mate and J. Trujillo, "An Iterative Methodology for Defining Big Data Analytics Architectures," IEEE Access, vol. 8, pp. 210597–210616, 2020.

[18] S. Alkatheri, S. A. Abbas and M. A. Siddiqui, "A Comparative Study of Big Data Frameworks," Int. J. of Computer Science and Information Security (IJCSIS), vol. 17, no. 1, pp. 66-73, 2019.

[19] M. Khalid and M. Murtaza Yousaf, "A Comparative Analysis of Big Data Frameworks: An Adoption Perspective," Applied Sciences, vol. 11, no. 22, p. 11033, 2021.

[20] A. A. Aydin, "A Comparative Perspective on Technologies of Big Data Value Chain," IEEE Access, vol. 11, pp. 112133 – 112146, 2023.

[21] A. Naghib, N. J. Navimipour, M. Hosseinzadeh and A. Sharifi, "A Comprehensive and Systematic Literature Review on the Big Data Management Techniques in the Internet of Things," Wireless Networks, vol. 29, no. 3, pp. 1085-1144, 2023.