(Received: 19-Jul.-2021, Revised: 19-Sep.-2021 , Accepted: 5-Oct.-2021)
The number of applications needing big data is on the rise nowadays, where big data processing tasks are sent as workflows to cloud computing systems. Considering the recent advances in the Internet technology, cloud computing has become the most popular computing technology. The scheduling approach in cloud computing environments has always been a topic of interest to many researchers. This paper proposes a new scheduling algorithm for data-intensive workflows based on data dependencies in computational clouds. The proposed algorithm tries to minimize the makespan by considering the details of the workflow structure and virtual machines. The concepts and details defined and considered in this study have received less emphasis in previous works. According to the results, the proposed algorithm reduced the duration of communication between tasks and runtimes by taking into account the features of data-intensive workflows and proper task assignment. Consequently, it reduced the total makespan in comparison with previous algorithms.

[1] Y. Ahn and Y. Kim, "Auto-scaling of Virtual Resources for Scientific Workflows on Hybrid Clouds," Proc. of the 5th ACM Workshop on Scientific Cloud Computing (ScienceCloud '14), pp. 47-52, DOI: 10.1145/2608029.2608036, June 2014. 

[2] L. F. Bittencourt and E. R. M. Madeira, "HCOC: A Cost Optimization Algorithm for Workflow Scheduling in Hybrid Clouds," Journal of Internet Services and Applications, vol. 2, pp. 207-227, 2011.

[3] S. Sagiroglu and D. Sinanc, "Big Data: A Review," Proc. of the IEEE International Conference on Collaboration Technologies and Systems (CTS), pp. 42-47, San Diego, CA, USA, July 2013.

[4] K. Wang, K. Qiao, I. Sadooghi, X. Zhou, T. Li, M. Lang et al., "Load-balanced and Locality-aware Scheduling for Data-intensive Workloads at Extreme Scales," Concurrency and Computation: Practice and Experience, vol. 28, pp. 70-94, 2016.

[5] M. Zaharia, D. Borthakur, J. Sen Sarma, K. Elmeleegy, S. Shenker and I. Stoica, "Delay Scheduling: A Simple Technique for Achieving Locality and Fairness in Cluster Scheduling," Proc. of the 5th European Conf. on Computer Systems (EuroSys '10), pp. 265-278, DOI: 10.1145/1755913.1755940, April 2010.

[6] M. Zaharia, A. Konwinski, A. D. Joseph, R. H. Katz and I. Stoica, "Improving MapReduce Performance in Heterogeneous Environments," Proc. of the 8th USENIX Conference on Operating Systems Design and Implementation (OSDI'08), vol. 8, no. 4, pp. 29-42, December 2008.

[7] B. Lin, W. Guo and X. Lin, "Online Optimization Scheduling for Scientific Workflows with Deadline Constraint on Hybrid Clouds," Concurrency and Computation: Practice and Experience, vol. 28, pp. 3079-3095, August 2016.

[8] N. Xiong, X. Jia, L. T. Yang, A. V. Vasilakos, Y. Li and Y. Pan, "A Distributed Efficient Flow Control Scheme for Multirate Multicast Networks," IEEE Transactions on Parallel and Distributed Systems, vol. 21, no. 9, pp. 1254-1266, September 2010.

[9] J. Yin, X. Lu, X. Zhao, H. Chen and X. Liu, "BURSE: A Bursty and Self-similar Workload Generator for Cloud Computing," IEEE Trans. on Parallel and Distributed Sys., vol. 26, no. 3, pp. 668-680, 2015.

[10] Y. E. M. Hamouda, "Modified Random Bit Climbing (λ -mRBC) for Task Mapping and Scheduling in Wireless Sensor Networks," Jordanian Journal of Computers and Information Technology (JJCIT), vol. 5, no. 1, pp. 17-32, April 2019.

[11] B. Lin, W. Guo, N. Xiong, G. Chen, A. V. Vasilakos and H. Zhang, "A Pretreatment Workflow Scheduling Approach for Big Data Applications in Multicloud Environments," IEEE Transactions on Network and Service Management, vol. 13, no. 3, pp. 581-594, September 2016.

[12] A. N. Toosi, R. O. Sinnott and R. Buyya, "Resource Provisioning for Data-intensive Applications with Deadline Constraints on Hybrid Clouds Using Aneka," Future Generation Computer Systems, vol. 79, no.2, pp. 765-775, February 2018.

[13] M. Sohani and S. C. Jain, "Fault Tolerance Using Self-healing SLA and Load Balanced Dynamic Resource Provisioning in Cloud Computing," Jordanian Journal of Computers and Information Technology (JJCIT), vol. 07, no. 02, pp. 206-222, June 2021.

[14] G. L. Stavrinides, F. R. Duro, H. D. Karatza, J. G. Blas and J. Carretero, "Different Aspects of Workflow Scheduling in Large-scale Distributed Systems," Simulation Modeling Practice and Theory, vol. 70, pp. 120-134, January 2017.

[15] M. Adhikari and T. Amgoth, "Multi-objective Accelerated Particle Swarm Optimization Technique for Scientific Workflows in IaaS Cloud," Proc. of the International Conference on Advances in Computing, Communications and Informatics (ICACCI), pp. 1448-1454, Bangalore, India, September 2018.

[16] H. G. E. D. H. Ali, I. A. Saroit and A. M. Kotb, "Grouped Tasks Scheduling Algorithm Based on QoS in Cloud Computing Network," Egyptian Informatics Journal, vol. 18, no. 1, pp. 11-19, March 2017.

[17] S. Suresh, H. Huang and H. J. Kim, "Scheduling in Compute Cloud with Multiple Data Banks Using Divisible Load Paradigm," IEEE Transactions on Aerospace and Electronic Systems, vol. 51, no. 2, pp. 1288-1297, 2015.

[18] M. Kowsigan and P. Balasubramanie, "Scheduling of Jobs in Cloud Environment Using Soft Computing Techniques," Int. Journal of Applied Engineering Research, vol. 10, no. 38, pp. 28640-28645, 2015.

[19] W. Yan, W. Jinkuan and H. Yinghua, "Cloud Computing Workflow Framework with Resource Scheduling Mechanism," Proc. of the IEEE Chinese Guidance, Navigation and Control Conference (CGNCC), pp. 342-345, Nanjing, China, August 2016.

[20] P. Kaur and S. Mehta, "Resource Provisioning and Workflow Scheduling in Clouds Using Augmented Shuffled Frog Leaping Algorithm," Journal of Parallel and Distributed Computing, vol. 101, pp. 41-50, 2017.

[21] J. Shamsi, M. A. Khojaye and M. A. Qasmi, "Data-intensive Cloud Computing: Requirements, Expectations, Challenges and Solutions," Journal of Grid Computing, vol. 11, pp. 281-310, April 2013. 

[22] S. G. Ahmad, C. S. Liew, M. M. Rafique and E. U. Munir, "Optimization of Data-intensive Workflows in Stream-based Data Processing Models," The Jour. of Supercomputing, vol. 73, pp. 3901-3923, 2017.

[23] M. S. Kumar, I. Gupta, S. K. Panda and P. K. Jana, "Granularity-based Workflow Scheduling Algorithm for Cloud Computing," The Journal of Supercomputing, vol. 73, pp. 5440-5464, June 2017.

[24] F. Xiong, C. Yeliang, Z. Lipeng, H. Bin, D. Song and W. Dong, "Deadline Based Scheduling for Data- Intensive Applications in Clouds," The Journal of China Universities of Posts and Telecommunications, vol. 23, no. 6, pp. 8-15, December 2016.

[25] T. Ghafarian and B. Javadi, "Cloud-aware Data Intensive Workflow Scheduling on Volunteer Computing Systems," Future Generation Computer Systems, vol. 51, no. C, pp. 87-97, October 2015.

[26] K. Kanagaraj and S. Swamynathan, "Structure Aware Resource Estimation for Effective Scheduling and Execution of Data Intensive Workflows in Cloud," Future Generation Computer Systems, vol. 79, no. P3, pp. 878-891, February 2018.

[27] S. Esteves and L. Veiga, "WaaS: Workflow-as-a-Service for the Cloud with Scheduling of Continuous and Data-intensive Workflows," The Computer Journal, vol. 59, no. 3, pp. 371-383, March 2016.

[28] I. Pietri and R. Sakellariou, "Scheduling Data-intensive Scientific Workflows with Reduced Communication," Proc. of the 30th International Conference on Scientific and Statistical Database Management (SSDBM '18), pp. 1-4, DOI: 10.1145/3221269.3221298, July 2018.

[29] P. Wang, Y. Lei, P. R. Agbedanu and Z. Zhang, "Makespan-driven Workflow Scheduling in Clouds Using Immune-based PSO Algorithm," IEEE Access, vol. 8, pp. 29281-29290, February 2020.

[30] G. Ismayilov and H.R. Topcuoglu, "Neural Network Based Multi-objective Evolutionary Algorithm for Dynamic Workflow Scheduling in Cloud Computing," Future Generation Computer Systems, vol. 102, pp. 307-322, January 2020.

[31] O. Sukhoroslov, "Toward Efficient Execution of Data-intensive Workflows," The Journal of Supercomputing, vol. 12, pp. 7989-8012, 2021.

[32] F. Li, "A Novel Scheduling Algorithm for Data-intensive Workflow in Virtualized Clouds," International Journal of Networking and Virtual Organizations, vol. 20, no. 3, pp. 284-300, June 2019. [33] H. Saadatfar and H. Deldari, "A Study on Combinational Effects of Job and Resource Characteristics on Energy Consumption," Multiagent and Grid Systems, vol. 9, no. 4, pp. 301-314, January 2014.

[34] G. Juve, A. Chervenak, E. Deelman, S. Bharathi, G. Mehta and K. Vahi, "Characterizing and Profiling Scientific Workflows," Future Generation Computer Systems, vol. 29, no. 3, pp. 682-692, March 2013.

[35] Confluence, "Workflow Generator," [Online], Available: pegasus/WorkflowGenerator.

[36] T. Goyal, A. Singh and A. Agrawal, "Cloudsim: Simulator for Cloud Computing Infrastructure and Modelling," Procedia Engineering, vol. 38, pp. 3566-3572, DOI: 10.1016/j.proeng.2012.06.412, 2012.