Abstract
Scientific workflows have enabled large-scale scientific computations and data analysis, and lowered the entry barrier for performing computations in distributed heterogeneous platforms (e.g., HTC and HPC). In spite of impressive achievements to date, large-scale modeling, simulation, and data analytics in the long-tail still face several challenges such as efficient scheduling and execution of large-scale workflows (\mathrm{O}(10^{6})) with very short-running tasks (few seconds). While the current trend to support next-generation workflows on leadership class machines have gained much attention in the past years, at the other end of the spectrum scientific workflows from the long-tail science have become larger and require processing massive volumes of data. In this paper, we report on our experience in designing and implementing an HTC workflow for agroecosystem modeling. We leverage well-known (task clustering and co-scheduling) and emerging (hierarchical workflows and containers) workflow optimization techniques to make the workflow planning problem tractable, and maximize resource utilization and the degree of task parallelism. Experimental results, via the implementation of a use case, show that by strategically combining the above strategies and defining an appropriate set of optimization parameters, the overall workflow makespan can be improved by 3.5 orders of magnitude when compared to a regular (non-optimized) execution of the workflow.
Original language | English |
---|---|
Title of host publication | Proceedings - 2019 IEEE International Conference on Big Data, Big Data 2019 |
Editors | Chaitanya Baru, Jun Huan, Latifur Khan, Xiaohua Tony Hu, Ronay Ak, Yuanyuan Tian, Roger Barga, Carlo Zaniolo, Kisung Lee, Yanfang Fanny Ye |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 4545-4552 |
Number of pages | 8 |
ISBN (Electronic) | 9781728108582 |
DOIs | |
State | Published - Dec 2019 |
Event | 2019 IEEE International Conference on Big Data, Big Data 2019 - Los Angeles, United States Duration: Dec 9 2019 → Dec 12 2019 |
Publication series
Name | Proceedings - 2019 IEEE International Conference on Big Data, Big Data 2019 |
---|
Conference
Conference | 2019 IEEE International Conference on Big Data, Big Data 2019 |
---|---|
Country/Territory | United States |
City | Los Angeles |
Period | 12/9/19 → 12/12/19 |
Funding
This work was funded by the Defense Advanced Research Projects Agency under award #W911NF-18-1-0027; and partly funded by the National Science Foundation under
Keywords
- Agroecosystem Modeling
- High-Throughput Computing
- Scientific Workflows