Abstract
The emergence of Big Data in recent years has resulted in a growing need for efficient data processing solutions. While infrastructures with sufficient compute power are available, the I/O bottleneck remains. The Linux page cache is an efficient approach to reduce I/O overheads, but few experimental studies of its interactions with Big Data applications exist, partly due to limitations of real-world experiments. Simulation is a popular approach to address these issues, however, existing simulation frameworks do not simulate page caching fully, or even at all. As a result, simulation-based performance studies of data-intensive applications can lead to misleading results and inaccurate conclusions. In this paper, we propose an I/O simulation model that captures the key features of the Linux page cache. We have implemented this model as part of the WRENCH workflow simulation framework, which itself builds on the popular SimGrid distributed systems simulation framework. Our model and its implementation enable the simulation of both singlethreaded and multithreaded applications, and of both writeback and writethrough caches for local or network-based filesystems. We evaluate the accuracy of our model in different conditions, including sequential and concurrent applications, as well as local and remote I/Os. We find that our page cache model reduces the simulation error by up to an order of magnitude when compared to state-of-the-art, cacheless simulations. Our model is publicly available in the WRENCH framework, making it usable in a wide range of simulation studies.
| Original language | English |
|---|---|
| Title of host publication | Proceedings - 2021 IEEE International Conference on Cluster Computing, Cluster 2021 |
| Publisher | Institute of Electrical and Electronics Engineers Inc. |
| Pages | 398-408 |
| Number of pages | 11 |
| ISBN (Electronic) | 9781728196664 |
| DOIs | |
| State | Published - 2021 |
| Externally published | Yes |
| Event | 2021 IEEE International Conference on Cluster Computing, Cluster 2021 - Virtual, Portland, United States Duration: Sep 7 2021 → Sep 10 2021 |
Publication series
| Name | Proceedings - IEEE International Conference on Cluster Computing, ICCC |
|---|---|
| Volume | 2021-September |
| ISSN (Print) | 1552-5244 |
Conference
| Conference | 2021 IEEE International Conference on Cluster Computing, Cluster 2021 |
|---|---|
| Country/Territory | United States |
| City | Virtual, Portland |
| Period | 09/7/21 → 09/10/21 |
Funding
VII. ACKNOWLEDGMENTS The computing platform used in the experiments was obtained with funding from the Canada Foundation for Innovation. This work was partially supported by NSF contracts #1923539 and #1923621.