Modeling the Linux page cache for accurate simulation of data-intensive applications

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

3 Scopus citations

Abstract

The emergence of Big Data in recent years has resulted in a growing need for efficient data processing solutions. While infrastructures with sufficient compute power are available, the I/O bottleneck remains. The Linux page cache is an efficient approach to reduce I/O overheads, but few experimental studies of its interactions with Big Data applications exist, partly due to limitations of real-world experiments. Simulation is a popular approach to address these issues, however, existing simulation frameworks do not simulate page caching fully, or even at all. As a result, simulation-based performance studies of data-intensive applications can lead to misleading results and inaccurate conclusions. In this paper, we propose an I/O simulation model that captures the key features of the Linux page cache. We have implemented this model as part of the WRENCH workflow simulation framework, which itself builds on the popular SimGrid distributed systems simulation framework. Our model and its implementation enable the simulation of both singlethreaded and multithreaded applications, and of both writeback and writethrough caches for local or network-based filesystems. We evaluate the accuracy of our model in different conditions, including sequential and concurrent applications, as well as local and remote I/Os. We find that our page cache model reduces the simulation error by up to an order of magnitude when compared to state-of-the-art, cacheless simulations. Our model is publicly available in the WRENCH framework, making it usable in a wide range of simulation studies.

Original languageEnglish
Title of host publicationProceedings - 2021 IEEE International Conference on Cluster Computing, Cluster 2021
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages398-408
Number of pages11
ISBN (Electronic)9781728196664
DOIs
StatePublished - 2021
Externally publishedYes
Event2021 IEEE International Conference on Cluster Computing, Cluster 2021 - Virtual, Portland, United States
Duration: Sep 7 2021Sep 10 2021

Publication series

NameProceedings - IEEE International Conference on Cluster Computing, ICCC
Volume2021-September
ISSN (Print)1552-5244

Conference

Conference2021 IEEE International Conference on Cluster Computing, Cluster 2021
Country/TerritoryUnited States
CityVirtual, Portland
Period09/7/2109/10/21

Funding

VII. ACKNOWLEDGMENTS The computing platform used in the experiments was obtained with funding from the Canada Foundation for Innovation. This work was partially supported by NSF contracts #1923539 and #1923621.

Fingerprint

Dive into the research topics of 'Modeling the Linux page cache for accurate simulation of data-intensive applications'. Together they form a unique fingerprint.

Cite this