A quantitative model of application slow-down in multi-resource shared systems

Seung Hwan Lim, Youngjae Kim

Research output: Contribution to journalArticlepeer-review

Abstract

Scheduling multiple jobs onto a platform enhances system utilization by sharing resources. The benefits from higher resource utilization include reduced cost to construct, operate, and maintain a system, which often include energy consumption. Maximizing these benefits comes at a price-resource contention among jobs increases job completion time. In this paper, we analyze slow-downs of jobs due to contention for multiple resources in a system; referred to as dilation factor. We observe that multiple-resource contention creates non-linear dilation factors of jobs. From this observation, we establish a general quantitative model for dilation factors of jobs in multi-resource systems. A job is characterized by a vector-valued loading statistics and dilation factors of a job set are given by a quadratic function of their loading vectors. We demonstrate how to systematically characterize a job, maintain the data structure to calculate the dilation factor (loading matrix), and calculate the dilation factor of each job. We validate the accuracy of the model with multiple processes running on a native Linux server, virtualized servers, and with multiple MapReduce workloads co-scheduled in a cluster. Evaluation with measured data shows that the D-factor model has an error margin of less than 16%. We extended the D-factor model to capture the slow-down of applications when multiple identical resources exist such as multi-core environments and multi-disks environments. Validation results of the extended D-factor model with HPC checkpoint applications on the parallel file systems show that D-factor accurately captures the slow down of concurrent applications in such environments.

Original languageEnglish
Pages (from-to)32-47
Number of pages16
JournalPerformance Evaluation
Volume108
DOIs
StatePublished - Feb 1 2017

Funding

We would like to thank the anonymous reviewers for their detailed comments, which helped us improve the quality of this paper. This work was supported in part by Institute for Information & communications Technology Promotion(IITP) grant funded by the Korea Government (MSIP) (No. R0190-15-2012) and by the National Research Foundation of Korea (NRF) grant funded by the Korea Government (MISP) (No. 2015R1C1A1A0152105). The work was also supported by, and used the resources of, the Oak Ridge Leadership Computing Facility, located in the National Center for Computational Sciences at ORNL, which is managed by UT Battelle, LLC for the US DOE (under the contract No. DE-AC05-00OR22725).

FundersFunder number
U.S. Department of EnergyDE-AC05-00OR22725
Ministry of Science, ICT and Future PlanningR0190-15-2012
National Research Foundation of Korea2015R1C1A1A0152105
Institute for Information and Communications Technology Promotion

    Keywords

    • Measurement
    • Modeling technique
    • Performance of systems

    Fingerprint

    Dive into the research topics of 'A quantitative model of application slow-down in multi-resource shared systems'. Together they form a unique fingerprint.

    Cite this