TY - GEN
T1 - Achieving performance isolation with lightweight co-kernels
AU - Ouyang, Jiannan
AU - Kocoloski, Brian
AU - Lange, John
AU - Pedretti, Kevin
N1 - Publisher Copyright:
© 2015 ACM.
PY - 2015/6/15
Y1 - 2015/6/15
N2 - Performance isolation is emerging as a requirement for High Performance Computing (HPC) applications, particularly as HPC architectures turn to in situ data processing and application composition techniques to increase system throughput. These approaches require the co-location of disparate workloads on the same compute node, each with different resource and runtime requirements. In this paper we claim that these workloads cannot be effectively managed by a single Operating System/Runtime (OS/R). Therefore, we present Pisces, a system software architecture that enables the co-existence of multiple independent and fully isolated OS/Rs, or enclaves, that can be customized to address the disparate requirements of next generation HPC workloads. Each enclave consists of a specialized lightweight OS cokernel and runtime, which is capable of independently managing partitions of dynamically assigned hardware resources. Contrary to other co-kernel approaches, in this work we consider performance isolation to be a primary requirement and present a novel co-kernel architecture to achieve this goal. We further present a set of design requirements necessary to ensure performance isolation, including: (1) elimination of cross OS dependencies, (2) internalized management of I/O, (3) limiting cross enclave communication to explicit shared memory channels, and (4) using virtualization techniques to provide missing OS features. The implementation of the Pisces co-kernel architecture is based on the Kitten Lightweight Kernel and Palacios Virtual Machine Monitor, two system software architectures designed specifically for HPC systems. Finally we will show that lightweight isolated co-kernels can provide better performance for HPC applications, and that isolated virtual machines are even capable of outperforming native environments in the presence of competing workloads.
AB - Performance isolation is emerging as a requirement for High Performance Computing (HPC) applications, particularly as HPC architectures turn to in situ data processing and application composition techniques to increase system throughput. These approaches require the co-location of disparate workloads on the same compute node, each with different resource and runtime requirements. In this paper we claim that these workloads cannot be effectively managed by a single Operating System/Runtime (OS/R). Therefore, we present Pisces, a system software architecture that enables the co-existence of multiple independent and fully isolated OS/Rs, or enclaves, that can be customized to address the disparate requirements of next generation HPC workloads. Each enclave consists of a specialized lightweight OS cokernel and runtime, which is capable of independently managing partitions of dynamically assigned hardware resources. Contrary to other co-kernel approaches, in this work we consider performance isolation to be a primary requirement and present a novel co-kernel architecture to achieve this goal. We further present a set of design requirements necessary to ensure performance isolation, including: (1) elimination of cross OS dependencies, (2) internalized management of I/O, (3) limiting cross enclave communication to explicit shared memory channels, and (4) using virtualization techniques to provide missing OS features. The implementation of the Pisces co-kernel architecture is based on the Kitten Lightweight Kernel and Palacios Virtual Machine Monitor, two system software architectures designed specifically for HPC systems. Finally we will show that lightweight isolated co-kernels can provide better performance for HPC applications, and that isolated virtual machines are even capable of outperforming native environments in the presence of competing workloads.
KW - Exascale
KW - Operating Systems
KW - Virtualization
UR - https://www.scopus.com/pages/publications/84987707893
U2 - 10.1145/2749246.2749273
DO - 10.1145/2749246.2749273
M3 - Conference contribution
AN - SCOPUS:84987707893
T3 - HPDC 2015 - Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing
SP - 149
EP - 160
BT - HPDC 2015 - Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing
PB - Association for Computing Machinery, Inc
T2 - 24th ACM Symposium on High-Performance Parallel and Distributed Computing, HPDC 2015
Y2 - 15 June 2015 through 19 June 2015
ER -