TY - JOUR
T1 - Best Practices and Lessons Learned from Deploying and Operating Large-Scale Data-Centric Parallel File Systems
AU - Oral, Sarp
AU - Simmons, James
AU - Hill, Jason
AU - Leverman, Dustin
AU - Wang, Feiyi
AU - Ezell, Matt
AU - Miller, Ross
AU - Fuller, Douglas
AU - Gunasekaran, Raghul
AU - Kim, Youngjae
AU - Gupta, Saurabh
AU - Tiwari, Devesh
AU - Vazhkudai, Sudharshan S.
AU - Rogers, James H.
AU - Dillow, David
AU - Shipman, Galen M.
AU - Bland, Arthur S.
N1 - Publisher Copyright:
© 2014 IEEE.
PY - 2014/1/16
Y1 - 2014/1/16
N2 - The Oak Ridge Leadership Computing Facility (OLCF) has deployed multiple large-scale parallel file systems (PFS) to support its operations. During this process, OLCF acquired significant expertise in large-scale storage system design, file system software development, technology evaluation, benchmarking, procurement, deployment, and operational practices. Based on the lessons learned from each new PFS deployment, OLCF improved its operating procedures, and strategies. This paper provides an account of our experience and lessons learned in acquiring, deploying, and operating large-scale parallel file systems. We believe that these lessons will be useful to the wider HPC community.
AB - The Oak Ridge Leadership Computing Facility (OLCF) has deployed multiple large-scale parallel file systems (PFS) to support its operations. During this process, OLCF acquired significant expertise in large-scale storage system design, file system software development, technology evaluation, benchmarking, procurement, deployment, and operational practices. Based on the lessons learned from each new PFS deployment, OLCF improved its operating procedures, and strategies. This paper provides an account of our experience and lessons learned in acquiring, deploying, and operating large-scale parallel file systems. We believe that these lessons will be useful to the wider HPC community.
UR - http://www.scopus.com/inward/record.url?scp=84936946893&partnerID=8YFLogxK
U2 - 10.1109/SC.2014.23
DO - 10.1109/SC.2014.23
M3 - Conference article
AN - SCOPUS:84936946893
SN - 2167-4329
VL - 2015-January
SP - 217
EP - 228
JO - International Conference for High Performance Computing, Networking, Storage and Analysis, SC
JF - International Conference for High Performance Computing, Networking, Storage and Analysis, SC
IS - January
M1 - 7013005
T2 - International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2014
Y2 - 16 November 2014 through 21 November 2014
ER -