TY - GEN
T1 - The practical obstacles of data transfer
T2 - 3rd International Workshop on Network-Aware Data Management, NDM 2013 - Held in Conjunction with the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2013
AU - Nam, Hai Ah
AU - Hill, Jason
AU - Parete-Koon, Suzanne
PY - 2013
Y1 - 2013
N2 - The importance of computing facilities is heralded every six months with the announcement of the new Top500 list, showcasing the world's fastest supercomputers. Unfortunately, with great computing capability does not come great long-term data storage capacity, which often means users must move their data to their local site archive, to remote sites where they may be doing future computation or analysis, or back to their home institution, else face the dreaded data purge that most HPC centers employ to keep utilization of large parallel filesystems low to manage performance and capacity. At HPC centers, data transfer is crucial to the scientific workflow and will increase in importance as computing systems grow in size. The Energy Sciences Network (ESnet) recently launched its fifth generation network, a 100 Gbps high-performance, unclassified national network connecting more than 40 DOE research sites to support scientific research and collaboration. Despite the tenfold increase in bandwidth to DOE research sites amenable to multiple data transfer streams and high throughput, in practice, researchers often under-utilize the network and resort to painfully-slow single stream transfer methods such as scp to avoid the complexity of using multiple stream tools such as GridFTP and bbcp, and contend with frustration from the lack of consistency of available tools between sites. In this study we survey and assess the data transfer methods provided at several DOE supported computing facilities, including both leadership-computing facilities, connected through ESnet. We present observed transfer rates, suggested optimizations, and discuss the obstacles the tools must overcome to receive wide-spread adoption over scp.
AB - The importance of computing facilities is heralded every six months with the announcement of the new Top500 list, showcasing the world's fastest supercomputers. Unfortunately, with great computing capability does not come great long-term data storage capacity, which often means users must move their data to their local site archive, to remote sites where they may be doing future computation or analysis, or back to their home institution, else face the dreaded data purge that most HPC centers employ to keep utilization of large parallel filesystems low to manage performance and capacity. At HPC centers, data transfer is crucial to the scientific workflow and will increase in importance as computing systems grow in size. The Energy Sciences Network (ESnet) recently launched its fifth generation network, a 100 Gbps high-performance, unclassified national network connecting more than 40 DOE research sites to support scientific research and collaboration. Despite the tenfold increase in bandwidth to DOE research sites amenable to multiple data transfer streams and high throughput, in practice, researchers often under-utilize the network and resort to painfully-slow single stream transfer methods such as scp to avoid the complexity of using multiple stream tools such as GridFTP and bbcp, and contend with frustration from the lack of consistency of available tools between sites. In this study we survey and assess the data transfer methods provided at several DOE supported computing facilities, including both leadership-computing facilities, connected through ESnet. We present observed transfer rates, suggested optimizations, and discuss the obstacles the tools must overcome to receive wide-spread adoption over scp.
KW - Data transfer
KW - Gridftp
KW - High performance computing
KW - WAN performance
KW - WAN usability
UR - http://www.scopus.com/inward/record.url?scp=84892949592&partnerID=8YFLogxK
U2 - 10.1145/2534695.2534703
DO - 10.1145/2534695.2534703
M3 - Conference contribution
AN - SCOPUS:84892949592
SN - 9781450325226
T3 - Proc. of NDM 2013: 3rd Int. Workshop on Network-Aware Data Management - Held in Conjunction with SC 2013: The Int. Conference for High Performance Computing, Networking, Storage and Analysis
BT - Proc. of NDM 2013
Y2 - 17 November 2013 through 17 November 2013
ER -