The practical obstacles of data transfer: Why researchers still love scp

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

3 Scopus citations

Abstract

The importance of computing facilities is heralded every six months with the announcement of the new Top500 list, showcasing the world's fastest supercomputers. Unfortunately, with great computing capability does not come great long-term data storage capacity, which often means users must move their data to their local site archive, to remote sites where they may be doing future computation or analysis, or back to their home institution, else face the dreaded data purge that most HPC centers employ to keep utilization of large parallel filesystems low to manage performance and capacity. At HPC centers, data transfer is crucial to the scientific workflow and will increase in importance as computing systems grow in size. The Energy Sciences Network (ESnet) recently launched its fifth generation network, a 100 Gbps high-performance, unclassified national network connecting more than 40 DOE research sites to support scientific research and collaboration. Despite the tenfold increase in bandwidth to DOE research sites amenable to multiple data transfer streams and high throughput, in practice, researchers often under-utilize the network and resort to painfully-slow single stream transfer methods such as scp to avoid the complexity of using multiple stream tools such as GridFTP and bbcp, and contend with frustration from the lack of consistency of available tools between sites. In this study we survey and assess the data transfer methods provided at several DOE supported computing facilities, including both leadership-computing facilities, connected through ESnet. We present observed transfer rates, suggested optimizations, and discuss the obstacles the tools must overcome to receive wide-spread adoption over scp.

Original languageEnglish
Title of host publicationProc. of NDM 2013
Subtitle of host publication3rd Int. Workshop on Network-Aware Data Management - Held in Conjunction with SC 2013: The Int. Conference for High Performance Computing, Networking, Storage and Analysis
DOIs
StatePublished - 2013
Event3rd International Workshop on Network-Aware Data Management, NDM 2013 - Held in Conjunction with the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2013 - Denver, CO, United States
Duration: Nov 17 2013Nov 17 2013

Publication series

NameProc. of NDM 2013: 3rd Int. Workshop on Network-Aware Data Management - Held in Conjunction with SC 2013: The Int. Conference for High Performance Computing, Networking, Storage and Analysis

Conference

Conference3rd International Workshop on Network-Aware Data Management, NDM 2013 - Held in Conjunction with the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2013
Country/TerritoryUnited States
CityDenver, CO
Period11/17/1311/17/13

Keywords

  • Data transfer
  • Gridftp
  • High performance computing
  • WAN performance
  • WAN usability

Fingerprint

Dive into the research topics of 'The practical obstacles of data transfer: Why researchers still love scp'. Together they form a unique fingerprint.

Cite this