TY - GEN
T1 - Improving Data Transfer Throughput with Direct Search Optimization
AU - Balaprakash, Prasanna
AU - Morozov, Vitali
AU - Kettimuthu, Rajkumar
AU - Kumaran, Kalyan
AU - Foster, Ian
N1 - Publisher Copyright:
© 2016 IEEE.
PY - 2016/9/21
Y1 - 2016/9/21
N2 - Improving data transfer throughput over high-speed long-distance networks has become increasingly difficult. Numerous factors such as nondeterministic congestion, dynamics of the transfer protocol, and multiuser and multitask source and destination endpoints, as well as interactions among these factors, contribute to this difficulty. A promising approach to improving throughput consists in using parallel streams at the application layer. We formulate and solve the problem of choosing the number of such streams from a mathematical optimization perspective. We propose the use of direct search methods, a class of easy-to-implement and light-weight mathematical optimization algorithms, to improve the performance of data transfers by dynamically adapting the number of parallel streams in a manner that does not require domain expertise, instrumentation, analytical models, or historic data. We apply our method to transfers performed with the GridFTP protocol, and illustrate the effectiveness of the proposed algorithm when used within Globus, a state-of-the-art data transfer tool, on productionWAN links and servers. We show that when compared to user default settings our direct search methods can achieve up to 10x performance improvement under certain conditions. We also show that our method can overcome performance degradation due to external compute and network load on source end points, a common scenario at high performance computing facilities.
AB - Improving data transfer throughput over high-speed long-distance networks has become increasingly difficult. Numerous factors such as nondeterministic congestion, dynamics of the transfer protocol, and multiuser and multitask source and destination endpoints, as well as interactions among these factors, contribute to this difficulty. A promising approach to improving throughput consists in using parallel streams at the application layer. We formulate and solve the problem of choosing the number of such streams from a mathematical optimization perspective. We propose the use of direct search methods, a class of easy-to-implement and light-weight mathematical optimization algorithms, to improve the performance of data transfers by dynamically adapting the number of parallel streams in a manner that does not require domain expertise, instrumentation, analytical models, or historic data. We apply our method to transfers performed with the GridFTP protocol, and illustrate the effectiveness of the proposed algorithm when used within Globus, a state-of-the-art data transfer tool, on productionWAN links and servers. We show that when compared to user default settings our direct search methods can achieve up to 10x performance improvement under certain conditions. We also show that our method can overcome performance degradation due to external compute and network load on source end points, a common scenario at high performance computing facilities.
KW - Data transfer
KW - Direct search
KW - Parallelism
KW - Tuning
UR - http://www.scopus.com/inward/record.url?scp=84990961180&partnerID=8YFLogxK
U2 - 10.1109/ICPP.2016.36
DO - 10.1109/ICPP.2016.36
M3 - Conference contribution
AN - SCOPUS:84990961180
T3 - Proceedings of the International Conference on Parallel Processing
SP - 248
EP - 257
BT - Proceedings - 45th International Conference on Parallel Processing, ICPP 2016
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 45th International Conference on Parallel Processing, ICPP 2016
Y2 - 16 August 2016 through 19 August 2016
ER -