FTLADS: Object-Logging Based Fault-Tolerant Big Data Transfer System Using Layout Aware Data Scheduling

Preethika Kasu, Taeuk Kim, Jung Ho Um, Kyongseok Park, Scott Atchley, Youngjae Kim

Research output: Contribution to journalArticlepeer-review

10 Scopus citations

Abstract

The layout-aware data scheduling (LADS) data movement framework optimizes congestion for end-to-end data transfers. During data transfer, LADS can avoid congested storage elements by exploiting the underlying storage layout at each endpoint. This improves the I/O bandwidth and hence the data transfer rate across high-speed networks. However, the absence of fault tolerance (FT) in LADS results in data retransmission overhead and may lead to possible data integrity issues upon faults. In this paper, we propose object-logging FT mechanisms to avoid transmitting the objects that are successfully written into the parallel file system (PFS) at the sink end. Depending on the number of log files created for the whole dataset, we have classified our FT mechanisms into three different categories: file logger, transaction logger, and universal logger. Also, to address the space overhead, we have proposed different methods of populating the log files with the information of the successfully transferred objects. We have evaluated the data transfer performance and recovery time overhead of the proposed object-logging-based FT mechanisms on the LADS data transfer framework. Our experimental results show that FT mechanisms exhibit negligible overhead (< 1%) with respect to the data transfer time. However, the fault recovery time is 10% higher than the total data transfer time at any fault point.

Original languageEnglish
Article number8672553
Pages (from-to)37448-37462
Number of pages15
JournalIEEE Access
Volume7
DOIs
StatePublished - 2019

Funding

This work was supported in part by the National Research Foundation of Korea (NRF) grant funded by the Korea Government (Ministry of Science and ICT) under Grant 2018R1A1A1A05079398, in part by the Korea Institute of Science and Technology (KISTI) under Grant K-17-L03-C01-S03, and in part by the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is managed by UT Battelle, Limited Liability Company for the U.S. DOE under Contract DE-AC05-00OR22725.

Keywords

  • Big data
  • fault tolerance
  • geo-distributed data centers
  • parallel system

Fingerprint

Dive into the research topics of 'FTLADS: Object-Logging Based Fault-Tolerant Big Data Transfer System Using Layout Aware Data Scheduling'. Together they form a unique fingerprint.

Cite this