Abstract
The evergrowing trend of big data has led scientists to share and transfer the simulation and analytical data across the geodistributed research and computing facilities. However, the existing data transfer frameworks used for data sharing lack the capability to adopt the attributes of the underlying parallel file systems (PFS). LADS (Layout-Aware Data Scheduling) is an end-to-end data transfer tool optimized for terabit network using a layout-aware data scheduling via PFS. However, it does not consider the NUMA (Nonuniform Memory Access) architecture. In this paper, we propose a NUMA-aware thread and resource scheduling for optimized data transfer in terabit network. First, we propose distributed RMA buffers to reduce memory controller contention in CPU sockets and then schedule the threads based on CPU socket and NUMA nodes inside CPU socket to reduce memory access latency. We design and implement the proposed resource and thread scheduling in the existing LADS framework. Experimental results showed from 21.7% to 44% improvement with memory-level optimizations in the LADS framework as compared to the baseline without any optimization.
Original language | English |
---|---|
Article number | 4120561 |
Journal | Scientific Programming |
Volume | 2018 |
DOIs | |
State | Published - 2018 |
Funding
This work was supported by Institute for Information & Communications Technology Promotion (IITP) grant funded by the Korean Government (MSIT) (no. 2015-0-00590, High Performance Big Data Analytic Platform Performance Acceleration Technologies Development). This work also used the resources of the Korea Institute of Science and Technology Information (KISTI), in Daedeok Science Town in Daejeon, South Korea. The authors thank Dr. Sungyong Park for his constructive comments that have significantly improved the paper.