TY - JOUR
T1 - Transparent runtime parallelization of the R scripting language
AU - Li, Jiangtian
AU - Ma, Xiaosong
AU - Yoginath, Srikanth
AU - Kora, Guruprasad
AU - Samatova, Nagiza F.
PY - 2011/2
Y1 - 2011/2
N2 - Scripting languages such as R and Matlab are widely used in scientific data processing. As the data volume and the complexity of analysis tasks both grow, sequential data processing using these tools often becomes the bottleneck in scientific workflows. We describe pR, a runtime framework for automatic and transparent parallelization of the popular R language used in statistical computing. Recognizing scripting languages' interpreted nature and data analysis codes' use pattern, we propose several novel techniques: (1) applying parallelizing compiler technology to runtime, whole-program dependence analysis of scripting languages, (2) incremental code analysis assisted with evaluation results, and (3) runtime parallelization of file accesses. Our framework does not require any modification to either the source code or the underlying R implementation. Experimental results demonstrate that pR can exploit both task and data parallelism transparently and overall has better performance as well as scalability compared to an existing parallel R package that requires code modification.
AB - Scripting languages such as R and Matlab are widely used in scientific data processing. As the data volume and the complexity of analysis tasks both grow, sequential data processing using these tools often becomes the bottleneck in scientific workflows. We describe pR, a runtime framework for automatic and transparent parallelization of the popular R language used in statistical computing. Recognizing scripting languages' interpreted nature and data analysis codes' use pattern, we propose several novel techniques: (1) applying parallelizing compiler technology to runtime, whole-program dependence analysis of scripting languages, (2) incremental code analysis assisted with evaluation results, and (3) runtime parallelization of file accesses. Our framework does not require any modification to either the source code or the underlying R implementation. Experimental results demonstrate that pR can exploit both task and data parallelism transparently and overall has better performance as well as scalability compared to an existing parallel R package that requires code modification.
KW - Incremental analysis
KW - Runtime parallelization
KW - Scripting languages
UR - http://www.scopus.com/inward/record.url?scp=78650418266&partnerID=8YFLogxK
U2 - 10.1016/j.jpdc.2010.08.013
DO - 10.1016/j.jpdc.2010.08.013
M3 - Article
AN - SCOPUS:78650418266
SN - 0743-7315
VL - 71
SP - 157
EP - 168
JO - Journal of Parallel and Distributed Computing
JF - Journal of Parallel and Distributed Computing
IS - 2
ER -