TY - GEN
T1 - A Scalable Graph Analytics Framework for Programming with Big Data in R (pbdR)
AU - Shamimul Hasan, S. M.
AU - Schmidt, Drew
AU - Kannan, Ramakrishnan
AU - Imam, Neena
N1 - Publisher Copyright:
© 2019 IEEE.
PY - 2019/12
Y1 - 2019/12
N2 - Many disciplines such as biology, economics, engineering, physics, and the social sciences represent their data as graphs to capture patterns, trends, and associations. There are are many commercially available graph libraries in different programming languages to analyze these complex graphs. But there is no distributed graph library package in R - the popular statistical programming language to analyze graphs that bigger than a single machine's memory. Many domain experts prefer R over the numerous other alternatives. Towards this, we present a distributed graph analytics framework for R called programming with big graph using R (pBGR.) Our proposed framework leverages the Programming with Big Data in R (pbdR) ecosystem that provides scalable R packages for distributed computing in data science. We present an early prototype implementation of this framework using the distributed-memory parallel graph library CombBLAS and evaluate the framework's performance on leadership class computing platforms. Our experimental results demonstrate that the proposed framework is capable of performing large-scale parallel graph mining through the easyto-use R language. This enhanced graph processing capability coupled with other statistical tools already available in R, should be valuable to many domain experts.
AB - Many disciplines such as biology, economics, engineering, physics, and the social sciences represent their data as graphs to capture patterns, trends, and associations. There are are many commercially available graph libraries in different programming languages to analyze these complex graphs. But there is no distributed graph library package in R - the popular statistical programming language to analyze graphs that bigger than a single machine's memory. Many domain experts prefer R over the numerous other alternatives. Towards this, we present a distributed graph analytics framework for R called programming with big graph using R (pBGR.) Our proposed framework leverages the Programming with Big Data in R (pbdR) ecosystem that provides scalable R packages for distributed computing in data science. We present an early prototype implementation of this framework using the distributed-memory parallel graph library CombBLAS and evaluate the framework's performance on leadership class computing platforms. Our experimental results demonstrate that the proposed framework is capable of performing large-scale parallel graph mining through the easyto-use R language. This enhanced graph processing capability coupled with other statistical tools already available in R, should be valuable to many domain experts.
KW - CombBLAS
KW - R
KW - Titan
KW - pBGR
KW - pbdR
UR - http://www.scopus.com/inward/record.url?scp=85081335237&partnerID=8YFLogxK
U2 - 10.1109/BigData47090.2019.9006155
DO - 10.1109/BigData47090.2019.9006155
M3 - Conference contribution
AN - SCOPUS:85081335237
T3 - Proceedings - 2019 IEEE International Conference on Big Data, Big Data 2019
SP - 4783
EP - 4792
BT - Proceedings - 2019 IEEE International Conference on Big Data, Big Data 2019
A2 - Baru, Chaitanya
A2 - Huan, Jun
A2 - Khan, Latifur
A2 - Hu, Xiaohua Tony
A2 - Ak, Ronay
A2 - Tian, Yuanyuan
A2 - Barga, Roger
A2 - Zaniolo, Carlo
A2 - Lee, Kisung
A2 - Ye, Yanfang Fanny
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2019 IEEE International Conference on Big Data, Big Data 2019
Y2 - 9 December 2019 through 12 December 2019
ER -