TY - GEN
T1 - NIMBLE
T2 - 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2011
AU - Ghoting, Amol
AU - Kambadur, Prabhanjan
AU - Pednault, Edwin
AU - Kannan, Ramakrishnan
PY - 2011
Y1 - 2011
N2 - In the last decade, advances in data collection and storage technologies have led to an increased interest in designing and implementing large-scale parallel algorithms for machine learning and data mining (ML-DM). Existing programming paradigms for expressing large-scale parallelism such as MapReduce (MR) and the Message Passing Interface (MPI) have been the de facto choices for implementing these ML-DM algorithms. The MR programming paradigm has been of particular interest as it gracefully handles large datasets and has built-in resilience against failures. However, the existing parallel programming paradigms are too low-level and ill-suited for implementing ML-DM algorithms. To address this deficiency, we present NIMBLE, a portable infrastructure that has been specifically designed to enable the rapid implementation of parallel ML-DM algorithms. The infrastructure allows one to compose parallel ML-DM algorithms using reusable (serial and parallel) building blocks that can be efficiently executed using MR and other parallel programming models; it currently runs on top of Hadoop, which is an open-source MR implementation. We show how NIMBLE can be used to realize scalable implementations of ML-DM algorithms and present a performance evaluation.
AB - In the last decade, advances in data collection and storage technologies have led to an increased interest in designing and implementing large-scale parallel algorithms for machine learning and data mining (ML-DM). Existing programming paradigms for expressing large-scale parallelism such as MapReduce (MR) and the Message Passing Interface (MPI) have been the de facto choices for implementing these ML-DM algorithms. The MR programming paradigm has been of particular interest as it gracefully handles large datasets and has built-in resilience against failures. However, the existing parallel programming paradigms are too low-level and ill-suited for implementing ML-DM algorithms. To address this deficiency, we present NIMBLE, a portable infrastructure that has been specifically designed to enable the rapid implementation of parallel ML-DM algorithms. The infrastructure allows one to compose parallel ML-DM algorithms using reusable (serial and parallel) building blocks that can be efficiently executed using MR and other parallel programming models; it currently runs on top of Hadoop, which is an open-source MR implementation. We show how NIMBLE can be used to realize scalable implementations of ML-DM algorithms and present a performance evaluation.
KW - Data mining
KW - Machine learning
KW - MapReduce
KW - Tasks
UR - http://www.scopus.com/inward/record.url?scp=80052651384&partnerID=8YFLogxK
U2 - 10.1145/2020408.2020464
DO - 10.1145/2020408.2020464
M3 - Conference contribution
AN - SCOPUS:80052651384
SN - 9781450308137
T3 - Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
SP - 334
EP - 342
BT - Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD'11
PB - Association for Computing Machinery
Y2 - 21 August 2011 through 24 August 2011
ER -