TY - GEN
T1 - Parallel, scalable, memory-efficient backtracking for combinatorial modeling of large-scale biological systems
AU - Park, Byung Hoon
AU - Schmidt, Matthew
AU - Thomas, Kevin
AU - Karpinets, Tatiana
AU - Samatova, Nagiza F.
PY - 2008
Y1 - 2008
N2 - Data-driven modeling of biological systems such as protein-protein interaction networks is data-intensive and combinatorially challenging. Backtracking can constrain a combinatorial search space. Yet, its recursive nature, exacerbated by data-intensity, limits its applicability for large-scale systems. Parallel, scalable, and memory-efficient backtracking is a promising approach. Parallel backtracking suffers from unbalanced loads. Load rebalancing via synchronization and data movement is prohibitively expensive. Balancing these discrepancies, while minimizing end-to-end execution time and memory requirements, is desirable. This paper introduces such a framework. Its scalability and efficiency, demonstrated on the maximal clique enumeration problem, are attributed to the proposed: (a) representation of search tree decomposition to enable parallelization; (b) depth-first parallel search to minimize memory requirement; (c) least stringent synchronization to minimize data movement; and (d) on-demand work stealing with stack splitting to minimize processors' idle time. The applications of this framework to real biological problems related to bioethanol production are discussed.
AB - Data-driven modeling of biological systems such as protein-protein interaction networks is data-intensive and combinatorially challenging. Backtracking can constrain a combinatorial search space. Yet, its recursive nature, exacerbated by data-intensity, limits its applicability for large-scale systems. Parallel, scalable, and memory-efficient backtracking is a promising approach. Parallel backtracking suffers from unbalanced loads. Load rebalancing via synchronization and data movement is prohibitively expensive. Balancing these discrepancies, while minimizing end-to-end execution time and memory requirements, is desirable. This paper introduces such a framework. Its scalability and efficiency, demonstrated on the maximal clique enumeration problem, are attributed to the proposed: (a) representation of search tree decomposition to enable parallelization; (b) depth-first parallel search to minimize memory requirement; (c) least stringent synchronization to minimize data movement; and (d) on-demand work stealing with stack splitting to minimize processors' idle time. The applications of this framework to real biological problems related to bioethanol production are discussed.
UR - http://www.scopus.com/inward/record.url?scp=51049100135&partnerID=8YFLogxK
U2 - 10.1109/IPDPS.2008.4536180
DO - 10.1109/IPDPS.2008.4536180
M3 - Conference contribution
AN - SCOPUS:51049100135
SN - 9781424416943
T3 - IPDPS Miami 2008 - Proceedings of the 22nd IEEE International Parallel and Distributed Processing Symposium, Program and CD-ROM
BT - IPDPS Miami 2008 - Proceedings of the 22nd IEEE International Parallel and Distributed Processing Symposium, Program and CD-ROM
T2 - IPDPS 2008 - 22nd IEEE International Parallel and Distributed Processing Symposium
Y2 - 14 April 2008 through 18 April 2008
ER -