Abstract
This paper introduces an end-to-end framework for efficient computing and merging of Monte Carlo simulations on heterogeneous distributed systems. Simulations are parallelized using a dynamic load-balancing approach and multiple parallel mergers. Checkpointing is used to improve reliability and to enable incremental results merging from partial results. A model is proposed to analyze the behavior of the proposed framework and help tune its parameters. Experimental results obtained on a production grid infrastructure show that the model fits the real makespan with a relative error of maximum 10%, that using multiple parallel mergers reduces the makespan by 40% on average, that checkpointing enables the completion of very long simulations and that it can be used without penalizing the makespan.
Original language | English |
---|---|
Pages (from-to) | 728-738 |
Number of pages | 11 |
Journal | Future Generation Computer Systems |
Volume | 29 |
Issue number | 3 |
DOIs | |
State | Published - Mar 2013 |
Externally published | Yes |
Funding
This work is co-funded by the French national research agency (ANR) , hGATE project under contract no. ANR-09-COSI-004-01. It also falls into the scope of the scientific topics of the French National Grid Institute (IdG). The authors would like to thank the site administrators of the European Grid Initiative and the GGUS support for their work.
Funders | Funder number |
---|---|
Seventh Framework Programme | 261323 |
Agence Nationale de la Recherche | ANR-09-COSI-004-01 |
Keywords
- Checkpointing
- Dynamic parallelization
- Grid computing
- Merge
- Monte Carlo
- Workflow