Monte Carlo simulation on heterogeneous distributed systems: A computing framework with parallel merging and checkpointing strategies

Sorina Camarasu-Pop, Tristan Glatard, Rafael Ferreira Da Silva, Pierre Gueth, David Sarrut, Hugues Benoit-Cattin

Research output: Contribution to journalArticlepeer-review

28 Scopus citations

Abstract

This paper introduces an end-to-end framework for efficient computing and merging of Monte Carlo simulations on heterogeneous distributed systems. Simulations are parallelized using a dynamic load-balancing approach and multiple parallel mergers. Checkpointing is used to improve reliability and to enable incremental results merging from partial results. A model is proposed to analyze the behavior of the proposed framework and help tune its parameters. Experimental results obtained on a production grid infrastructure show that the model fits the real makespan with a relative error of maximum 10%, that using multiple parallel mergers reduces the makespan by 40% on average, that checkpointing enables the completion of very long simulations and that it can be used without penalizing the makespan.

Original languageEnglish
Pages (from-to)728-738
Number of pages11
JournalFuture Generation Computer Systems
Volume29
Issue number3
DOIs
StatePublished - Mar 2013
Externally publishedYes

Funding

This work is co-funded by the French national research agency (ANR) , hGATE project under contract no. ANR-09-COSI-004-01. It also falls into the scope of the scientific topics of the French National Grid Institute (IdG). The authors would like to thank the site administrators of the European Grid Initiative and the GGUS support for their work.

FundersFunder number
Seventh Framework Programme261323
Agence Nationale de la RechercheANR-09-COSI-004-01

    Keywords

    • Checkpointing
    • Dynamic parallelization
    • Grid computing
    • Merge
    • Monte Carlo
    • Workflow

    Fingerprint

    Dive into the research topics of 'Monte Carlo simulation on heterogeneous distributed systems: A computing framework with parallel merging and checkpointing strategies'. Together they form a unique fingerprint.

    Cite this