TY - GEN
T1 - Scalable multi-purpose network representation for large scale distributed system simulation
AU - Bobelin, Laurent
AU - Legrand, Arnaud
AU - Márquez, David A.González
AU - Navarro, Pierre
AU - Quinson, Martin
AU - Suter, Frédéric
AU - Thiéry, Christophe
PY - 2012
Y1 - 2012
N2 - Conducting experiments in large-scale distributed systems is usually time-consuming and labor-intensive. Uncontrolled external load variation prevents to reproduce experiments and such systems are often not available to the purpose of research experiments, e.g. production or yet to deploy systems. Hence, many researchers in the area of distributed computing rely on simulation to perform their studies. However, the simulation of large-scale computing systems raises several scalability issues, in terms of speed and memory. Indeed, such systems now comprise millions of hosts interconnected through a complex network and run billions of processes. Most simulators thus trade accuracy for speed and rely on very simple and easy to implement models. However, the assumptions underlying these models are often questionable, especially when it comes to network modeling. In this paper, we show that, despite a widespread belief in the community, achieving high scalability does not necessarily require to resort to overly simple models and ignore important phenomena. We show that relying on a modular and hierarchical platform representation, while taking advantage of regularity when possible, allows us to model systems such as data and computing centers, peer-to-peer networks, grids, or clouds in a scalable way. This approach has been integrated into the open-source SimGrid simulation toolkit. We show that our solution allows us to model such systems much more accurately than other state-of-the-art simulators without trading for simulation speed. SimGrid is even sometimes orders of magnitude faster.
AB - Conducting experiments in large-scale distributed systems is usually time-consuming and labor-intensive. Uncontrolled external load variation prevents to reproduce experiments and such systems are often not available to the purpose of research experiments, e.g. production or yet to deploy systems. Hence, many researchers in the area of distributed computing rely on simulation to perform their studies. However, the simulation of large-scale computing systems raises several scalability issues, in terms of speed and memory. Indeed, such systems now comprise millions of hosts interconnected through a complex network and run billions of processes. Most simulators thus trade accuracy for speed and rely on very simple and easy to implement models. However, the assumptions underlying these models are often questionable, especially when it comes to network modeling. In this paper, we show that, despite a widespread belief in the community, achieving high scalability does not necessarily require to resort to overly simple models and ignore important phenomena. We show that relying on a modular and hierarchical platform representation, while taking advantage of regularity when possible, allows us to model systems such as data and computing centers, peer-to-peer networks, grids, or clouds in a scalable way. This approach has been integrated into the open-source SimGrid simulation toolkit. We show that our solution allows us to model such systems much more accurately than other state-of-the-art simulators without trading for simulation speed. SimGrid is even sometimes orders of magnitude faster.
KW - Grid computing
KW - High-performance computing
KW - Large-scale distributed systems
KW - Peer-to-Peer
KW - Simulation
KW - Volunteer Computing
UR - https://www.scopus.com/pages/publications/84863661668
U2 - 10.1109/CCGrid.2012.31
DO - 10.1109/CCGrid.2012.31
M3 - Conference contribution
AN - SCOPUS:84863661668
SN - 9780769546919
T3 - Proceedings - 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGrid 2012
SP - 220
EP - 227
BT - Proceedings - 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGrid 2012
T2 - 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGrid 2012
Y2 - 13 May 2012 through 16 May 2012
ER -