Poster: Scalable infrastructure to support supercomputer resiliency-aware applications and load balancing

Yoav Tock, Benjamin Mandler, Josè Moreira, Terry Jones

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations

Abstract

High performance computing systems display increasing complexity and component counts. This trend exposes weak-nesses in the underlying clustering infrastructure needed for continuous availability, maximizing utilization, and efficient administration of such systems. To mitigate the problem, we present a highly scalable clustering infrastructure, based on peer-to-peer technologies, for supporting resiliency-aware applications as well as efficient monitoring and load balancing. Supported services include Membership, Publishsubscribe messaging, Convergecast, Attribute replication and a DHT. We present a preliminary evaluation taken from an IBM BlueGene/P, demonstrating scalability up to ∼ 256K nodes.

Original languageEnglish
Title of host publicationSC'11 - Proceedings of the 2011 High Performance Computing Networking, Storage and Analysis Companion, Co-located with SC'11
Pages9-10
Number of pages2
DOIs
StatePublished - 2011
Event2011 High Performance Computing Networking, Storage and Analysis, SC'11, Co-located with SC'11 - Seattle, WA, United States
Duration: Nov 12 2011Nov 18 2011

Publication series

NameSC'11 - Proceedings of the 2011 High Performance Computing Networking, Storage and Analysis Companion, Co-located with SC'11

Conference

Conference2011 High Performance Computing Networking, Storage and Analysis, SC'11, Co-located with SC'11
Country/TerritoryUnited States
CitySeattle, WA
Period11/12/1111/18/11

Keywords

  • Clustering
  • Membership
  • Middleware
  • Peer-to-peer
  • Pub/sub systems
  • Scalability

Fingerprint

Dive into the research topics of 'Poster: Scalable infrastructure to support supercomputer resiliency-aware applications and load balancing'. Together they form a unique fingerprint.

Cite this