Poster: Scalable infrastructure to support supercomputer resiliency-aware applications and load balancing

  • Yoav Tock
  • , Benjamin Mandler
  • , Josè Moreira
  • , Terry Jones

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    2 Scopus citations

    Abstract

    High performance computing systems display increasing complexity and component counts. This trend exposes weak-nesses in the underlying clustering infrastructure needed for continuous availability, maximizing utilization, and efficient administration of such systems. To mitigate the problem, we present a highly scalable clustering infrastructure, based on peer-to-peer technologies, for supporting resiliency-aware applications as well as efficient monitoring and load balancing. Supported services include Membership, Publishsubscribe messaging, Convergecast, Attribute replication and a DHT. We present a preliminary evaluation taken from an IBM BlueGene/P, demonstrating scalability up to ∼ 256K nodes.

    Original languageEnglish
    Title of host publicationSC'11 - Proceedings of the 2011 High Performance Computing Networking, Storage and Analysis Companion, Co-located with SC'11
    Pages9-10
    Number of pages2
    DOIs
    StatePublished - 2011
    Event2011 High Performance Computing Networking, Storage and Analysis, SC'11, Co-located with SC'11 - Seattle, WA, United States
    Duration: Nov 12 2011Nov 18 2011

    Publication series

    NameSC'11 - Proceedings of the 2011 High Performance Computing Networking, Storage and Analysis Companion, Co-located with SC'11

    Conference

    Conference2011 High Performance Computing Networking, Storage and Analysis, SC'11, Co-located with SC'11
    Country/TerritoryUnited States
    CitySeattle, WA
    Period11/12/1111/18/11

    Keywords

    • Clustering
    • Membership
    • Middleware
    • Peer-to-peer
    • Pub/sub systems
    • Scalability

    Fingerprint

    Dive into the research topics of 'Poster: Scalable infrastructure to support supercomputer resiliency-aware applications and load balancing'. Together they form a unique fingerprint.

    Cite this