RingX: Scalable Parallel Attention for Long-Context Learning on HPC

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

The attention mechanism has become foundational for remarkable AI breakthroughs since the introduction of the Transformer, driving the demand for increasingly longer context to power frontier models such as large-scale reasoning language models and high-resolution image/video generators. However, its quadratic computational and memory complexities present substantial challenges. Current state-of-the-art parallel attention methods, such as ring attention, are widely adopted for long-context training but utilize a point-to-point communication strategy that fails to fully exploit the capabilities of modern HPC network architectures. In this work, we propose ringX, a scalable family of parallel attention methods optimized explicitly for HPC systems. By enhancing workload partitioning, refining communication patterns, and improving load balancing, ringX achieves up to 3.4× speedup compared to conventional ring attention on the Frontier supercomputer. Optimized for both bi-directional and causal attention mechanisms, ringX demonstrates its effectiveness through training benchmarks of a Vision Transformer (ViT) on a climate dataset and a Generative Pre-Trained Transformer (GPT) model, Llama3 8B. Our method attains an end-to-end training speedup of approximately 1.5× in both scenarios. To our knowledge, the achieved 38% model FLOPs utilization (MFU) for training Llama3 8B with a 1M-token sequence length on 4,096 GPUs represents one of the highest training efficiencies reported for long-context learning on HPC systems. Our code implementation is available at https://github.com/jqyin/ringX-attention.

Original languageEnglish
Title of host publicationProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2025
PublisherAssociation for Computing Machinery, Inc
Pages1395-1408
Number of pages14
ISBN (Electronic)9798400714665
DOIs
StatePublished - Nov 15 2025
Event2025 International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2025 - St. Louis, United States
Duration: Nov 16 2025Nov 21 2025

Publication series

NameProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2025

Conference

Conference2025 International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2025
Country/TerritoryUnited States
CitySt. Louis
Period11/16/2511/21/25

Keywords

  • HPC for AI
  • Long-context learning
  • Parallel attention

Fingerprint

Dive into the research topics of 'RingX: Scalable Parallel Attention for Long-Context Learning on HPC'. Together they form a unique fingerprint.

Cite this