Mapping applications for high performance on multithreaded, NUMA systems

Guojing Cong, Huifang Wen

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

The communication latency and available resources for a group of logical processors are determined by their relative position in the hierarchy of chips, cores, and threads on modern shared-memory systems. Multithreaded applications exhibit different performance behavior depending on the mapping of software threads to logical processors. We observe the execution time under one mapping can be 5.4 times as much as that under another. Applications with irregular access patterns show the worst performance under the default OS mapping. Mapping alone does not reduce remote accesses on NUMA machines when the logical processors span multiple chips. We present new data replication and distribution optimizations for two irregular applications. We further show that locality optimization reduces remote accesses and improves cache performance simultaneously and achieves better performance than prior NUMA-specific techniques.

Original languageEnglish
Title of host publicationProceedings of the ACM International Conference on Computing Frontiers, CF 2013
DOIs
StatePublished - 2013
Externally publishedYes
Event2013 ACM International Conference on Computing Frontiers, CF 2013 - Ischia, Italy
Duration: May 14 2013May 16 2013

Publication series

NameProceedings of the ACM International Conference on Computing Frontiers, CF 2013

Conference

Conference2013 ACM International Conference on Computing Frontiers, CF 2013
Country/TerritoryItaly
CityIschia
Period05/14/1305/16/13

Keywords

  • Binding
  • Locality
  • Multithreading
  • NUMA

Fingerprint

Dive into the research topics of 'Mapping applications for high performance on multithreaded, NUMA systems'. Together they form a unique fingerprint.

Cite this