Skip to main navigation Skip to search Skip to main content

Scidac-Data: Enabling Data Driven Modeling of Exascale Computing

  • Misbah Mubarak
  • , Pengfei Ding
  • , Leo Aliaga
  • , Aristeidis Tsaris
  • , Andrew Norman
  • , Adam Lyon
  • , Robert Ross

Research output: Contribution to journalConference articlepeer-review

Abstract

The SciDAC-Data project is a DOE-funded initiative to analyze and exploit two decades of information and analytics that have been collected by the Fermilab data center on the organization, movement, and consumption of high energy physics (HEP) data. The project analyzes the analysis patterns and data organization that have been used by NOvA, MicroBooNE, MINERvA, CDF, D0, and other experiments to develop realistic models of HEP analysis workflows and data processing. The SciDAC-Data project aims to provide both realistic input vectors and corresponding output data that can be used to optimize and validate simulations of HEP analysis. These simulations are designed to address questions of data handling, cache optimization, and workflow structures that are the prerequisites for modern HEP analysis chains to be mapped and optimized to run on the next generation of leadership-class exascale computing facilities. We present the use of a subset of the SciDAC-Data distributions, acquired from analysis of approximately 71,000 HEP workflows run on the Fermilab data center and corresponding to over 9 million individual analysis jobs, as the input to detailed queuing simulations that model the expected data consumption and caching behaviors of the work running in high performance computing (HPC) and high throughput computing (HTC) environments. In particular we describe how the Sequential Access via Metadata (SAM) data-handling system in combination with the dCache/Enstore-based data archive facilities has been used to develop radically different models for analyzing the HEP data. We also show how the simulations may be used to assess the impact of design choices in archive facilities.

Original languageEnglish
Article number062048
JournalJournal of Physics: Conference Series
Volume898
Issue number6
DOIs
StatePublished - Nov 23 2017
Externally publishedYes
Event22nd International Conference on Computing in High Energy and Nuclear Physics, CHEP 2016 - San Francisco, United States
Duration: Oct 10 2016Oct 14 2016

Funding

The research at Argonne is based upon work supported by the U.S. Department of Energy, Office of Science, under Contract No. DE-AC02-06CH11357. Fermilab is operated by Fermi Research Alliance, LLC under Contract No. De-AC02-07CH11359 with the U.S. Department of Energy. The research at Argonne is based upon work supported by the U.S. Department of Energy, Office of Science, under Contract No. DE-AC02-06CH11357.

Fingerprint

Dive into the research topics of 'Scidac-Data: Enabling Data Driven Modeling of Exascale Computing'. Together they form a unique fingerprint.

Cite this