Apply Block Index Technique to Scientific Data Analysis and I/O Systems

Tzuhsien Wu, Jerry Chou, Norbert Podhorszki, Junmin Gu, Yuan Tian, Scott Klasky, Kesheng Wu

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

10 Scopus citations

Abstract

Scientific discoveries are increasingly relying on analysis of massive amounts of data. The ability to directly access the most relevant data records through query, without shifting through all of them becomes essential. However, scientific datasets are commonly stored on parallel file systems and I/O systems that are optimized for reading/writing large chunks of data, and many scientific datasets have spatial-Temporal data similarity, such that the records with similar values often locate in a close proximity of each other. Therefore, our previous work started to investigate the benefit of using block range index technique for scientific datasets, which only records the value range of all the records in a data block. In this paper, we extend our work in several aspects. First, we implement and integrate our blockindex technique with the ADIOS I/O system. Second, we show our proposed method can be significantly better than the existing minmax and bitmaps indexing methods supported in ADIOS, and can also have comparable performance in the worst case. Third, we propose several techniques that can take advantage of the block index information to greatly reduce data retrieval time from query results. Fourth, we evaluate our approach using several real scientific datasets, and analyze the spatial-Temporal data similarity characteristics in them. Through our study, we believe block index can be an effective indexing technique for scientific datasets with little implementation and operating overhead. It's size is small enough for building the indexes on-The-fly, and yet its query information is sufficient for efficient data access.

Original languageEnglish
Title of host publicationProceedings - 2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGRID 2017
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages865-871
Number of pages7
ISBN (Electronic)9781509066100
DOIs
StatePublished - Jul 10 2017
Externally publishedYes
Event17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGRID 2017 - Madrid, Spain
Duration: May 14 2017May 17 2017

Publication series

NameProceedings - 2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGRID 2017

Conference

Conference17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGRID 2017
Country/TerritorySpain
CityMadrid
Period05/14/1705/17/17

Keywords

  • IO systems
  • Indexing
  • Query analysis
  • Scientific data

Fingerprint

Dive into the research topics of 'Apply Block Index Technique to Scientific Data Analysis and I/O Systems'. Together they form a unique fingerprint.

Cite this