DIBS: A data integration benchmark suite

Anthony M. Cabrera, Clayton J. Faber, Kyle Cepeda, Robert Derber, Cooper Epstein, Jason Zheng, Ron K. Cytron, Roger D. Chamberlain

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

13 Scopus citations

Abstract

As the generation of data becomes more prolific, the amount of time and resources necessary to perform analyses on these data increases. What is less understood, however, is the data preprocessing steps that must be applied before any meaningful analysis can begin. This problem of taking data in some initial form and transforming it into a desired one is known as data integration. Here, we introduce the Data Integration Benchmarking Suite (DIBS), a suite of applications that are representative of data integration workloads across many disciplines. We apply a comprehensive characterization to these applications to better understand the general behavior of data integration tasks. As a result of our benchmark suite and characterization methods, we offer insight regarding data integration tasks that will guide other researchers designing solutions in this area.

Original languageEnglish
Title of host publicationICPE 2018 - Companion of the 2018 ACM/SPEC International Conference on Performance Engineering
PublisherAssociation for Computing Machinery, Inc
Pages25-28
Number of pages4
ISBN (Electronic)9781450356299
DOIs
StatePublished - Apr 2 2018
Externally publishedYes
Event9th ACM/SPEC International Conference on Performance Engineering, ICPE 2018 - Berlin, Germany
Duration: Apr 9 2018Apr 13 2018

Publication series

NameICPE 2018 - Companion of the 2018 ACM/SPEC International Conference on Performance Engineering
Volume2018-January

Conference

Conference9th ACM/SPEC International Conference on Performance Engineering, ICPE 2018
Country/TerritoryGermany
CityBerlin
Period04/9/1804/13/18

Funding

This work was supported by the NSF under grant CNS-1527510.

FundersFunder number
National Science FoundationCNS-1527510
National Science Foundation

    Keywords

    • Big data
    • Data integration
    • Data wrangling

    Fingerprint

    Dive into the research topics of 'DIBS: A data integration benchmark suite'. Together they form a unique fingerprint.

    Cite this