Abstract
As the generation of data becomes more prolific, the amount of time and resources necessary to perform analyses on these data increases. What is less understood, however, is the data preprocessing steps that must be applied before any meaningful analysis can begin. This problem of taking data in some initial form and transforming it into a desired one is known as data integration. Here, we introduce the Data Integration Benchmarking Suite (DIBS), a suite of applications that are representative of data integration workloads across many disciplines. We apply a comprehensive characterization to these applications to better understand the general behavior of data integration tasks. As a result of our benchmark suite and characterization methods, we offer insight regarding data integration tasks that will guide other researchers designing solutions in this area.
| Original language | English |
|---|---|
| Title of host publication | ICPE 2018 - Companion of the 2018 ACM/SPEC International Conference on Performance Engineering |
| Publisher | Association for Computing Machinery, Inc |
| Pages | 25-28 |
| Number of pages | 4 |
| ISBN (Electronic) | 9781450356299 |
| DOIs | |
| State | Published - Apr 2 2018 |
| Externally published | Yes |
| Event | 9th ACM/SPEC International Conference on Performance Engineering, ICPE 2018 - Berlin, Germany Duration: Apr 9 2018 → Apr 13 2018 |
Publication series
| Name | ICPE 2018 - Companion of the 2018 ACM/SPEC International Conference on Performance Engineering |
|---|---|
| Volume | 2018-January |
Conference
| Conference | 9th ACM/SPEC International Conference on Performance Engineering, ICPE 2018 |
|---|---|
| Country/Territory | Germany |
| City | Berlin |
| Period | 04/9/18 → 04/13/18 |
Funding
This work was supported by the NSF under grant CNS-1527510.
Keywords
- Big data
- Data integration
- Data wrangling