Abstract
The rapid generation of data from distributed IoT devices, scientific instruments, and compute clusters presents unique data management challenges. The influx of large, heterogeneous, and complex data causes repositories to become siloed or generally unsearchable-both problems not currently well-addressed by distributed file systems. In this work, we propose Xtract, a serverless middleware to extract metadata from files spread across heterogeneous edge computing resources. In my future work, we intend to study how Xtract can automatically construct file extraction workflows subject to users' cost, time, security, and compute allocation constraints. To this end, Xtract will enable the creation of a searchable centralized index across distributed data collections.
| Original language | English |
|---|---|
| Title of host publication | Middleware 2019 - Proceedings of the 2019 20th International Middleware Conference Doctoral Symposium, Part of Middleware 2019 |
| Publisher | Association for Computing Machinery, Inc |
| Pages | 51-53 |
| Number of pages | 3 |
| ISBN (Electronic) | 9781450370394 |
| DOIs | |
| State | Published - Dec 9 2019 |
| Externally published | Yes |
| Event | 20th International Middleware Conference Doctoral Symposium, Middleware 2019, Part of Middleware 2019 - Davis, United States Duration: Dec 9 2019 → Dec 13 2019 |
Publication series
| Name | Middleware 2019 - Proceedings of the 2019 20th International Middleware Conference Doctoral Symposium, Part of Middleware 2019 |
|---|
Conference
| Conference | 20th International Middleware Conference Doctoral Symposium, Middleware 2019, Part of Middleware 2019 |
|---|---|
| Country/Territory | United States |
| City | Davis |
| Period | 12/9/19 → 12/13/19 |
Funding
This research is conducted under the guidance of Dr. Ian Foster and Dr. Kyle Chard, and with contributions from Dr. Ryan Chard, Dr. Zhuozhao Li, Yadu Babuji, and Ryan Wong. We gratefully acknowledge the use of compute resources from the Jetstream cloud for science and engineering [12].
Keywords
- Data lakes
- File systems
- Metadata extraction
- Serverless