Dredging a data lake: Decentralized metadata extraction

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

7 Scopus citations

Abstract

The rapid generation of data from distributed IoT devices, scientific instruments, and compute clusters presents unique data management challenges. The influx of large, heterogeneous, and complex data causes repositories to become siloed or generally unsearchable-both problems not currently well-addressed by distributed file systems. In this work, we propose Xtract, a serverless middleware to extract metadata from files spread across heterogeneous edge computing resources. In my future work, we intend to study how Xtract can automatically construct file extraction workflows subject to users' cost, time, security, and compute allocation constraints. To this end, Xtract will enable the creation of a searchable centralized index across distributed data collections.

Original languageEnglish
Title of host publicationMiddleware 2019 - Proceedings of the 2019 20th International Middleware Conference Doctoral Symposium, Part of Middleware 2019
PublisherAssociation for Computing Machinery, Inc
Pages51-53
Number of pages3
ISBN (Electronic)9781450370394
DOIs
StatePublished - Dec 9 2019
Externally publishedYes
Event20th International Middleware Conference Doctoral Symposium, Middleware 2019, Part of Middleware 2019 - Davis, United States
Duration: Dec 9 2019Dec 13 2019

Publication series

NameMiddleware 2019 - Proceedings of the 2019 20th International Middleware Conference Doctoral Symposium, Part of Middleware 2019

Conference

Conference20th International Middleware Conference Doctoral Symposium, Middleware 2019, Part of Middleware 2019
Country/TerritoryUnited States
CityDavis
Period12/9/1912/13/19

Funding

This research is conducted under the guidance of Dr. Ian Foster and Dr. Kyle Chard, and with contributions from Dr. Ryan Chard, Dr. Zhuozhao Li, Yadu Babuji, and Ryan Wong. We gratefully acknowledge the use of compute resources from the Jetstream cloud for science and engineering [12].

Keywords

  • Data lakes
  • File systems
  • Metadata extraction
  • Serverless

Fingerprint

Dive into the research topics of 'Dredging a data lake: Decentralized metadata extraction'. Together they form a unique fingerprint.

Cite this