A Lakehouse Architecture for the Management and Analysis of Heterogeneous Data for Biomedical Research and Mega-biobanks

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

19 Scopus citations

Abstract

Data Lakehouse is a new paradigm in data architectures that embodies and integrates already established concepts for the systematic management of disparate, large-scale data - a data lake for heterogeneous data management, use of open standards for high-performance querying, and systematic maintenance of the data "freshness". In addition to being a new concept, the data lakehouse is also still a conceptual construct. Many projects that use the lakehouse require maturing, empirical studies, and specific implementations. In this paper, we present our implementation of the data lakehouse concept in a biomedical research and health data analytics domain, and we discuss the implementation of some unique and novel features such as support for specialized access controls in support of HIPAA regulation and IRB protocols, and support for the FAIR standard.1

Original languageEnglish
Title of host publicationProceedings - 2021 IEEE International Conference on Big Data, Big Data 2021
EditorsYixin Chen, Heiko Ludwig, Yicheng Tu, Usama Fayyad, Xingquan Zhu, Xiaohua Tony Hu, Suren Byna, Xiong Liu, Jianping Zhang, Shirui Pan, Vagelis Papalexakis, Jianwu Wang, Alfredo Cuzzocrea, Carlos Ordonez
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages4643-4651
Number of pages9
ISBN (Electronic)9781665439022
DOIs
StatePublished - 2021
Event2021 IEEE International Conference on Big Data, Big Data 2021 - Virtual, Online, United States
Duration: Dec 15 2021Dec 18 2021

Publication series

NameProceedings - 2021 IEEE International Conference on Big Data, Big Data 2021

Conference

Conference2021 IEEE International Conference on Big Data, Big Data 2021
Country/TerritoryUnited States
CityVirtual, Online
Period12/15/2112/18/21

Funding

This manuscript has been in part co-authored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy, and under a joint program (VICTOR), between the U.S. Department of Energy (DOE), and the U.S. Department of Veterans Affairs (VA).

Fingerprint

Dive into the research topics of 'A Lakehouse Architecture for the Management and Analysis of Heterogeneous Data for Biomedical Research and Mega-biobanks'. Together they form a unique fingerprint.

Cite this