Real-Time Discovery Services over Large, Heterogeneous and Complex Healthcare Datasets Using Schema-Less, Column-Oriented Methods

Edmon Begoli, Ted Dunning, Charlie Frasure

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

10 Scopus citations

Abstract

We present a service platform for schema-leess exploration of data and discovery of patient-related statistics from healthcare data sets. The architecture of this platform is motivated by the need for fast, schema-less, and flexible approaches to SQL-based exploration and discovery of information embedded in the common, heterogeneously structured healthcare data sets and supporting components (electronic health records, practice management systems, etc.) The motivating use cases described in the paper are clinical trials candidate discovery, and a treatment effectiveness analysis. Following the use cases, we discuss the key features and software architecture of the platform, the underlying core components (Apache Parquet, Drill, the web services server), and the runtime profiles and performance characteristics of the platform. We conclude by showing dramatic speedup with some approaches, and the performance tradeoffs and limitations of others.

Original languageEnglish
Title of host publicationProceedings - 2016 IEEE 2nd International Conference on Big Data Computing Service and Applications, BigDataService 2016
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages257-264
Number of pages8
ISBN (Electronic)9781509022519
DOIs
StatePublished - May 19 2016
Externally publishedYes
Event2nd IEEE International Conference on Big Data Computing Service and Applications, BigDataService 2016 - Oxford, United Kingdom
Duration: Mar 29 2016Apr 1 2016

Publication series

NameProceedings - 2016 IEEE 2nd International Conference on Big Data Computing Service and Applications, BigDataService 2016

Conference

Conference2nd IEEE International Conference on Big Data Computing Service and Applications, BigDataService 2016
Country/TerritoryUnited Kingdom
CityOxford
Period03/29/1604/1/16

Keywords

  • Apache Drill
  • Apache Parquet
  • Column Oriented Stores
  • Data Analysis
  • Data Cyclone
  • Healthcare
  • Schema-less data management
  • Services Platform

Fingerprint

Dive into the research topics of 'Real-Time Discovery Services over Large, Heterogeneous and Complex Healthcare Datasets Using Schema-Less, Column-Oriented Methods'. Together they form a unique fingerprint.

Cite this