Apache calcite: A foundational framework for optimized query processing over heterogeneous data sources

Edmon Begoli, Jesús Camacho-Rodríguez, Julian Hyde, Michael J. Mior, Daniel Lemire

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

115 Scopus citations

Abstract

Apache Calcite is a foundational software framework that provides query processing, optimization, and query language support to many popular open-source data processing systems such as Apache Hive, Apache Storm, Apache Flink, Druid, and MapD. The goal of this paper is to formally introduce Calcite to the broader research community, briefly present its history, and describe its architecture, features, functionality, and patterns for adoption. Calcite's architecture consists of a modular and extensible query optimizer with hundreds of built-in optimization rules, a query processor capable of processing a variety of query languages, an adapter architecture designed for extensibility, and support for heterogeneous data models and stores (relational, semi-structured, streaming, and geospatial). This flexible, embeddable, and extensible architecture is what makes Calcite an attractive choice for adoption in big-data frameworks. It is an active project that continues to introduce support for the new types of data sources, query languages, and approaches to query processing and optimization.

Original languageEnglish
Title of host publicationSIGMOD 2018 - Proceedings of the 2018 International Conference on Management of Data
EditorsGautam Das, Christopher Jermaine, Ahmed Eldawy, Philip Bernstein
PublisherAssociation for Computing Machinery
Pages221-230
Number of pages10
ISBN (Electronic)9781450317436
DOIs
StatePublished - May 27 2018
Event44th ACM SIGMOD International Conference on Management of Data, SIGMOD 2018 - Houston, United States
Duration: Jun 10 2018Jun 15 2018

Publication series

NameProceedings of the ACM SIGMOD International Conference on Management of Data
ISSN (Print)0730-8078

Conference

Conference44th ACM SIGMOD International Conference on Management of Data, SIGMOD 2018
Country/TerritoryUnited States
CityHouston
Period06/10/1806/15/18

Keywords

  • Apache calcite
  • Data management
  • Modular query optimization
  • Query algebra
  • Relational semantics
  • Storage adapters

Fingerprint

Dive into the research topics of 'Apache calcite: A foundational framework for optimized query processing over heterogeneous data sources'. Together they form a unique fingerprint.

Cite this