TagIt: An integrated indexing and search service for file systems

Hyogi Sim, Youngjae Kim, Sudharshan S. Vazhkudai, Geofroy R. Vallée, Seung Hwan Lim, Ali R. Butt

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

20 Scopus citations

Abstract

Data services such as search, discovery, and management in scalable distributed environments have traditionally been decoupled from the underlying file systems, and are often deployed using external databases and indexing services. However, modern data production rates, looming data movement costs, and the lack of metadata, entail revisiting the decoupled file system-data services design philosophy. In this paper, we present TagIt, a scalable data management service framework aimed at scientific datasets, which is tightly integrated into a shared-nothing distributed file system. A key feature of TagIt is a scalable, distributed metadata indexing framework, using which we implement a flexible tagging capability to support data discovery. The tags can also be associated with an active operator, for pre-processing, filtering, or automatic metadata extraction, which we seamlessly offload to file servers in a load-aware fashion. Our evaluation shows that TagIt can expedite data search by up to 10× over the extant decoupled approach.

Original languageEnglish
Title of host publicationProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2017
PublisherAssociation for Computing Machinery, Inc
ISBN (Electronic)9781450351140
DOIs
StatePublished - Nov 12 2017
EventInternational Conference for High Performance Computing, Networking, Storage and Analysis, SC 2017 - Denver, United States
Duration: Nov 12 2017Nov 17 2017

Publication series

NameProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2017

Conference

ConferenceInternational Conference for High Performance Computing, Networking, Storage and Analysis, SC 2017
Country/TerritoryUnited States
CityDenver
Period11/12/1711/17/17

Funding

We would like to thank our shepherd, Suzanne McIntosh, for her feedback. This research was supported in part by the U.S. DOE’s Scientific data management program, by NSF through grants CNS-1615411, CNS-1405697 and CNS-1565314, and by the Institute for Information & communications Technology Promotion (IITP) grant funded by the Korea Government (MSIP) (No. R0190-15-2012). The work was also supported by, and used the resources of, the Oak Ridge Leadership Computing Facility, located in the National Center for Computational Sciences at ORNL, which is managed by UT Battelle, LLC for the U.S. DOE (under the contract No. DE-AC05-00OR22725). We would like to thank our shepherd, Suzanne McIntosh, for her feedback. This research was supported in part by the U.S. DOE's Scientific data management program, by NSF through grants CNS-1615411, CNS-1405697 and CNS-1565314, and by the Institute for Information & communications Technology Promotion (IITP) grant funded by the Korea Government (MSIP) (No. R0190-15-2012). The work was also supported by, and used the resources of, the Oak Ridge Leadership Computing Facility, located in the National Center for Computational Sciences at ORNL, which is managed by UT Battelle, LLC for the U.S. DOE (under the contract No. DE-AC05-00OR22725).

FundersFunder number
U.S. DOE
National Science FoundationCNS-1615411, CNS-1565314, CNS-1405697
U.S. Department of Energy
Ministry of Science, ICT and Future Planning
National Science Foundation
Institute for Information and Communications Technology Promotion

    Keywords

    • Distributed file systems
    • Indexing methods
    • Search process

    Fingerprint

    Dive into the research topics of 'TagIt: An integrated indexing and search service for file systems'. Together they form a unique fingerprint.

    Cite this