Clustering-Based Predictive Analytics to Improve Scientific Data Discovery

Ranjeet Devarakonda, Jitendra Kumar, Giri Prakash

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

Given the sheer volume of scientific data archived within the data-intensive projects at the US Department of Energy's Oak Ridge National Laboratory, finding precisely what data we are looking for may not be a trivial task; conversely, we may also miss a more prominent data product. To address such issues, we propose improving the data discovery system and using data analytics methods to comprehend what specific users might be interested in based on their physiological state, search patterns, and past data usage history. This work's primary goal is to prune the complexity, increase the visibility of popular data products, and direct users toward the data that best meet their needs. The proposed algorithm constructs a user profile based on the user's explicit or implicit interactions with the system, such as items they are currently looking at on-site and the key metadata mappings related to the data set. The pattern is then used to build a training data set, which will help find relevant data to recommend to the user.

Original languageEnglish
Title of host publicationProceedings - 2020 IEEE International Conference on Big Data, Big Data 2020
EditorsXintao Wu, Chris Jermaine, Li Xiong, Xiaohua Tony Hu, Olivera Kotevska, Siyuan Lu, Weijia Xu, Srinivas Aluru, Chengxiang Zhai, Eyhab Al-Masri, Zhiyuan Chen, Jeff Saltz
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages5658-5661
Number of pages4
ISBN (Electronic)9781728162515
DOIs
StatePublished - Dec 10 2020
Event8th IEEE International Conference on Big Data, Big Data 2020 - Virtual, Atlanta, United States
Duration: Dec 10 2020Dec 13 2020

Publication series

NameProceedings - 2020 IEEE International Conference on Big Data, Big Data 2020

Conference

Conference8th IEEE International Conference on Big Data, Big Data 2020
Country/TerritoryUnited States
CityVirtual, Atlanta
Period12/10/2012/13/20

Funding

ACKNOWLEDGMENT This research was supported by the Atmospheric Radiation Measurement (ARM) user facility, a US Department of Energy (DOE) Office of Science user facility managed by the Office of Biological and Environmental Research. This manuscript has been authored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the US Department of Energy. The US government retains and the publisher, by accepting the article for publication, acknowledges that the US government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for US government purposes. DOE will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (https://www.energy.gov/downloads/doe-

FundersFunder number
U.S. Department of Energy
Office of Science
Biological and Environmental ResearchDE-AC05-00OR22725

    Keywords

    • clustering
    • collaborative filtering
    • content-based filtering
    • data discovery
    • data recommended system

    Fingerprint

    Dive into the research topics of 'Clustering-Based Predictive Analytics to Improve Scientific Data Discovery'. Together they form a unique fingerprint.

    Cite this