Machine Learning and Social Media to Mine and Disseminate Big Scientific Data

Ranjeet Devarakonda, Michael Giansiracusa, Jitendra Kumar

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations

Abstract

One of the challenges in supplying the communities with wider access to scientific databases is the need for knowledge of database languages like Structured Query Language (SQL). Although the SQL language has been published in many forms, not everybody is able to write SQL queries. Another challenge is that it might not be practical to make the public aware of the structure of databases. There is a need for novice users to query relational databases using their natural language. To solve this problem, many natural language interfaces to structured databases have been developed. The goal is to provide a more intuitive method for generating database queries and delivering responses. Through social media, which makes it possible to interact with a wide section of the population, and with the help of natural language processing, researchers at the Atmospheric Radiation Measurement (ARM) Data Center at Oak Ridge National Laboratory (ORNL) have developed a concept to enable easy search and retrieval of data from several environmental data centers for the scientific community through social media.Using a machine learning framework that maps natural language text to thousands of datasets, instruments, variables, and data streams, the prototype system would allow users to request data through Twitter and receive a link (via tweet) to applicable data results on the project's search catalog tailored to their key words. This automated identification of relevant data from various petascale archives at ORNL could increase convenience, access, and use of the project's data by the broader community. In this paper we discuss how some data-intensive projects at ORNL are using innovative ways to help in data discovery.

Original languageEnglish
Title of host publicationProceedings - 2018 IEEE International Conference on Big Data, Big Data 2018
EditorsNaoki Abe, Huan Liu, Calton Pu, Xiaohua Hu, Nesreen Ahmed, Mu Qiao, Yang Song, Donald Kossmann, Bing Liu, Kisung Lee, Jiliang Tang, Jingrui He, Jeffrey Saltz
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages5312-5315
Number of pages4
ISBN (Electronic)9781538650356
DOIs
StatePublished - Jul 2 2018
Event2018 IEEE International Conference on Big Data, Big Data 2018 - Seattle, United States
Duration: Dec 10 2018Dec 13 2018

Publication series

NameProceedings - 2018 IEEE International Conference on Big Data, Big Data 2018

Conference

Conference2018 IEEE International Conference on Big Data, Big Data 2018
Country/TerritoryUnited States
CitySeattle
Period12/10/1812/13/18

Funding

The ARM is funded through the DOE Office of Science and is managed through the Biological and Environmental Research (BER) Division. Oak Ridge National Laboratory is managed by the UT-Battelle, LLC, for the U.S. Department of Energy under contract DE-AC05-000R22725.

FundersFunder number
U.S. Department of EnergyDE-AC05-000R22725
Biological and Environmental Research

    Keywords

    • machine learning
    • natural language processing
    • scientific data mining
    • social media interaction
    • stream pipelining

    Fingerprint

    Dive into the research topics of 'Machine Learning and Social Media to Mine and Disseminate Big Scientific Data'. Together they form a unique fingerprint.

    Cite this