TY - GEN
T1 - Clustering-Based Predictive Analytics to Improve Scientific Data Discovery
AU - Devarakonda, Ranjeet
AU - Kumar, Jitendra
AU - Prakash, Giri
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2020/12/10
Y1 - 2020/12/10
N2 - Given the sheer volume of scientific data archived within the data-intensive projects at the US Department of Energy's Oak Ridge National Laboratory, finding precisely what data we are looking for may not be a trivial task; conversely, we may also miss a more prominent data product. To address such issues, we propose improving the data discovery system and using data analytics methods to comprehend what specific users might be interested in based on their physiological state, search patterns, and past data usage history. This work's primary goal is to prune the complexity, increase the visibility of popular data products, and direct users toward the data that best meet their needs. The proposed algorithm constructs a user profile based on the user's explicit or implicit interactions with the system, such as items they are currently looking at on-site and the key metadata mappings related to the data set. The pattern is then used to build a training data set, which will help find relevant data to recommend to the user.
AB - Given the sheer volume of scientific data archived within the data-intensive projects at the US Department of Energy's Oak Ridge National Laboratory, finding precisely what data we are looking for may not be a trivial task; conversely, we may also miss a more prominent data product. To address such issues, we propose improving the data discovery system and using data analytics methods to comprehend what specific users might be interested in based on their physiological state, search patterns, and past data usage history. This work's primary goal is to prune the complexity, increase the visibility of popular data products, and direct users toward the data that best meet their needs. The proposed algorithm constructs a user profile based on the user's explicit or implicit interactions with the system, such as items they are currently looking at on-site and the key metadata mappings related to the data set. The pattern is then used to build a training data set, which will help find relevant data to recommend to the user.
KW - clustering
KW - collaborative filtering
KW - content-based filtering
KW - data discovery
KW - data recommended system
UR - http://www.scopus.com/inward/record.url?scp=85103823444&partnerID=8YFLogxK
U2 - 10.1109/BigData50022.2020.9377797
DO - 10.1109/BigData50022.2020.9377797
M3 - Conference contribution
AN - SCOPUS:85103823444
T3 - Proceedings - 2020 IEEE International Conference on Big Data, Big Data 2020
SP - 5658
EP - 5661
BT - Proceedings - 2020 IEEE International Conference on Big Data, Big Data 2020
A2 - Wu, Xintao
A2 - Jermaine, Chris
A2 - Xiong, Li
A2 - Hu, Xiaohua Tony
A2 - Kotevska, Olivera
A2 - Lu, Siyuan
A2 - Xu, Weijia
A2 - Aluru, Srinivas
A2 - Zhai, Chengxiang
A2 - Al-Masri, Eyhab
A2 - Chen, Zhiyuan
A2 - Saltz, Jeff
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 8th IEEE International Conference on Big Data, Big Data 2020
Y2 - 10 December 2020 through 13 December 2020
ER -