TY - GEN
T1 - Stream processing for near real-time scientific data analysis
AU - Choi, Jong Youl
AU - Kurc, Tahsin
AU - Logan, Jeremy
AU - Wolf, Matthew
AU - Suchyta, Eric
AU - Kress, James
AU - Pugmire, David
AU - Podhorszki, Norbert
AU - Byun, Eun Kyu
AU - Ainsworth, Mark
AU - Parashar, Manish
AU - Klasky, Scott
N1 - Publisher Copyright:
© 2016 IEEE.
PY - 2016/11/17
Y1 - 2016/11/17
N2 - The demand for near real-time analysis of streaming data is increasing rapidly in scientific projects. This trend is driven by the fact that it is expensive and time consuming to design and execute complex experiments and simulations. During an experiment, the research team and the team at the experiment facility will want to analyze data as it is generated, interpret it, and collaboratively make decisions to modify the experiment parameters or abort the experiment in order to prevent events that may damage experimental instruments or to avoid wasting resources if there is a problem. The increasing velocity and volume of streaming data and the multi-institutional nature of large-scale scientific projects present challenges to near real-time analysis of streaming data. In this work we develop a framework to address these challenges. This framework provides an interface for applications to define and interact with named, self-describing streams, takes advantage of advanced network technologies, and implements support for the reduction and compression of data at the source. We describe this framework and demostrate its application in three scientific applications.
AB - The demand for near real-time analysis of streaming data is increasing rapidly in scientific projects. This trend is driven by the fact that it is expensive and time consuming to design and execute complex experiments and simulations. During an experiment, the research team and the team at the experiment facility will want to analyze data as it is generated, interpret it, and collaboratively make decisions to modify the experiment parameters or abort the experiment in order to prevent events that may damage experimental instruments or to avoid wasting resources if there is a problem. The increasing velocity and volume of streaming data and the multi-institutional nature of large-scale scientific projects present challenges to near real-time analysis of streaming data. In this work we develop a framework to address these challenges. This framework provides an interface for applications to define and interact with named, self-describing streams, takes advantage of advanced network technologies, and implements support for the reduction and compression of data at the source. We describe this framework and demostrate its application in three scientific applications.
UR - http://www.scopus.com/inward/record.url?scp=85006842009&partnerID=8YFLogxK
U2 - 10.1109/NYSDS.2016.7747804
DO - 10.1109/NYSDS.2016.7747804
M3 - Conference contribution
AN - SCOPUS:85006842009
T3 - 2016 New York Scientific Data Summit, NYSDS 2016 - Proceedings
BT - 2016 New York Scientific Data Summit, NYSDS 2016 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2016 New York Scientific Data Summit, NYSDS 2016
Y2 - 14 August 2016 through 17 August 2016
ER -