TY - GEN
T1 - Building a large scale climate data system in support of HPC environment
AU - Wang, Feiyi
AU - Harney, John
AU - Shipman, Galen
AU - Williams, Dean
AU - Cinquini, Luca
PY - 2011
Y1 - 2011
N2 - The Earth System Grid Federation (ESG) is a large scale, multi-institutional, interdisciplinary project that aims to provide climate scientists and impact policy makers worldwide a web-based and client-based platform to publish, disseminate, compare and analyze ever increasing climate related data. This paper describes our practical experiences on the design, development and operation of such a system. In particular, we focus on the support of the data lifecycle from a high performance computing (HPC) perspective that is critical to the end-to-end scientific discovery process. We discuss three subjects that interconnect the consumer and producer of scientific datasets: (1) the motivations, complexities and solutions of deep storage access and sharing in a tightly controlled environment; (2) the importance of scalable and flexible data publication/population; and (3) high performance indexing and search of data with geospatial properties. These perceived corner issues collectively contributed to the overall user experience and proved to be as important as any other architectural design considerations. Although the requirements and challenges are rooted and discussed from a climate science domain context, we believe the architectural problems, ideas and solutions discussed in this paper are generally useful and applicable in a larger scope.
AB - The Earth System Grid Federation (ESG) is a large scale, multi-institutional, interdisciplinary project that aims to provide climate scientists and impact policy makers worldwide a web-based and client-based platform to publish, disseminate, compare and analyze ever increasing climate related data. This paper describes our practical experiences on the design, development and operation of such a system. In particular, we focus on the support of the data lifecycle from a high performance computing (HPC) perspective that is critical to the end-to-end scientific discovery process. We discuss three subjects that interconnect the consumer and producer of scientific datasets: (1) the motivations, complexities and solutions of deep storage access and sharing in a tightly controlled environment; (2) the importance of scalable and flexible data publication/population; and (3) high performance indexing and search of data with geospatial properties. These perceived corner issues collectively contributed to the overall user experience and proved to be as important as any other architectural design considerations. Although the requirements and challenges are rooted and discussed from a climate science domain context, we believe the architectural problems, ideas and solutions discussed in this paper are generally useful and applicable in a larger scope.
UR - http://www.scopus.com/inward/record.url?scp=83755206339&partnerID=8YFLogxK
U2 - 10.1109/NWeSP.2011.6088209
DO - 10.1109/NWeSP.2011.6088209
M3 - Conference contribution
AN - SCOPUS:83755206339
SN - 9781457711268
T3 - Proceedings of the 2011 7th International Conference on Next Generation Web Services Practices, NWeSP 2011
SP - 380
EP - 385
BT - Proceedings of the 2011 7th International Conference on Next Generation Web Services Practices, NWeSP 2011
T2 - 2011 7th International Conference on Next Generation Web Services Practices, NWeSP 2011
Y2 - 19 October 2011 through 21 October 2011
ER -