TY - GEN
T1 - IPMI-bascd efficient notification framework for large scale cluster computing
AU - Leangsuksun, Chokchai
AU - Rao, Tirumala
AU - Tikotekar, Anand
AU - Scott, Stephen L.
AU - Libby, Richard
AU - Vetter, Jeffrey S.
AU - Fang, Yung Chin
AU - Ong, Hong
PY - 2006
Y1 - 2006
N2 - The demand for an efficient fault tolerance system has led to the development of complex monitoring infrastructure, which in turn has created cm overwhelming task of data and event management. The increasing level of details at the hardware and software layer clearly affects the scalability and performance of monitoring and management tools. In this paper, we propose a problem notification framework that directly addresses the issue of monitor scalability. We first present the design and implementation of our step-by-step approach to analysing, filtering, and classifying the plethora of node statistics. Then, we present experimental results to show that our approach only needs minimal system resource and thus has low overhead. Finally, we introduce our web-based cluster management system that provides hardware controls at both cluster and nodal levels.
AB - The demand for an efficient fault tolerance system has led to the development of complex monitoring infrastructure, which in turn has created cm overwhelming task of data and event management. The increasing level of details at the hardware and software layer clearly affects the scalability and performance of monitoring and management tools. In this paper, we propose a problem notification framework that directly addresses the issue of monitor scalability. We first present the design and implementation of our step-by-step approach to analysing, filtering, and classifying the plethora of node statistics. Then, we present experimental results to show that our approach only needs minimal system resource and thus has low overhead. Finally, we introduce our web-based cluster management system that provides hardware controls at both cluster and nodal levels.
KW - High-availability IPMI
KW - Scalability
UR - http://www.scopus.com/inward/record.url?scp=42549172783&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:42549172783
SN - 0769525857
SN - 9780769525853
T3 - Sixth IEEE International Symposium on Cluster Computing and the Grid Workshops, 2006. CCGRID 06
BT - Sixth IEEE International Symposium on Cluster Computing and the Grid Workshop, 2006. CCGRID 06
T2 - 6th IEEE International Symposium on Cluster Computing and the Grid, 2006. CCGRID 06
Y2 - 16 May 2006 through 19 May 2006
ER -