Filtering log data: Finding the needles in the Haystack

Li Yu, Ziming Zheng, Zhiling Lan, Terry Jones, Jim M. Brandt, Ann C. Gentile

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

11 Scopus citations

Abstract

Log data is an incredible asset for troubleshooting in large-scale systems. Nevertheless, due to the ever-growing system scale, the volume of such data becomes overwhelming, bringing enormous burdens on both data storage and data analysis. To address this problem, we present a 2-dimensional online filtering mechanism to remove redundant and noisy data via feature selection and instance selection. The objective of this work is two-fold: (i) to significantly reduce data volume without losing important information, and (ii) to effectively promote data analysis. We evaluate this new filtering mechanism by means of real environmental data from the production supercomputers at Oak Ridge National Laboratory and Sandia National Laboratory. Our preliminary results demonstrate that our method can reduce more than 85% disk space, thereby significantly reducing analysis time. Moreover, it also facilitates better failure prediction and diagnosis by more than 20%, as compared to the conventional predictive approach relying on RAS (Reliability, Availability, and Serviceability) events alone.

Original languageEnglish
Title of host publication2012 42nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2012
DOIs
StatePublished - 2012
Event42nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2012 - Boston, MA, United States
Duration: Jun 25 2012Jun 28 2012

Publication series

NameProceedings of the International Conference on Dependable Systems and Networks

Conference

Conference42nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2012
Country/TerritoryUnited States
CityBoston, MA
Period06/25/1206/28/12

Fingerprint

Dive into the research topics of 'Filtering log data: Finding the needles in the Haystack'. Together they form a unique fingerprint.

Cite this