Adaptive event prediction strategy with dynamic time window for large-scale HPC systems

Ana Gainaru, Franck Cappello, Joshi Fullop, Stefan Trausan-Matu, William Kramer

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

31 Scopus citations

Abstract

In this paper, we analyse messages generated by different HPC large-scale systems in order to extract sequences of correlated events which we lately use to predict the normal and faulty behaviour of the system. Our method uses a dynamic window strategy that is able to find frequent sequences of events regardless on the time delay between them. Most of the current related research narrows the correlation extraction to fixed and relatively small time windows that do not reflect the whole behaviour of the system. The generated events are in constant change during the lifetime of the machine. We consider that it is important to update the sequences at runtime by applying modifications after each prediction phase according to the forecast's accuracy and the difference between what was expected and what really happened. Our experiments show that our analysing system is able to predict around 60% of events with a precision of around 85% at a lower event granularity than before.

Original languageEnglish
Title of host publicationManaging Large-Scale Systems via the Analysis of System Logs and the Application of Machine Learning Techniques, SLAML'11
DOIs
StatePublished - 2011
Externally publishedYes
EventManaging Large-Scale Systems via the Analysis of System Logs and the Application of Machine Learning Techniques, SLAML'11 - Cascais, Portugal
Duration: Oct 23 2011Oct 26 2011

Publication series

NameManaging Large-Scale Systems via the Analysis of System Logs and the Application of Machine Learning Techniques, SLAML'11

Conference

ConferenceManaging Large-Scale Systems via the Analysis of System Logs and the Application of Machine Learning Techniques, SLAML'11
Country/TerritoryPortugal
CityCascais
Period10/23/1110/26/11

Keywords

  • Event prediction
  • HPC systems
  • Logfile analysis

Fingerprint

Dive into the research topics of 'Adaptive event prediction strategy with dynamic time window for large-scale HPC systems'. Together they form a unique fingerprint.

Cite this