Sequence-RTG: Efficient and Production-Ready Pattern Mining in System Log Messages

Louise Harding, Fabien Wernli, Frédéric Suter

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations

Abstract

System logs are a wealth of information that can be leveraged to control the behaviour of a computing and storage infrastructure, detect deviations from normal behaviour, and react accordingly by triggering some predefined actions. System log management usually consists of a complex workflow that collects, standardises, indexes, stores, and visualises the log messages to help system administration teams in their daily operations. In large scale data centres such log management infrastructures can collect millions if not billions of messages per day. A key component in this workflow is the identification of message patterns, which requests the expertise of administrators. These patterns represent a template of both static and variable message parts against which a new log message can be matched. This crucial task is often done manually, but these patterns can change frequently making it time consuming for the human operators to keep up. Therefore, we propose in this paper to automate the discovery of patterns in system log messages by extending the functionalities of an existing pattern mining framework, called Sequence. Our main objectives are to improve both the scalability of this framework and its capacity to be integrated into a complete system log management workflow. We present how we addressed six main limitations of the seminal Sequence tool. These modifications led us to propose Sequence-RTG (Sequence-Ready-To-Go), a more efficient and production-ready version. We analyse its performance in terms of both speed, using data-sets of increasing sizes, and accuracy on data-sets from the literature. We also show that two months after the introduction of Sequence-RTG within the system log management framework of the IN2P3 Computing Centre we reduced the fraction of messages that are not matched to a pattern from 75-80% to only 15%.

Original languageEnglish
Title of host publicationProceedings - 2021 IEEE International Conference on Cluster Computing, Cluster 2021
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages623-631
Number of pages9
ISBN (Electronic)9781728196664
DOIs
StatePublished - 2021
Externally publishedYes
Event2021 IEEE International Conference on Cluster Computing, Cluster 2021 - Virtual, Portland, United States
Duration: Sep 7 2021Sep 10 2021

Publication series

NameProceedings - IEEE International Conference on Cluster Computing, ICCC
Volume2021-September
ISSN (Print)1552-5244

Conference

Conference2021 IEEE International Conference on Cluster Computing, Cluster 2021
Country/TerritoryUnited States
CityVirtual, Portland
Period09/7/2109/10/21

Fingerprint

Dive into the research topics of 'Sequence-RTG: Efficient and Production-Ready Pattern Mining in System Log Messages'. Together they form a unique fingerprint.

Cite this