Abstract
We present the mathematical analysis of the Isolation Random Forest Method (IRF Method) for anomaly detection, proposed by Liu F.T., Ting K.M. and Zhou Z. H. in their seminal work as a heuristic method for anomaly detection in Big Data. We prove that the IRF space can be endowed with a probability induced by the Isolation Tree algorithm (iTree). In this setting, the convergence of the IRF method is proved, using the Law of Large Numbers. A couple of counterexamples are presented to show that the method is inconclusive and no certificate of quality can be given, when using it as a means to detect anomalies. Hence, an alternative version of the method is proposed whose mathematical foundation is fully justified. Furthermore, a criterion for choosing the number of sampled trees needed to guarantee confidence intervals of the numerical results is presented. Finally, numerical experiments are presented to compare the performance of the classic method with the proposed one.
| Original language | English |
|---|---|
| Pages (from-to) | 1156-1177 |
| Number of pages | 22 |
| Journal | Mathematical Methods in the Applied Sciences |
| Volume | 46 |
| Issue number | 1 |
| DOIs | |
| State | Published - Jan 15 2023 |
Funding
This material is based in part upon work supported by grant HERMES 54748 from Universidad Nacional de Colombia, Sede Medellín. Funding information The first author wishes to thank Universidad Nacional de Colombia, Sede Medellín for supporting the production of this work through the project Hermes 54748 as well as granting access to Gauss Server, financed by “Proyecto Plan 150 150 Fomento de la cultura de evaluación continua a través del apoyo a planes de mejoramiento de los programas curriculares” ( gauss.medellin.unal.edu.co ), where the numerical experiments were executed. Special thanks to Mr. Jorge Humberto Moreno Córdoba, our former student, who introduced us to the IRF method. The authors wish to acknowledge the anonymous reviewers whose deep insight and kind suggestions decisively enhanced the quality of this work.
Keywords
- anomaly detection
- isolation random forest
- monte carlo methods
- probabilistic algorithms