Outlier detection for text data

Ramakrishnan Kannan, Hyenkyun Woo, Charu C. Aggarwal, Haesun Park

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

42 Scopus citations

Abstract

The problem of outlier detection is extremely challenging in many domains such as text, in which the attribute values are typically non-negative, and most values are zero. In such cases, it often becomes difficult to separate the outliers from the natural variations in the patterns in the underlying data. In this paper, we present a matrix factorization method, which is naturally able to distinguish the anomalies with the use of low rank approximations of the underlying data. Our iterative algorithm TONMF is based on Block Coordinate Descent (BCD) framework. Our approach has significant advantages over traditional methods for text outlier detection. Finally, we present experimental results illustrating the effectiveness of our method over competing methods.

Original languageEnglish
Title of host publicationProceedings of the 17th SIAM International Conference on Data Mining, SDM 2017
EditorsNitesh Chawla, Wei Wang
PublisherSociety for Industrial and Applied Mathematics Publications
Pages489-497
Number of pages9
ISBN (Electronic)9781611974874
DOIs
StatePublished - 2017
Event17th SIAM International Conference on Data Mining, SDM 2017 - Houston, United States
Duration: Apr 27 2017Apr 29 2017

Publication series

NameProceedings of the 17th SIAM International Conference on Data Mining, SDM 2017

Conference

Conference17th SIAM International Conference on Data Mining, SDM 2017
Country/TerritoryUnited States
CityHouston
Period04/27/1704/29/17

Funding

This manuscript has been co-authored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. This project was partially funded by the Laboratory Director’s Research and Development fund, National Science Foundation (NSF) grant IIS-1348152, Defense Advanced Research Projects Agency (DARPA) XDATA program grant FA8750-12-2-0309 and also sponsored by the Army Research Laboratory (ARL) accomplished under Cooperative Agreement Number W911NF-09-2-0053. Also, H. Woo is supported by NRF-2015R101A1A01061261.

FundersFunder number
IIS-1348152
Army Research LaboratoryW911NF-09-2-0053, NRF-2015R101A1A01061261
National Science Foundation
U.S. Department of Energy
Laboratory Directed Research and Development
Defense Advanced Research Projects AgencyFA8750-12-2-0309
NRF-2015R101A1A01061261
CelgardDE-AC05-00OR22725
National Stroke FoundationIIS-1348152
Research and Development
United States - Israel Binational Agricultural Research and Development Fund

    Fingerprint

    Dive into the research topics of 'Outlier detection for text data'. Together they form a unique fingerprint.

    Cite this