TF-ICF: A new term weighting scheme for clustering dynamic data streams

Joel W. Reed, Jiao Yu, Thomas E. Potok, Brian A. Klump, Mark T. Elmore, Ali R. Hurson

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

110 Scopus citations

Abstract

In this paper, we propose a new term weighting scheme called Term Frequency - Inverse Corpus Frequency (TF-ICF). It does not require term frequency information from other documents within the document collection and thus, it enables us to generate the document vectors of N streaming documents in linear time. In the context of a machine learning application, unsupervised document clustering, we evaluated the effectiveness of the proposed approach in comparison to five widely used term weighting schemes through extensive experimentation. Our results show that TF-ICF can produce document clusters that are of comparable quality as those generated by the widely recognized term weighting schemes and it is significantly faster than those methods.

Original languageEnglish
Title of host publicationProceedings - 5th International Conference on Machine Learning and Applications, ICMLA 2006
Pages258-263
Number of pages6
DOIs
StatePublished - 2006
Event5th International Conference on Machine Learning and Applications, ICMLA 2006 - Orlando, FL, United States
Duration: Dec 14 2006Dec 16 2006

Publication series

NameProceedings - 5th International Conference on Machine Learning and Applications, ICMLA 2006

Conference

Conference5th International Conference on Machine Learning and Applications, ICMLA 2006
Country/TerritoryUnited States
CityOrlando, FL
Period12/14/0612/16/06

Fingerprint

Dive into the research topics of 'TF-ICF: A new term weighting scheme for clustering dynamic data streams'. Together they form a unique fingerprint.

Cite this