Classification of distributed data using topic modeling and maximum variation sampling

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

3 Scopus citations

Abstract

From a management perspective, understanding the information that exists on a network and how it is distributed provides a critical advantage. This work explores the use of topic modeling as an approach to automatically determine the classes of information that exist on an organization's network, and then use the resultant topics as centroid vectors for the classification of individual documents in order to understand the distribution of information topics across the enterprise network. The approach is tested using the 20 Newsgroups dataset.

Original languageEnglish
Title of host publicationProceedings of the 44th Annual Hawaii International Conference on System Sciences, HICSS-44 2010
DOIs
StatePublished - 2011
Event44th Hawaii International Conference on System Sciences, HICSS-44 2010 - Koloa, Kauai, HI, United States
Duration: Jan 4 2011Jan 7 2011

Publication series

NameProceedings of the Annual Hawaii International Conference on System Sciences
ISSN (Print)1530-1605

Conference

Conference44th Hawaii International Conference on System Sciences, HICSS-44 2010
Country/TerritoryUnited States
CityKoloa, Kauai, HI
Period01/4/1101/7/11

Fingerprint

Dive into the research topics of 'Classification of distributed data using topic modeling and maximum variation sampling'. Together they form a unique fingerprint.

Cite this