Information fusion for text classification - An experimental comparison

Venu Dasigi, Reinhold C. Mann, Vladimir A. Protopopescu

Research output: Contribution to journalArticlepeer-review

26 Scopus citations

Abstract

This article reports on our experiments and results on the effectiveness of different features sets and information fusion from some combinations of them in classifying free text documents into a given number of categories. We use different feature sets and integrate neural network learning into the method. The feature sets are based on the "latent semantics" of a reference library - a collection of documents adequately representing the desired concepts. We found that a larger reference library is not necessarily better. Information fusion almost always gives better results than the individual constituent feature sets, with certain combinations doing better than the others.

Original languageEnglish
Pages (from-to)2413-2425
Number of pages13
JournalPattern Recognition
Volume34
Issue number12
DOIs
StatePublished - Dec 2001

Funding

The first author thanks the Air Force Office of Scientific Research (AFOSR) for the summer fellowship that made this work possible, and acknowledges Dr. Heather Dussault's invaluable support. Thanks are also due to Professor Mike Berry of the Computer Science Department at the University of Tennessee, Knoxville and his students for distributing the SVDPACKC software. About the Author —REINHOLD C. MANN is Director of the Life Sciences Division at the Oak Ridge National Laboratory (ORNL), in Oak Ridge, Tennessee, a Department of Energy laboratory managed by UT-Battelle, LLC. His research interests have been in pattern recognition, intelligent systems and robotics, bioinformatics, computational biology. Prior to assuming his current position, Mann was staff member in the former ORNL Biology Division from 1983 to 1986, leader of the ORNL Advanced Computing and Integrated Sensor Systems Group from 1987 until 1989, Director of the ORNL Center for Engineering Systems Advanced Research (CESAR) from 1989 until 1994, and head of the Intelligent Systems Section from 1989 until 1997. He came to ORNL as visiting scientist in 1981, supported by a Feodor-Lynen Fellowship by the Alexander von Humboldt Foundation in Bonn, Germany. Mann received a Diplom-Mathematiker degree (M.S. in mathematics) in 1977, and a Dr. rer. nat. degree (Ph.D.) in physics in 1980 from the Johannes Gutenberg University in Mainz, Federal Republic of Germany. He is an Adjunct Associate Professor at the University of Tennessee in Knoxville. He is a member of the AAAS, IEEE, and the ACM.

FundersFunder number
Air Force Office of Scientific Research
Alexander von Humboldt-Stiftung

    Keywords

    • Features
    • Information fusion
    • Latent semantic indexing
    • Neural networks
    • Reference library
    • Text classification

    Fingerprint

    Dive into the research topics of 'Information fusion for text classification - An experimental comparison'. Together they form a unique fingerprint.

    Cite this