Abstract
This article reports on our experiments and results on the effectiveness of different features sets and information fusion from some combinations of them in classifying free text documents into a given number of categories. We use different feature sets and integrate neural network learning into the method. The feature sets are based on the "latent semantics" of a reference library - a collection of documents adequately representing the desired concepts. We found that a larger reference library is not necessarily better. Information fusion almost always gives better results than the individual constituent feature sets, with certain combinations doing better than the others.
Original language | English |
---|---|
Pages (from-to) | 2413-2425 |
Number of pages | 13 |
Journal | Pattern Recognition |
Volume | 34 |
Issue number | 12 |
DOIs | |
State | Published - Dec 2001 |
Funding
The first author thanks the Air Force Office of Scientific Research (AFOSR) for the summer fellowship that made this work possible, and acknowledges Dr. Heather Dussault's invaluable support. Thanks are also due to Professor Mike Berry of the Computer Science Department at the University of Tennessee, Knoxville and his students for distributing the SVDPACKC software. About the Author —REINHOLD C. MANN is Director of the Life Sciences Division at the Oak Ridge National Laboratory (ORNL), in Oak Ridge, Tennessee, a Department of Energy laboratory managed by UT-Battelle, LLC. His research interests have been in pattern recognition, intelligent systems and robotics, bioinformatics, computational biology. Prior to assuming his current position, Mann was staff member in the former ORNL Biology Division from 1983 to 1986, leader of the ORNL Advanced Computing and Integrated Sensor Systems Group from 1987 until 1989, Director of the ORNL Center for Engineering Systems Advanced Research (CESAR) from 1989 until 1994, and head of the Intelligent Systems Section from 1989 until 1997. He came to ORNL as visiting scientist in 1981, supported by a Feodor-Lynen Fellowship by the Alexander von Humboldt Foundation in Bonn, Germany. Mann received a Diplom-Mathematiker degree (M.S. in mathematics) in 1977, and a Dr. rer. nat. degree (Ph.D.) in physics in 1980 from the Johannes Gutenberg University in Mainz, Federal Republic of Germany. He is an Adjunct Associate Professor at the University of Tennessee in Knoxville. He is a member of the AAAS, IEEE, and the ACM.
Funders | Funder number |
---|---|
Air Force Office of Scientific Research | |
Alexander von Humboldt-Stiftung |
Keywords
- Features
- Information fusion
- Latent semantic indexing
- Neural networks
- Reference library
- Text classification