Abstract
Social animals or insects in nature often exhibit a form of emergent collective behavior known as flocking. In this paper, we present a novel Flocking based approach for document clustering analysis. Our Flocking clustering algorithm uses stochastic and heuristic principles discovered from observing bird flocks or fish schools. Unlike other partition clustering algorithm such as K-means, the Flocking based algorithm does not require initial partitional seeds. The algorithm generates a clustering of a given set of data through the embedding of the high-dimensional data items on a two-dimensional grid for easy clustering result retrieval and visualization. Inspired by the self-organized behavior of bird flocks, we represent each document object with a flock boid. The simple local rules followed by each flock boid result in the entire document flock generating complex global behaviors, which eventually result in a clustering of the documents. We evaluate the efficiency of our algorithm with both a synthetic dataset and a real document collection that includes 100 news articles collected from the Internet. Our results show that the Flocking clustering algorithm achieves better performance compared to the K-means and the Ant clustering algorithm for real document clustering.
Original language | English |
---|---|
Pages (from-to) | 505-515 |
Number of pages | 11 |
Journal | Journal of Systems Architecture |
Volume | 52 |
Issue number | 8-9 |
DOIs | |
State | Published - Aug 2006 |
Funding
Oak Ridge National Laboratory is managed by UT-Battelle LLC for the US Department of Energy under contract number DE-AC05_00OR22725. Xiaohui Cui received his M.S. degree in computer science from Wuhan University, China in July 2000, and his Ph.D. degree in Computer Science and Engineering from University of Louisville, USA in November 2004. He is currently a Postdoctoral Research Associate at the Applied Software Engineering Research Group in Oak Ridge National Laboratory. His research interests include Collective Intelligence of multi-agent system, Swarm Modeling, Data Mining and Knowledge Discovering, distributed computing, and sensor network. He is a member of IEEE computer society. Jinzhu Gao received her B.S. and M.S. degree from Huazhong University of Science and Technology, China, and her Ph.D. degree in Computer Science and Engineering from The Ohio State University, USA in June 2004. She is currently a Postdoctoral Research Associate at the Network and Cluster Computing Group in Oak Ridge National Laboratory. Her research interests include scientific visualization, parallel and distributed computing, and computer graphics. She is a member of ACM and IEEE computer society. Thomas E. Potok is the leader of the Applied Software Engineering Research Group at the Oak Ridge National Laboratory, where he manages a staff of 14 researchers. He is the principal investigator on a number intelligent software agent research projects. Prior to this he worked for 14 years at IBM’s Software Solutions Laboratory in Research Triangle Park, North Carolina. Dr. Potok has a BS, MS, and Ph.D. in Computer Engineering all from North Carolina State University. He is an adjunct faculty member at the University of Tennessee, and a member of the ACM and IEEE Computer Society. He has authored numerous publications, has filed five software patents, and organized several workshops.
Funders | Funder number |
---|---|
US Department of Energy | DE-AC05_00OR22725 |
UT-Battelle LLC | |
Oak Ridge National Laboratory |
Keywords
- Agent
- Bio-inspired
- Document clustering
- F-measure
- Flocking model