TY - CHAP
T1 - Flocking-based document clustering on the graphics processing unit
AU - St.Charles, Jesse
AU - Potok, Thomas E.
AU - Patton, Robert
AU - Cui, Xiaohui
PY - 2008
Y1 - 2008
N2 - Analyzing and grouping documents by content is a complex problem. One explored method of solving this problem borrows from nature, imitating the flocking behavior of birds. Each bird represents a single document and flies toward other documents that are similar to it. One limitation of this method of document clustering is its complexity O(n2). As the number of documents grows, it becomes increasingly difficult to receive results in a reasonable amount of time. However, flocking behavior, along with many naturally inspired algorithms such as ant colony optimization and particle swarm optimization, are highly parallel and have found increased performance on expensive cluster computers. In the last few years, the graphics processing unit (GPU) has received attention for its ability to solve highlyparallel and semi-parallel problems much faster than the traditional sequential processor. Some applications see a huge increase in performance on this new platform. The cost of these high-performance devices is also marginal when compared with the price of cluster machines. In this paper, we have conducted research to exploit this architecture and apply its strengths to the document flocking problem. Our results highlight the potential benefit the GPU brings to many naturally inspired algorithms. Using the CUDA platform from NIVIDA®, we developed a document flocking implementation to be run on the NIVIDA® GEFORCE 8800. Additionally, we developed a similar but sequential implementation of the same algorithm to be run on a desktop CPU. We tested the performance of each on groups of news articles ranging in size from 200 to 3000 documents. The results of these tests were very significant. Performance gains ranged from three to nearly five times improvement of the GPU over the CPU implementation. Our results also confirm that each implementation is of similar complexity, confirming that gains are from the hardware and not from algorithmic benefits. This improvement in runtime makes the GPU a potentially powerful new platform for document analysis.
AB - Analyzing and grouping documents by content is a complex problem. One explored method of solving this problem borrows from nature, imitating the flocking behavior of birds. Each bird represents a single document and flies toward other documents that are similar to it. One limitation of this method of document clustering is its complexity O(n2). As the number of documents grows, it becomes increasingly difficult to receive results in a reasonable amount of time. However, flocking behavior, along with many naturally inspired algorithms such as ant colony optimization and particle swarm optimization, are highly parallel and have found increased performance on expensive cluster computers. In the last few years, the graphics processing unit (GPU) has received attention for its ability to solve highlyparallel and semi-parallel problems much faster than the traditional sequential processor. Some applications see a huge increase in performance on this new platform. The cost of these high-performance devices is also marginal when compared with the price of cluster machines. In this paper, we have conducted research to exploit this architecture and apply its strengths to the document flocking problem. Our results highlight the potential benefit the GPU brings to many naturally inspired algorithms. Using the CUDA platform from NIVIDA®, we developed a document flocking implementation to be run on the NIVIDA® GEFORCE 8800. Additionally, we developed a similar but sequential implementation of the same algorithm to be run on a desktop CPU. We tested the performance of each on groups of news articles ranging in size from 200 to 3000 documents. The results of these tests were very significant. Performance gains ranged from three to nearly five times improvement of the GPU over the CPU implementation. Our results also confirm that each implementation is of similar complexity, confirming that gains are from the hardware and not from algorithmic benefits. This improvement in runtime makes the GPU a potentially powerful new platform for document analysis.
UR - http://www.scopus.com/inward/record.url?scp=44949199293&partnerID=8YFLogxK
U2 - 10.1007/978-3-540-78987-1_3
DO - 10.1007/978-3-540-78987-1_3
M3 - Chapter
AN - SCOPUS:44949199293
SN - 9783540789864
T3 - Studies in Computational Intelligence
SP - 27
EP - 37
BT - Nature Inspired Cooperative Strategies for Optimization (NICSO 2007)
A2 - Krasnogor, Natalio
A2 - Nicosia, Giuseppe
A2 - Pavone, Mario
A2 - Pelta, David
ER -