Characterizing large text corpora using a maximum variation sampling genetic algorithm

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    5 Scopus citations

    Abstract

    There exists an enormous amount of information available via the Internet. Much of this data is in the form of text-based documents. These documents cover a variety of topics that are vitally important to the scientific, business, and defense/security communities. Currently, there are a many techniques for processing and analyzing such data. However, the ability to quickly characterize a large set of documents still proves challenging. Previous work has successfully demonstrated the use of a genetic algorithm for providing a representative subset for text documents via adaptive sampling. In this work, we further expand and explore this approach on much larger data sets using a parallel Genetic Algorithm (GA) with adaptive parameter control. Experimental results are presented and discussed.

    Original languageEnglish
    Title of host publicationGECCO 2006 - Genetic and Evolutionary Computation Conference
    PublisherAssociation for Computing Machinery (ACM)
    Pages1877-1878
    Number of pages2
    ISBN (Print)1595931864, 9781595931863
    DOIs
    StatePublished - 2006
    Event8th Annual Genetic and Evolutionary Computation Conference 2006 - Seattle, WA, United States
    Duration: Jul 8 2006Jul 12 2006

    Publication series

    NameGECCO 2006 - Genetic and Evolutionary Computation Conference
    Volume2

    Conference

    Conference8th Annual Genetic and Evolutionary Computation Conference 2006
    Country/TerritoryUnited States
    CitySeattle, WA
    Period07/8/0607/12/06

    Keywords

    • Intelligent agents
    • Parallel genetic algorithm
    • Text analysis

    Fingerprint

    Dive into the research topics of 'Characterizing large text corpora using a maximum variation sampling genetic algorithm'. Together they form a unique fingerprint.

    Cite this