Rapid and robust ranking of text documents in a dynamically changing corpus

Byung Hoon Park, Nagiza F. Samatova, Rajesh Munavalli, Ramya Krishnamurthy, Houssain Kettani, Al Geist

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Ranking documents in a selected corpus plays an important role in information retrieval systems. Despite notable advances in this direction, with continuously accumulating text documents, maintaining up-to-date ordering among documents in the domains of interest is a challenging task. Conventional approaches can produce an ordering that is only valid within a given corpus. Thus, with such approaches, ordering should be completely redone as documents are added to or deleted from the corpus. In this paper, we introduce a corpusindependent framework for rapid ordering of documents in a dynamically changing corpus. Like in many practical approaches, our framework suggests utilizing a similarity measure in some metric space indicating the degree of relevance of a document to the domain of interest. However, unlike in corpus-dependent approaches, the relevance score of a document remains valid with changes being introduced into the corpus (insertion of new documents, for example), thus allowing a rapid ordering within the corpus. This paper particularly details a statistical approach to compute such relevance scores.

Original languageEnglish
Title of host publicationAICCSA 08 - 6th IEEE/ACS International Conference on Computer Systems and Applications
Pages149-155
Number of pages7
DOIs
StatePublished - 2008
Events6th IEEE/ACS International Conference on Computer Systems and Applications, AICCSA 2008 - Doha, Qatar
Duration: Mar 31 2008Apr 4 2008

Publication series

NameAICCSA 08 - 6th IEEE/ACS International Conference on Computer Systems and Applications

Conference

Conferences6th IEEE/ACS International Conference on Computer Systems and Applications, AICCSA 2008
Country/TerritoryQatar
CityDoha
Period03/31/0804/4/08

Fingerprint

Dive into the research topics of 'Rapid and robust ranking of text documents in a dynamically changing corpus'. Together they form a unique fingerprint.

Cite this