TY - GEN
T1 - Rapid and robust ranking of text documents in a dynamically changing corpus
AU - Park, Byung Hoon
AU - Samatova, Nagiza F.
AU - Munavalli, Rajesh
AU - Krishnamurthy, Ramya
AU - Kettani, Houssain
AU - Geist, Al
PY - 2008
Y1 - 2008
N2 - Ranking documents in a selected corpus plays an important role in information retrieval systems. Despite notable advances in this direction, with continuously accumulating text documents, maintaining up-to-date ordering among documents in the domains of interest is a challenging task. Conventional approaches can produce an ordering that is only valid within a given corpus. Thus, with such approaches, ordering should be completely redone as documents are added to or deleted from the corpus. In this paper, we introduce a corpusindependent framework for rapid ordering of documents in a dynamically changing corpus. Like in many practical approaches, our framework suggests utilizing a similarity measure in some metric space indicating the degree of relevance of a document to the domain of interest. However, unlike in corpus-dependent approaches, the relevance score of a document remains valid with changes being introduced into the corpus (insertion of new documents, for example), thus allowing a rapid ordering within the corpus. This paper particularly details a statistical approach to compute such relevance scores.
AB - Ranking documents in a selected corpus plays an important role in information retrieval systems. Despite notable advances in this direction, with continuously accumulating text documents, maintaining up-to-date ordering among documents in the domains of interest is a challenging task. Conventional approaches can produce an ordering that is only valid within a given corpus. Thus, with such approaches, ordering should be completely redone as documents are added to or deleted from the corpus. In this paper, we introduce a corpusindependent framework for rapid ordering of documents in a dynamically changing corpus. Like in many practical approaches, our framework suggests utilizing a similarity measure in some metric space indicating the degree of relevance of a document to the domain of interest. However, unlike in corpus-dependent approaches, the relevance score of a document remains valid with changes being introduced into the corpus (insertion of new documents, for example), thus allowing a rapid ordering within the corpus. This paper particularly details a statistical approach to compute such relevance scores.
UR - http://www.scopus.com/inward/record.url?scp=50049130256&partnerID=8YFLogxK
U2 - 10.1109/AICCSA.2008.4493529
DO - 10.1109/AICCSA.2008.4493529
M3 - Conference contribution
AN - SCOPUS:50049130256
SN - 9781424419685
T3 - AICCSA 08 - 6th IEEE/ACS International Conference on Computer Systems and Applications
SP - 149
EP - 155
BT - AICCSA 08 - 6th IEEE/ACS International Conference on Computer Systems and Applications
T2 - s6th IEEE/ACS International Conference on Computer Systems and Applications, AICCSA 2008
Y2 - 31 March 2008 through 4 April 2008
ER -