An optimal hierarchical clustering algorithm for gene expression data

Sudip Seal, Srikanth Komarina, Srinivas Aluru

Research output: Contribution to journalArticlepeer-review

16 Scopus citations


Microarrays are used for measuring expression levels of thousands of genes simultaneously. Clustering algorithms are used on gene expression data to find co-regulated genes. An often used clustering strategy is the Pearson correlation coefficient based hierarchical clustering algorithm presented in [Proc. Nat. Acad. Sci. 95 (25) (1998) 14863-14868], which takes O(N 3) time. We note that this run time can be reduced to O(N 2) by applying known hierarchical clustering algorithms [Proc. 9th Annual ACM-SIAM Symposium on Discrete Algorithms, 1998, pp. 619-628] to this problem. In this paper, we present an algorithm which runs in Q(N log N) time using a geometrical reduction and show that it is optimal.

Original languageEnglish
Pages (from-to)143-147
Number of pages5
JournalInformation Processing Letters
Issue number3
StatePublished - Feb 14 2005
Externally publishedYes


  • Algorithms
  • Computational geometry
  • Gene expression
  • Hierarchical clustering
  • Microarrays


Dive into the research topics of 'An optimal hierarchical clustering algorithm for gene expression data'. Together they form a unique fingerprint.

Cite this