High performance dimension reduction and visualization for large high-dimensional data analysis

Jong Youl Choi, Seung Hee Bae, Xiaohong Qiu, Geoffrey Fox

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

24 Scopus citations

Abstract

Large high dimension datasets are of growing importance in many fields and it is important to be able to visualize them for understanding the results of data mining approaches or just for browsing them in a way that distance between points in visualization (2D or 3D) space tracks that in original high dimensional space. Dimension reduction is a well understood approach but can be very time and memory intensive for large problems. Here we report on parallel algorithms for Scaling by MAjorizing a COmplicated Function (SMACOF) to solve Multidimensional Scaling problem and Generative Topographic Mapping (GTM). The former is particularly time consuming with complexity that grows as square of data set size but has advantage that it does not require explicit vectors for dataset points but just measurement of inter-point dissimilarities. We compare SMACOF and GTM on a subset of the NIH PubChem database which has binary vectors of length 166 bits. We find good parallel performance for both GTM and SMACOF and strong correlation between the dimension-reduced PubChem data from these two methods.

Original languageEnglish
Title of host publicationCCGrid 2010 - 10th IEEE/ACM International Conference on Cluster, Cloud, and Grid Computing
Pages331-340
Number of pages10
DOIs
StatePublished - 2010
Externally publishedYes
Event10th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2010 - Melbourne, VIC, Australia
Duration: May 17 2010May 20 2010

Publication series

NameCCGrid 2010 - 10th IEEE/ACM International Conference on Cluster, Cloud, and Grid Computing

Conference

Conference10th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2010
Country/TerritoryAustralia
CityMelbourne, VIC
Period05/17/1005/20/10

Fingerprint

Dive into the research topics of 'High performance dimension reduction and visualization for large high-dimensional data analysis'. Together they form a unique fingerprint.

Cite this