Browsing large-scale cheminformatics data with dimension reduction

Jong Youl Choi, Seung Hee Bae, Judy Qiu, Bin Chen, David Wild

Research output: Contribution to journalArticlepeer-review

2 Scopus citations

Abstract

Visualization of large-scale high dimensional data is highly valuable for data analysis facilitating scientific discovery in many fields. We present PubChemBrowse, a customized visualization tool for cheminformatics research. It provides a novel 3D data point browser that displays complex properties of massive data on commodity clients. As in Geographic Information System browsers for Earth and Environment data, chemical compounds with similar properties are nearby in the browser. PubChemBrowse is built around in-house high performance parallel Multi-dimensional scaling and Generative topographic mapping services and supports fast interaction with an external property database. These properties can be overlaid on 3D mapped compound space or queried for individual points. We prototype the integration with Chem2Bio2RDF system using SPARQL endpoint to access over 20 publicly accessible bioinformatics databases. We describe our design and implementation of the integrated PubChemBrowse application and outline its use in drug discovery. The same core technologies are generally applicable to develop high performance scientific data browsing systems for other applications.

Original languageEnglish
Pages (from-to)2315-2325
Number of pages11
JournalConcurrency and Computation: Practice and Experience
Volume23
Issue number17
DOIs
StatePublished - Dec 10 2011
Externally publishedYes

Keywords

  • GTM
  • MDS
  • interpolation
  • semantic web
  • visualization

Fingerprint

Dive into the research topics of 'Browsing large-scale cheminformatics data with dimension reduction'. Together they form a unique fingerprint.

Cite this