Collective principal component analysis from distributed, heterogeneous data

Hillol Kargupta, Weiyun Huang, Krishnamoorthy Sivakumar, Byung Hoon Park, Shuren Wang

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

27 Scopus citations

Abstract

Principal component analysis (PCA) is a statistical technique to identify the dependency structure of multivariate stochastic observations. PCA is frequently used in data mining applications. This paper considers PCA in the context of the emerging network-based computing environments. It offers a technique to perform PCA from distributed and heterogeneous data sets with relatively small communication overhead. The technique is evaluated against different data sets, including a data set for a web mining application. This approach is likely to facilitate the development of distributed clustering, associative link analysis, and other heterogeneous data mining applications that frequently use PCA.

Original languageEnglish
Title of host publicationPrinciples of Data Mining and Knowledge Discovery - 4th European Conference, PKDD 2000, Proceedings
EditorsDjamel A. Zighed, Jan Komorowski, Jan Zytkow
PublisherSpringer Verlag
Pages452-457
Number of pages6
ISBN (Print)9783540410669
DOIs
StatePublished - 2000
Externally publishedYes
Event4th European Conference on Principles and Practice of Knowledge Discovery in Databases, PKDD 2000 - Lyon, France
Duration: Sep 13 2000Sep 16 2000

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume1910
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference4th European Conference on Principles and Practice of Knowledge Discovery in Databases, PKDD 2000
Country/TerritoryFrance
CityLyon
Period09/13/0009/16/00

Funding

The authors thank American Cancer Society for supporting part of this research.

Fingerprint

Dive into the research topics of 'Collective principal component analysis from distributed, heterogeneous data'. Together they form a unique fingerprint.

Cite this