TY - GEN
T1 - Motivating complex dependence structures in data mining
T2 - 2009 IEEE International Conference on Data Mining Workshops, ICDMW 2009
AU - Kao, Shih Chieh
AU - Ganguly, Auroop R.
AU - Steinhaeuser, Karsten
PY - 2009
Y1 - 2009
N2 - While data mining aims to identify hidden knowledge from massive and high dimensional datasets, the importance of dependence structure among time, space, and between different variables is less emphasized. Analogous to the use of probability density functions in modeling individual variables, it is now possible to characterize the complete dependence space mathematically through the application of copulas. By adopting copulas, the multivariate joint probability distribution can be constructed without constraint to specific types of marginal distributions. Some common assumptions, like normality and independence between variables, can also be relieved. This study provides fundamental introduction and illustration of dependence structure, aimed at the potential applicability of copulas in general data mining. The case study in hydro-climatic anomaly detection shows that the frequency of multivariate anomalies is affected by the dependence level between variables. The appropriate multivariate thresholds can be determined through a copula-based approach.
AB - While data mining aims to identify hidden knowledge from massive and high dimensional datasets, the importance of dependence structure among time, space, and between different variables is less emphasized. Analogous to the use of probability density functions in modeling individual variables, it is now possible to characterize the complete dependence space mathematically through the application of copulas. By adopting copulas, the multivariate joint probability distribution can be constructed without constraint to specific types of marginal distributions. Some common assumptions, like normality and independence between variables, can also be relieved. This study provides fundamental introduction and illustration of dependence structure, aimed at the potential applicability of copulas in general data mining. The case study in hydro-climatic anomaly detection shows that the frequency of multivariate anomalies is affected by the dependence level between variables. The appropriate multivariate thresholds can be determined through a copula-based approach.
UR - http://www.scopus.com/inward/record.url?scp=77951173971&partnerID=8YFLogxK
U2 - 10.1109/ICDMW.2009.37
DO - 10.1109/ICDMW.2009.37
M3 - Conference contribution
AN - SCOPUS:77951173971
SN - 9780769539027
T3 - ICDM Workshops 2009 - IEEE International Conference on Data Mining
SP - 223
EP - 230
BT - ICDM Workshops 2009 - IEEE International Conference on Data Mining
Y2 - 6 December 2009 through 6 December 2009
ER -