TY - GEN
T1 - Fast rule mining over multi-dimensional windows
AU - Das, Mahashweta
AU - Deepak, P.
AU - Deshpande, Prasad M.
AU - Kannan, Ramakrishnan
PY - 2011
Y1 - 2011
N2 - Association rule mining is an indispensable tool for discovering insights from large databases and data warehouses. The data in a warehouse being multi-dimensional, it is often useful to mine rules over subsets of data defined by selections over the dimensions. Such interactive rule mining over multi-dimensional query windows is difficult since rule mining is computationally expensive. Current methods using pre-computation of frequent itemsets require counting of some itemsets by revisiting the transaction database at query time, which is very expensive. We develop a method (RMW) that identifies the minimal set of itemsets to compute and store for each cell, so that rule mining over any query window may be performed without going back to the transaction database. We give formal proofs that the set of itemsets chosen by RMW is sufficient to answer any query and also prove that it is the optimal set to be computed for 1 dimensional queries. We demonstrate through an extensive empirical evaluation that RMW achieves extremely fast query response time compared to existing methods, with only moderate overhead in pre-computation and storage.
AB - Association rule mining is an indispensable tool for discovering insights from large databases and data warehouses. The data in a warehouse being multi-dimensional, it is often useful to mine rules over subsets of data defined by selections over the dimensions. Such interactive rule mining over multi-dimensional query windows is difficult since rule mining is computationally expensive. Current methods using pre-computation of frequent itemsets require counting of some itemsets by revisiting the transaction database at query time, which is very expensive. We develop a method (RMW) that identifies the minimal set of itemsets to compute and store for each cell, so that rule mining over any query window may be performed without going back to the transaction database. We give formal proofs that the set of itemsets chosen by RMW is sufficient to answer any query and also prove that it is the optimal set to be computed for 1 dimensional queries. We demonstrate through an extensive empirical evaluation that RMW achieves extremely fast query response time compared to existing methods, with only moderate overhead in pre-computation and storage.
UR - http://www.scopus.com/inward/record.url?scp=84880098686&partnerID=8YFLogxK
U2 - 10.1137/1.9781611972818.50
DO - 10.1137/1.9781611972818.50
M3 - Conference contribution
AN - SCOPUS:84880098686
SN - 9780898719925
T3 - Proceedings of the 11th SIAM International Conference on Data Mining, SDM 2011
SP - 582
EP - 593
BT - Proceedings of the 11th SIAM International Conference on Data Mining, SDM 2011
PB - Society for Industrial and Applied Mathematics Publications
T2 - 11th SIAM International Conference on Data Mining, SDM 2011
Y2 - 28 April 2011 through 30 April 2011
ER -