Abstract
A truncated singular value decomposition (SVD) is a powerful tool for analyzing modern datasets. However, the massive volume and rapidly changing nature of the datasets often make it too expensive to compute the SVD of the whole dataset at once. It is more attractive to use only a part of the dataset at a time and incrementally update the SVD. A randomized algorithm has been shown to be a great alternative to a traditional updating algorithm due to its ability to efficiently filter out the noises and extract the relevant features of the dataset. Though it is often faster than the traditional algorithm, in order to extract the relevant features, the randomized algorithm may need to accesses the data multiple times, and this data access creates a significant performance bottleneck. To improve the performance of the randomized algorithm for updating SVD, we study, in this paper, two sampling algorithms that access the data only two or three times, respectively. We present several case studies to show that only a small fraction of the data may be needed to maintain the quality of the updated SVD, while our performance results on a hybrid CPU/GPU computer demonstrate the potential of the sampling algorithms to improve the performance of the randomized algorithm.
Original language | English |
---|---|
Title of host publication | Proceedings - 2017 IEEE International Conference on Big Data, Big Data 2017 |
Editors | Jian-Yun Nie, Zoran Obradovic, Toyotaro Suzumura, Rumi Ghosh, Raghunath Nambiar, Chonggang Wang, Hui Zang, Ricardo Baeza-Yates, Ricardo Baeza-Yates, Xiaohua Hu, Jeremy Kepner, Alfredo Cuzzocrea, Jian Tang, Masashi Toyoda |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 817-826 |
Number of pages | 10 |
ISBN (Electronic) | 9781538627143 |
DOIs | |
State | Published - Jul 1 2017 |
Externally published | Yes |
Event | 5th IEEE International Conference on Big Data, Big Data 2017 - Boston, United States Duration: Dec 11 2017 → Dec 14 2017 |
Publication series
Name | Proceedings - 2017 IEEE International Conference on Big Data, Big Data 2017 |
---|---|
Volume | 2018-January |
Conference
Conference | 5th IEEE International Conference on Big Data, Big Data 2017 |
---|---|
Country/Territory | United States |
City | Boston |
Period | 12/11/17 → 12/14/17 |
Funding
This research was supported in part by the National Science Foundation (NSF) OAC Award number 1708299.
Keywords
- out-of-core
- randomize
- sample
- update SVD