TY - GEN
T1 - A MapReduce approach to Gi*(d) spatial statistic
AU - Liu, Yan
AU - Wu, Kaichao
AU - Wang, Shaowen
AU - Zhao, Yanli
AU - Huang, Qian
PY - 2010
Y1 - 2010
N2 - Managing and analyzing massive spatial datasets as supported by GIS and spatial analysis is becoming crucial to geospatial problem-solving and decision-making. MapReduce provides a data-centric computational model through which highly scalable spatial analysis computation can be achieved. However, it is challenging to leverage multi-dimensional spatial characteristics on the horizontally-partitioned and transparently managed MapReduce data system for improving the computational performance of spatial analysis. This paper tackles this challenge through the development of MapReduce-based computation of G i*(d) - a spatial statistic for detecting local clustering. Without exploiting spatial characteristics, Gi* (d) computation for a particular location requires pair-wise distance calculation for all points of a given dataset. A spatial locality-based storage and indexing strategy is developed to associate spatial locality with storage locality on MapReduce platform. Based on a spatial indexing method, unnecessary map tasks can be eliminated for a MapReduce job, thus significantly improving the overall computation performance. To leverage underlying parallelism on storage nodes, an application-level load balancing mechanism is developed to produce even loads among map tasks based on adaptive spatial domain decomposition. Experiments show the effectiveness of the developed storage and indexing strategy with different distance parameter settings. Significant reduction on execution time for all-point computation is observed through the use of the application-level load balancing mechanism.
AB - Managing and analyzing massive spatial datasets as supported by GIS and spatial analysis is becoming crucial to geospatial problem-solving and decision-making. MapReduce provides a data-centric computational model through which highly scalable spatial analysis computation can be achieved. However, it is challenging to leverage multi-dimensional spatial characteristics on the horizontally-partitioned and transparently managed MapReduce data system for improving the computational performance of spatial analysis. This paper tackles this challenge through the development of MapReduce-based computation of G i*(d) - a spatial statistic for detecting local clustering. Without exploiting spatial characteristics, Gi* (d) computation for a particular location requires pair-wise distance calculation for all points of a given dataset. A spatial locality-based storage and indexing strategy is developed to associate spatial locality with storage locality on MapReduce platform. Based on a spatial indexing method, unnecessary map tasks can be eliminated for a MapReduce job, thus significantly improving the overall computation performance. To leverage underlying parallelism on storage nodes, an application-level load balancing mechanism is developed to produce even loads among map tasks based on adaptive spatial domain decomposition. Experiments show the effectiveness of the developed storage and indexing strategy with different distance parameter settings. Significant reduction on execution time for all-point computation is observed through the use of the application-level load balancing mechanism.
KW - Cloud computing
KW - Data-centric computing
KW - Spatial statistics
UR - http://www.scopus.com/inward/record.url?scp=78650889463&partnerID=8YFLogxK
U2 - 10.1145/1869692.1869695
DO - 10.1145/1869692.1869695
M3 - Conference contribution
AN - SCOPUS:78650889463
SN - 9781450304320
T3 - Proceedings of the ACM SIGSPATIAL International Workshop on High Performance and Distributed Geographic Information Systems, ACM SIGSPATIAL HPDGIS 2010
SP - 11
EP - 18
BT - Proceedings of the ACM SIGSPATIAL International Workshop on High Performance and Distributed Geographic Information Systems, ACM SIGSPATIAL HPDGIS 2010
T2 - 18th ACM SIGSPATIAL International Conference on Advances in Geographic Information System, ACM SIGSPATIAL HPDGIS 2010
Y2 - 2 November 2010 through 2 November 2010
ER -