A MapReduce approach to Gi*(d) spatial statistic

Yan Liu, Kaichao Wu, Shaowen Wang, Yanli Zhao, Qian Huang

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

9 Scopus citations

Abstract

Managing and analyzing massive spatial datasets as supported by GIS and spatial analysis is becoming crucial to geospatial problem-solving and decision-making. MapReduce provides a data-centric computational model through which highly scalable spatial analysis computation can be achieved. However, it is challenging to leverage multi-dimensional spatial characteristics on the horizontally-partitioned and transparently managed MapReduce data system for improving the computational performance of spatial analysis. This paper tackles this challenge through the development of MapReduce-based computation of G i*(d) - a spatial statistic for detecting local clustering. Without exploiting spatial characteristics, Gi* (d) computation for a particular location requires pair-wise distance calculation for all points of a given dataset. A spatial locality-based storage and indexing strategy is developed to associate spatial locality with storage locality on MapReduce platform. Based on a spatial indexing method, unnecessary map tasks can be eliminated for a MapReduce job, thus significantly improving the overall computation performance. To leverage underlying parallelism on storage nodes, an application-level load balancing mechanism is developed to produce even loads among map tasks based on adaptive spatial domain decomposition. Experiments show the effectiveness of the developed storage and indexing strategy with different distance parameter settings. Significant reduction on execution time for all-point computation is observed through the use of the application-level load balancing mechanism.

Original languageEnglish
Title of host publicationProceedings of the ACM SIGSPATIAL International Workshop on High Performance and Distributed Geographic Information Systems, ACM SIGSPATIAL HPDGIS 2010
Pages11-18
Number of pages8
DOIs
StatePublished - 2010
Externally publishedYes
Event18th ACM SIGSPATIAL International Conference on Advances in Geographic Information System, ACM SIGSPATIAL HPDGIS 2010 - San Jose, CA, United States
Duration: Nov 2 2010Nov 2 2010

Publication series

NameProceedings of the ACM SIGSPATIAL International Workshop on High Performance and Distributed Geographic Information Systems, ACM SIGSPATIAL HPDGIS 2010

Conference

Conference18th ACM SIGSPATIAL International Conference on Advances in Geographic Information System, ACM SIGSPATIAL HPDGIS 2010
Country/TerritoryUnited States
CitySan Jose, CA
Period11/2/1011/2/10

Keywords

  • Cloud computing
  • Data-centric computing
  • Spatial statistics

Fingerprint

Dive into the research topics of 'A MapReduce approach to Gi*(d) spatial statistic'. Together they form a unique fingerprint.

Cite this