TY - JOUR
T1 - Assessing spatial and attribute errors in large national datasets for population distribution models
T2 - A case study of Philadelphia county schools
AU - Patterson, Lauren
AU - Urban, Marie
AU - Myers, Aaron
AU - Bhaduri, Budhendra
AU - Bright, Eddie
AU - Coleman, Phillip
PY - 2007/6
Y1 - 2007/6
N2 - Geospatial technologies and digital data have developed and disseminated rapidly in conjunction with increasing computing efficiency and Internet availability. The ability to store and transmit large datasets has encouraged the development of national infrastructure datasets in geospatial formats. National datasets are used by numerous agencies for analysis and modeling purposes because these datasets are standardized and considered to be of acceptable accuracy for national scale applications. At Oak Ridge National Laboratory a population model has been developed that incorporates national schools data as one of the model inputs. This paper evaluates spatial and attribute inaccuracies present within two national school datasets, Tele Atlas North America and National Center of Education Statistics (NCES). Schools are an important component of the population model, because they are spatially dense clusters of vulnerable populations. It is therefore essential to validate the quality of school input data. Schools were also chosen since a validated schools dataset was produced in geospatial format for Philadelphia County; thereby enabling a comparison between a local dataset and the national datasets. Analyses found the national datasets are not standardized and incomplete, containing 76 to 90 percent of existing schools. The temporal accuracy of updating annual enrollment values resulted in 89 percent inaccuracy for 2003. Spatial rectification was required for 87 percent of NCES points, of which 58 percent of the errors were attributed to the geocoding process. Lastly, it was found that by combining the two national datasets, the resultant dataset provided a more useful and accurate solution.
AB - Geospatial technologies and digital data have developed and disseminated rapidly in conjunction with increasing computing efficiency and Internet availability. The ability to store and transmit large datasets has encouraged the development of national infrastructure datasets in geospatial formats. National datasets are used by numerous agencies for analysis and modeling purposes because these datasets are standardized and considered to be of acceptable accuracy for national scale applications. At Oak Ridge National Laboratory a population model has been developed that incorporates national schools data as one of the model inputs. This paper evaluates spatial and attribute inaccuracies present within two national school datasets, Tele Atlas North America and National Center of Education Statistics (NCES). Schools are an important component of the population model, because they are spatially dense clusters of vulnerable populations. It is therefore essential to validate the quality of school input data. Schools were also chosen since a validated schools dataset was produced in geospatial format for Philadelphia County; thereby enabling a comparison between a local dataset and the national datasets. Analyses found the national datasets are not standardized and incomplete, containing 76 to 90 percent of existing schools. The temporal accuracy of updating annual enrollment values resulted in 89 percent inaccuracy for 2003. Spatial rectification was required for 87 percent of NCES points, of which 58 percent of the errors were attributed to the geocoding process. Lastly, it was found that by combining the two national datasets, the resultant dataset provided a more useful and accurate solution.
KW - Data integrity
KW - LandScan USA
KW - National dataset
KW - Population distribution model
KW - Quality control
KW - Validation and verification
UR - http://www.scopus.com/inward/record.url?scp=35648946419&partnerID=8YFLogxK
U2 - 10.1007/s10708-007-9099-3
DO - 10.1007/s10708-007-9099-3
M3 - Article
AN - SCOPUS:35648946419
SN - 0343-2521
VL - 69
SP - 93
EP - 102
JO - GeoJournal
JF - GeoJournal
IS - 1-2
ER -