Machine learning to improve retrieval by category in big volunteered geodata

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Nowadays, Volunteered Geographic Information (VGI) is commonly used in research and practical applications. However, the quality assurance of such a geographic data remains a problem. In this study we use machine learning and natural language processing to improve record retrieval by category (e.g. restaurant, museum, etc.) from Wikimapia Points of Interest data.We use textual information contained in VGI records to evaluate its ability to determine the category label. The performance of the trained classifier is evaluated on the complete dataset and then is compared with its performance on regional subsets. Preliminary analysis shows significant difference in the classifier performance across the regions. Such geographic differences will have a significant effect on data enrichment efforts such as labeling entities with missing categories.

Original languageEnglish
Title of host publicationProceedings of the 12th Workshop on Geographic Information Retrieval, GIR 2018
EditorsChristopher B. Jones, Ross S. Purves
PublisherAssociation for Computing Machinery, Inc
ISBN (Electronic)9781450360340
DOIs
StatePublished - Nov 6 2018
Event12th Workshop on Geographic Information Retrieval, GIR 2018 - Seattle, United States
Duration: Nov 6 2018 → …

Publication series

NameProceedings of the 12th Workshop on Geographic Information Retrieval, GIR 2018

Conference

Conference12th Workshop on Geographic Information Retrieval, GIR 2018
Country/TerritoryUnited States
CitySeattle
Period11/6/18 → …

Funding

∗Notice: This manuscript has been authored by UT-Battelle, LLC, under contract DE-AC05-00OR22725 with the US Department of Energy (DOE). The US government retains and the publisher, by accepting the article for publication, acknowledges that the US government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for US government purposes. DOE will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http: //energy.gov/downloads/doe-public-access-plan).

FundersFunder number
US Department of Energy
U.S. Department of Energy

    Keywords

    • Crowd-sourcing
    • Machine learning
    • Natural language processing

    Fingerprint

    Dive into the research topics of 'Machine learning to improve retrieval by category in big volunteered geodata'. Together they form a unique fingerprint.

    Cite this