Automatic Categorization of Social Sensor Data

Olivera Kotevska, Sarala Padi, Ahmed Lbath

Research output: Contribution to journalConference articlepeer-review

1 Scopus citations

Abstract

Today, there is a huge impact on generation of data in everyday life due to micro blogging sites like Twitter, Facebook, and other social networking web sites. The valuable data that is broadcast through micro blogging can provide useful information to different situations if captured and analyzed properly in timely manner. When it comes to Smart City, automatically identifying messages communicated via Twitter can contribute to situation awareness about the city, and it also brings out a lot of beneficial information for people who seek information about the city. This paper addresses processing and automatic categorization of micro blogging data; in particular Twitter data, using Natural Language Processing (NLP) techniques together with Random Forest classifier. As processing of twitter messages is a challenging task, we propose an algorithm to automatically preprocess the twitter messages. For this, we collected Twitter messages for sixteen different categories from one geo-location. We used proposed algorithm to prepro- cess the twitter messages and using Random Forest classifier these tweets are automatically categorized into predefined categories. It is shown that Random Forest classifier outperformed Support Vector Machines (SVM) and Naive Bayes classifiers.

Funding

This work was supported by the National Institute of Standards and Technologies (NIST), and conducted within a collaboration under Information Technology Laboratory, Advanced Network Technologies Division (ANTD) and University of Grenoble. Our special thanks to Dr. Abdella Battou, ANTD division chief for his support an advises.

FundersFunder number
University of Grenoble
National Institute of Standards and Technology

    Keywords

    • Automatic Categorization of Twitterdata
    • Micro blogging data
    • Random Forest classifier
    • Smart Cities
    • Twitter Data Analysis

    Fingerprint

    Dive into the research topics of 'Automatic Categorization of Social Sensor Data'. Together they form a unique fingerprint.

    Cite this