Abstract
Today, there is a huge impact on generation of data in everyday life due to micro blogging sites like Twitter, Facebook, and other social networking web sites. The valuable data that is broadcast through micro blogging can provide useful information to different situations if captured and analyzed properly in timely manner. When it comes to Smart City, automatically identifying messages communicated via Twitter can contribute to situation awareness about the city, and it also brings out a lot of beneficial information for people who seek information about the city. This paper addresses processing and automatic categorization of micro blogging data; in particular Twitter data, using Natural Language Processing (NLP) techniques together with Random Forest classifier. As processing of twitter messages is a challenging task, we propose an algorithm to automatically preprocess the twitter messages. For this, we collected Twitter messages for sixteen different categories from one geo-location. We used proposed algorithm to prepro- cess the twitter messages and using Random Forest classifier these tweets are automatically categorized into predefined categories. It is shown that Random Forest classifier outperformed Support Vector Machines (SVM) and Naive Bayes classifiers.
Original language | English |
---|---|
Pages (from-to) | 596-603 |
Number of pages | 8 |
Journal | Procedia Computer Science |
Volume | 58 |
DOIs | |
State | Published - 2016 |
Externally published | Yes |
Event | 7th International Conference on Emerging Ubiquitous Systems and Pervasive Networks, EUSPN 2016 / The 6th International Conference on Current and Future Trends of Information and Communication Technologies in Healthcare, ICTH-2016 / Affiliated Workshops, 2016 - London, United Kingdom Duration: Sep 19 2016 → Sep 22 2016 |
Funding
This work was supported by the National Institute of Standards and Technologies (NIST), and conducted within a collaboration under Information Technology Laboratory, Advanced Network Technologies Division (ANTD) and University of Grenoble. Our special thanks to Dr. Abdella Battou, ANTD division chief for his support an advises.
Funders | Funder number |
---|---|
University of Grenoble | |
National Institute of Standards and Technology |
Keywords
- Automatic Categorization of Twitterdata
- Micro blogging data
- Random Forest classifier
- Smart Cities
- Twitter Data Analysis