Sentiment analysis on (Bengali horoscope) corpus

Tirthankar Ghosal, Sajal K. Das, Saprativa Bhattacharjee

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

21 Scopus citations

Abstract

Sentiment analysis in its simplest form is the classification of a piece of text into positive or negative class based on the polarity of the text. Horoscopes consist of future predictions for each of the twelve zodiac signs and are very popular in India. All major TV channels and newspapers publish their horoscope expert's predictions on a daily basis. These daily horoscopes are well suited for the task of sentiment analysis as they have a high percentage of strong sentiment bearing sentences. This work deals with sentiment analysis of Bengali daily horoscope. A corpus of 6000 sentences is created by crawling through the website of a leading Bengali newspaper's daily horoscope section. Each sentence is annotated with polarity (positive or negative) by a team of three independent annotators. A lexicon of 58 stop words is also created from the frequently occurring words in the corpus. A comparative analysis of five well known classification algorithms namely Naïve Bayes, Support Vector Machines, k-Nearest Neighbours, Decision Tree and Random Forest is done. For each classification algorithm three different input features (unigram, bigram and trigram presence) are experimented with. Stop word removal and feature selection using information gain metric are also used. SVM with all unigram features neither removing stop words nor using information gain metric for feature selection proves to be the best combination producing an accuracy of 98.7%.

Original languageEnglish
Title of host publication12th IEEE International Conference Electronics, Energy, Environment, Communication, Computer, Control
Subtitle of host publication(E3-C3), INDICON 2015
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781467373999
DOIs
StatePublished - Mar 29 2016
Externally publishedYes
Event12th IEEE International Conference Electronics, Energy, Environment, Communication, Computer, Control, INDICON 2015 - New Delhi, India
Duration: Dec 17 2015Dec 20 2015

Publication series

Name12th IEEE International Conference Electronics, Energy, Environment, Communication, Computer, Control: (E3-C3), INDICON 2015

Conference

Conference12th IEEE International Conference Electronics, Energy, Environment, Communication, Computer, Control, INDICON 2015
Country/TerritoryIndia
CityNew Delhi
Period12/17/1512/20/15

Fingerprint

Dive into the research topics of 'Sentiment analysis on (Bengali horoscope) corpus'. Together they form a unique fingerprint.

Cite this