TY - GEN
T1 - Sentiment analysis on (Bengali horoscope) corpus
AU - Ghosal, Tirthankar
AU - Das, Sajal K.
AU - Bhattacharjee, Saprativa
N1 - Publisher Copyright:
© 2015 IEEE.
PY - 2016/3/29
Y1 - 2016/3/29
N2 - Sentiment analysis in its simplest form is the classification of a piece of text into positive or negative class based on the polarity of the text. Horoscopes consist of future predictions for each of the twelve zodiac signs and are very popular in India. All major TV channels and newspapers publish their horoscope expert's predictions on a daily basis. These daily horoscopes are well suited for the task of sentiment analysis as they have a high percentage of strong sentiment bearing sentences. This work deals with sentiment analysis of Bengali daily horoscope. A corpus of 6000 sentences is created by crawling through the website of a leading Bengali newspaper's daily horoscope section. Each sentence is annotated with polarity (positive or negative) by a team of three independent annotators. A lexicon of 58 stop words is also created from the frequently occurring words in the corpus. A comparative analysis of five well known classification algorithms namely Naïve Bayes, Support Vector Machines, k-Nearest Neighbours, Decision Tree and Random Forest is done. For each classification algorithm three different input features (unigram, bigram and trigram presence) are experimented with. Stop word removal and feature selection using information gain metric are also used. SVM with all unigram features neither removing stop words nor using information gain metric for feature selection proves to be the best combination producing an accuracy of 98.7%.
AB - Sentiment analysis in its simplest form is the classification of a piece of text into positive or negative class based on the polarity of the text. Horoscopes consist of future predictions for each of the twelve zodiac signs and are very popular in India. All major TV channels and newspapers publish their horoscope expert's predictions on a daily basis. These daily horoscopes are well suited for the task of sentiment analysis as they have a high percentage of strong sentiment bearing sentences. This work deals with sentiment analysis of Bengali daily horoscope. A corpus of 6000 sentences is created by crawling through the website of a leading Bengali newspaper's daily horoscope section. Each sentence is annotated with polarity (positive or negative) by a team of three independent annotators. A lexicon of 58 stop words is also created from the frequently occurring words in the corpus. A comparative analysis of five well known classification algorithms namely Naïve Bayes, Support Vector Machines, k-Nearest Neighbours, Decision Tree and Random Forest is done. For each classification algorithm three different input features (unigram, bigram and trigram presence) are experimented with. Stop word removal and feature selection using information gain metric are also used. SVM with all unigram features neither removing stop words nor using information gain metric for feature selection proves to be the best combination producing an accuracy of 98.7%.
UR - http://www.scopus.com/inward/record.url?scp=84994341234&partnerID=8YFLogxK
U2 - 10.1109/INDICON.2015.7443551
DO - 10.1109/INDICON.2015.7443551
M3 - Conference contribution
AN - SCOPUS:84994341234
T3 - 12th IEEE International Conference Electronics, Energy, Environment, Communication, Computer, Control: (E3-C3), INDICON 2015
BT - 12th IEEE International Conference Electronics, Energy, Environment, Communication, Computer, Control
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 12th IEEE International Conference Electronics, Energy, Environment, Communication, Computer, Control, INDICON 2015
Y2 - 17 December 2015 through 20 December 2015
ER -