Abstract
Machine-derived sentiment analysis has become a pervasive and useful tool to address a wide array of issues in natural language processing. Leading technology companies such as Google now provide sentiment analysis tools (SATs) as readily accessible online products. Academic researchers develop and make available SATs to support the research enterprise. One of the major challenges with SATs is the inconsistencies in results among the various SATs. Consequently, the selection of a SAT for a specific purpose may significantly impact the application. This study addresses the foregoing problem by utilizing structural equation modeling to merge the outputs of SATs to develop a combined sentiment metric without the need for a labeled training dataset. This method is applicable to a wide range of text-based problems, is data-driven, and replicable. It was tested using three publicly available datasets and compared against seven different SATs. The results indicate that as a continous measure, the proposed method outperformed other SATs in the movie reviews and SemEval datasets, and achieved a tie for first place with IBM Watson on the Sentiment 140 dataset. Also, compared to the published major alternatives, the arithmetic mean solution, this approach performed better across these three datasets.
Original language | English |
---|---|
Article number | 3 |
Journal | Journal of Computational Social Science |
Volume | 8 |
Issue number | 1 |
DOIs | |
State | Published - Feb 2025 |
Funding
This research is not supported under any grant but by the institutional resources of Oak Ridge National Laboratory and Mississippi State University.This material is based upon Cindy Bethel\u2019s work supported while serving at the National Science Foundation. Any opinion, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.This manuscript has been authored by UT-Battelle, LLC, under contract DE-AC05-00OR22725 with the US Department of Energy (DOE).The US government retains and the publisher, by accepting the article for publication, acknowledges that the US government retains a non exclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for US government purposes. DOE will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (https://energy.gov/downloads/doe-public-access-plan).
Keywords
- Amazon comprehend
- Combining sentiment analysis tools
- Google nlp
- Ibm watson
- Natural language processing
- Sentiment 140
- Sentiment analysis
- Sentiment tools
- Stanford corenlp
- Structural equation modeling
- Text blob
- Text processing
- Vader