Classification of Hate, Offensive and Profane content from Tweets using an Ensemble of Deep Contextualized and Domain Specific Representations

Basavraj Chinagundi, Muskaan Singh, Tirthankar Ghosal, Prashant Singh Rana, Guneet Singh Kohli

Research output: Contribution to journalConference articlepeer-review

Abstract

The explosive growth of social media has also resulted in unfortunate emergence of hate, offensive, and profane content on the web. A certain conversational thread can contain hate, offensive, and profane content, which is not apparent from a standalone or single tweet or replies but can be identified if given the context of the parent content. Such social media content is spread in many different languages, including code-mixed languages like hinglish (English code-mixed with Hindi). So it becomes a huge responsibility for the social media sites to identify such hate content before it gets disseminated to the general population, which may trigger havoc. The hate speech and offensive content identification track (HASOC)[1] in FIRE 2021 English Subtask A track provides a forum and a data challenge for multilingual research on the identification of such problematic content. In this paper, we describe our submission for the above track. Our proposed approach uses a transformer-based embedding with HateBERT and achieves the Macro F1 score of 79% on the test data, which is 3.96% behind the best-performing system. We make our system run available at https://github.com/basavraj-chinagundi/HASOC_2021.

Original languageEnglish
Pages (from-to)491-500
Number of pages10
JournalCEUR Workshop Proceedings
Volume3159
StatePublished - 2021
Externally publishedYes
EventWorking Notes of FIRE - 13th Forum for Information Retrieval Evaluation, FIRE-WN 2021 - Gandhinagar, India
Duration: Dec 13 2021Dec 17 2021

Keywords

  • HateBERT
  • Profane Content
  • Text Classification
  • hate Speech

Fingerprint

Dive into the research topics of 'Classification of Hate, Offensive and Profane content from Tweets using an Ensemble of Deep Contextualized and Domain Specific Representations'. Together they form a unique fingerprint.

Cite this