A Mixed-Method Design Approach for Empirically Based Selection of Unbiased Data Annotators

Gautam Thakur, Janna Caspersen, Drahomira Herrmannova, Bryan Eaton, Jordan Burdette

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Implicit bias embedded in the annotated data is by far the greatest impediment in the effectual use of supervised machine learning models in tasks involving race, ethics, and geopolitical polarization. For societal good and demonstrable positive impact on wider society, it is paramount to carefully select data annotators and rigorously validate the annotation process. Current approaches to selecting annotators are not sufficiently grounded in scientific principles and are limited at the policy-guidance level, thereby rendering them unusable for machine learning practitioners. This work proposes a new approach based on the mixed-methods design that is functional, adaptable, and simpler to implement in selecting unbiased annotators for any machine learning problem. By demonstrating it on a real-world geopolitical problem, we also identified and ranked key inane profile characteristics towards an empirically-based selection of unbiased data annotators.

Original languageEnglish
Title of host publicationFindings of the Association for Computational Linguistics
Subtitle of host publicationACL-IJCNLP 2021
EditorsChengqing Zong, Fei Xia, Wenjie Li, Roberto Navigli
PublisherAssociation for Computational Linguistics (ACL)
Pages1930-1938
Number of pages9
ISBN (Electronic)9781954085541
StatePublished - 2021
EventFindings of the Association for Computational Linguistics: ACL-IJCNLP 2021 - Virtual, Online
Duration: Aug 1 2021Aug 6 2021

Publication series

NameFindings of the Association for Computational Linguistics: ACL-IJCNLP 2021

Conference

ConferenceFindings of the Association for Computational Linguistics: ACL-IJCNLP 2021
CityVirtual, Online
Period08/1/2108/6/21

Bibliographical note

Publisher Copyright:
© 2021 Association for Computational Linguistics

Funding

This manuscript has been authored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan). The authors are grateful to RJ Moquito of National Geospatial Intelligence Agency for their support and guidance. The authors also extend their thanks to Budhendra “Budhu” Bhaduri, Amy Rose, Marie Urban, Supriya Chinthavali for allocating necessary resources to complete the research work. This manuscript has been authored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/ downloads/doe-public-access-plan).

FundersFunder number
DOE Public Access Plan
Moquito of National Geospatial Intelligence Agency
United States Government
U.S. Department of Energy

    Fingerprint

    Dive into the research topics of 'A Mixed-Method Design Approach for Empirically Based Selection of Unbiased Data Annotators'. Together they form a unique fingerprint.

    Cite this