Abstract
Implicit bias embedded in the annotated data is by far the greatest impediment in the effectual use of supervised machine learning models in tasks involving race, ethics, and geopolitical polarization. For societal good and demonstrable positive impact on wider society, it is paramount to carefully select data annotators and rigorously validate the annotation process. Current approaches to selecting annotators are not sufficiently grounded in scientific principles and are limited at the policy-guidance level, thereby rendering them unusable for machine learning practitioners. This work proposes a new approach based on the mixed-methods design that is functional, adaptable, and simpler to implement in selecting unbiased annotators for any machine learning problem. By demonstrating it on a real-world geopolitical problem, we also identified and ranked key inane profile characteristics towards an empirically-based selection of unbiased data annotators.
Original language | English |
---|---|
Title of host publication | Findings of the Association for Computational Linguistics |
Subtitle of host publication | ACL-IJCNLP 2021 |
Editors | Chengqing Zong, Fei Xia, Wenjie Li, Roberto Navigli |
Publisher | Association for Computational Linguistics (ACL) |
Pages | 1930-1938 |
Number of pages | 9 |
ISBN (Electronic) | 9781954085541 |
State | Published - 2021 |
Event | Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 - Virtual, Online Duration: Aug 1 2021 → Aug 6 2021 |
Publication series
Name | Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 |
---|
Conference
Conference | Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 |
---|---|
City | Virtual, Online |
Period | 08/1/21 → 08/6/21 |
Bibliographical note
Publisher Copyright:© 2021 Association for Computational Linguistics
Funding
This manuscript has been authored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan). The authors are grateful to RJ Moquito of National Geospatial Intelligence Agency for their support and guidance. The authors also extend their thanks to Budhendra “Budhu” Bhaduri, Amy Rose, Marie Urban, Supriya Chinthavali for allocating necessary resources to complete the research work. This manuscript has been authored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/ downloads/doe-public-access-plan).
Funders | Funder number |
---|---|
DOE Public Access Plan | |
Moquito of National Geospatial Intelligence Agency | |
United States Government | |
U.S. Department of Energy |