TY - GEN
T1 - A Mixed-Method Design Approach for Empirically Based Selection of Unbiased Data Annotators
AU - Thakur, Gautam
AU - Caspersen, Janna
AU - Herrmannova, Drahomira
AU - Eaton, Bryan
AU - Burdette, Jordan
N1 - Publisher Copyright:
© 2021 Association for Computational Linguistics
PY - 2021
Y1 - 2021
N2 - Implicit bias embedded in the annotated data is by far the greatest impediment in the effectual use of supervised machine learning models in tasks involving race, ethics, and geopolitical polarization. For societal good and demonstrable positive impact on wider society, it is paramount to carefully select data annotators and rigorously validate the annotation process. Current approaches to selecting annotators are not sufficiently grounded in scientific principles and are limited at the policy-guidance level, thereby rendering them unusable for machine learning practitioners. This work proposes a new approach based on the mixed-methods design that is functional, adaptable, and simpler to implement in selecting unbiased annotators for any machine learning problem. By demonstrating it on a real-world geopolitical problem, we also identified and ranked key inane profile characteristics towards an empirically-based selection of unbiased data annotators.
AB - Implicit bias embedded in the annotated data is by far the greatest impediment in the effectual use of supervised machine learning models in tasks involving race, ethics, and geopolitical polarization. For societal good and demonstrable positive impact on wider society, it is paramount to carefully select data annotators and rigorously validate the annotation process. Current approaches to selecting annotators are not sufficiently grounded in scientific principles and are limited at the policy-guidance level, thereby rendering them unusable for machine learning practitioners. This work proposes a new approach based on the mixed-methods design that is functional, adaptable, and simpler to implement in selecting unbiased annotators for any machine learning problem. By demonstrating it on a real-world geopolitical problem, we also identified and ranked key inane profile characteristics towards an empirically-based selection of unbiased data annotators.
UR - http://www.scopus.com/inward/record.url?scp=85123956422&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85123956422
T3 - Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021
SP - 1930
EP - 1938
BT - Findings of the Association for Computational Linguistics
A2 - Zong, Chengqing
A2 - Xia, Fei
A2 - Li, Wenjie
A2 - Navigli, Roberto
PB - Association for Computational Linguistics (ACL)
T2 - Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021
Y2 - 1 August 2021 through 6 August 2021
ER -