Unsupervised Identification of Study Descriptors in Toxicology Research: An Experimental Study

Drahomira Herrmannova, Steven R. Young, Robert M. Patton, Christopher G. Stahl, Nicole C. Kleinstreuer, Mary S. Wolfe

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

Identifying and extracting data elements such as study descriptors in publication full texts is a critical yet manual and labor-intensive step required in a number of tasks. In this paper we address the question of identifying data elements in an unsupervised manner. Specifically, provided a set of criteria describing specific study parameters, such as species, route of administration, and dosing regimen, we develop an unsupervised approach to identify text segments (sentences) relevant to the criteria. A binary classifier trained to identify publications that met the criteria performs better when trained on the candidate sentences than when trained on sentences randomly picked from the text, supporting the intuition that our method is able to accurately identify study descriptors.

Original languageEnglish
Title of host publicationEMNLP 2018 - 9th International Workshop on Health Text Mining and Information Analysis, LOUHI 2018 - Proceedings of the Workshop
PublisherAssociation for Computational Linguistics (ACL)
Pages71-82
Number of pages12
ISBN (Electronic)9781948087742
StatePublished - 2018
Event9th International Workshop on Health Text Mining and Information Analysis, LOUHI 2018, co-located with EMNLP 2018 - Brussels, Belgium
Duration: Oct 31 2018 → …

Publication series

NameEMNLP 2018 - 9th International Workshop on Health Text Mining and Information Analysis, LOUHI 2018 - Proceedings of the Workshop

Conference

Conference9th International Workshop on Health Text Mining and Information Analysis, LOUHI 2018, co-located with EMNLP 2018
Country/TerritoryBelgium
CityBrussels
Period10/31/18 → …

Bibliographical note

Publisher Copyright:
© 2018 Association for Computational Linguistics.

Funding

Support for this research was provided by a grant from the National Institute of Environmental Health Sciences (AES 16002-001), National Institutes of Health to Oak Ridge National Laboratory. This research was supported in part by an appointment to the Oak Ridge National Laboratory ASTRO Program, sponsored by the U.S. Department of Energy and administered by the Oak Ridge Institute for Science and Education. This manuscript has been authored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan1.

FundersFunder number
U.S. Department of Energy
National Institute of Environmental Health SciencesAES 16002-001
Oak Ridge Institute for Science and Education

    Fingerprint

    Dive into the research topics of 'Unsupervised Identification of Study Descriptors in Toxicology Research: An Experimental Study'. Together they form a unique fingerprint.

    Cite this