Abstract
The process of deciphering, extracting, and compiling information from texts dense with domain-specific terminology and technical jargon is a challenging endeavor. It demands considerable expertise and deep knowledge in the respective field, resulting in a labor-intensive process when executed by humans. Furthermore, the task of identifying multiple class labels in extensive texts presents a challenge due to intra- and inter-reader variability, making the process time-consuming and costly.We're introducing a user-friendly graphical interface, fortified with a BERT model-powered decision support system. This advanced system aims to augment efficiency, curtail data collection time, and sustain high precision in data acquisition. It is instrumental in deciphering and synthesizing intricate texts teeming with a spectrum of expressions, even within similar mitigation categories. Such tasks traditionally demand substantial human effort and specialized knowledge in the domain.Our system is specifically engineered for the task of extracting environmental mitigation information to promote sustainable hydropower development from licenses issued by the Federal Energy Regulatory Commission (FERC). These license documents are comprehensive, each containing over 15,000 words and requiring the identification of 135 different class labels. We anticipate that our system will boost reading speed, improve the consistency of classification outputs among readers, and contribute to the development of a robust scientific database of environmental mitigations associated with the 2,000+ non-federal hydropower facilities licensed by FERC in the United States.
Original language | English |
---|---|
Title of host publication | Proceedings - 2024 IEEE International Conference on Big Data, BigData 2024 |
Editors | Wei Ding, Chang-Tien Lu, Fusheng Wang, Liping Di, Kesheng Wu, Jun Huan, Raghu Nambiar, Jundong Li, Filip Ilievski, Ricardo Baeza-Yates, Xiaohua Hu |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 4265-4268 |
Number of pages | 4 |
ISBN (Electronic) | 9798350362480 |
DOIs | |
State | Published - 2024 |
Event | 2024 IEEE International Conference on Big Data, BigData 2024 - Washington, United States Duration: Dec 15 2024 → Dec 18 2024 |
Publication series
Name | Proceedings - 2024 IEEE International Conference on Big Data, BigData 2024 |
---|
Conference
Conference | 2024 IEEE International Conference on Big Data, BigData 2024 |
---|---|
Country/Territory | United States |
City | Washington |
Period | 12/15/24 → 12/18/24 |
Funding
This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S., Department of Energy under Contract No. DE-AC05-00OR22725.
Keywords
- BERT
- Decision support system
- Environmental mitigation
- FERC licences
- Information extraction
- Large Language Models
- Natural Language Processing
- US hydropower