Abstract
This work represents the system proposed by team Innovators for SemEval 2022 Task 8: Multilingual News Article Similarity (Chen et al., 2022). Similar multilingual news articles should match irrespective of the style of writing, the language of conveyance, and subjective decisions and biases induced by medium/outlet. The proposed architecture includes a machine translation system that translates multilingual news articles into English and presents a multitask learning model trained simultaneously on three distinct datasets. The system leverages the PageRank algorithm for Long-form text alignment. Multitask learning approach allows simultaneous training of multiple tasks while sharing the same encoder during training, facilitating knowledge transfer between tasks. Our best model is ranked 16 with a Pearson score of 0.733. We make our code accessible here.
Original language | English |
---|---|
Title of host publication | SemEval 2022 - 16th International Workshop on Semantic Evaluation, Proceedings of the Workshop |
Editors | Guy Emerson, Natalie Schluter, Gabriel Stanovsky, Ritesh Kumar, Alexis Palmer, Nathan Schneider, Siddharth Singh, Shyam Ratan |
Publisher | Association for Computational Linguistics (ACL) |
Pages | 1163-1170 |
Number of pages | 8 |
ISBN (Electronic) | 9781955917803 |
State | Published - 2022 |
Externally published | Yes |
Event | 16th International Workshop on Semantic Evaluation, SemEval 2022 - Seattle, United States Duration: Jul 14 2022 → Jul 15 2022 |
Publication series
Name | SemEval 2022 - 16th International Workshop on Semantic Evaluation, Proceedings of the Workshop |
---|
Conference
Conference | 16th International Workshop on Semantic Evaluation, SemEval 2022 |
---|---|
Country/Territory | United States |
City | Seattle |
Period | 07/14/22 → 07/15/22 |
Funding
This work was supported by the European Union's Horizon 2020 research and innovation program under grant agreement No. 833635 (project ROXANNE: Real-time network, text, and speaker analytics for combating organized crime, 2019-2022). This work was supported by the European Union’s Horizon 2020 research and innovation program under grant agreement No. 833635 (project ROX-ANNE: Real-time network, text, and speaker analytics for combating organized crime, 2019-2022).