KLUE: Korean Language Understanding Evaluation

  • Sungjoon Park
  • , Jihyung Moon
  • , Sungdong Kim
  • , Won Ik Cho
  • , Jiyoon Han
  • , Jangwon Park
  • , Chisung Song
  • , Junseong Kim
  • , Youngsook Song
  • , Taehwan Oh
  • , Joohong Lee
  • , Juhyun Oh
  • , Sungwon Lyu
  • , Younghoon Jeong
  • , Inkwon Lee
  • , Sangwoo Seo
  • , Dongjun Lee
  • , Hyunwoo Kim
  • , Myeonghwa Lee
  • , Seongbo Jang
  • Seungwon Do, Sunkyoung Kim, Kyungtae Lim, Jongwon Lee, Kyumin Park, Jamin Shin, Seonghyun Kim, Lucy Park, Alice Oh, Jung Woo Ha, Kyunghyun Cho

Research output: Contribution to journalConference articlepeer-review

96 Scopus citations

Abstract

We introduce Korean Language Understanding Evaluation (KLUE) benchmark. KLUE is a collection of eight Korean natural language understanding (NLU) tasks, including Topic Classification, Semantic Textual Similarity, Natural Language Inference, Named Entity Recognition, Relation Extraction, Dependency Parsing, Machine Reading Comprehension, and Dialogue State Tracking. We create all of the datasets from scratch in a principled way. We design the tasks to have diverse formats and each task to be built upon various source corpora that respect copyrights. Also, we propose suitable evaluation metrics and organize annotation protocols in a way to ensure quality. To prevent ethical risks in KLUE, we proactively remove examples reflecting social biases, containing toxic content or personally identifiable information (PII). Along with the benchmark datasets, we release pretrained language models (PLM) for Korean, KLUE-BERT and KLUE-RoBERTa, and find KLUE-RoBERTaLARGE outperforms other baselines including multilingual PLMs and existing open-source Korean PLMs. The fine-tuning recipes are publicly open for anyone to reproduce our baseline result. We believe our work will facilitate future research on cross-lingual as well as Korean language models and the creation of similar resources for other languages.

Original languageEnglish
JournalAdvances in Neural Information Processing Systems
StatePublished - 2021
Externally publishedYes
Event35th Conference on Neural Information Processing Systems - Track on Datasets and Benchmarks, NeurIPS Datasets and Benchmarks 2021 - Virtual, Online
Duration: Dec 6 2021Dec 14 2021

Funding

Data annotation costs were provided by Upstage, NAVER CLOVA, Scatter Lab, SelectStar, Riiid!, DeepNatural and KAIST. The leaderboard is built and supported by Upstage. GPU cloud computing is provided by NAVER CLOVA NSML [64], Google TensorFlow Research Cloud (TFRC), and Kakao Enterprise BrainCloud. These three computing resources were used to pretrain and fine-tune the language models. News articles for the MRC datasets were provided by the Korea Economy Daily and Acrofan. The authors thank Cheoneum Park for discussions about task selection and DP task, Jinhyuk Lee and Minjoon Seo for discussions on MRC task, Sujeong Kim and DongYeon Kim for considerable efforts to manage the annotation for MRC dataset, and Sangah Park for careful consideration of data construction in DP, NER, and RE. We thank Junyeop Lee, Geonhee Lee, Jiho Lee, Daehyun Nam, and Yongjin Cho for the leaderboard and the evaluation system. This study is reviewed and approved by the KAIST Institutional Review Board (#KH2020-173).

Fingerprint

Dive into the research topics of 'KLUE: Korean Language Understanding Evaluation'. Together they form a unique fingerprint.

Cite this