Skip to content

Downstream datasets

embedding edited this page Oct 17, 2022 · 7 revisions

CLUE benchmark

CLUE is a Chinese Language Understanding Evaluation benchmark which contains classification, named entity recognition, and machine reading comprehension tasks. The datasets in CLUE are in JSON format. For classification and named entity recognition datasets, we convert the JSON format to TSV format so that UER can load them directly. For machine reading comprehension, the original format is retained and the dataset pre-processing is included in the project.

Classification:

Dataset Link or path in the project
TNEWS https://share.weiyun.com/maExfIeO
CSL https://share.weiyun.com/LftIGlIT
CMNLI https://share.weiyun.com/hn3kTeKm
OCNLI https://share.weiyun.com/3DlKxB3q
AFQMC https://share.weiyun.com/CdlEKMON
IFLYTEK https://share.weiyun.com/ldiLjnZJ
CLUEWSC2020 https://share.weiyun.com/RLL1ShBi

Machine reading comprehension:

Dataset Link or path in the project
CMRC2018 https://share.weiyun.com/KwAbnX60
C3 https://share.weiyun.com/JDpgczdp
ChID https://share.weiyun.com/8KJE3NOz

Named entity recognition:

Dataset Link or path in the project
CLUENER2020 https://share.weiyun.com/smSMtLkn

Baidu ERNIE

ERNIE provides 5 Chinese datasets in its first version and use them to test ERNIE's performance.

Dataset Link or path in the project
ChnSentiCorp https://share.weiyun.com/BRujeOQT
LCQMC https://share.weiyun.com/5Fmf2SZ
XNLI https://share.weiyun.com/mcd8EApl
MSRA-NER https://share.weiyun.com/ua1Z5w2r
NLPCC-DBQA https://share.weiyun.com/5HJMbih

Competition dataset

Dataset Link or path in the project
SMP2020-EWECT https://share.weiyun.com/uFGEhrWp
SMP2019-ECISA https://share.weiyun.com/MgHL8QSI
CCF-BDCI2021-Corrupted_Short_Message_Reconstruction https://share.weiyun.com/xHr6OkQw

GLUE benchmark

GLUE is an English Language Understanding Evaluation benchmark which contains classification and regression tasks. We convert the datasets in GLUE to TSV format so that UER can load them directly.

Dataset Link or path in the project
CoLA https://share.weiyun.com/n5kPUmsr
SST-2 https://share.weiyun.com/48noHt6Y
MRPC https://share.weiyun.com/7nXAjpYo
STS-B https://share.weiyun.com/hMuQmwMx
QQP https://share.weiyun.com/1k6IGbfj
MNLI https://share.weiyun.com/9QbFtF02
QNLI https://share.weiyun.com/J7LQKCYY
RTE https://share.weiyun.com/EnGVoElX
WNLI https://share.weiyun.com/752vzwjP
Clone this wiki locally