-
Notifications
You must be signed in to change notification settings - Fork 526
Downstream datasets
CLUE is a Chinese Language Understanding Evaluation benchmark which contains classification, named entity recognition, and machine reading comprehension tasks. The datasets in CLUE are in JSON format. For classification and named entity recognition datasets, we convert the JSON format to TSV format so that UER can load them directly. For machine reading comprehension, the original format is retained and the dataset pre-processing is included in the project.
Classification:
Dataset | Link or path in the project |
---|---|
TNEWS | https://share.weiyun.com/maExfIeO |
CSL | https://share.weiyun.com/LftIGlIT |
CMNLI | https://share.weiyun.com/hn3kTeKm |
OCNLI | https://share.weiyun.com/3DlKxB3q |
AFQMC | https://share.weiyun.com/CdlEKMON |
IFLYTEK | https://share.weiyun.com/ldiLjnZJ |
CLUEWSC2020 | https://share.weiyun.com/RLL1ShBi |
Machine reading comprehension:
Dataset | Link or path in the project |
---|---|
CMRC2018 | https://share.weiyun.com/KwAbnX60 |
C3 | https://share.weiyun.com/JDpgczdp |
ChID | https://share.weiyun.com/8KJE3NOz |
Named entity recognition:
Dataset | Link or path in the project |
---|---|
CLUENER2020 | https://share.weiyun.com/smSMtLkn |
ERNIE provides 5 Chinese datasets in its first version and use them to test ERNIE's performance.
Dataset | Link or path in the project |
---|---|
ChnSentiCorp | https://share.weiyun.com/BRujeOQT |
LCQMC | https://share.weiyun.com/5Fmf2SZ |
XNLI | https://share.weiyun.com/mcd8EApl |
MSRA-NER | https://share.weiyun.com/ua1Z5w2r |
NLPCC-DBQA | https://share.weiyun.com/5HJMbih |
Dataset | Link or path in the project |
---|---|
SMP2020-EWECT | https://share.weiyun.com/uFGEhrWp |
SMP2019-ECISA | https://share.weiyun.com/MgHL8QSI |
CCF-BDCI2021-Corrupted_Short_Message_Reconstruction | https://share.weiyun.com/xHr6OkQw |
GLUE is an English Language Understanding Evaluation benchmark which contains classification and regression tasks. We convert the datasets in GLUE to TSV format so that UER can load them directly.
Dataset | Link or path in the project |
---|---|
CoLA | https://share.weiyun.com/n5kPUmsr |
SST-2 | https://share.weiyun.com/48noHt6Y |
MRPC | https://share.weiyun.com/7nXAjpYo |
STS-B | https://share.weiyun.com/hMuQmwMx |
QQP | https://share.weiyun.com/1k6IGbfj |
MNLI | https://share.weiyun.com/9QbFtF02 |
QNLI | https://share.weiyun.com/J7LQKCYY |
RTE | https://share.weiyun.com/EnGVoElX |
WNLI | https://share.weiyun.com/752vzwjP |