Skip to content

Latest commit

 

History

History
74 lines (47 loc) · 3.17 KB

README.md

File metadata and controls

74 lines (47 loc) · 3.17 KB

Segatron

License: MIT

This repo contains codes and pre-trained models for our paper

Segatron: Segment-aware Transformer for Language Modeling and Understanding

He Bai, Peng Shi, Jimmy Lin, Yuqing Xie, Luchen Tan, Kun Xiong, Wen Gao, Ming Li

AAAI 2021

Setup

To use this repo, please install NVIDIA APEX. We recommand using this docker or building your own environment with NGC's PyTorch container nvcr.io/nvidia/pytorch:20.03-py3.

Download Checkpoints

We have uploaded following checkpoints to the huggingace models:

Pre-training

Evaluation

1. Wikitext-103

image-20201213122915570

2. GLUE and Machine Reading Comprehension

  • The source code is in the transformers folder, which is based on huggingface's Transformers repository. It should be notice that Segatron needs paragraph position index, sentence position index, and token position index in its input features. Hence we changed the input feature extraction and model forward functions of Transformers, which means our codes is not compatiable with the huggingface's Transformers.

  • Please refer to transformers/README.md for details.

image-20201213122906064

3. SST

image-20201213122841703

Citation

Please cite the AAAI 2021 paper:

@inproceedings{bai2021segatron,
  title={Segatron: Segment-Aware Transformer for Language Modeling and Understanding},
  author={Bai, He and Shi, Peng and Lin, Jimmy and Xie, Yuqing and Tan, Luchen and Xiong, Kun and Gao, Wen and Li, Ming},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  volume={35},
  number={14},
  pages={12526--12534},
  year={2021}
}