Skip to content

tbs17/MathBERT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 

Repository files navigation

MathBERT

MathBERT is a BERT model trained on the below mathematics text.

  • pre-k to high school math curriculum from engageny.org
  • G6-8 math curriculum from utahmiddleschoolmath.org
  • G6-high school math from illustrativemathematics.org
  • high school to college math text books from openculture.com
  • G6-8 math curriculum from ck12.org
  • College to graduate level MOOC math course syllabus from classcentral.com
  • math paper abstracts from arxiv.org

MathBERT has its own vocabulary (mathVocab) that's built via BertTokenizer to best match the training corpus. We also trained MathBERT with the original BERT vocabulary (baseVocab) for comparison. Both models are uncased versions.

Downloading Trained Models

We release the tensorflow and the pytorch version of the trained models. The tensorflow version is compatible with code that works with the model from Google Research. The pytorch version is created using the Hugging Face library.

  • Tensorflow download
    • note: to download mathbert-mathvocab version, change the model name to mathbert-mathvocab-uncased in the below code
    wget http://tracy-nlp-models.s3.amazonaws.com/mathbert-basevocab-uncased/bert_config.json
    wget http://tracy-nlp-models.s3.amazonaws.com/mathbert-basevocab-uncased/vocab.txt
    wget http://tracy-nlp-models.s3.amazonaws.com/mathbert-basevocab-uncased/bert_model.ckpt.index
    wget http://tracy-nlp-models.s3.amazonaws.com/mathbert-basevocab-uncased/bert_model.ckpt.meta
    wget http://tracy-nlp-models.s3.amazonaws.com/mathbert-basevocab-uncased/bert_model.ckpt.data-00000-of-00001
    
  • Pytorch download
from transformers import *

tokenizer = AutoTokenizer.from_pretrained('tbs17/MathBERT')
model = AutoModel.from_pretrained('tbs17/MathBERT')

tokenizer = AutoTokenizer.from_pretrained('tbs17/MathBERT-custom')
model = AutoModel.from_pretrained('tbs17/MathBERT-custom')

Pretraining and fine-tuning

The pretraining code is located at /mathbert/ and fine-tuning notebook is at /scripts/MathBERT_finetune.ipynb. Unfortunately, we can't release the fine-tuning data set per the data owner's request. All the packages we use is in the requirements.txt file.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published