Skip to content

isamplesorg/content-classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

content-classification

Training, evaluation code of classifiers that were created for the purpose of metadata enhancement.

How to use

Setup

Download the dataset, and the checkpoint of the finetuned model based on the schema.

Dataset Finetuned Model Dataset
SESAR Model Dataset
OPENCONTEXT: Material Model Dataset
OPENCONTEXT: Sample Model Dataset

Place the downloaded data in the data folder, and the model checkpoint in the model folder.



Finetuning the model

  1. If needed, conduct hyperparameter search by running hyperparameter_search.py file
python OPENCONTEXT/hyperparameter_search.py --nb_epochs 3 --batch_size 32 --lr_rate 3e-5 --label_type "material" --train_mode "custom" --pretrained_model PRETRAINED_MODEL --output_dir $OUTPUT_DIR 
  1. Finetune the model using the train.py file Running this file expects the following arguments:
  • --nb_epochs : number of epochs
  • --batch_size : training batch size
  • --lr_rate : value of learning rate
  • --train_mode: whether or not to account for the class imbalance
  • --output_dir : the directory where the results, finetuned model checkpoint will exist
  • --label_type ( if OPENCONTEXT dataset is used for training) : The type of label (material / sample)

For example, to train a model that classifies the material type of the OPENCONTEXT dataset
python3 OPENCONTEXT/train.py --nb_epochs 3 --batch_size 32 --lr_rate 3e-5 --label_type "material" --train_mode "custom" --output_dir $OUTPUT_DIR 

Evaluate the finetuned model

  1. Load your model and use it for evaluation using eval.py file
  • --batch_size : training batch size
  • --train_mode: whether or not to account for the class imbalance
  • --label_type ( if OPENCONTEXT dataset is used for training) : The type of label (material / sample)
  • --model_dir : the directory that stores the checkpoint of the finetuned model. This would be the model folder where the downloaded model exists.
  • --output_dir : the directory where the results, finetuned model checkpoint will exist

For example, to load a finetuned model that was trained on the OPENCONTEXT dataset and see its performance on the test dataset:

python3 OPENCONTEXT/eval.py  --batch_size 32 --label_type "material" --train_mode "custom" --model_dir OPENCONTEXT/model/material_checkpoint --output_dir $OUTPUT_DIR 

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published