This project presents a comparison of self-supervised learning methods for different downstream tasks in the context of Medieval Handwriting in the Latin Script dataset. Self-supervised learning has shown promise in various computer vision and natural language processing applications, but its effectiveness on historical scripts has not been extensively explored.
Three self-supervised learning methods are compared in this work.
-
A Simple Framework for Contrastive Learning of Visual Representations (SimCLR)
-
Masked Autoencoders Are Scalable Vision Learners (MAE)
-
Bootstrap your own latent: A new approach to self-supervised Learning (BYOL)
The performance evaluation was conducted on one downstream tasks i.e. script type classification. The results indicate that the SimCLR method outperforms other methods in the downstream task for the Medieval Handwritings Script dataset. Additionally, insights were provided regarding the factors influencing the performance of self-supervised learning methods in this context, including the selection of pre-training data and the size of the pre-training dataset. In conclusion, this study showcases the potential of self-supervised learning for historical handwritten document classification tasks and emphasizes the significance of selecting suitable methods for specific downstream tasks.
ICDAR CLaMM Challenge dataset is used for this project. The dataset can be found here
API Documentation is available at DOCUMENTATION.md
pip install -r requirements.txt
cd src/
python train.py +experiment=simclr_bolts
cd src/
python evaluate.py +experiment=simclr_eval
Check notebook here
Pre-training | Linear evaluation | |||
---|---|---|---|---|
Model Name | Epochs | Batch size | Training epochs | Top-1 accuracy |
SimCLR | 500 | 256 | 100 | 71.8 % |
MAE | 500 | 256 | 100 | 36.1 % |
BYOL | 500 | 64 | 100 | 45.2 % |
Image sources: ICDAR CLaMM