trOCR base model 30% CER on IAM word dataset vs 4% for IAM line dataset, is this normal? #1653

slender9168 · 2024-11-12T11:07:11Z

Describe the bug
Model I am using: trocr-base-handwritten

Dataset:

IAM word dataset: https://www.kaggle.com/datasets/nibinv23/iam-handwriting-word-database
IAM line dataset: https://huggingface.co/datasets/Teklia/IAM-line

The problem arises when using:

my own modified scripts:
` self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(self.device)

  # Initialize processor and model
  self.processor = TrOCRProcessor.from_pretrained("microsoft/trocr-base-handwritten")
  self.model = VisionEncoderDecoderModel.from_pretrained("microsoft/trocr-base-handwritten").to(self.device)
  # Prepare image and move pixel values to the device
  image = Image.open(image_path).convert("RGB")
  pixel_values = self.processor(image, return_tensors="pt").pixel_values.to(self.device)
  
  # Generate text
  generated_ids = self.model.generate(pixel_values)
  generated_text = self.processor.batch_decode(generated_ids, skip_special_tokens=True)[0]

`

A clear and concise description of what the bug is:
when running microsoft/trocr-base-handwritten against the IAM word dataset ( single words ), I got a CER of about 30%
when running it against the IAM line dataset, the CER is about 4%

is this expected?
can I train the model on single word images to enhance its performance on single words to 4% CER? or is it inherintally bad on single words?
is the model being trained on full lines instead of single words, the reason for the 30% CER?

To Reproduce
Steps to reproduce the behavior:

use the sample code with microsoft/trocr-base-handwritten against the IAM word dataset, the CER will be aroud 30%

Platform: windows 10
Python version: 3.8
PyTorch version (GPU?): 2.5.1+cu124

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

trOCR base model 30% CER on IAM word dataset vs 4% for IAM line dataset, is this normal? #1653

trOCR base model 30% CER on IAM word dataset vs 4% for IAM line dataset, is this normal? #1653

slender9168 commented Nov 12, 2024 •

edited

Loading

trOCR base model 30% CER on IAM word dataset vs 4% for IAM line dataset, is this normal? #1653

trOCR base model 30% CER on IAM word dataset vs 4% for IAM line dataset, is this normal? #1653

Comments

slender9168 commented Nov 12, 2024 • edited Loading

slender9168 commented Nov 12, 2024 •

edited

Loading