Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

trOCR base model 30% CER on IAM word dataset vs 4% for IAM line dataset, is this normal? #1653

Open
1 task
slender9168 opened this issue Nov 12, 2024 · 0 comments

Comments

@slender9168
Copy link

slender9168 commented Nov 12, 2024

Describe the bug
Model I am using: trocr-base-handwritten

Dataset:

The problem arises when using:

  • my own modified scripts:
    ` self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    print(self.device)

      # Initialize processor and model
      self.processor = TrOCRProcessor.from_pretrained("microsoft/trocr-base-handwritten")
      self.model = VisionEncoderDecoderModel.from_pretrained("microsoft/trocr-base-handwritten").to(self.device)
      # Prepare image and move pixel values to the device
      image = Image.open(image_path).convert("RGB")
      pixel_values = self.processor(image, return_tensors="pt").pixel_values.to(self.device)
      
      # Generate text
      generated_ids = self.model.generate(pixel_values)
      generated_text = self.processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
    

`

A clear and concise description of what the bug is:
when running microsoft/trocr-base-handwritten against the IAM word dataset ( single words ), I got a CER of about 30%
when running it against the IAM line dataset, the CER is about 4%

  1. is this expected?
  2. can I train the model on single word images to enhance its performance on single words to 4% CER? or is it inherintally bad on single words?
  3. is the model being trained on full lines instead of single words, the reason for the 30% CER?

To Reproduce
Steps to reproduce the behavior:

  1. use the sample code with microsoft/trocr-base-handwritten against the IAM word dataset, the CER will be aroud 30%
  • Platform: windows 10
  • Python version: 3.8
  • PyTorch version (GPU?): 2.5.1+cu124
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant