New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Ablation study on `Multi-way Transformer Decoder` #9

Open

amos-x-wang opened this issue Dec 5, 2023 · 0 comments

amos-x-wang commented Dec 5, 2023 •

edited

Loading

Dear authors:

First of all, amazing work, and I enjoyed it a lot.

Your paper stated:

In this ablation study, we use a hybrid transformer as a baseline
encoder and train both models from scratch with a longer
side of the input image of 768, and test them with an image
size of 1280.

According to Table 7, the improvement of DET and E2E is minor (in my personal viewpoint only, as ICDAR 2015 and Total-Text are small scale datasets)

Are you happy to share more about this ablation study? E.g.,
1. What is the improvement of Multi-way Decoder in the final model, i.e.,

when using the Swin transformer, and
the image resolution is 1920.

2. The converge curves with/without using Multi-way Decoder?

The text was updated successfully, but these errors were encountered:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment