You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First of all, amazing work, and I enjoyed it a lot.
Your paper stated:
In this ablation study, we use a hybrid transformer as a baseline
encoder and train both models from scratch with a longer
side of the input image of 768, and test them with an image
size of 1280.
According to Table 7, the improvement of DET and E2E is minor (in my personal viewpoint only, as ICDAR 2015 and Total-Text are small scale datasets)
Are you happy to share more about this ablation study? E.g., 1. What is the improvement of Multi-way Decoder in the final model, i.e.,
when using the Swin transformer, and
the image resolution is 1920.
2. The converge curves with/without using Multi-way Decoder?
The text was updated successfully, but these errors were encountered:
Dear authors:
First of all, amazing work, and I enjoyed it a lot.
Your paper stated:
According to Table 7, the improvement of DET and E2E is minor (in my personal viewpoint only, as ICDAR 2015 and Total-Text are small scale datasets)
Are you happy to share more about this ablation study? E.g.,
1. What is the improvement of
Multi-way Decoder
in the final model, i.e.,2. The converge curves with/without using
Multi-way Decoder
?The text was updated successfully, but these errors were encountered: