Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

论文细节求解释下 #97

Open
menglin0320 opened this issue Jul 11, 2023 · 0 comments
Open

论文细节求解释下 #97

menglin0320 opened this issue Jul 11, 2023 · 0 comments

Comments

@menglin0320
Copy link

Another observation is that Transformer-based methods
generally suffer from unaligned-length problem [49], which
denotes that the Transformer is hard to correct the vision
prediction if character number is unaligned with ground truth.
The unaligned-length problem is caused by the inevitable
implementation of padding mask which is fixed for filtering
context outside text length. Our iterative LM can alleviate
this problem as the visual feature and linguistic feature are
fused several times, and thus the predicted text length is also
refined gradually.
这段指的是什么问题?这套框架应该是不适用复杂的layout和很长的文本的吧,有大佬解释下这里解决的是啥问题么?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant