论文细节求解释下 #97

menglin0320 · 2023-07-11T20:05:21Z

Another observation is that Transformer-based methods
generally suffer from unaligned-length problem [49], which
denotes that the Transformer is hard to correct the vision
prediction if character number is unaligned with ground truth.
The unaligned-length problem is caused by the inevitable
implementation of padding mask which is fixed for filtering
context outside text length. Our iterative LM can alleviate
this problem as the visual feature and linguistic feature are
fused several times, and thus the predicted text length is also
refined gradually.
这段指的是什么问题？这套框架应该是不适用复杂的layout和很长的文本的吧，有大佬解释下这里解决的是啥问题么？

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

论文细节求解释下 #97

论文细节求解释下 #97

menglin0320 commented Jul 11, 2023

论文细节求解释下 #97

论文细节求解释下 #97

Comments

menglin0320 commented Jul 11, 2023