How to parse Simple Chinese PDF #2536
Replies: 8 comments 3 replies
-
简体中文pdf解析有什么问题吗😊 |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
识别各个章节要么通过坐标计算的方式,要么通过视觉模型识别区域,如果你是想过滤掉页眉页脚的话,只能先对整页进行视觉版面识别。pdf本身并没有这些信息。 |
Beta Was this translation helpful? Give feedback.
-
我的工作刚好是做这个的 |
Beta Was this translation helpful? Give feedback.
-
至于图片下的乱码,可以获取图片坐标与seqno,获取文本的坐标和seqno,然后判断文本是不是被图片覆盖了,如果覆盖了则是不可见文本。 |
Beta Was this translation helpful? Give feedback.
-
get_texttrace获取文本,get_bboxlog获取图片坐标 |
Beta Was this translation helpful? Give feedback.
-
计算是否是不可见文本有以下规则 |
Beta Was this translation helpful? Give feedback.
-
I am not sure what to make out of this post. |
Beta Was this translation helpful? Give feedback.
-
How to parse Simple Chinese PDF
Beta Was this translation helpful? Give feedback.
All reactions