-
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 1 reply
-
The two variants (generated by "dict" and "rawdict", respectively) differ in the span level only. So you could generate "rawdict" output, then walk through each of its spans, concatenate all for b in page.get_text("rawdict", flags=..., clip=..., etc.)["blocks"]:
for l in b["lines"]:
for s in l["spans"]:
s["text"] = "".join([c["c"] for c in s["chars"]]) |
Beta Was this translation helpful? Give feedback.
-
To output to a JSON file, you could even write a small JSON output plugin which does the above. |
Beta Was this translation helpful? Give feedback.
The two variants (generated by "dict" and "rawdict", respectively) differ in the span level only.
In the first case case, the span has the key
"text"
, the rawdict spans have"chars"
instead.So you could generate "rawdict" output, then walk through each of its spans, concatenate all
char["c"]
of its"chars"
and put the result in the span as"text"
key.