Skip to content

How to output text and char to JSON file at the same time? #1785

Answered by JorjMcKie
trulymust asked this question in Q&A
Discussion options

You must be logged in to vote

The two variants (generated by "dict" and "rawdict", respectively) differ in the span level only.
In the first case case, the span has the key "text", the rawdict spans have "chars" instead.

So you could generate "rawdict" output, then walk through each of its spans, concatenate all char["c"] of its "chars" and put the result in the span as "text" key.

for b in page.get_text("rawdict", flags=..., clip=..., etc.)["blocks"]:
    for l in b["lines"]:
        for s in l["spans"]:
            s["text"] = "".join([c["c"] for c in s["chars"]])

Replies: 2 comments 1 reply

Comment options

You must be logged in to vote
1 reply
@trulymust
Comment options

Answer selected by trulymust
Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants