-
This works pretty well, but is not perfect. How can I modify the invisible text to fix OCR errors without otherwise affecting the layout of the document. For example I want to change I found a lot of people asking for this: |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 9 replies
-
One comment says:
|
Beta Was this translation helpful? Give feedback.
-
You can use the ocrmypdf API - You could do small edits this way, but it mostly amounts to changing words in place. It would take a full word processor to do more complex changes. The APIs are currently private because they are subject to change. The author of gscan2pdf is rewriting that program in Python and planning to use ocrmypdf's features, including an interface to edit hOCR files. No idea of his timeline. |
Beta Was this translation helpful? Give feedback.
-
Just a side comment you may find interesting, looking for hOCR-editing tools I found Scribe OCR today which seems quite promising! |
Beta Was this translation helpful? Give feedback.
OK I finally got it working on a different machine:
Then inside ipython:
Without any text modifications it outputs slightly different text than what I had before, with extra line breaks, but I guess that's from newer versions of various things.