Skip to content

How to fix OCR errors? #1254

Answered by endolith
endolith asked this question in Q&A
Feb 17, 2024 · 3 comments · 9 replies
Discussion options

You must be logged in to vote

OK I finally got it working on a different machine:

sudo apt install ocrmypdf
conda create --name ocrmypdf pip ipython
conda activate ocrmypdf
pip install git+https://github.com/ocrmypdf/OCRmyPDF.git
ipython

Then inside ipython:

import ocrmypdf
from pathlib import Path
ocrmypdf.api._pdf_to_hocr(input_pdf=Path("png2pdf.pdf"), output_folder=Path("./output"))
ocrmypdf.api._hocr_to_ocr_pdf(work_folder=Path("./output/"), output_file=Path("OCRed.pdf"))

Without any text modifications it outputs slightly different text than what I had before, with extra line breaks, but I guess that's from newer versions of various things.

Replies: 3 comments 9 replies

Comment options

You must be logged in to vote
3 replies
@endolith
Comment options

@endolith
Comment options

@endolith
Comment options

Comment options

You must be logged in to vote
6 replies
@endolith
Comment options

@endolith
Comment options

@endolith
Comment options

@endolith
Comment options

Answer selected by endolith
@jbarlow83

This comment has been hidden.

@jbarlow83
Comment options

Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
3 participants