HocrConverter

Create PDFs and plain text from hOCR documents

Changes by C.Holtermann

Original script didn't work for me so I made some changes to make it work for me

My configuration is ocropus 0.7 and tesseract 3.02.02

Included some aspects from the fork of https://github.com/zw/HocrConverter:

Some command line arguments:

For command line parsing and validation I use some external libraries:

Like this the script is rather something to understand the concept.

Maybe it's useful for others trying to understand OCR.

Work in progress.

Provide feedback