Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: "Try to get the orientation right" #912

Open
lopiuh opened this issue Jul 3, 2021 · 2 comments
Open

Feature Request: "Try to get the orientation right" #912

lopiuh opened this issue Jul 3, 2021 · 2 comments

Comments

@lopiuh
Copy link

lopiuh commented Jul 3, 2021

As far as I know it is a trick to do a OCR 4 times and changing the orientation of the document. The scan with the best OCR (most hits in a dictionary scan?) is the "right" orientation. Paperless-ng seems to do that trick. Is it possible to integrate in docspell?

Thanks

lopiuh

@eikek
Copy link
Owner

eikek commented Jul 3, 2021

Thanks for reporting. This is a quite expensive trick, but simple and works of course. For reference there is #554 with similar requests. Changing orientation should be done automatically and manually at some point. It is possible to integrate it into docspell; I want to put more thought into this first.

@lopiuh
Copy link
Author

lopiuh commented Jul 3, 2021

Thanks, i did a manual orientation change on a pdf with linux "pdf arranger" and interestingly ocr was not better, maybe it is no real image rotating but a stored info "orientation" which get changed by changing orientation. Scanning the same document in correct orientation gives the correct ocr by the way. maybe there is a flag to tell the used ocr tools to use the saved orientatin information of a pdf (if my reasoning is right)

UPDATE 21-07-07:
Testing again with another pdf: OCR was correct done after rotating with software pdf arranger (linux). Don't know what the problem was with the first sample which paperless got ocred but docspell not...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants