An approach to use OpenCV and Google's Tesseract to do OCR in Python.
Other Libraries: PyTesseract (Python-tesseract is a python wrapper for Google's Tesseract-OCR), NumPy, Pillow.
- Tesseract Documentation
- Original Repository: Tesseract at UB Mannheim link
- The Mannheim University Library (UB Mannheim) uses Tesseract to perform OCR (optical character recognition) of historical German newspapers (Allgemeine Preußische Staatszeitung, Deutscher Reichsanzeiger). The latest results with OCR from more than 360,000 scans are available online.
- OpenCV Documentation: link
- I used PyCharm for most of this. Also can be deployed using JupyterLab & Notebooks.
- Use appropriate functions for viewing data, as the consoles in Notebook won't support OpenCV's 'imshow()' function.
- The inspiration behind this and the data I used in this project is taken from here.
- Main Paper: (https://www.cse.iitb.ac.in/~rohitsaluja/PID6011473.pdf)
- Main Repository By: @rohitsaluja22
- OpenOCRCorrect: (https://github.com/rohitsaluja22/OpenOCRCorrect/)