Skip to content

An approach to use OpenCV and Google's Tesseract to do OCR in Python

License

Notifications You must be signed in to change notification settings

sarbhanub/opencv-ocr-indic

Repository files navigation

GPLv3 License

OCR for Indic-Languages

Introduction

An approach to use OpenCV and Google's Tesseract to do OCR in Python.

Other Libraries: PyTesseract (Python-tesseract is a python wrapper for Google's Tesseract-OCR), NumPy, Pillow.

  • Tesseract Documentation
    • Original Repository: Tesseract at UB Mannheim link
    • The Mannheim University Library (UB Mannheim) uses Tesseract to perform OCR (optical character recognition) of historical German newspapers (Allgemeine Preußische Staatszeitung, Deutscher Reichsanzeiger). The latest results with OCR from more than 360,000 scans are available online.
  • OpenCV Documentation: link

Deployment

  • I used PyCharm for most of this. Also can be deployed using JupyterLab & Notebooks.
  • Use appropriate functions for viewing data, as the consoles in Notebook won't support OpenCV's 'imshow()' function.

Acknowledgements

About

An approach to use OpenCV and Google's Tesseract to do OCR in Python

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages