OCR for Indic-Languages

Introduction

An approach to use OpenCV and Google's Tesseract to do OCR in Python.

Other Libraries: PyTesseract (Python-tesseract is a python wrapper for Google's Tesseract-OCR), NumPy, Pillow.

Tesseract Documentation
- Original Repository: Tesseract at UB Mannheim link
- The Mannheim University Library (UB Mannheim) uses Tesseract to perform OCR (optical character recognition) of historical German newspapers (Allgemeine Preußische Staatszeitung, Deutscher Reichsanzeiger). The latest results with OCR from more than 360,000 scans are available online.
OpenCV Documentation: link

I used PyCharm for most of this. Also can be deployed using JupyterLab & Notebooks.
Use appropriate functions for viewing data, as the consoles in Notebook won't support OpenCV's 'imshow()' function.

The inspiration behind this and the data I used in this project is taken from here.
Main Paper: (https://www.cse.iitb.ac.in/~rohitsaluja/PID6011473.pdf)
Main Repository By: @rohitsaluja22
OpenOCRCorrect: (https://github.com/rohitsaluja22/OpenOCRCorrect/)

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data/sanskrit		data/sanskrit
information		information
lib		lib
output		output
processed_data		processed_data
temp		temp
.gitignore		.gitignore
Documentation.docx		Documentation.docx
LICENSE		LICENSE
README.md		README.md
func.py		func.py
index_boxes.py		index_boxes.py
main.py		main.py