EXTRACT is an optical character recognition engine for various operating systems which extracts texts from an image and converts them to plain text.
This model is a very primitive form of the original google tesseract which extracts texts (ONLY CAPITAL LETTERS) from an image and converts them to plain text.
- os
- numpy
- PIL
- sys
- keras
- cropyble
- cv2
- shutil
NOTE1:- The trained model is not provided. So for the very first time run the script as it is. Once the model is trained: COMMENT OUT 'Train_Model' on line '65' and then run the script for further use.
NOTE2:- Only some fonts were taken into account so remember to use default font (calibri) in image texts with a FONT SIZE of '72' as there are assumptions to extract letters.
Run the script on your terminal: 'python3 tesseract.py': input image is:
output is (the predicted result is at the bottom):