You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
By defauly, tesseract does both detection and recognition.
Is it possible to have an API for recognize() which would just perform recognition and return the output text with confidence?
Or atleast simulate it?
The pytesseract.image_to_string() call only gives the recognized text.
For image_recognize(), we could do something like this for output_type dict:
def recognize(img):
data = pytesseract.image_to_data(img, lang=self.lang_str, output_type='dict')
texts = []
avg_confidence = 0
total_bboxes = 0
# assert len(data['text']) == 1 # Should contain only 1 bbox
for i in range(len(data['text'])):
text, conf = data['text'][i].strip(), float(data['conf'][i]) / 100.0
if conf < 0 or not text:
continue
total_bboxes += 1
avg_confidence += conf
texts.append(text)
if not total_bboxes:
return {}
return {
'text': ' '.join(texts),
'confidence': avg_confidence/total_bboxes
}
Can you please take this as a feature request?
This would be helpful if someone is using their own detector and want to just perform recognition using tesseract.
The text was updated successfully, but these errors were encountered:
The only problem might be the fact that tesseract and pytesseract respectively support batch recognition via list of images.
So this should be considered in that use case.
By defauly,
tesseract
does both detection and recognition.Is it possible to have an API for
recognize()
which would just perform recognition and return the output text with confidence?Or atleast simulate it?
The
pytesseract.image_to_string()
call only gives the recognized text.For
image_recognize()
, we could do something like this for output_typedict
:Can you please take this as a feature request?
This would be helpful if someone is using their own detector and want to just perform recognition using tesseract.
The text was updated successfully, but these errors were encountered: