You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In light of tesseract-ocr/tesseract#3560 (which describes how not only tesstrain's own CER estimation is completely off but also why its checkpoint selection uses the wrong criterion) I would recommend not just using the "best" model picked by make training, but implementing your own checkpoint selection based on make traineddata and subsequent (external, not lstmeval-based) CER measurement (on the validation subset) of each checkpoint.
The text was updated successfully, but these errors were encountered:
Thanks for your advises! We'll do some evaluation regarding this Issue, since we plan to utilize this model (or a model based on this workflow / training data) for current running digitalization of historical newspapers / "Zeitungsprojekt HP II"
Since your report is published, may I inquire about model selection for ulbhdz1.traineddata again? Was the checkpoint selected by Tesseract already the best one with a true evaluator? How much did the CER results differ?
In light of tesseract-ocr/tesseract#3560 (which describes how not only tesstrain's own CER estimation is completely off but also why its checkpoint selection uses the wrong criterion) I would recommend not just using the "best" model picked by
make training
, but implementing your own checkpoint selection based onmake traineddata
and subsequent (external, not lstmeval-based) CER measurement (on the validation subset) of each checkpoint.The text was updated successfully, but these errors were encountered: