do external model/checkpoint selection #1

bertsky · 2021-10-21T09:51:36Z

In light of tesseract-ocr/tesseract#3560 (which describes how not only tesstrain's own CER estimation is completely off but also why its checkpoint selection uses the wrong criterion) I would recommend not just using the "best" model picked by make training, but implementing your own checkpoint selection based on make traineddata and subsequent (external, not lstmeval-based) CER measurement (on the validation subset) of each checkpoint.

The text was updated successfully, but these errors were encountered:

M3ssman · 2021-10-23T09:34:59Z

Thanks for your advises! We'll do some evaluation regarding this Issue, since we plan to utilize this model (or a model based on this workflow / training data) for current running digitalization of historical newspapers / "Zeitungsprojekt HP II"

bertsky · 2023-01-26T17:39:17Z

Since your report is published, may I inquire about model selection for ulbhdz1.traineddata again? Was the checkpoint selected by Tesseract already the best one with a true evaluator? How much did the CER results differ?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

do external model/checkpoint selection #1

do external model/checkpoint selection #1

bertsky commented Oct 21, 2021

M3ssman commented Oct 23, 2021

bertsky commented Jan 26, 2023

do external model/checkpoint selection #1

do external model/checkpoint selection #1

Comments

bertsky commented Oct 21, 2021

M3ssman commented Oct 23, 2021

bertsky commented Jan 26, 2023