Skip to content

The CIS language aware OCR document error profiler

Notifications You must be signed in to change notification settings

cisocrgroup/Profiler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Profiler

Source code for the language-aware OCR document error profiler. See the Profiler Manual for a description.

References

The profiler has originally been written by Uli Reffle as part of his PhD thesis in computational linguistics at CIS during the IMPACT project (2008-2011).

It has been further developed as a CLARIN-D Kurationsprojekt by Florian Fink at CIS.

Its underlying technology is described in the following publications:

Mihov, Stoyan, and Klaus U. Schulz. 2004. “Fast Approximate Search in Large Dictionaries.” Computational Linguistics 30 (4). MIT Press: 451–77.

Reffle, Ulrich. 2011. Algorithmen und Methoden zur dokumentenspezifischen Analyse historischer und OCR-erfasster Texte. Verlag Dr. Hut.

Reffle, Ulrich, and Christoph Ringlstetter. 2013. “Unsupervised Profiling of OCRed Historical Documents.” Pattern Recognition 46 (5): 1346–57. doi:http://dx.doi.org/10.1016/j.patcog.2012.10.002.

Schulz, Klaus U., and Stoyan Mihov. 2002. “Fast String Correction with Levenshtein Automata.” International Journal on Document Analysis and Recognition 5 (1). Springer: 67–85.