Skip to content

OCR-D/gt_structure_all

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

gt_structure_all

This meta-repository is a comprehensive collection of all official OCR-D Ground Truth repositories with structural annotations (i.e. only layout, but no text).

Together, these datasets make up the OCR-D Structure GT corpus, which contains images and their respective annotations in PAGE format, capturing the structural elements (segments=regions but not lines) of printed pages (with a total of 25441 pages).

It was established as part of the DFG funded project OCR-D.

Data repositories

Cloning the repository with submodules

git clone --recurse-submodules -j8 https://github.com/OCR-D/gt_structure_all.git

Zenodo

zenodo logo

All data records are also published in Zenodo, and thus have a DOI. Whenever changes are made and a new release is created, the respective dataset will receive a new DOI.

Access to the OCR-D datasets in Zenodo via this search.

Text Data

If you wish to incorporate text data into these structural datasets, then please use the datasets or data from gt_structure_dtaText repository.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •