You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I noticed that on some pages there only segments within the printspace are annotated, so there are no text regions for catch-words, page numbers, headers etc. There is only a Border annotation, no PrintSpace element, so this seems somewhat inconsistent. Also, it only affects some pages.
This is a problem if used as structural GT to train segmentation models.
I could run an incremental segmentation to automatically "find" these segments and make a PR or visual comparison if you want.
The text was updated successfully, but these errors were encountered:
@bertsky I encourage any kind of improvement to enhance data usability, but can you point me to an example?
I'm not sure whether I got the issue right and how something within the PrintSpace can be marked without being a TextRegion.
Here, in the footer of the page, the signature mark and page number are not annotated.
On this example, the running title in the header and catch word in the footer are not annoted.
In both cases, there is a Border element (more or less precisely) around the physical page (as it should be), but no PrintSpace element. The latter is only required on GT level 3, but practically having no PrintSpace element and no segments outside of the print space (headers/footers) is difficult for use as layout training data.
I noticed that on some pages there only segments within the printspace are annotated, so there are no text regions for catch-words, page numbers, headers etc. There is only a Border annotation, no PrintSpace element, so this seems somewhat inconsistent. Also, it only affects some pages.
This is a problem if used as structural GT to train segmentation models.
I could run an incremental segmentation to automatically "find" these segments and make a PR or visual comparison if you want.
The text was updated successfully, but these errors were encountered: