Document workflow

Document preparation

Prerequisites

In order to extract text and structure from a printed book you need a digitized scan of the pages of the book, possibly with 300dpi+ resolution and little to no compression. Transkribus server prefer jpg to hires tiff: same computation-wise quality with less filesize. If you have tiff, convert them to jpg with no change in dimensions and 9-10 (90%-100%) quality (i.e. little compression).

Visual analysis

In order to decode the structure it could be lovely to have some kind of standard references for book page structures. @liladude should create some basic structure standards for P2PLa.

Creating the project on Transkribus

Create a new document in Transkribus by uploading the images and filing the needed metadata.

Document processing

Run a printed block detection on text pages

paragraphs are detected and inserted in separated regions
paragraphs are not in separated regions
- manually create the paragraph regions
- train a model with the new layout @me check with team if this is possible

Run line detection
Recognize text (some pages) in order get some basic text for creating the new training

Text and character styles

normal text
italics
bold
superscript
superscript for footnote reference
superscript as footnote number
underlined
gesperrt (expanded)
antiqua in Fraktur
small caps

Paragraph styles

body text -> paragraph
running header
headings (levels?)
page number
quote
footnote
abstract
centered text
poetry
commentary
continued elements

Manually correct a set of pages (minumun to be set, depends on complexity of document)

Document export and conversion

Export document as Transkribus page (save it locally)
Process it with a transformation cascade from this project, taking care of
- Transformation with Page2Tei
- Substitute tag symbols in text with corresponding xml tags
- Fix notes, page numbers etc. (xslt)

Publish final document

create final tei document for teipublisher (manual fixes, re-combination of parts, etc.)
export to word docx for editorial/commmentary workflow
word processing for tei export -> see Reto END

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

workflow.md

workflow.md

Document workflow

Document preparation

Prerequisites

Visual analysis

Creating the project on Transkribus

Document processing

Text and character styles

Paragraph styles

Document export and conversion

Publish final document

Files

workflow.md

Latest commit

History

workflow.md

File metadata and controls

Document workflow

Document preparation

Prerequisites

Visual analysis

Creating the project on Transkribus

Document processing

Text and character styles

Paragraph styles

Document export and conversion

Publish final document