-
-
Notifications
You must be signed in to change notification settings - Fork 0
USFM Parsing and Translation
John Lambert edited this page Apr 9, 2024
·
6 revisions
Serval seeks to mirror how Paratext parses and interprets USFM files, both by providing support for USFM as per the documentation as well as seeking to be accommodating to some non-standard formats.
- Unique Identifiers are generated to reference a specific text segment in a scripture text and act as a primary anchor point.
- The reference is serialized in the following format:
[verse reference]/[path element 1]/[path element 2]/...
. - Verse references follow the standard USFM identification and naming
- Non-verse paths are identified by
[localized instance #]:[USFM tag]
- For example, the reference for the section header that occurs directly after MAT 1:1 would be represented as
MAT 1:1/1:s
.
- For example, the reference for the section header that occurs directly after MAT 1:1 would be represented as
- Positions are 1-based (the position 0 is used when a position is not specified or unknown).
- Some non-verse text segments can be nested in another element.
- For example, a table cell might be represented as
MAT 1:1/1:tr/1:tc1
.
- For example, a table cell might be represented as
- Introductory material that occurs at the beginning of a book before the first verse is referenced by the
1:0
verse reference.
When projects are read in, they are put in original versification. The source and target verse ranges are then merged with the other. All text from the verse ranges (or ranges of segments) are put on the first verse or segment range.
Non-scripture text within the USFM structure is also translated. This includes:
- Section Headers (any USFM paragraph type)
- Footnotes, endnotes, etc.
- Tables (USFM cells)
- Note that tables and paragraphs will be stripped out when inside of a verse (segment) or a footnote. Paragraph and table formatting will otherwise be preserved.