Itstool experimental branch #303

jralls · 2023-03-27T19:51:32Z

Recreate #120

@gjanssens original description:

This branch is meant to experiment with an alternative workflow for documentation authoring and translations. In this alternative workflow the independent documents per language are replaced with a single master document and translation will happen using gettext and the its extensions.

The single master document will serve for all languages as follows:

Sections with contents that is valid for all languages will be written in English.
Using the proper tooling the translatable messages will be extracted into a message catalog (pot file) which can be translated for each language we support.
Sections that are relevant only for a single language will also be added in the master document, but marked with special ITS tags. As a result messages from these sections will not appear in the catalog files.
ITS also supports sections that are valid for some but not all languages or sections that are not relevant for some languages.
This branch is working with a reduced sample book (many chapters removed) for simplicity.

It has converted this subset into a master document and has created a po file for the German language.

Note creating this initial PO file is fairly complicated (the pot file on the other hand is easy). I have written a script that will extract msgids and corresponding msgstrs from the original and translated files. The only caveat is the order in which the xml tags appear in both documents has to match exactly. If they don't the script will fail. So to use the script any tag mismatches should first be manually corrected (using a smart diff tool to find the differences does help, but it's still a huge one-time manual effort).

Lastly what gettext with ITS won't solve for us is that incomplete translations will be replaced with English text as opposed to not included as it was in the original workflow. I'll admit that personally I would prefer complete documentation that may be partially untranslated than incomplete documentation with what's there fully translated. Including untranslated contents may also better expose the work to be done still to potentially new translators. We can investigate how tooling/scripting can improve the experience though.

Note also that the last commits are there to simulate large documentation changes and evaluate how this impacts translatability of the master document and the language specific po files in particular.

… this commit required additional content be created to accommodate links, resulting in the addition of ch_configuring.xml and ch_importing.xml to the documentation source.

…te files. xmllint yields: ch_basics.xml:1520: parser error : Premature end of data in tag chapter line 18 ^ gnucash-guide.xml:497: parser error : Failure to process entity chapter2 &chapter2; ^ gnucash-guide.xml:497: parser error : Entity 'chapter2' not defined &chapter2; ^

…h_GettingStarted.xml

…originally slated for Help in bug 796856

Note this commit readds the custom html entity definitions. This is needed for itstool 2.0.2 to be able to parse the files. Due to a bug it won't properly parse the DTD.

…into itstool With some adjustments to get the branch up to speed with the XInclude changes

The xml files I have kept are those needed to merge in sunfish62's PR. That should give enough context to evaluate how pot based translations work when moving texts from one book to another.

This is the first step to extract translations into a po file. When the tag structure across the documents is exactly the same on the document files across different languages, the extraction of translatable strings/translated messages can be done in a semi automated way. Note some sections exist only in one language. ITS has a mechanism to handle this, but that can only be used *after* the documents have been converted to po based translations. So in the interim missing sections are added with markers that can later be searched for. I am currently using these markers: * LANG-DE: will be found in the C documents to indicate this section only exists in the German translation, not in the English original * UNTRANSLATED-DE: used in the German translation to indicate this section did not exist in the translation, but does in the English original. * FUZZY-DE: used in the German translation, indicating the context of this section has changed and needs review. Sections marked with LANG-DE or UNTRANSLATED-DE are either candidates for a special ITS marker (if the section is only relevant for one language) or for future translation. Note also that whitespace differences are not important. ITS cleans up whitespace before extracting the strings. The sequence of the tags in a file is what matters.

…r conversion to po At the top level, these global targets have been defined: - gnucash-docs-de-english.fpot, gnucash-docs-de-native.fpot, gnucash-docs-it-english.fpot, gnucash-docs-it-native.fpot, gnucash-docs-ja-english.fpot, gnucash-docs-ja-native.fpot, gnucash-docs-pt-english.fpot, gnucash-docs-pt-native.fpot, gnucash-docs-ru-english.fpot, gnucash-docs-ru-native.fpot: These all generate a po template file based on the native language (de, it, ja, pt, ru). The files having "native" will have msgids in the native language. The files having "english" in their name will have English msgids to go with the native language. (For example for the de language, these fpot files will have msgids for both the guide and the help manual, while the fpot files for ru will only consider the guide). Note the 'f' stands for 'fake' as the template files for other languages are not really meant as template files: they have msgid's in the native language instead of English. But this fake template file can later be used to compile a proper po message catalog including that language's translations. - gnucash-docs-de-english.struct, gnucash-docs-de-native.struct,...: these rules will extract the msgid order as found in the respective fpot with similar name and stores this order as gnucash-docs-<lang>-<english,native>.struct. This roughly maps with the xml tag hierarchy of the original xml files. This extracted order can be used to verify if the msgid order in the English pot file (in srcdir/po) is identical to the msgid order in the fpot file for another language. This is crucial for a later automatic generation of po files. If the msgid order doesn't match exactly the automatic compilation can't work. Note to fix misalignments, xml nodes may have to be added or combined in the original xml files (either the English ones or the translated ones depending on the mismatch) or extracted strings should be harmonized (match capitalization and punctuation in the same language such that one msgid is used the same number of times in both files). - fpot-de, fpot-pt,...: pseudo targets that will generate the proper gnucash-docs-xy.fpot and .struct for that language - de.po, it.po,...: will generate a po file in the given language starting from gnucash-docs.pot and the fpot file for the given language. Again if the associated .struct files differ, this command will print an error and exit. Aside from these global targets, there are similar targets per book (guide/C, help/de, ...) - <entity>.fpot: will generate an fpot file for a single entity. For example ch_accts.fpot will generate such a file for ch_accts.xml. It will also generate the associated ch_accts.struct which again can be used to compare the same source file in different languages. - fpots: will run the *.fpot rule for each source file in the current directory. These additional targets are provided as it's likely easier to start the msgid alignment on a file per file basis. After all files in a language have been lined up with the same files in the C sources, the global fpot file for a language can be evaluated to make sure the alignment holds when all files are parsed into one pot file.

Note this likely obscures the rule to extract a <lang>.po file from two fake pot files. If this happens, the file in src/po/<lang>.po should temporarily be renamed for the extraction to work.

sunfish62 and others added 19 commits September 12, 2018 20:44

Bug 796855. Bringing Chapter 3 of Help into Chapter 2 of Guide. Note:…

fe7107e

… this commit required additional content be created to accommodate links, resulting in the addition of ch_configuring.xml and ch_importing.xml to the documentation source.

no message

98648ac

Fix error in file structure.

2a06b97

Replace closing sect1 tag that was mistakenly commented out in Help_c…

4ce507a

…h_GettingStarted.xml

Bug-796855 - Add lengthy text to ch_Importing from David C. that was …

32ca395

…originally slated for Help in bug 796856

Adding changes to Makefile.am, per fellen comment.

11310a4

Deleting commented out content per jralls.

7ebab26

Adding back two entity declarations to Makefile.am, per fellen

1d8f1f0

Add rule to generate a pot file based on our two documents

a065e68

Note this commit readds the custom html entity definitions. This is needed for itstool 2.0.2 to be able to parse the files. Due to a bug it won't properly parse the DTD.

Merge branch 'bug-796855' of https://github.com/sunfish62/gnucash-docs …

4bb344a

…into itstool With some adjustments to get the branch up to speed with the XInclude changes

Reduce C books to a minimum test set

0651f65

The xml files I have kept are those needed to merge in sunfish62's PR. That should give enough context to evaluate how pot based translations work when moving texts from one book to another.

Commit first version of pot and de.po

400b086

Merge branch 'merge-sunfish62-pr' into itstool

3b37c09

Updated pot file after merge of sunfish62's branch

8dc33e7

Add rule to update <lang>.po file

02917a9

Note this likely obscures the rule to extract a <lang>.po file from two fake pot files. If this happens, the file in src/po/<lang>.po should temporarily be renamed for the extraction to work.

de.po after merging with newest pot file

38e595c

jralls changed the title ~~Itstool~~ Itstool experimental branch Mar 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Itstool experimental branch #303

Itstool experimental branch #303

jralls commented Mar 27, 2023 •

edited

Loading

Itstool experimental branch #303

Are you sure you want to change the base?

Itstool experimental branch #303

Conversation

jralls commented Mar 27, 2023 • edited Loading

jralls commented Mar 27, 2023 •

edited

Loading