-
-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider adding more control to the way that translation is performed on a per-marker basis. #560
Comments
Issue #306 is an earlier version of this request. Since this one has more detail let's keep this one. |
Here are some ideas of how we could add more flexibility to the AI drafting.
Retain the marker and place it:
These are options for what to do with the content of the marker.
|
For paragraph markers we could calculate the position of the marker in the source verse and reinsert it in the same relative position (to the nearest word) in the translation. It's possible that this may be a more effective solution than using word alignments since it easy to understand and has no chance of mixing up the order of markers that are in the original. |
Here's another option for how we deal with Paragraph level markers. It might make a good default. There will be some challenges in implementing this. Many verses that are split over multiple paragraph style markers also have punctuation that preceeds the marker in the original AND in the translation. \v 2 Abraham was the father of Isaac, And the translated version looks like this: In this case the punctuation matches exactly and we should have great confidence producing One challenge is coping with translation across different scripts. Here we need to know the relationship between Arabic script punctuation and Latin script punctuation. We will need to know that for all the script pairs. (Ulf's URoman may have this information.) \v 2 إِبْرَاهِيمُ أَنْجَبَ إِسْحَاقَ. وَإِسْحاقُ أَنْجَبَ يَعْقُوبَ. وَيَعْقُوبُ أَنْجَبَ يَهُوذَا وَإِخْوَتَهُ. Another is ensuring that quotation marks are not split off from the quotation even though they contain punctuation. There are also quotations that are split over multiple paragraphs: \v 5 “Because the poor are plundered and the needy groan, Although this looks like a simple mechanical fix it's much more difficult to achieve than it first appears. |
Two translators have indicated that they think looking for punctuation would be good to try first and then fall back to counting words or characters and putting markers back in the same relative position in cases where we can't match on punctuation. "... assuming that would get it right most of the time. Moving an occasional marker over a word or two isn't a big deal but inserting all the paragraph and quote marks manually is a significant amount of time." |
It would be very helpful to have finer control over the way that SILNLP produces translations. It would be ideal to be able to specify what should happen with the data in each marker or group of markers. The translate_config.yml file might be a good place to configure these settings.
There are these actions that could be considered possible for Paragraph style markers:
Delete: Ignore the marker and its data and omit it from the output.
Translate: Copy the marker to the output along with the translation of its content.
Copy: Copy the marker and text to the output verbatim. Do not attempt to translate - useful for references for example.
Other actions are possible for Character style markers:
Translate without marker: Extract the text from the marker and translate it don't add the marker to the output.
Translate and move marker: Extract the text from the marker and translate it. Add the marker and end marker to the output.
This would have the option of adding the Marker and Endmarker to either the beginning or the end of the paragraph. Or adding the Marker to the beginning of the Paragraph and the End marker to the end of the paragraph.
Every marker has a \StyleType which is one of: Paragraph, Character, Milestone or Note. It might (or might not) be useful to be able to apply one action to all those markers with a specific \StyleType. Although this would likely not be very useful for the Paragraph or Character Styles which are widely used, it could be useful as a way to decide what should happen with Notes and Milestones.
Most, but not all markers have a \TextType which is one of: Title, ChapterNumber, VerseNumber, VerseText, Other, NoteText, Section.
It might be useful to be able to apply one action to all those markers with a specific \TextType
According to the USFM Reference, Markers which would be used in a broader text "environments" were named using a reserved initial letter and rather than an opening and closing tag.
In other words the markers beginning with \i form the introduction. All those beginning with f refer to a given footnote, etc.
Ideally, we would be able to specify what happens to these as a group without having to specify what happens to each individual marker within the group.
\i - Introductions
\f - Footnotes
\x - Cross references
\e - Explanatory (study) material
The text was updated successfully, but these errors were encountered: