Replies: 5 comments
-
Trying to find a way to implement this in a clean way, I have difficulty with our --- a/ocrd_models/ocrd_models/ocrd_page_generateds.py
+++ b/ocrd_models/ocrd_models/ocrd_page_generateds.py
@@ -14387,7 +14387,7 @@ class TextRegionType(RegionType):
already_processed.add('secondaryLanguage')
self.secondaryLanguage = value
self.validate_LanguageSimpleType(self.secondaryLanguage) # validate type LanguageSimpleType
- value = find_attr_value_('primaryScript', node)
+ value = find_attr_value_('primaryScript', node) or self.parent_object_.primaryScript
if value is not None and 'primaryScript' not in already_processed:
already_processed.add('primaryScript')
self.primaryScript = value So instead of copying the existing |
Beta Was this translation helpful? Give feedback.
-
If you can boil this down to a single |
Beta Was this translation helpful? Give feedback.
-
It can be a single The problem with copying the whole generated |
Beta Was this translation helpful? Give feedback.
-
Gotcha, yes, a
I tend to check the |
Beta Was this translation helpful? Give feedback.
-
I don't see the point in moving this to a discussion. We don't have multiple threads here yet, and were in the middle of finishing the PR that fixes it. (A part of the relevant discussion is in that PR now BTW.) In my understanding, GH discussions should be used for too complex problems (multi-issues) or debated features/functionality (non-issues) or even questions (not-yet-issues). |
Beta Was this translation helpful? Give feedback.
-
PAGE-XML features an implicit inheritance relation between various elements of the hierarchy:
Page/TextStyle → TextRegion*/TextStyle → TextLine/TextStyle → Word/TextStyle → Glyph/TextStyle
TextRegion*/@production → TextLine/@production → Word/@production → Glyph/@production
Page/@primaryScript → TextRegion*/@primaryScript → TextLine/@primaryScript → Word/@primaryScript → Glyph/@script
Page/@secondaryScript → TextRegion*/@secondaryScript → TextLine/@secondaryScript → Word/@secondaryScript → Glyph/@script
Page/@primaryLanguage → TextRegion*/@primaryLanguage → TextLine/@primaryLanguage → Word/@language
Page/@secondaryLanguage → TextRegion*/@secondaryLanguage → TextLine/@secondaryLanguage → Word/@language
Page/@readingDirection → TextRegion*/@readingDirection → TextLine/@readingDirection → Word/@readingDirection
Page/@textLineOrder → TextRegion*/@textLineOrder
These relations are only documented and cannot be automatically implemented in a generated DOM. But their semantics are important, and it would make writing processors much easier if they would be implemented.
For example, if I want to know if the current segment belongs to a certain script, I'd currently have to:
@script
or@primaryScript
/@secondaryScript
)@primaryScript
etcThis is very hard to achieve with XPath (because disjunction/unions are only possible on nodesets, not on predicates). And with the DOM it requires a lot of code each time.
But we could facilitate this by simply propagating all inherited features during
.build()
– in a patchedocrd_page_generateds
. We already have the user methods mechanism for patching, and we could simply usebuildChildren
to propagate all of the above attributes (as a bottom up post-hook), because attributes of parents are built before those of children.But for
TextStyle
, it's more complicated: on all hierarchy levels except thePage
level,TextStyle
sorts after the logical children and thus is only built after they are built. Also, one would need to unify style attributes between levels (we usually haveTrue
,False
andNone
; so true/false from parents replaces none in children).Beta Was this translation helpful? Give feedback.
All reactions