-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
XML predicate mapping repeating child elements getting concatenated if reference includes concatenation #235
Comments
Thanks for the very detailed bug report! I'm afraid this is an old RML spec issue, being underspecified how to work with multiple valued references (resulting in sometimes very weird results as you've detailed here, eg in combination with rr:template or a function). We're working on improving the new version of the spec and a more global solution using the Logical Views extension, with a PoC implementation available (and paper being presented next month), however, that's all still in alpha stage. So, there are actually 3 paths that can be taken in parallel, I think:
We'll check when we can dedicate some time on this bug report, but as you can imagine as an academic institution, it's always trying to find a balance wrt our research roadmaps/paid projects. If this would be really blocking you, feel free to reach out at [email protected] to see how we can prioritize this! |
Thank you for the swift response @bjdmeest! It already helps a lot to know that I'm not (likely) making a mistake somewhere. I understand that offering a resolution is not always possible, which is totally fine. We will reach out if it indeed turns out to be a blocker. There are potentially other solutions depending on the use case, e.g. in our case it was originally related to a lookup based on modified source values, but we decided to encode certain values in the lookup table as a workaround instead, so that we need not modify the reference. Otherwise, I took a look again at the Logical Views extension, which I did check out briefly once before for tabular lookups. However, I don't see XML as a supported source format in the reference/PoC implementation, and I also think it attacks a different problem. Nevertheless, I took the liberty to try and figure out where in the code this is likely happening. It appears to be an issue with the dataio library's XMLRecord.get() implementation as called in ReferenceExtractor::extract(). Trying to reproduce the issue
instead of
or in the case of a plain reference:
It could very well be that the concatenation causes unexpected behaviour in the evaluation of the XPaths (using Saxon?), as a direct concat on repeating elements would otherwise raise an error of the form:
Tested using:
But we are not getting an error in the mapping itself, just unexpected concatenation, which indicates that the function works but is being evaluated on the entire set of XPath query results. |
I realized after all that we also have template, which works:
So, this is a very valid alternative for simple cases not involving other XPath expressions (added as a workaround in the original post). |
Extract languageMaps into their own TriplesMaps as RML is "underspecified" for multi-valued properties, leading to odd behaviour in beyond-basic use cases (in this case language code transformation via CSV lookup function). Effectively fixes gh-66 but implements an unintuitive mapping approach (predicate at its own iterator). See also RMLio/rmlmapper-java#235. Affects: - ChangeInformation - ContactPoint - Lot - LotGroup - Organization - Procedure - BT-75-Lot versioned, with fix for version range and dupe conflict - BT-772-Lot versioned ChangeInformation tested with 673305-2023.xml. Many of the Lot fields could not be tested due to unavailability of data (not easily searchable or requirements expressible through currently available means). The change for BT-75-Lot fixes the situation where there was a redundant 1.4+ mapping in the common Lot RML file, and a mistaken version annotation for 1.3-1.3 (being marked as min 1.4). Because later versions are less restrictive, this cannot easily be caught, but was otherwise wrong (the common would override). Additionally, remove some conflicting dupe references in Procedure: - BT-01(d)-Procedure - BT-1351-Procedure This may or may not have led to undefined behaviour in the mapping (there were no change in outputs so hopefully this had no impact).
Extract languageMaps into their own TriplesMaps as RML is "underspecified" for multi-valued properties, leading to odd behaviour in beyond-basic use cases (in this case language code transformation via CSV lookup function). Effectively fixes gh-66 but implements an unintuitive mapping approach (predicate at its own iterator). See also RMLio/rmlmapper-java#235. Affects: - Contract - Procedure CAN - BT-554-Tender versioned
Environment
rmlmapper v6.5.1 (reproducible also as far back as v6.1.3)
Linux/WSL2
Java 17, 11
Namespaces
Problem
Given the following kind of input XML with two
Organization
elements, where the first has two childName
elements:and the following kind of RML mapping involving a custom concatenated value in the source reference:
Actual
Results in an unexpected output of the first resource's
name
concatenating the repeating values in between the prefix and suffix, instead of multiple comma-separated RDF/Turtle values:Expected
Should result in multiple comma-separated values mapped from the XML child elements, adhering to the condition of the reference:
Workaround
Template bypassing XPath expressions
This is perhaps the closest thing to an actual solution (if you don't need additional XPath complexity):
producing the correct result:
Plain reference with out-of-band strategies
One could skip using the reference altogether and employ a different technique, with something external, to replicate the desired outcome, for e.g. using (custom) functions, or even just looking up a mapping table using a parentTriplesMap.
Removing the concatenation obviously makes it work:
resulting in:
Reoriented iterator
Using an iterator on the child element which repeats but creating the subject using the ancestor element appears to work:
However, this is unintuitive and convoluted. The correct solution would be if repeating child elements were also repeated as values for a predicateObjectMap, as they normally are with a plain reference (or template).
MWE
rml-mwe-concat-multivalue.zip (excludes template example)
Context
This may or may not be related to #227 #228.
The text was updated successfully, but these errors were encountered: