-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unreproducible results for type values from CSV vocabularies with empty first column #227
Comments
Hi @schivmeister , Thanks for the detailed issue. |
Hi @DylanVanAssche thanks for looking! That's the thing - the input data is the same, the version is the same, but we are seeing different results! The attached MWEs show exactly this. The expected result was the one we last generated. The MWE will produce new output that is different. So, we were wondering if you might have any clue as to what else it could be. It requires a bit of time investment in following the MWE. |
I had already a look at the MWE but I fail to understand which data was the 'old' data and which is the 'new' data. |
@DylanVanAssche sorry about the confusion. The file The scope of the MWE is to generate the "new" output Given this context, let me know if the MWE then is still hard to follow. We'll attempt to minimize whatever complexity still remains. |
We are in the process of preparing simpler MWEs for reporting the discovered causes as new tickets. Perhaps that will allow us to better comprehend these issues, and lead the way to finding the root cause of the behaviour described here (inability to reproduce specific prior results). We will start with the potential cause identified in #226, as that is more pressing at this time. |
Background
RMLMapper is a tool used as part of another library (basically a wrapper around
rmlmapper
and other tools) by Meaningfy to aid in the mapping of OP TED notices from XML to RDF. However, since the transformation was run by the mapping team at Meaningfy in July 2023, the same results can no longer be reproduced in November 2023, despite using the same version ofrmlmapper
v6.1.3.One of these potential regressions relates to the introduction, among some of the data, of properties called
epo:hasBuyerLegalType
andepo:hasMainActivityType
, which are themselves related to the corresponding object/value data vocabularies buyer_legal_type.csv and main_activity.csv, respectively. Help is now sought to determine what the root cause for this behaviour could be.Problem
Expected
No occurrence of
epo:hasBuyerLegalType
orepo:hasMainActivityType
in the resulting RDF data, wherever there is no XML element mapping in the object/value reference data vocabulary (empty first column).Actual
Occurrences of
epo:hasBuyerLegalType
andepo:hasMainActivityType
in the resulting RDF data with unexpected values, wherever there is no XML element mapping in the object/value reference data vocabulary (empty first column).Observations
It was later found that the issue occurs in cases where the above-cited
CSV
vocabulary file has an empty cell value (no XML element and therefore no mapping to be expected). Placing a hyphen-
or a white space in place of the empty first cells appears to fix this. However, this is unexpected, as the previous transformation in July 2023 did not exhibit this behaviour, and there were no such occurrences. It is uncertain if this relates in any way to #140.MWE
As the transformation involves multiple RML files/modules, and it is not useful to prepare a very minimal example without all the contextual data, a reproduction test suite (of a mostly-minimal working example) is attached with this ticket. It contains also the MWE for another potential regression #226 identified alongside this one.
mfy-rml-mwe.zip
The text was updated successfully, but these errors were encountered: