Unreproducible results for type values from CSV vocabularies with empty first column #227

schivmeister · 2023-11-13T19:12:02Z

Background

RMLMapper is a tool used as part of another library (basically a wrapper around rmlmapper and other tools) by Meaningfy to aid in the mapping of OP TED notices from XML to RDF. However, since the transformation was run by the mapping team at Meaningfy in July 2023, the same results can no longer be reproduced in November 2023, despite using the same version of rmlmapper v6.1.3.

One of these potential regressions relates to the introduction, among some of the data, of properties called epo:hasBuyerLegalType and epo:hasMainActivityType, which are themselves related to the corresponding object/value data vocabularies buyer_legal_type.csv and main_activity.csv, respectively. Help is now sought to determine what the root cause for this behaviour could be.

Problem

Expected

No occurrence of epo:hasBuyerLegalType or epo:hasMainActivityType in the resulting RDF data, wherever there is no XML element mapping in the object/value reference data vocabulary (empty first column).

Actual

Occurrences of epo:hasBuyerLegalType and epo:hasMainActivityType in the resulting RDF data with unexpected values, wherever there is no XML element mapping in the object/value reference data vocabulary (empty first column).

Observations

It was later found that the issue occurs in cases where the above-cited CSV vocabulary file has an empty cell value (no XML element and therefore no mapping to be expected). Placing a hyphen - or a white space in place of the empty first cells appears to fix this. However, this is unexpected, as the previous transformation in July 2023 did not exhibit this behaviour, and there were no such occurrences. It is uncertain if this relates in any way to #140.

MWE

As the transformation involves multiple RML files/modules, and it is not useful to prepare a very minimal example without all the contextual data, a reproduction test suite (of a mostly-minimal working example) is attached with this ticket. It contains also the MWE for another potential regression #226 identified alongside this one.

mfy-rml-mwe.zip

The text was updated successfully, but these errors were encountered:

DylanVanAssche · 2023-11-13T20:00:13Z

Hi @schivmeister ,

Thanks for the detailed issue.
You mention that both executions were with the same version of the RMLMapper, I'm not sure if it is then a bug in the RMLMapper, same for #226. If the input data was different for both executions, the results are indeed not the same.
Empty values should be ignored by RMLMapper.

schivmeister · 2023-11-13T20:09:48Z

Hi @DylanVanAssche thanks for looking! That's the thing - the input data is the same, the version is the same, but we are seeing different results! The attached MWEs show exactly this.

The expected result was the one we last generated. The MWE will produce new output that is different. So, we were wondering if you might have any clue as to what else it could be. It requires a bit of time investment in following the MWE.

DylanVanAssche · 2023-11-13T20:12:21Z

I had already a look at the MWE but I fail to understand which data was the 'old' data and which is the 'new' data.
I would expect that the MWE had 2 versions then, one from July and one from November?
Maybe I missed it :)

schivmeister · 2023-11-14T07:45:57Z

@DylanVanAssche sorry about the confusion. The file expected.ttl is the "old" output from July. The rest of the files (the XML, RMLs, CSVs and JSONs) are all the original files used to generate that output TTL.

The scope of the MWE is to generate the "new" output actual.ttl exactly from these old resources, so that the tester can follow and compare how it comes about, both with and without applying the discovered workarounds.

Given this context, let me know if the MWE then is still hard to follow. We'll attempt to minimize whatever complexity still remains.

schivmeister · 2023-11-14T16:11:50Z

We are in the process of preparing simpler MWEs for reporting the discovered causes as new tickets. Perhaps that will allow us to better comprehend these issues, and lead the way to finding the root cause of the behaviour described here (inability to reproduce specific prior results). We will start with the potential cause identified in #226, as that is more pressing at this time.

schivmeister mentioned this issue Nov 13, 2023

Unreproducible results for repeated XML NUTS elements with JSON vocabulary mapping #226

Open

schivmeister changed the title ~~Unexpected properties with type values since v6.1.3~~ Unreproducible results for values from CSV vocabularies with empty first column Nov 14, 2023

schivmeister changed the title ~~Unreproducible results for values from CSV vocabularies with empty first column~~ Unreproducible results for type values from CSV vocabularies with empty first column Nov 14, 2023

DylanVanAssche added the bug Something isn't working label Mar 15, 2024

schivmeister mentioned this issue Apr 17, 2024

XML predicate mapping repeating child elements getting concatenated if reference includes concatenation #235

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unreproducible results for type values from CSV vocabularies with empty first column #227

Unreproducible results for type values from CSV vocabularies with empty first column #227

schivmeister commented Nov 13, 2023

DylanVanAssche commented Nov 13, 2023

schivmeister commented Nov 13, 2023

DylanVanAssche commented Nov 13, 2023

schivmeister commented Nov 14, 2023

schivmeister commented Nov 14, 2023

Unreproducible results for type values from CSV vocabularies with empty first column #227

Unreproducible results for type values from CSV vocabularies with empty first column #227

Comments

schivmeister commented Nov 13, 2023

Background

Problem

Expected

Actual

Observations

MWE

DylanVanAssche commented Nov 13, 2023

schivmeister commented Nov 13, 2023

DylanVanAssche commented Nov 13, 2023

schivmeister commented Nov 14, 2023

schivmeister commented Nov 14, 2023