Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate statement ids #166

Open
mkroetzsch opened this issue Aug 26, 2015 · 9 comments · May be fixed by #500
Open

Validate statement ids #166

mkroetzsch opened this issue Aug 26, 2015 · 9 comments · May be fixed by #500
Labels
enhancement good first issue Well-scoped issue that can be a good first step to get started

Comments

@mkroetzsch
Copy link
Member

Statement ids in Wikibase have a fixed format, e.g., Q1294$AFB6473A-362F-429C-B2D8-8. This is validated in Wikibase when making API calls, and Wikidata Toolkit should likewise implement a basic validation in its data model. The PHP validation code can be seen at http://wbdoc.wmflabs.org/de/df9/ClaimGuidValidator_8php_source.html

@wetneb wetneb added the good first issue Well-scoped issue that can be a good first step to get started label Mar 5, 2020
@wetneb
Copy link
Member

wetneb commented Mar 5, 2020

Marking this as a good first issue, with the understanding that this would only validate non-empty statement ids: empty statement ids would still be allowed to represent new statements (which are not saved in the target Wikibase yet). Making sure that even new statements have valid ids is what #447 proposes (so that's a different issue, probably a bit more involved).

@kanikasaini
Copy link
Contributor

Hi @wetneb @mkroetzsch, I tried finding how Wikibase validates statement IDs but couldn't find the exact logic/code for it. Can you please help with that? I'm a student and I hope to contribute to this issue. Thank you!

@wetneb
Copy link
Member

wetneb commented May 26, 2020

I think this is the file mentioned above:
https://github.com/wmde/WikibaseLib/blob/bf563124cbab1ccc28367dcc5886fe23cac73bde/includes/ClaimGuidValidator.php
This library has been refactored but I do not think the claim id format has changed since, so you should still be able to rely on it to understand the format.

@kanikasaini
Copy link
Contributor

Thank you for your help.

From my understanding, we are passing the statement id and subject (EntityIdValue) to the below function where the format of statement id is AFB6473A-362F-429C-B2D8-8 and subject has Id of format Q1294. The issue description asks to validate the whole string Q1294$AFB6473A-362F-429C-B2D8-8. After going through the PHP code, I have understood that the latter half of the string can be validated using regex and the first half is checked for its existence. Can you please suggest how I can check if the EntityIdValue exists or not in the wdtk as I couldn't find a function that does it? Also, please confirm if we have to validate the whole string and not just the latter half, i.e., statement id.

image

Thank you!

@wetneb
Copy link
Member

wetneb commented May 26, 2020

For this sort of validation we would not check that the id exists in the Wikibase instance, as this would require an HTTP query. You could just check that it is a valid entity id by parsing it (see #424) perhaps.

@kanikasaini
Copy link
Contributor

Hi, I have implemented the below logic. I wanted to discuss where this function should go.

There are two thoughts:

  1. We can put it in Datamodel.java and call in the makeStatement function.
  2. We put it in StatementImpl.java and check before constructing the object.

Also, I wanted to know your thoughts on what should happen when it is not valid. Is it good to generate an exception and log it while returning a null object to the function call? Please let me know.

Thank you!

image

@wetneb
Copy link
Member

wetneb commented May 27, 2020

I would put statement id validation in the statement constructor. When validation fails, an exception should be thrown (for instance IllegalArgumentException). The standard output should not be used for this. Statement id validation should only be enforced when a non-null statement id is provided.

Perhaps it would be worth exposing statement id validation as a static method somewhere, so that it can also be used in tests, to validate the output of statement id generators.

@kanikasaini
Copy link
Contributor

kanikasaini commented May 28, 2020

Hi,

Thanks a lot for your help so far. For now, I declared and defined the validate function in StatementImpl and calling it within the constructor.

I wanted to confirm the format of the statement id once. We ran this query. and the statement id seems to be the format of 'Q5721-b763ede3-42b3-5ecb-ec0e-4bb85d4d348d'. Please notice there's no '$' sign in it. Although, according to the GuidGenerator, '$' is the separator.
Can you confirm which is it actually?

Also, this change would require as to change all other test cases that pass a statement id since they are not following this particular format and all fail when this validate function is added. Right?

Thank you for your time and patience.

@wetneb
Copy link
Member

wetneb commented May 28, 2020

Yes, this is a deliberate difference between the format of statement ids in JSON and in RDF. I suspect this was introduced because having a $ sign in a URI is not very clean, perhaps.
You can see that in org.wikidata.wdtk.rdf.Vocabulary where the statement ids are converted to RDF with PREFIX_WIKIDATA_STATEMENT + statementId.replaceFirst("\\$", "-").

Also, this change would require as to change all other test cases that pass a statement id since they are not following this particular format and all fail when this validate function is added. Right?

If these tests start failing, then yes, we will need to change the ids they provide so that they match the expected format.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement good first issue Well-scoped issue that can be a good first step to get started
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants