-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Analyse openaire similarity mechanism #9
Comments
documentation from OpenAire on their similarity mechanism: |
OpenAIRE assigns internal identifiers for each object it collects. By default, the internal identifier is generated as sourcePrefix::md5(localId) where:
|
@pvgenuchten, I had a look at this issue, not sure I fully understand it, but this is what I did:
the results are in the attached csv the origin of the record is the 'sourcePrefix' (a namespace prefix of 12 chars) of the obj_identifier, e.g. _____OmicsDI::47167d2e7a363dcb907e77d4a5c948d7, the 'sourcePrefix' = '_____OmicsDI', this 'sourcePrefix' does not always seem to be 'unique',e.g. 'doi_________' is used for Datacite,Crossref and Zenodo (see more info and examples at the bottom of the following webpage) In some cases the sourcePrefix can be used to generate the url from the id, e.g. in case of the prefix 'doi_________' , the reoccurring pattern to generate the url = 'https://doi.org/' + original_id But this is not always the case, see the first 3 records in the attached csv, _____OmicsDI: is linked to the following record-urls: https://www.omicsdi.org/dataset/gpmdb/GPM11210027561, https://www.omicsdi.org/dataset/omics_ena_project/PRJNA267992, https://www.omicsdi.org/dataset/geo/GSE63974. For the mapping from id to url, I think this will be a combination of I have added the code I use in the notebook 'soilwise_openaire' in the github repository. I hope I understood the question correctly, if not let me know. |
thank you for the work, the main question is, for any harvested record, can we capture the platforms it has been found, and on those platforms, where is the url to access it, see for example this record. I think you are on the right track, introducing a concatenation pattern based on sourceprefix. suggestion would be to try it out for some of the popular platforms (cordis, zenodo, dataverse, gbif), then evaluate if it makes sense https://api.openaire.eu/search/publications?format=json&page=16&size=200 it is apparently available in the collectedfrom and originalid section, but no direct url, i wonder if we can derive for popular platforms the direct url from those 2 properties |
on the other hand, i like what openaire states here: it's probably best to only link to formal pids, because localid seems unstable over time |
my suggestion would be to close this issue, but document its findings (as evidence in our reports) |
hi @pvgenuchten , I probably don't fully understand the issue, but there seems to be a direct url in the jsonfile for each record in the fields "children.result.instance.url" or "children.result.instance.webresources"? in some cases you can also derive the direct url from the collectedfrom and originalID fileds, the url of the example you provide in the higher comments is e.g. sourcePrefix = "openaire____" -> doi.org/10.1029/2018WR024608 ( 'sourcePrefix' + correct originalId) Fine for me to close issue |
Openaire has a mechanism to identify in which repositories a resource is included, the origins of the record are stored with the record, including a string identifier of the platform, the identifier should be used to retrieve the url of the record, so users can click from the record in soilwise to one of its origins
The text was updated successfully, but these errors were encountered: