-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
consistent hierarchy levels in centre-id? #136
Comments
Multiple hyphens are allowed in centre-ids. The first hyphen delineates between the TLD and centre name. See https://wmo-im.github.io/wis2-topic-hierarchy/standard/wis2-topic-hierarchy-DRAFT.html#_centre_identification, Permission 2A for more information. Beyond this there is no hierarchy assumed or implied. |
If everything after the first hyphen is interpreted as When just inspecting for example
a machine wouldn't know a priori that the "root center name" in one case is When inspecting the following two centre-ids That means it cannot be established unambiguously what the institution releasing the data actually is. If I want more data from the institution that released with centre-id Of course that might not have been the primary goal here, but for us one of the reasons to adhere to the WMO scheme with all our open data metadata (instead of plain UUIDs) was that it will be clear at which institution a dataset originates. In the above scenario, the hierarchy information of what the institution actually is, is lost, because after the country and the first hyphen, the end of the institution part and the start of a routine or other component at that institution is not defined. |
Description of datasets is in the remit of WCMP2 / discovery metadata. WIS2 Global Discovery Catalogue (GDC) search results have the core discovery/description constructs (identification, data policy, access links, spatiotemporal extents). WTH itself is in support of a topic structure for Pub/Sub and event driven architecture. As well, the centre-id is not responsible for articulating the dataset originator (again, in the remit of WCMP2). |
@tomkralidis OK, then I misinterpreted the introduction at https://wmo-im.github.io/wis2-topic-hierarchy/standard/wis2-topic-hierarchy-DRAFT.html#_centre_identification From
I (wrongly) deduced that it would be in fact clear what of the given possibilities is actually given, whether it is for example the "issuing centre of a given dataset" or a "data product". Now I think I understand that another resource will in fact be needed to understand what hierarchy level is actually given in the
implied in my view that it will be clear from the |
We have never considered that extracting the name of the institution or the name of the service (typically dwd or gts-to-wis2) was a requirement. |
One can derive this for global services by always checking the last token for an approved global service type (i.e. https://github.com/wmo-im/wcmp2-codelists/blob/main/codelists/global-service-type.csv). But that's only a partial use case. Having said this, the centre-id lookup clearly provides attribution of the publishing centre along with the associated WCMP2 record, which is available in |
Nice! Implicit scheme nowTo understand a
Dedicated separator in the centre-id scheme?However, I still think still more robust would be a scheme where the separator of functional units within the (A) Hyphens
(B) Alternative: hyphens are so ubiquitous in names that it is not easy to use it as a reserved separator so another char could be used as separator as was done in WIS before, e.g., @golfvert Thank you for the question
We assumed as open data team at DWD that adopting the WIS2.0 scheme for all our metadata IDs would bring the advantage of being able to identify from what country and which institution a dataset comes without opening it. Mixing the function of hyphens as separators with them being part of names as well makes the automatic/machine interpretation of a It may very well be that this type of machine readability was never in the scope of the Best regards! |
I don't think we will go to change the hyphen to something else to make the country more visible. Sorry. |
We discussed dotted paths previously and decided not to use them due to edge cases where other message queuing protocols (for example, running an MQTT/AMQP bridge/facade). We also decided against The centre-id.csv has the institution name, but not the country in a human readable form. Using TLDs helps in providing the country name. Having said this, the centre-id implementation does have some heuristics in order to identify global services, which would be implemented with something like: global_services = [
'global-broker',
'global-cache',
'global-discovery-catalogue',
'global-monitoring'
]
centre_id = 'ca-eccc-msc-global-discovery-catalogue'
# split centre-id on the first dash
tld, centre = centre_id.split('-', 1)
# strip any global service function identification
[centre := centre.replace(f'-{gs}', '') for gs in global_services]
print(tld, centre) |
Dear colleagues, when looking at the centre-id.csv I noticed that at first it seems to separate hierarchy levels by hyphens, e.g. for DWD:
de-dwd
:<country>-<institution>
But then it doesn't seem the case such as in
de-dwd-gts-to-wis2
where the last 3 items seem to be one name, but suggest further hierarchy levels via the hyphens.or in
fr-meteo-france
: after the country, only "meteo" would be the institution when machine-parsing with a hyphen as separator.As far as I understood, the scheme
urn:wmo:md:{centre_id}:{local_identifier}
offers the opportunity to parse the origin of a dataset without opening it. In the examples above, hyphens as hierarchy level separators are mixed with hyphens as part of names. That will make automatic parsing of the data source ambiguous.Best regards,
Hella Riede (DWD)
The text was updated successfully, but these errors were encountered: