Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate that country code is an ISO_3166-2 2-Digit country code #272

Closed
siwhitehouse opened this issue Jan 28, 2020 · 12 comments
Closed

Validate that country code is an ISO_3166-2 2-Digit country code #272

siwhitehouse opened this issue Jan 28, 2020 · 12 comments

Comments

@siwhitehouse
Copy link
Contributor

At the moment the schema only states in the description that a country code should be an ISO_3166-2 2-Digit country code. Both min and max length are set to 2, but there is no explicit check that the two digits published are valid.

We have had an internal discussion about creating an enum that can be checked. Three options have been suggested:

  • create and populate the enum by hand
  • maintain the codelist as a csv files so that we can included metadata for codelist items. We should document here what the workflow is around updating them and embedding codelist items as enums in the schema
  • fetch the codelist from a canonical online source and populate the enum from that.

Pinging @kd-ods and @Bjwebb to check that I have captured the options correctly.

@siwhitehouse
Copy link
Contributor Author

This has been moved from the (internal ODSC) issue at http://bods.opendataservices.coop/redmine/issues/310

@stevenday
Copy link

FWIW, the register uses option 3 and this ruby gem to provide the external source, which eventually traces back to this Debian package and specifically these files: ISO3166-1, ISO3166-2

From finding that, I got confused as to whether it should be ISO3166-1 or ISO3166-2 though.

The reference for country.code seems to infer ISO-3166-1 alpha-2, but isn't explicit, whereas jurisdiction.code explicitly states ISO-3166-2.

I think the docs for jurisdiction are slightly mistaken, it should be either an ISO3166-1 alpha-2 country code (2 digits), or an ISO3166-2 subdivision code (6 digits including the hyphen).

Tightening up both the docs and the validation gets a 👍 from me.

@Bjwebb
Copy link
Collaborator

Bjwebb commented Feb 18, 2020

I had a look into what we do for OCDS, but we just don't use country codes at the moment!

There's also a list of ISO 3166-1 alpha-2 codes on datahub (which seems to have some relation to Open Knowledge): https://datahub.io/core/country-list
The license information there is an interesting read.

@odscjames
Copy link
Collaborator

In terms of tidying up the docs to clear up @stevenday Q that sounds good.

In terms of validation, I'm not sure. Don't all 3 suggestions in the first post basically mean we are setting the list into the standard? What happens if then the list changes - a country is created or removed - should we then technically be releasing another version of the standard just to add a codelist entry?

Would it be better if we tighten up the docs to be clear but leave the technical schema as "min and max length are set to 2" only?

Then other software like CoVE and it's lib can download the list from an official source and check against that. (like the register currently does)

Tho, potentially do we have to worry about issues in historical data? If "Landy McLand Face" joins with it's neighbour and stops being a country in 2020, but we have historical immutable statements sitting around from 2019 with country.code = LL , would any validator have to know how to handle that?

@stevenday
Copy link

Thanks @odscjames, these are great points!

I agree that we can't set the list into the standard, so it would have to be additional validation provided by cove, etc not a codelist.

Tho, potentially do we have to worry about issues in historical data?

It was news to me that ISO 3166 has an issue of code re-use, as well as deletion! It makes sense if you think of it solely as a standard for current countries I suppose.

I guess we have three options:

  1. Downgrade it to a 'warning' rather than an 'error', with guidance to express the fact we can't be 100% sure.
  2. Build knowledge of ISO3166-3 into the validator, so it can say whether your codes are correct w.r.t the statementDates
  3. Punt the problem to a future time - waiting for the first instance of historical data that overlaps with a country code change, and use of our validator, to decide to fix that problem.

I'd probably err on the side of doing 1 and 3.

@stevenday stevenday added this to the 1.0 Release Candidate milestone Jul 16, 2020
@jpmckinney
Copy link

FYI, as far as updating the list of ISO 3166-1 alpha-2 codes, we use this script to update directly from ISO. (Rubygems, Debian, Open Knowledge, etc. are not always up-to-date.) There are some manual steps, but it's fairly quick: https://github.com/open-contracting-extensions/ocds_countryCode_extension/tree/master/script

@kd-ods
Copy link
Collaborator

kd-ods commented Nov 2, 2021

Tightening up both the docs and the validation gets a +1 from me.

On the tightening up the docs side of things. I can see various places that we need to work on. Here are my suggestions:

Address.country

  • Edit description to say "The 2-digit country code (ISO 3166-1) for this address."

Country

  • Edit Country description to say "A country MUST have a name. A country SHOULD have a 2-digit county code (ISO 3166-1)."
  • Make Country.name required in the schema.
  • Edit Country.code description to say "The 2-digit country code (ISO 3166-1) for this country."

Jurisdcition.code

  • Edit description to say "The 2-digit country code (ISO 3166-1), or the subdivision code (ISO 3166-2) for the jurisdiction."

SecuritiesListing.stockExchangeJurisdiction

Edit description to say "The 2-digit country code (ISO 3166-1), or the subdivision code (ISO 3166-2) for the jurisdiction under which the exchange, market or trading platform is regulated."

I guess we have three options:

On those 3 options on the validation side of things, @odscjames is working on validation issues at the moment and may have thoughts. I suggest spinning up a ticket in lib-cove or lib-cove-bods and taking this discussion there. This ticket will then solely relate to the standard changes.

@jpmckinney
Copy link

A digit is 0-9. Please use "2-letter" :)

@odscjames
Copy link
Collaborator

Merged to master

@odscjames odscjames reopened this Nov 4, 2021
@odscjames
Copy link
Collaborator

Sorry - leaving open for the other part of this - validation in Cove & other places

@kd-ods
Copy link
Collaborator

kd-ods commented Nov 4, 2021

@odscjames - are you happy to do that cove thinking too & add any related ticket to a cove library? (Then we can close this issue.)

@odscjames
Copy link
Collaborator

openownership/cove-bods#78 created, closing this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants