You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently every change we make to the schema incurs a high risk of service interruption because we do not have a fully automated, consistent, and reproducible deployment regime.
From the fmu-dataio perspective, deployment should ideally look like so:
fmu-dataio 3.0.0 represents and produces version 3.0.0 of the schema
fmu-dataio 3.1.2 represents and produces version 3.0.0 of the schema
A new optional field is added
fmu-dataio 3.2.0 represents and produces version 3.1.0 of the schema
This means that these versions should be decoupled.
It tracks the evolution of how data is produced
It is more easily auditable if some version of dataio begins producing data differently
It is consistent
It is reproducible
It provides backward compatibility by always allowing validation against an existing schema, even if that schema is not the latest
Schema versioning
The schema is already versioned by semantic versioning. This gives every schema version a specific number in the form X.Y.Z where X is the major version number, Y is the minor version number, and Z is the patch version number.
Schema version numbers change when a schema update is made. When deciding what version a changed schema should become the primary concern should be whether or not it is backward compatible. Backwards compatibility is broken if metadata generated for and valid against a previous version is invalid against the updated version.
Therefore schema version numbers should change like so:
Major
Any schema change that breaks backwards compatibility with metadata created using the previous version. These scenarios are candidates for a major version change:
Adding a required field
Removing a required or optional field
Moving an optional field to a required field
Changing the name or a field
Changing the type of a field (e.g. number to string)
Removing a value from a controlled vocabulary (e.g. 'OWC' is no longer a valid contact [unlikely, but an example!])
Adding a regular expression to a field
Minor
Any schema change that ensures backwards compatibility with metadata created using the previous version.
Adding an optional field
Making a required field optional
Changing a field from a controlled vocabulary to free text without changing the field type
Removing a regular expression from a field
Patch
Any change to auxiliary information that does not affect the structure or semantics of the schema itself. Also, any bug fixes to the schema.
Adding or updating the field description to improve readability
Adding or updating the field example, comment, or user-friendly name
Extending a controlled vocabulary enumeration
Fixing an incorrect regular expression
Initial impact
Sumo will need to reference the schema url from the metadata.
This should be the only initial impact. In practical terms, nothing else changes except that the schema version number will tick up according to the above versioning conditions. As long as we continue to make all changes backward compatibility, i.e. we continue to work toward a version 1.0.0 of the schema, from the consumer perspective nothing is changing except they have metadata on the metadata to tie the ongoing changes to.
Deployment
fmu-dataio 3.2.0 is released
This schema is deployed to radix as schemas/3.0.0/fmu_results.json, or schemas/fmu_results-3.0.0.json
This schema exists as a real file always committed to this repository (?)
We could start generating these for radix by checking out every version tag and writing it... but that is probably less ideal
All metadata produced with the schema is self-referential, i.e. points to schema which produced and can validate it
fmu-dataio is now staged for release to Komodo + RMS
Each Komodo version points to a distinct RMS version that contains the same fmu-dataio version (in progress!)
Metadata should be consistent and reproducible between the RMS and Komodo versions now, 1 to 1
When uploaded to Sumo, Sumo should validate metadata against the schema url referenced within the metadata
Consumers can also reference this as needed
Or, fmu-schemas
Another, possibly better solution is to host and add schema updates statically to their own repository as it could be cumbersome to continue to stack them here.
Open questions
How does this affect consumers and their expectations about what exists in metadata?
These sorts of version expectations are burdensome for consumers, but offer consistent and long-term guarantees. I.e. once version 3.0.0 is released, every version prior to it cannot possibly have spec.num_rows so logic built to handle this can persist long-term.
However, if we are inconsiderate with our changes this can lead to a miasma of spaghetti conditionals for consumers to handle. Therefore we would need a sensible strategy attaching metadata changes to a version
A sensible strategy is bundling them into major versions. This makes sense from semantic versioning perspective and also makes version checking simpler, i.e. it'd become cumbersome if version 3.1.0 added spec.num_columns and version 3.2.1 added spec.num_awesome_columns
Despite these hurdles, even if some extra conditionals are added, it gives consumers predictive power so that they can tie functionality to something concrete rather than trying to infer it or deal with optional patterns like
Currently every change we make to the schema incurs a high risk of service interruption because we do not have a fully automated, consistent, and reproducible deployment regime.
From the fmu-dataio perspective, deployment should ideally look like so:
This means that these versions should be decoupled.
Schema versioning
The schema is already versioned by semantic versioning. This gives every schema version a specific number in the form X.Y.Z where X is the major version number, Y is the minor version number, and Z is the patch version number.
Schema version numbers change when a schema update is made. When deciding what version a changed schema should become the primary concern should be whether or not it is backward compatible. Backwards compatibility is broken if metadata generated for and valid against a previous version is invalid against the updated version.
Therefore schema version numbers should change like so:
Major
Any schema change that breaks backwards compatibility with metadata created using the previous version. These scenarios are candidates for a major version change:
Minor
Any schema change that ensures backwards compatibility with metadata created using the previous version.
Patch
Any change to auxiliary information that does not affect the structure or semantics of the schema itself. Also, any bug fixes to the schema.
Initial impact
This should be the only initial impact. In practical terms, nothing else changes except that the schema version number will tick up according to the above versioning conditions. As long as we continue to make all changes backward compatibility, i.e. we continue to work toward a version 1.0.0 of the schema, from the consumer perspective nothing is changing except they have metadata on the metadata to tie the ongoing changes to.
Deployment
schemas/3.0.0/fmu_results.json
, orschemas/fmu_results-3.0.0.json
Or,
fmu-schemas
Another, possibly better solution is to host and add schema updates statically to their own repository as it could be cumbersome to continue to stack them here.
Open questions
spec.num_rows
spec.num_rows
so logic built to handle this can persist long-term.3.1.0
addedspec.num_columns
and version3.2.1
addedspec.num_awesome_columns
The text was updated successfully, but these errors were encountered: