Schema versioning and deployment proposal #787

mferrera · 2024-09-19T10:46:25Z

Currently every change we make to the schema incurs a high risk of service interruption because we do not have a fully automated, consistent, and reproducible deployment regime.

From the fmu-dataio perspective, deployment should ideally look like so:

fmu-dataio 3.0.0 represents and produces version 3.0.0 of the schema
fmu-dataio 3.1.2 represents and produces version 3.0.0 of the schema
A new optional field is added
- fmu-dataio 3.2.0 represents and produces version 3.1.0 of the schema

This means that these versions should be decoupled.

It tracks the evolution of how data is produced
It is more easily auditable if some version of dataio begins producing data differently
It is consistent
It is reproducible
It provides backward compatibility by always allowing validation against an existing schema, even if that schema is not the latest

Schema versioning

The schema is already versioned by semantic versioning. This gives every schema version a specific number in the form X.Y.Z where X is the major version number, Y is the minor version number, and Z is the patch version number.

Schema version numbers change when a schema update is made. When deciding what version a changed schema should become the primary concern should be whether or not it is backward compatible. Backwards compatibility is broken if metadata generated for and valid against a previous version is invalid against the updated version.

Therefore schema version numbers should change like so:

Major

Any schema change that breaks backwards compatibility with metadata created using the previous version. These scenarios are candidates for a major version change:

Adding a required field
Removing a required or optional field
Moving an optional field to a required field
Changing the name or a field
Changing the type of a field (e.g. number to string)
Removing a value from a controlled vocabulary (e.g. 'OWC' is no longer a valid contact [unlikely, but an example!])
Adding a regular expression to a field

Minor

Any schema change that ensures backwards compatibility with metadata created using the previous version.

Adding an optional field
Making a required field optional
Changing a field from a controlled vocabulary to free text without changing the field type
Removing a regular expression from a field

Patch

Any change to auxiliary information that does not affect the structure or semantics of the schema itself. Also, any bug fixes to the schema.

Adding or updating the field description to improve readability
Adding or updating the field example, comment, or user-friendly name
Extending a controlled vocabulary enumeration
Fixing an incorrect regular expression

Initial impact

Sumo will need to reference the schema url from the metadata.

This should be the only initial impact. In practical terms, nothing else changes except that the schema version number will tick up according to the above versioning conditions. As long as we continue to make all changes backward compatibility, i.e. we continue to work toward a version 1.0.0 of the schema, from the consumer perspective nothing is changing except they have metadata on the metadata to tie the ongoing changes to.

Deployment

fmu-dataio 3.2.0 is released
This schema is deployed to radix as schemas/3.0.0/fmu_results.json, or schemas/fmu_results-3.0.0.json
This schema exists as a real file always committed to this repository (?)
- We could start generating these for radix by checking out every version tag and writing it... but that is probably less ideal
All metadata produced with the schema is self-referential, i.e. points to schema which produced and can validate it
fmu-dataio is now staged for release to Komodo + RMS
- Each Komodo version points to a distinct RMS version that contains the same fmu-dataio version (in progress!)
- Metadata should be consistent and reproducible between the RMS and Komodo versions now, 1 to 1
When uploaded to Sumo, Sumo should validate metadata against the schema url referenced within the metadata
Consumers can also reference this as needed

Or, `fmu-schemas`

Another, possibly better solution is to host and add schema updates statically to their own repository as it could be cumbersome to continue to stack them here.

Open questions

How does this affect consumers and their expectations about what exists in metadata?
- Suppose fmu-dataio 3.0.0 adds spec.num_rows
- ConsumerA wants to display this property
- Is this pattern fine?
- ```
 metadata = get_metadata()
 if version.parse(metadata.version) >= version.parse("3.0.0"):
     do_something_with(metadata.num_rows)
```
These sorts of version expectations are burdensome for consumers, but offer consistent and long-term guarantees. I.e. once version 3.0.0 is released, every version prior to it cannot possibly have spec.num_rows so logic built to handle this can persist long-term.
- However, if we are inconsiderate with our changes this can lead to a miasma of spaghetti conditionals for consumers to handle. Therefore we would need a sensible strategy attaching metadata changes to a version
- A sensible strategy is bundling them into major versions. This makes sense from semantic versioning perspective and also makes version checking simpler, i.e. it'd become cumbersome if version 3.1.0 added spec.num_columns and version 3.2.1 added spec.num_awesome_columns
Despite these hurdles, even if some extra conditionals are added, it gives consumers predictive power so that they can tie functionality to something concrete rather than trying to infer it or deal with optional patterns like
- ```
 metadata = get_metadata()
 if hasattr(metadata, "num_rows"):
     do_something_with(metadata.num_rows)
```
There are of course a number of possible issues not yet contained here

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Schema versioning and deployment proposal #787

Schema versioning and deployment proposal #787

mferrera commented Sep 19, 2024 •

edited

Loading

Schema versioning and deployment proposal #787

Schema versioning and deployment proposal #787

Comments

mferrera commented Sep 19, 2024 • edited Loading

Schema versioning

Major

Minor

Patch

Initial impact

Deployment

Or, fmu-schemas

Open questions

mferrera commented Sep 19, 2024 •

edited

Loading

Or, `fmu-schemas`