Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update bootimage management enhancement #1698

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
60 changes: 49 additions & 11 deletions enhancements/machine-config/manage-boot-images.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ superseded-by:

## Summary

This is a proposal to manage bootimages via the `Machine Config Operator`(MCO), leveraging some of the [pre-work](https://github.com/openshift/installer/pull/4760) done as a result of the discussion in [#201](https://github.com/openshift/enhancements/pull/201). This feature will only target standalone OCP installs. It will also be user opt-in and is planned to be released behind a feature gate.
This is a proposal to manage bootimages via the `Machine Config Operator`(MCO), leveraging some of the [pre-work](https://github.com/openshift/installer/pull/4760) done as a result of the discussion in [#201](https://github.com/openshift/enhancements/pull/201). This feature will only target standalone OCP installs. This is now released as an opt-in feature and will be rolled out on a per-platform basis (see projected roadmap). This will eventually be on by default, and the MCO will enforce an accepted skew and require non-platform managed bootimage updates to be acknowledged by the cluster admin.

For `MachineSet` managed clusters, the end goal is to create an automated mechanism that can:
- update the boot images references in `MachineSets` to the latest in the payload image
Expand All @@ -49,6 +49,8 @@ There has been [a previous effort](https://github.com/openshift/machine-config-o

In certain long lived clusters, the MCS TLS cert contained within the above Ignition configuration may be out of date. Example issue [here](https://issues.redhat.com/browse/OCPBUGS-1817). While this has been partly solved [MCO-642](https://issues.redhat.com/browse/MCO-642) (which allows the user to manually rotate the cert) it would be very beneficial for the MCO to actively manage this TLS cert and take this concern away from the user.

This is also a soft pre-requisite for both dual-stream RHEL support in OpenShift, and on-cluster layered builds. RPM-OSTree presently does a deploy-from-self to get a new-enough rpm-ostree to deploy image-based RHEL CoreOS systems, and we would like to avoid doing this for bootc if possible. We would also like to prevent RHEL8->RHEL10 direct updates once that is available for OpenShift.

### User Stories

* As an Openshift engineer, having nodes boot up on an unsupported OCP version is a security liability. By having nodes boot on the latest OCP supported boot image for a given OCP release, there will be less of a skew with the release payload image. This helps me avoid tracking incompatibilities across OCP release versions and shore up technical debt(see issues linked above).
Expand All @@ -59,7 +61,7 @@ In certain long lived clusters, the MCS TLS cert contained within the above Igni

The MCO will take over management of the boot image references and the stub Ignition configuration. The installer is still responsible for creating the `MachineSet` at cluster bring-up, but once cluster installation is complete the MCO will ensure that boot images are in sync with the latest payload. From the user standpoint, this should cause less compatibility issues as nodes will no longer need to pivot to a different version of RHCOS during node scaleup.

This should not interfere with existing workflows such as Hive and ArgoCD. As this is an opt-in mechanism, the cluster admin will be protected against such scenarios of accidental "reconciliation" and for additional safety, the MSBIC will also ensure that machinesets that have a valid OwnerReference will be excluded from boot image updates.
This should not interfere with existing workflows such as Hive and ArgoCD. As this is an opt-in mechanism, the cluster admin will be protected against such scenarios of accidental "reconciliation" and for additional safety, the MSBIC will also ensure that machinesets that have a valid OwnerReference will be excluded from boot image updates. We will work with affected teams to transition them to the new workflow before we turn this feature on by default.

### Non-Goals

Expand All @@ -77,7 +79,7 @@ __Overview__
- `ManagedBootImages` feature gate is active
- The cluster and/or the machineset is opted-in to boot image updates. This is done at the operator level, via the `MachineConfiguration` API object.
- The `machineset` does not have a valid owner reference. Having a valid owner reference typically indicates that the `MachineSet` is managed by another workflow, and that updates to it are likely going to cause thrashing.
- The golden configmap is verified to be in sync with the current version of the MCO. The MCO will update("stamp") the golden configmap with version of the new MCO image after atleast 1 master node has succesfully completed an update to the new OCP image. This helps prevent `machinesets` being updated too soon at the end of a cluster upgrade, before the MCO itself has updated and has had a chance to roll out the new OCP image to the cluster.
- The golden configmap is verified to be in sync with the current version of the MCO. The MCO will update("stamp") the golden configmap with version of the new MCO image after at least 1 master node has succesfully completed an update to the new OCP image. This helps prevent `machinesets` being updated too soon at the end of a cluster upgrade, before the MCO itself has updated and has had a chance to roll out the new OCP image to the cluster.

If any of the above checks fail, the MSBIC will exit out of the sync.
- Based on platform and architecture type, the MSBIC will check if the boot images referenced in the `providerSpec` field of the `MachineSet` is the same as the one in the ConfigMap. Each platform(gcp, aws...and so on) does this differently, so this part of the implementation will have to be special cased. The ConfigMap is considered to be the golden set of bootimage values, i.e. they will never go out of date. If it is not a match, the `providerSpec` field is cloned and updated with the new boot image reference.
Expand Down Expand Up @@ -115,23 +117,37 @@ Any form factor using the MCO and `MachineSets` will be impacted by this proposa

##### Supported platforms

The initial release(phase 0) will support GCP. In future releases, we will add in support for remaining platforms as we gain confidence in the functionality and understand the specific needs of those platforms. For platforms that cannot be supported, we aim to atleast provide documentation to perform the boot image updates manually. Here is an exhaustive list of all the platforms:
The initial release(phase 0) will support GCP. In future releases, we will add in support for remaining platforms as we gain confidence in the functionality and understand the specific needs of those platforms. For platforms that cannot be supported, we aim to at least provide documentation to perform the boot image updates manually. Here is an exhaustive list of all the platforms:

- gcp
- aws
- azure
- alibabacloud
- baremetal
- gcp
- ibmcloud
- nutanix
- powervs
- openstack
- powervs
- vsphere
- baremetal
- libvirt
- ovirt
- ibmcloud

This work will be tracked in [MCO-793](https://issues.redhat.com/browse/MCO-793).

##### Projected timeline

This is a tentative timeline, subject to change (GA = General Availability, TP = Tech Preview, EF = Enforced).

| Platform | TP | GA | EF |
| -------- | ------- | ------- | ------- |
| gcp | 4.16 |4.17 |4.19 |
| aws | 4.17 |4.18 |4.19 |
| vsphere | 4.19 |4.20 |4.21 |
| baremetal| |4.21 |4.22 |
| openstack| |4.21 |4.22 |
| nutanix | |4.22 |4.23 |
| ibmcloud | |4.23 |4.24 |
| non-managed | |4.20 |4.21 |

*non-managed in this case indicates enforcement of user-initiated bootimage updates. See enforcement section below.

##### Cluster API backed machinesets

As the Cluster API move is impending(initial release in 4.16 and default-on release in 4.17), it is necessary that this enhancement plans for the changes required in an CAPI backed cluster. Here are a couple of sample YAMLs used in CAPI backed `Machinesets`, from the [official Openshift documentation](https://docs.openshift.com/container-platform/4.14/machine_management/capi-machine-management.html#capi-sample-yaml-files-gcp).
Expand Down Expand Up @@ -490,6 +506,28 @@ This is a solution aimed at reducing usage of outdated artifacts and should not
How will UX be reviewed and by whom? TBD
The UX element involved include the user opt-in and opt-out, which is currently up for debate.

### Enforcement of bootimage skew

There should be some mechanism that will alert the user and fail if an unsupported bootimage is used. For example, the first boot pivot should fail if a RHEL8->RHEL10 pivot is attempted.

#### Acceptable skew

The release payload will describe the current skew policy. Generally speaking, we would like to keep the bootimage version aligned to the RHEL version we are shipping in the payload. For example, a 9.6 bootimage will be allowed until 9.8 is shipped via RHCOS. We would like to keep this customizeable, such that any major breaking changes outside of RHEL major/minor can still be enforced as a one-off.

#### Enforcement options

Some combination of the following mechanisms should be implemented to alert users, particularly non-machineset backed scaled environments. The options generally fall under proactive enforcement (require users to updated and acknowledge upon upgrading to a new version) vs. reactive enforcement (only fail when a non-compliant bootimage is being used to scale into the cluster).
Copy link
Member

@wking wking Oct 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"require users to updated and acknowledge upon upgrading to a new version" is still reactive, isn't it? The admin ack approach is proactive, so the admin is aware (and ideally can set things up ahead of time), before updating from 4.y to a 4.(y+1) that would require a newer boot image. Can we change "updated" to "update" and "upon" to "before" here?


1. Add an Admin Ack for bootimages during the update to which bootimage updates will be enforced/turned on by default. Optionally, we could add additional acks in prior releases to prepare customers on a per-platform basis, once opt-in is available.
2. Add a configmap to the cluster for non-machineset backed clusters, initially created by the MCO, to indicate the last user-updated bootimage. The user would need to update this configmap every few releases, when the RHEL minor on which the RHCOS container is built on changes (e.g. 9.6->9.8). The user may also delete this, which indicates that they will not require scaling nodes, and thereby opting out of bootimage updates and scaling functionality.
3. Have the MCS reject new ignition requests if:
a. in a machineset-backed cluster, the user has not opted into bootimage management
b. in a non-machineset backed cluster, they have a non-updated configmap referenced above
4. Add a service to be shipped via RHCOS/MCO templates, which will do a check on incoming OS container image vs currently booted RHCOS version. This runs on firstboot right after the MCD pulls the new image, and will prevent the node to rebase to the updated image if the drift is too far.


RHEL major versions will no longer be cross-compatible. i.e. if you wish to have a RHEL10 machineconfigpool, you must use a RHEL10 bootimage.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This creates a skew issue, right? E.g.

  1. Cluster is happy on an OCP that likes RHEL 9.
  2. ClusterVersion update requested for a new release that likes RHEL 10.
  3. MCO sets the MachineConfigPools up for a transition from the RHEL 9 target to the RHEL 10 target.
  4. Up through this point, MCO will target RHEL 9, and if you have a RHEL 10 boot image, the MCP will fail to scale.
  5. The first node being updated in the MCP successfully goes Ready=True (with disk space, and the other things that MCPs watch to decide the node is happy) on RHEL 10.
  6. From this point on, MCO will target RHEL 10 for new nodes scaling into this MCP, and if you have a RHEL 9 boot image, the MCP will fail to scale.

Trying to time the bootimage bump to exactly match the "MCP associated with these MachineSets has decided new nodes will be RHEL 10" seems tricky. But to avoid that timing issue, you'd either need RHEL 10 targets to be more flexible about boot image matching (e.g. both RHEL 9 and RHEL 10 boot images would work with RHEL 10 MCPs), or some way to select from RHEL 9 or RHEL 10 boot images at Machine-creation time depending on what the target MCP was expecting (and even then, there would still be a race if you selected a RHEL 9 boot image but the MCP got a happy RHEL 10 node before the new RHEL 9 Machine made it's MCS Ignition request).


### Drawbacks

TBD, based on the open questions below.
Expand Down