-
Notifications
You must be signed in to change notification settings - Fork 471
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update bootimage management enhancement #1698
base: master
Are you sure you want to change the base?
Update bootimage management enhancement #1698
Conversation
Add some timeline and enforcement options.
Skipping CI for Draft Pull Request. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
||
#### Enforcement options | ||
|
||
Some combination of the following mechanisms should be implemented to alert users, particularly non-machineset backed scaled environments. The options generally fall under proactive enforcement (require users to updated and acknowledge upon upgrading to a new version) vs. reactive enforcement (only fail when a non-compliant bootimage is being used to scale into the cluster). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"require users to updated and acknowledge upon upgrading to a new version" is still reactive, isn't it? The admin ack approach is proactive, so the admin is aware (and ideally can set things up ahead of time), before updating from 4.y to a 4.(y+1) that would require a newer boot image. Can we change "updated" to "update" and "upon" to "before" here?
4. Add a service to be shipped via RHCOS/MCO templates, which will do a check on incoming OS container image vs currently booted RHCOS version. This runs on firstboot right after the MCD pulls the new image, and will prevent the node to rebase to the updated image if the drift is too far. | ||
|
||
|
||
RHEL major versions will no longer be cross-compatible. i.e. if you wish to have a RHEL10 machineconfigpool, you must use a RHEL10 bootimage. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This creates a skew issue, right? E.g.
- Cluster is happy on an OCP that likes RHEL 9.
- ClusterVersion update requested for a new release that likes RHEL 10.
- MCO sets the MachineConfigPools up for a transition from the RHEL 9 target to the RHEL 10 target.
- Up through this point, MCO will target RHEL 9, and if you have a RHEL 10 boot image, the MCP will fail to scale.
- The first node being updated in the MCP successfully goes
Ready=True
(with disk space, and the other things that MCPs watch to decide the node is happy) on RHEL 10. - From this point on, MCO will target RHEL 10 for new nodes scaling into this MCP, and if you have a RHEL 9 boot image, the MCP will fail to scale.
Trying to time the bootimage bump to exactly match the "MCP associated with these MachineSets has decided new nodes will be RHEL 10" seems tricky. But to avoid that timing issue, you'd either need RHEL 10 targets to be more flexible about boot image matching (e.g. both RHEL 9 and RHEL 10 boot images would work with RHEL 10 MCPs), or some way to select from RHEL 9 or RHEL 10 boot images at Machine-creation time depending on what the target MCP was expecting (and even then, there would still be a race if you selected a RHEL 9 boot image but the MCP got a happy RHEL 10 node before the new RHEL 9 Machine made it's MCS Ignition request).
Add some timeline and enforcement options.
cc @dustymabe @jlebon