-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wait for a stable cluster in the suite #51
base: main
Are you sure you want to change the base?
Conversation
Some remediations are more invasive than others, and make changes to the cluster that require time to propagate through the system. Before the suite starts running subsequent scans, we should wait for it to become stable so that we know the remediations at least applied properly, or at the very least didn't make things worse.
@@ -673,6 +673,14 @@ func (ctx *e2econtext) waitForMachinePoolUpdate(t *testing.T, name string) { | |||
} | |||
} | |||
|
|||
func (ctx *e2econtext) waitForStableCluster() error { | |||
_, err := exec.Command("oc", "adm", "wait-for-stable-cluster", "--minimum-stable-period=2m").Output() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The assumption here is that we don't care about the command output, just that it doesn't timeout waiting for a stable cluster.
Using a client library here instead would be nice because it might give us more useful error messages without having to parse raw output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per the testing for PR ComplianceAsCode/content#12220, the remediation took about 25-30 minutes for a 6 node cluster. Otherwise the ingress or apisever will be in updating status..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could the cluster be modified to have a faster rollout? Machine config operate used to have such an option
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah - that's a significant increase in our testing times. I'll do some digging around to see if there is a way to speed this up.
Some remediations are more invasive than others, and make changes to the
cluster that require time to propagate through the system. Before the
suite starts running subsequent scans, we should wait for it to become
stable so that we know the remediations at least applied properly, or at
the very least didn't make things worse.