Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add retry for testrunner validation #65

Merged
merged 6 commits into from
Jun 26, 2024

Conversation

fredjn
Copy link
Member

@fredjn fredjn commented Jun 3, 2024

Description of the Change

Adding retry when validation test runners withing test suite validation. This will increase resilience against random network outages and other intermittent non-permanent network related problems.

Alternate Designs

I could have added the retries on several places in the code as there are chains of head requests made throughout the validation code. Although since the validation is fairly quick i opted for one retry to rule them all.

Possible Drawbacks

The backoff is currently very rudimentary and short lived, we will not "survive" longer outages with the current code.

Sign-off

Developer's Certificate of Origin 1.1

By making a contribution to this project, I certify that:

(a) The contribution was created in whole or in part by me and I
have the right to submit it under the open source license
indicated in the file; or

(b) The contribution is based upon previous work that, to the best
of my knowledge, is covered under an appropriate open source
license and I have the right under that license to submit that
work with modifications, whether created in whole or in part
by me, under the same open source license (unless I am
permitted to submit under a different license), as indicated
in the file; or

(c) The contribution was provided directly to me by some other
person who certified (a), (b) or (c) and I have not modified
it.

(d) I understand and agree that this project and the contribution
are public and that a record of the contribution (including all
personal information I submit with it, including my sign-off) is
maintained indefinitely and may be redistributed consistent with
this project or the open source license(s) involved.

Signed-off-by: Fredrik Fristedt <[email protected]>

@fredjn fredjn requested a review from a team as a code owner June 3, 2024 11:42
@fredjn fredjn requested review from t-persson and andmat900 and removed request for a team June 3, 2024 11:42
result = await docker.pull(test_runner)
if result:
break

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a log print maybe? I guess opentelemetry will not work here, but maybe just a print to see when retries are done.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I'll add that.

Copy link
Member Author

@fredjn fredjn Jun 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's not too hard we should add some data to an otel span here describing how many retries were done.
This is not required. We could add it to metrics later instead.

@@ -191,6 +193,13 @@ async def validate(self, test_suite_url):
test_runners.add(constraint.value)
docker = Docker()
for test_runner in test_runners:
for _ in range(3):
result = await docker.pull(test_runner)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is a complete pull necessary here? Can it be avoided?

Copy link
Member Author

@fredjn fredjn Jun 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC there is no other way to verify that the image is available for download. @t-persson you're the original author, do you remember?

Copy link
Member Author

@fredjn fredjn Jun 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait a moment, looking at this more closely that shouldn't be docker.pull() at all. The Docker() class doesn't even have a pull() method. This is most likely a CoPilot hallucination that slipped past my weary eyes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That also means that there is no pulling of the actual image.

Copy link
Member Author

@fredjn fredjn Jun 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@fredjn fredjn requested a review from andmat900 June 4, 2024 09:26
@fredjn fredjn force-pushed the dev/retry-suite-validation branch from dc421c9 to 2bc0d0f Compare June 20, 2024 12:52
@fredjn fredjn merged commit 672f982 into eiffel-community:main Jun 26, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants