-
Notifications
You must be signed in to change notification settings - Fork 431
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Composer object store download retry #3140
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some failing tests but overall looks good. Let's clean up the unused import and pyright ignores. This would retry with every exception though, not just the ones that aren't retried from the object store. So you would 5x the number of retries that were already happening.
@b-chu , for the retries that already happening, if the oci internal retry works, then we don't do extra retry. Only if the internal retry fails, we do extra retry (i think that's kind of what we want. :) ). btw, the test error is unrelated, it's fixed in this PR #3142 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple of minor comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the cleanup!
* retry * up * up * up * fix * up * a * up * up * up * up * lint * up * up * lint
What does this PR do?
add object_store checkpoint downloading retry
test
bigning-debug-cp-loading-2-J6z0wB ConnectionError
in the logging, there is one rank succeeded at 2nd attempt.
another run: bigning-debug-cp-loading-2-9Lq8am ChunkedEncodingError