Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry after failing data transfer #24

Closed
wagdav opened this issue Sep 9, 2019 · 6 comments
Closed

Retry after failing data transfer #24

wagdav opened this issue Sep 9, 2019 · 6 comments

Comments

@wagdav
Copy link
Contributor

wagdav commented Sep 9, 2019

Hi @lehins!

We're using cache-s3 in a hybrid infra: we have workers in the cloud and on-premises as well. cache-s3 is working great, but sometimes uploading the .cache file to S3 fails:

# Saving the compilation cache...
$ cache-s3 --prefix=macos save --relative-path ./ccache

[cache-s3][Info ][Mon,  2 Sep 2019 07:02:04 UTC]: Caching: ./ccache
[cache-s3][Info ][Mon,  2 Sep 2019 07:02:17 UTC]: <cache-s3/macos/master.cache> - No previously stored cache was found.
[cache-s3][Info ][Mon,  2 Sep 2019 07:02:17 UTC]: <cache-s3/macos/master.cache> - Data change detected, caching 65.6 MiB with sha256:     sKL7Z16jyBiNDy+nZl+NSxyLVJtwM8aR0wvJYB+ZRKE=
[cache-s3][Error][Mon,  2 Sep 2019 07:03:29 UTC]: <cache-s3/macos/master.cache> - Critical HTTPException

The resulting cache database is only 65 MB, but it took more than an hour to finish the compilation (because we started from an empty cache). As the HTTP error fails our build, we have to start again the whole build.

To solve this issue I propose to implement a retry mechanism for the data transfer. I was looking at the retry package which could try to recover in case of an HTTPException.

I'm not sure why this exception occurs, so probably I'd start by fixing #18

What do you think? If you agree with this direction I'd be happy to submit a PR.

@lehins
Copy link
Contributor

lehins commented Sep 9, 2019

@wagdav I made a long overdue upgrade of lts snapshot as well as improved exception reporting. So if you wanna give that version a shot it is on the branch switch-to-lts-14

With regards to retry mechanism I would not be opposed to it. Having an optional flag --retry-upload=<n>, where default value for n of something sensible like 3 or 5 tries would be good. At the same time using retry package I think is an overkill. Adding a counter with a threadDelay to runLoggingAWS that recursively calls itself until tries are exhausted would be a much easier approach, I think.

If you decide to submit a PR, start of with switch-to-lts-14 branch, since I'll be merging it to master soon.

@wagdav
Copy link
Contributor Author

wagdav commented Sep 19, 2019

This was addressed in PR #25, will be part of the next release.

@lehins
Copy link
Contributor

lehins commented Sep 19, 2019

It was indeed

@lehins lehins closed this as completed Sep 19, 2019
@wagdav
Copy link
Contributor Author

wagdav commented Sep 24, 2019

@lehins Do you have an ETA when you can tag a new release? Thanks!

@lehins
Copy link
Contributor

lehins commented Sep 24, 2019

So, @wagdav because of #26 we have two possible versions:

  • v0.1.9 - concurrent uploads, but no retry
  • v0.1.10 - retry functionality, but no concurrent uploads.

Hopefully next version will not have to make any compromises :)

Despite that we use cache-s3 internally, unfortunately, neither of those features are critical for us, so I can't spend much more time right now on getting those issues resolved. Naturally, PRs are always welcome and of course, I can always spend more time and take care of it myself, FP Complete is always happy to work with new customers ;)

@wagdav
Copy link
Contributor Author

wagdav commented Sep 25, 2019

@lehins Thanks a lot!

Recently we managed to cut down the cache sizes so we probably could live without the concurrent upload for now.

Also, thanks for the constant support, cache-s3 works great for us, and I'll try to keep contributing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants