-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lack of context for failing reconciliation #498
Comments
Take a look at the events of that object using |
I have a pretty good idea what's the underlying issue here. But still would be nice to see more info in the error itself. Current idea: we have 1 git repo for 2 clusters. Both with webhooks for image updates. It's quite likely a race condition between the 2 Flux instances (one in each cluster). By the time one cluster tries to push, its HEAD is no longer up-to-date as it was already altered by the other cluster. Saw the same with Flux v1, but we didn't care for its logs as much as we do with v2 😉 Assuming the above to be correct, it'd be nice for it to retry (basically rebase?) instead of failing. On the other hand, we should probably invest some time in proper monitoring & alerting, instead of just dumping everything to slack directly. As Kubernetes is kinda all about "it's okay to fail, sometimes", being (primarily) stateless and all that. |
Oh I see. Agree that the |
I have the same error.. also no context and have no idea what's going on here :) |
There are a couple of reports of this type of failure (or potentially unrelated failures eg. git error code 128) that are showing up in the Slack channel, I haven't seen them filter down to reports for IAC as of yet, but something to be aware of. I will load up some Image Update Automation controls today or tomorrow and try to reproduce this issue, one or the other issue, there is not much context to go on for what is causing the failure. I understand this report is not about one specific failure, but the general case of failure not being reported very clearly with a good obvious link to a really specific root cause.
This is another example of that. This is the error returned from Git, and I'm not sure how much helpful parsing we can do, but to refocus, the subject of this report is about making it clearer what has gone wrong when IAC fails. Maybe we can come up with some common failure scenarios and start classifying errors to raise those as conditions, based on a pattern matching. |
We have migrated our repository to another provider (migrated to gitlab from bitbucket.org). And seems like these errors are gone. |
I'm having this same issue and have gone down the route of changing gitImplementation to use libgit2, but we're using source.toolkit.fluxcd.io/v1beta2 so this has been deprecated, (https://github.com/fluxcd/source-controller/blob/main/docs/spec/v1beta1/gitrepositories.md#git-implementation) Furthermore v1beta2 recommends setting @mantasaudickas, We're also using bitbucket and have the scenario of multiple clusters using the same repo. Are you having good results so far? |
I have switched one project to gitlab and another to github (2 independent clients). So far error message "object not found" is gone in both of them. I did not tried your mentioned options. The reason for switch actually was bitbucket issue - that once FluxCD makes a push - its not possible to get that last push anymore (while it is visible using UI, but not fetchable to local copies and not visible in git command line history)... I don't know if its a flux or bitbucket issue, but it was solved by migrating to other providers. |
We've seen the same behaviour and have raised a support ticket with Bitbucket... still no solution though. |
Same here, since it was blocking us - we switched manifest repository location to another provider.. and now thinking to switch everything :) |
Also getting the "Object not found" error using flux with Bitbucket. Imageautomation gets stuck starting with |
@tobiasjochheimenglund We're seeing the same behavior's; image updater is returning @dewe @mantasaudickas Do you have any more technical details you sent to bitbucket to push the problem onto them if it does seem to be bitbucket specific? Bitbucket didnt resolve for us either, though were helpful and pointed me towards git shallow clone potentially causing the issue. My support ticket was less technical and more a query about shallow clone, repo health and fluxcd. I'll report back with any findings. |
They did not asked for any technical details.. all their communication sounded more like: please check that, or that and maybe we can do GC for your repo.. and I did not heard from them since last Friday :) |
Hello there, we have the same issue. our configuration is multicluster with different branches on the same Bitbucket repo. |
@PaulSxxxs At the same time we get |
We had a similar issue committing from any git client for a time when "object not found" was occurring. |
I received this message from Bitbucket:
|
Wondering if "object not found" issue will be fixed, or its related to something else :) |
I specifically spoke to them about "object not found" and gave some technical details ... i'm fairly sure it will fix this. |
Can't find any apparently related issue over at go-git... 🤔 |
As I happen to be a go-git maintainer as well, we would be really happy to see an issue being created in |
We use both flux and bitbucket and have been absolutely pulling our hair out over this issue. For what it's worth we found that moving from https:// to ssh:// git URLs seemed to make the behavior go away. That isn't always practical to do however so here's hoping that bitbucket's fix works out. |
Bitbucket rolled their fix, and for us everything has been working perfectly again. |
Yeah... I reverted my manifests as well to bitbucket, so it works - but I am again getting "object not found" messages :) |
Given that these |
@tun0 Image Automation Controller would automatically retry on the next reconciliation, so yes it should be safe to disregard the "one-off" However, if you do find a pattern where you can reliably reproduce the issue, please report it upstream so it can be investigated and fixed. |
It is still happening with Bitbucket Cloud :) |
Then please report it upstream with more details around any patterns you observe while the error occurs (or e.g. information about the contents of your repository, size, etc.)? There is little we can do from within the context of this repository, and it really has to be addressed there. Thanks for your cooperation. |
This doesn't provide enough context to determine what actually is going wrong here.
The text was updated successfully, but these errors were encountered: