Velero schedule is replaced on every helm upgrade #298

branttaylor · 2021-09-08T17:21:40Z

What steps did you take and what happened:

Every time we run a helm upgrade in CI, the schedule is replaced due to this:
https://github.com/vmware-tanzu/helm-charts/blob/main/charts/velero/templates/schedule.yaml#L12
This causes Velero to run a new backup every time our CI pipeline runs.

What did you expect to happen:

The schedule should not be replaced every time, or there should at least be a way to prevent a new backup running when a new schedule resource is created.

Environment:

helm version (use helm version): 3.3.0
helm chart version and app version (use helm list -n <YOUR NAMESPACE>): chart: velero-2.23.6 app version: 1.6.3
Kubernetes version (use kubectl version): 1.20.9
Kubernetes installer & version: RKE 1.2.11
Cloud provider or hardware configuration: vSphere 6.7
OS (e.g. from /etc/os-release): Ubuntu 20.04.2 LTS

The text was updated successfully, but these errors were encountered:

jenting · 2021-09-09T01:13:05Z

In my perspective, it should be fixed in Velero core, ref to vmware-tanzu/velero#1980.
After that, every time when a new Schedule CR setting, Velero won't trigger a new backup immediately unless the CronJob setting is triggered.

branttaylor · 2021-09-09T01:27:23Z

Thank you @jenting, I see the referenced issue is nearly 2 years old. Hopefully it gets resolved soon.

jenting · 2021-09-09T01:29:39Z

Thank you @jenting, I see the referenced issue is nearly 2 years old. Hopefully it gets resolved soon.

I saw it be scheduled to v1.8.0, hope it'll be fixed along with that release version.

javanthropus · 2021-09-14T21:40:03Z

I think the fact that the schedule resource is deleted and then re-created is actually causing loss of backup resources that are tied to them when using the useOwnerReferencesInBackup setting of the schedule under ArgoCD. In this case, the backup's owner is deleted, and if that alone is not enough to trigger the removal of the backup resource, Argo would then identify the backup as something to be managed, see that the chart itself does not create such a resource, and then prune the resource.

In other words, the identified helm hook does not make sense for schedule resources. Why is that being done in the first place?

jenting · 2021-09-23T08:17:14Z

In other words, the identified helm hook does not make sense for schedule resources. Why is that being done in the first place?

The purpose we want to achieve before is that the helm upgrade did not upgrade the CRD manifest.
It makes us have lots of issues related to it that the user uses old CRD but with new Velero core or CR.

After digging the helm doc, the best way we could think of is to create a job to upgrade the CRD.
And makes sure the job is finished and applies the Backups/Restores/Schedules CR into the cluster.
We did not notice that the Backups/Restores/Schedules CR would be recreated before.

As far as I know, the ArgoCD did some enhancement on managed the helm chart. We did hit some issues before. However, it's really hard to make the chart compatible with to native helm command and GitOps tool (ArgoCD, FluxCD, etc...). We could do is focus on the native helm command work correctly and verify it on the CI pipeline.

javanthropus · 2021-09-23T14:22:38Z

ArgoCD, unlike helm, will perform CRD upgrades unconditionally. It uses helm template to render all the manifests in order to apply them itself, so it doesn't have helm's CRD upgrade limitation.

Probably the only way to handle this for both native helm and ArgoCD is to offer an additional chart that only supplies the CRDs as regular templates: https://helm.sh/docs/chart_best_practices/custom_resource_definitions/#method-2-separate-charts. I think doing this and making that new chart a dependency of this one would solve the CRD upgrade issue without the need to use hooks at all.

jenting · 2021-09-23T14:50:26Z

Thanks for the suggestion and we did consider separate the chart before.
But I can't remember the main reason on why we did not choose this solution before.

Anyway, for this chart we would want to make sure helm works well. But for GitOps tool, we could try our best to support it.

javanthropus · 2021-09-23T14:59:21Z

Regardless of the method used to operate helm, the current hook will cause backups associated with a schedule to be deleted when the chart is re-applied for any reason if the useOwnerReferencesInBackup setting is included. We need to use that setting to prevent ArgoCD from nuking the backups itself, so we're stuck with building a wrapper chart that provides the schedule resources itself without any of the hooks defined.

Would it be possible to add a feature that allows disabling all the hooks used to work around helm's CRD management issues? That would satisfy our needs for using this chart with ArgoCD and should be pretty simple to implement without risking the current behavior for helm native users.

jenting · 2021-09-23T15:04:31Z

Sure, we did have that before but after implementing the upgrade CRD job we remove the setting. We could add it back.

javanthropus · 2021-09-23T15:07:00Z

That would be great! Thanks.

jsnider-mtu · 2021-11-03T17:59:00Z

I've just run into this issue and now have backups that exist in S3 but the owner ref has a different UID than the newly created schedule, causing Velero to continuously find them and attempt to sync. Is there anyway to rescue those backups or is that a lost cause?

jsnider-mtu · 2021-11-03T18:08:13Z

I was able to modify the velero-backup.json file in S3 and change the UID of the owner schedule to match the newly created one. They are now syncing correctly.

lucagouty · 2022-03-29T22:47:52Z

EDIT: ok now I think I get it, those annotations are just to be sure we create those resources after upgrade right? So CRDS and other things already exists. But so far it didn't cause any issue for me to remove it.

To be honest I don't see the link with vmware-tanzu/velero#1980

Afaik the recreation of Schedule on each helm upgrade is because of the 2 hooks annotations, and that fix on Velero core won't change that : as long as those annotations are in the chart Schedule will be deleted on every helm upgrade.

When using helm upgrade --debug I can see those 2 annotations trigger deletions for BackupStorageLocation, VolumeSnapshotLocation and Schedule but I've no idea why is that necessary :

client.go:299: [debug] Starting delete for "default" BackupStorageLocation
client.go:128: [debug] creating 1 resource(s)
client.go:299: [debug] Starting delete for "default" VolumeSnapshotLocation
client.go:128: [debug] creating 1 resource(s)
client.go:299: [debug] Starting delete for "velero-xxx-daily" Schedule
client.go:128: [debug] creating 1 resource(s)
...

So anyway here's how I "fixed" it, but as I don't understand why there's those annotations I'm not sure it's a good idea 😓

I always encapsulated those kind of charts into my own :

# /Chart.yaml

apiVersion: v2
name: velero
description: A Helm chart to install a Velero server
version: 1.0.7
appVersion: 1.0.7
dependencies:
  - name: velero
    version: 2.29.1
    repository: https://vmware-tanzu.github.io/helm-charts

Then all I needed to do was copy templates/schedule.yaml into my own templates folder, remove the 2 hooks annotations and add schedules to the root of my values :

# /values.yaml

schedules:
   # Schedules created by the template without hooks annotations
   some-schedule:
      ...

velero:
   # Velero values, if you set schedules here it will be recreated on each upgrade
   ...

Now it doesn't recreates schedules on each helm upgrade.

If it's not a problem to remove those annotations, maybe the chart should allow to disable it conditionally? I don't remember seeing hooks annotations outside of jobs tbh

javanthropus · 2022-03-29T23:26:44Z

We ultimately ended up doing the same thing in a wrapper chart, @lucagouty.

sotiriougeorge · 2022-07-26T13:26:07Z

Wouldn't that be solved if the original chart was modified to allow (through its values.yaml) to modify the hooks for the Schedule resource? Sorry chart newbie here trying to see if my logic stands.

javanthropus · 2022-07-26T13:37:39Z

Yes. I requested that earlier in this conversation, and @jenting said that the feature could be added back in.

machine424 · 2022-11-23T18:41:15Z

Hello,

We also suffer from this "workaround". As it was suggested above, another solution to manage CRDs (we consider it to be simpler and less hacky) is to put them in their own Chart (as suggested by Helm https://helm.sh/docs/chart_best_practices/custom_resource_definitions/#method-2-separate-charts), we have been doing it this way for all our CRDs for almost a year and it works well (we use Helmfile to ease the management of multiple releases)

We host velero's CRD Chart here https://github.com/wiremind/wiremind-helm-charts/tree/main/charts/velero-crds, if you want we can move it to this repo.

PS: We're trying to do the same for prometheus-operator's CRD prometheus-community/helm-charts#2612

jenting · 2022-11-24T06:03:10Z

@machine424
Thank you for the feedback. I'm curious about the Helmfile that manages multiple releases.
Would you mind sharing the Helmfile information more?

machine424 · 2022-11-24T18:52:30Z

@jenting

For CRDs, we have a helmfile.yaml file that looks like

...
repositories:
  - name: wiremind
    url: https://wiremind.github.io/wiremind-helm-charts

templates:
  common: &common
    namespace: crds
    missingFileHandler: Error

releases:
  - name: velero-crds
    <<: *common
    chart: wiremind/velero-crds
    version: x.y.z
  - name: prometheus-operator-crds
    <<: *common
    chart: wiremind/prometheus-operator-crds
    version: x.y.z
...

And then a CD Pod (that has only permissions to create/edit CRDs) runs a helmfile sync, this is how we manage all our CRDs.

(more about helmfile: https://github.com/helmfile/helmfile)

qiuming-best · 2022-12-01T08:33:05Z

Indeed, BackupStorageLocation, VolumeSnapshotLocation, and Schedule will be recreated when using helm install, upgrade, and rollback command, for the hooks: "helm.sh/hook": post-install,post-upgrade,post-rollback, For these hooks, I think it wants to keep one order recreated after install/upgrade/rollback.

For Velero, there is not matter whether these CRs are created before or after the Velero controller, if CRs are created before the Velero controller, the controller will list CRs and handle them. if CRs are created after the Velero controller, these CRs could still be handled by the watch resources mechanism. So the orders defined in the hook do not matter.

We want to Separate CRDs to avoid recreating CRs, and it's caused by hooks, in fact, we could remove them.

LarsBingBong · 2023-04-18T20:44:51Z

Any news on this? Would be really awesome to fix. Fixing this would make it possible to avoid doing a lot of magic in regards to e.g. trying to remove the "helm.sh/hook": post-install,post-upgrade,post-rollback annotation with ytt.

LarsBingBong · 2023-04-19T07:45:34Z

So because of this issue in Helm Post-renderer not receiving chart hook manifests this issue can't event be worked around with e.g. ytt. Another VMWare backed project as you likely know.
So I would have to work around this in an somewhat "sad" way. By disabling the schedule in the Helm values file then create the schedule Velero cli. Sad because then more tools are in the air and more logic.

What's the holdup in regards to removing the helm.sh hook related annotations on Schedule, BackupStorageLocation and VolumeSnapshotLocation.

Thank you 💯

jenting · 2023-08-08T11:37:24Z

Would it be possible to add a feature that allows disabling all the hooks used to work around helm's CRD management issues? That would satisfy our needs for using this chart with ArgoCD and should be pretty simple to implement without risking the current behavior for helm native users.

@javanthropus
Sorry for the late reply. This PR #487 add a toggle to disable all the hooks.

javanthropus · 2023-08-08T13:11:48Z

Thank you, @jenting. I've opened a ticket in my team's issue tracker to take a look at the new chart release and hopefully to start using it.

jenting · 2023-08-14T11:25:10Z

@LarsBingBong @lucagouty @javanthropus @branttaylor
I created a PR to address it.
Would you mind to help me verify it? Appreciate 🙏

jenting added the velero label Sep 9, 2021

jenting added the wontfix This will not be worked on label Sep 9, 2021

jenting added enhancement New feature or request and removed wontfix This will not be worked on labels Sep 25, 2021

jenting added the good first issue Good for newcomers label Oct 13, 2021

jenting mentioned this issue Nov 24, 2022

Separate CRD into a new chart #421

Open

jenting added help wanted Extra attention is needed and removed good first issue Good for newcomers labels Aug 6, 2023

jenting added bug Something isn't working and removed enhancement New feature or request labels Aug 6, 2023

jenting mentioned this issue Aug 13, 2023

[velero] Remove helm hooks for the custom resources #490

Merged

4 tasks

jenting self-assigned this Aug 13, 2023

jenting removed the help wanted Extra attention is needed label Aug 13, 2023

qiuming-best closed this as completed in #490 Aug 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Velero schedule is replaced on every helm upgrade #298

Velero schedule is replaced on every helm upgrade #298

branttaylor commented Sep 8, 2021

jenting commented Sep 9, 2021

branttaylor commented Sep 9, 2021

jenting commented Sep 9, 2021

javanthropus commented Sep 14, 2021

jenting commented Sep 23, 2021

javanthropus commented Sep 23, 2021

jenting commented Sep 23, 2021

javanthropus commented Sep 23, 2021

jenting commented Sep 23, 2021

javanthropus commented Sep 23, 2021

jsnider-mtu commented Nov 3, 2021

jsnider-mtu commented Nov 3, 2021

lucagouty commented Mar 29, 2022 •

edited

Loading

javanthropus commented Mar 29, 2022

sotiriougeorge commented Jul 26, 2022

javanthropus commented Jul 26, 2022

machine424 commented Nov 23, 2022

jenting commented Nov 24, 2022

machine424 commented Nov 24, 2022

qiuming-best commented Dec 1, 2022 •

edited by jenting

Loading

LarsBingBong commented Apr 18, 2023 •

edited by jenting

Loading

LarsBingBong commented Apr 19, 2023

jenting commented Aug 8, 2023

javanthropus commented Aug 8, 2023

jenting commented Aug 14, 2023

Velero schedule is replaced on every helm upgrade #298

Velero schedule is replaced on every helm upgrade #298

Comments

branttaylor commented Sep 8, 2021

jenting commented Sep 9, 2021

branttaylor commented Sep 9, 2021

jenting commented Sep 9, 2021

javanthropus commented Sep 14, 2021

jenting commented Sep 23, 2021

javanthropus commented Sep 23, 2021

jenting commented Sep 23, 2021

javanthropus commented Sep 23, 2021

jenting commented Sep 23, 2021

javanthropus commented Sep 23, 2021

jsnider-mtu commented Nov 3, 2021

jsnider-mtu commented Nov 3, 2021

lucagouty commented Mar 29, 2022 • edited Loading

javanthropus commented Mar 29, 2022

sotiriougeorge commented Jul 26, 2022

javanthropus commented Jul 26, 2022

machine424 commented Nov 23, 2022

jenting commented Nov 24, 2022

machine424 commented Nov 24, 2022

qiuming-best commented Dec 1, 2022 • edited by jenting Loading

LarsBingBong commented Apr 18, 2023 • edited by jenting Loading

LarsBingBong commented Apr 19, 2023

jenting commented Aug 8, 2023

javanthropus commented Aug 8, 2023

jenting commented Aug 14, 2023

lucagouty commented Mar 29, 2022 •

edited

Loading

qiuming-best commented Dec 1, 2022 •

edited by jenting

Loading

LarsBingBong commented Apr 18, 2023 •

edited by jenting

Loading