Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design to enable concurrency of operations in velero pod for backup and restore #5510

Closed
wants to merge 2 commits into from

Conversation

kaovilai
Copy link
Contributor

@kaovilai kaovilai commented Oct 28, 2022

⚠️ this PR is being updated to remove references of worker pods.

The intent is to move towards ability to run concurrently backup/restore operations inside a single velero pod.

Details will follow.

original PR description below pending removal:

Thank you for contributing to Velero!

Please add a summary of your change

This PR extends original PR #1653
with a major difference being the use of Kubernetes Jobs instead of Pods directly for worker pod lifecycle management.

graph LR;

subgraph Velero Managed Resources
    A[Backup CR-1] & C[Backup CR-2] & M[Backup CR-3] -->|Watched by|B;
    B((Velero Controller))-->|Create|D & E & N;
    end
    D[Worker Job-1]-->|Create|F & FF;
    E[Worker Job-2]-->|Create|G;
    N[Worker Job-3]-->|Create|O & OO & OOO;
    F((Worker Pod#1))-->|PodStatus|J;
    FF((Worker Pod#2))-->|PodStatus|L;
    G((Worker Pod#1))-->|PodStatus|K;
    O((Worker Pod#1))-->|PodStatus|Q;
    OO((Worker Pod#2))-->|PodStatus|R;
    OOO((Worker Pod#3))-->|PodStatus|S;
    J[Failed];
    L[Succeeded]-->|Update job status|D;
    K[Succeeded]-->|Update job status|E;
    Q[Failed];
    R[Failed];
    S[Failed]-->|Update job status|N;
    N-->|Failed|B;
    E-->|Succeeded|B;
    D-->|Succeeded|B;

Loading

Does your change fix a particular issue?

Fixes #(issue)
#2601

Please indicate you've done the following:

  • [] Accepted the DCO. Commits without the DCO will delay acceptance.
  • Created a changelog file or added /kind changelog-not-required as a comment on this pull request.
  • Updated the corresponding documentation in site/content/docs/main.

/kind changelog-not-required

@github-actions github-actions bot added the Area/Design Design Documents label Oct 28, 2022
@kaovilai kaovilai changed the title Enable concurrency of operations using worker pod for Backup and Restore Design to enable concurrency of operations using worker pod for backup and restore Oct 28, 2022
@kaovilai kaovilai force-pushed the design-concurrent-backup branch 7 times, most recently from 2e51083 to 5aa815a Compare October 31, 2022 07:23
@kaovilai
Copy link
Contributor Author

Received comments to avoid using jobs due to unpredictability of "non-parallel jobs"

from k8s

Note that even if you specify .spec.parallelism = 1 and .spec.completions = 1 and .spec.template.spec.restartPolicy = "Never", the same program may sometimes be started twice.

Reverting jobs back to pods. The initial motivation for using jobs is to make it easier to restart (with jobs, automatically) failed backup CRs in-place. This could be a separate enhancement (will file an issue if there isn't one).

@jiangfoxi
Copy link

hello,is there any progress on this concurrency topic?
velero is good for small data,but not capable of handling massive amounts of data such as above 1T for real production environment

@shawn-hurley
Copy link
Contributor

Unless there is some way to share a cache of resources listed, I would worry about this DDOSing the API server.

We could make sure that only X numbers get kicked off, but sharing a cache would generally make this process more performant and would allow the process to look and feel like most other controllers IMO.

@kaovilai kaovilai changed the title Design to enable concurrency of operations using worker pod for backup and restore Design to enable concurrency of operations in velero pod for backup and restore Aug 9, 2023
@kaovilai
Copy link
Contributor Author

kaovilai commented Aug 9, 2023

⚠️ this PR is being updated to remove references of worker pods.

The intent is to move towards ability to run concurrently backup/restore operations inside a single velero pod.

Details will follow.

@sseago
Copy link
Collaborator

sseago commented Aug 10, 2023

Closing this in favor of an approach that does not create new pods.

@sseago sseago closed this Aug 10, 2023
@jiangfoxi
Copy link

jiangfoxi commented Sep 8, 2023

so, what is the progress for backup/restore concurrency? I really love velero, but when there is a large amount of data such as 100T in our project, there is no concurrent backup and the speed is very very very slow,that drives me crazy!!!!

@jiangfoxi
Copy link

When will this feature be available?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area/Design Design Documents
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants