Backup sometimes fails with 'TLS handshake timeout' #5739

nomaster · 2023-01-05T10:16:20Z

What steps did you take and what happened:

Backup tasks sometimes fail with 'TLS handshake timeout' when trying to reach the Kubernetes controller.

What did you expect to happen:

Velero should wait for the Kubernetes controller to be reachable again.

The following information will help us better understand what's going on:

log output:

time="2023-01-02T07:32:46Z" level=error msg="backup failed" controller=backup error="rpc error: code = Unknown desc = Get "[https://10.0.0.1:443/api](https://10.0.0.1/api%5C)": net/http: TLS handshake timeout" key=velero/daily0 logSource="pkg/controller/backup_controller.go:298"

Anything else you would like to add:

Maybe there is a retry loop missing?

Environment:

Velero version (use velero version): 1.10.0
Velero features (use velero client config get features): EnableCSI
Kubernetes version (use kubectl version): 1.24.6
Kubernetes installer & version: Azure Kubernetes Service with Terraform
Cloud provider or hardware configuration: Azure
OS (e.g. from /etc/os-release): Ubuntu 18.04.6 LTS

Vote on this issue!

This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.

👍 for "I would like to see this bug fixed as soon as possible"
👎 for "There are more important bugs to focus on right now"

The text was updated successfully, but these errors were encountered:

allenxu404 · 2023-01-06T04:21:16Z

There is expected to be a timeout since velero can't wait all the time. As for TLS handshake timeout, you may have some env issue, please make sure you unset any http_proxy and https_proxy. If that doesn't work, Sometimes it comes down to resource issue, you cloud check if all kind of resources(e.g. memory, network) in the node is sufficient.

nomaster · 2023-01-06T13:43:24Z

Yes of course, every operation needs to time out at some point. What I miss is Velero to try again if this happens - or any other error occurs.

I'm not using a proxy. Resources should be sufficient: I had OOM kills in the past but they vanished, since I configured a memory request of 512 MiB for the pod.

VickyWinner · 2023-02-01T19:04:07Z

@nomaster, Did your issue resolve? I do have the same issue. Here is the error from today.

time="2023-02-01T14:00:30Z" level=debug msg="Error from backupItemActionResolver.ResolveActions" backup=velero/velero-astro-daily-20230201140019 error="rpc error: code = Unknown desc = Get "https://xx.x.x.x:443/api\": net/http: TLS handshake timeout" error.file="/go/src/github.com/vmware-tanzu/velero/pkg/backup/backup.go:219" error.function="github.com/vmware-tanzu/velero/pkg/backup.(*kubernetesBackupper).BackupWithResolvers" logSource="pkg/backup/backup.go:219"

Can someone help on this?

nomaster · 2023-02-02T09:42:42Z

@nomaster, Did your issue resolve? I do have the same issue. Here is the error from today.

Unfortunately not. The error still comes up for me every few days.

VickyWinner · 2023-02-08T16:09:35Z

We need some kind of timeout dynamic setting so we can control how long Velero can wait before timing out.

stale · 2023-05-01T15:33:01Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

nomaster · 2023-05-02T10:13:28Z

I'm still seeing this issue

aquarelacs-arthur-rosa · 2023-05-24T19:49:17Z

I'm also having this problem

time="2023-05-22T02:00:44Z" level=error msg="backup failed" controller=backup error="rpc error: code = Unknown desc = Get "https://10.0.0.1:443/api?timeout=32s": net/http: TLS handshake timeout" key=velero/generalbackup01backup01-20230522020033 logSource="pkg/controller/backup_controller.go:282"

github-actions · 2023-07-24T01:56:09Z

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days. If a Velero team member has requested log or more information, please provide the output of the shared commands.

nomaster · 2023-07-24T09:02:29Z

I haven't seen this issue anymore. Maybe a fix has been included in the recent releases?

Anyone else?

tberreis · 2023-08-30T13:47:55Z

Still seeing this issue (but rarely).
AKS 1.26.3, Velero 1.11.0, Azure Plugin 1.7.0

pandvag · 2023-10-19T12:11:01Z

facing the same error regularly with
AKS v1.27.3, Velero v1.11.1, Azure Plugin v1.7.1
We are using two schedules at the same time, one always succeeds

dpunkturban · 2023-11-04T08:00:26Z

Hi,

we are seeing the handshake problem very regularly on our hourly cron jobs:

NAME                                 STATUS      ERRORS   WARNINGS   CREATED                         EXPIRES   STORAGE LOCATION   SELECTOR
velero-daily-backup-20231104070024   Completed   0        0          2023-11-04 08:00:24 +0100 CET   2d        default            <none>
velero-daily-backup-20231104060024   Completed   0        0          2023-11-04 07:00:24 +0100 CET   2d        default            <none>
velero-daily-backup-20231104050024   Failed      0        0          2023-11-04 06:00:24 +0100 CET   2d        default            <none>
velero-daily-backup-20231104040024   Completed   0        0          2023-11-04 05:00:24 +0100 CET   2d        default            <none>
velero-daily-backup-20231104030024   Failed      0        0          2023-11-04 04:00:24 +0100 CET   2d        default            <none>
velero-daily-backup-20231104020024   Failed      0        0          2023-11-04 03:00:24 +0100 CET   2d        default            <none>
velero-daily-backup-20231104010024   Failed      0        0          2023-11-04 02:00:24 +0100 CET   2d        default            <none>
velero-daily-backup-20231104000024   Completed   0        0          2023-11-04 01:00:24 +0100 CET   2d        default            <none>
velero-daily-backup-20231103230024   Completed   0        0          2023-11-04 00:00:24 +0100 CET   2d        default            <none>
velero-daily-backup-20231103220024   Completed   0        0          2023-11-03 23:00:24 +0100 CET   2d        default            <none>
velero-daily-backup-20231103210024   Failed      0        0          2023-11-03 22:00:24 +0100 CET   2d        default            <none>
velero-daily-backup-20231103200024   Completed   0        0          2023-11-03 21:00:24 +0100 CET   2d        default            <none>
velero-daily-backup-20231103190024   Completed   0        0          2023-11-03 20:00:24 +0100 CET   2d        default            <none>
velero-daily-backup-20231103180024   Completed   0        0          2023-11-03 19:00:24 +0100 CET   2d        default            <none>
velero-daily-backup-20231103170024   Completed   0        0          2023-11-03 18:00:24 +0100 CET   2d        default            <none>
velero-daily-backup-20231103160024   Completed   0        0          2023-11-03 17:00:24 +0100 CET   2d        default            <none>
velero-daily-backup-20231103150024   Completed   0        0          2023-11-03 16:00:24 +0100 CET   2d        default            <none>
velero-daily-backup-20231103140024   Failed      0        0          2023-11-03 15:00:24 +0100 CET   2d        default            <none>
velero-daily-backup-20231103130024   Failed      0        0          2023-11-03 14:00:24 +0100 CET   2d        default            <none>
velero-daily-backup-20231103120624   Completed   0        0          2023-11-03 13:06:24 +0100 CET   2d        default            <none>

time="2023-11-04T03:00:36Z" level=error msg="backup failed" backuprequest=velero/velero-daily-backup-20231104030024 controller=backup error="rpc error: code = Unknown desc = Get \"https://10.0.0.1:443/api\": net/http: TLS handshake timeout" logSource="pkg/controller/backup_controller.go:290"

time="2023-11-04T02:00:35Z" level=error msg="backup failed" backuprequest=velero/velero-daily-backup-20231104020024 controller=backup error="rpc error: code = Unknown desc = Get \"https://10.0.0.1:443/api\": net/http: TLS handshake timeout" logSource="pkg/controller/backup_controller.go:290"

AKS 1.26.3 / 1.27.3, Velero 1.11.0, Azure Plugin 1.8.1

fk-flip · 2023-11-22T07:46:27Z

Also seeing this issue regularly and couldn't pin-point a culprit so far.

AKS 1.26.3 / 1.27.3, Velero 1.11.1, Azure Plugin 1.8.1

Out of curiosity:

Is everyone experiencing this running AKS? Or is someone experiencing this regularly and not using AKS?
Are your schedules always to the "full hour" or some odd number?

pandvag · 2023-11-22T13:54:08Z

We are running a couple of aks clusters, all experiencing the same ... regularly failed backups due of timeouts.
Changing the schedule to pin each cluster to a separate time frame did not solve the issue so far.
We are wondering if it would be possible to integrate a retry within velero for such cases.

github-actions · 2024-01-23T01:49:33Z

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days. If a Velero team member has requested log or more information, please provide the output of the shared commands.

pandvag · 2024-02-06T07:52:07Z

still same problem with aks v1.27.3, velero v1.12.3 & velero-plugin-for-microsoft-azure v1.8.2

github-actions · 2024-04-09T01:47:19Z

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days. If a Velero team member has requested log or more information, please provide the output of the shared commands.

monotek · 2024-04-09T08:50:38Z

@kubecon paris some Azure folks told us that this is an issue on the network layer of AKS.
It's hard to debug but they are working on a fix.

github-actions · 2024-06-09T01:54:03Z

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days. If a Velero team member has requested log or more information, please provide the output of the shared commands.

monotek · 2024-06-10T14:34:57Z

not stale

pandvag · 2024-08-06T13:23:41Z

Hi there,
finally we could get rid of this behaviour by setting storeValidationFrequency

dpunkturban · 2024-08-09T09:16:48Z

Hi there, finally we could get rid of this behaviour by setting storeValidationFrequency

@pandvag What value did you set for storeValidationFrequency ? Thx :)

pandvag · 2024-08-12T05:56:00Z

Hi there, finally we could get rid of this behaviour by setting storeValidationFrequency

@pandvag What value did you set for storeValidationFrequency ? Thx :)

we used 30m

github-actions · 2024-10-13T02:04:43Z

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days. If a Velero team member has requested log or more information, please provide the output of the shared commands.

github-actions · 2024-10-30T02:02:09Z

This issue was closed because it has been stalled for 14 days with no activity.

stale bot added the staled label May 1, 2023

stale bot removed the staled label May 2, 2023

github-actions bot added the staled label Jul 24, 2023

github-actions bot removed the staled label Jul 25, 2023

github-actions bot added the staled label Jan 23, 2024

github-actions bot removed the staled label Feb 7, 2024

github-actions bot added the staled label Apr 9, 2024

github-actions bot removed the staled label Apr 10, 2024

github-actions bot added the staled label Jun 9, 2024

github-actions bot removed the staled label Jun 11, 2024

github-actions bot added the staled label Oct 13, 2024

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Oct 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Backup sometimes fails with 'TLS handshake timeout' #5739

Backup sometimes fails with 'TLS handshake timeout' #5739

nomaster commented Jan 5, 2023

allenxu404 commented Jan 6, 2023 •

edited

Loading

nomaster commented Jan 6, 2023

VickyWinner commented Feb 1, 2023

nomaster commented Feb 2, 2023

VickyWinner commented Feb 8, 2023 •

edited

Loading

stale bot commented May 1, 2023

nomaster commented May 2, 2023

aquarelacs-arthur-rosa commented May 24, 2023

github-actions bot commented Jul 24, 2023

nomaster commented Jul 24, 2023

tberreis commented Aug 30, 2023 •

edited

Loading

pandvag commented Oct 19, 2023 •

edited

Loading

dpunkturban commented Nov 4, 2023

fk-flip commented Nov 22, 2023

pandvag commented Nov 22, 2023 •

edited

Loading

github-actions bot commented Jan 23, 2024

pandvag commented Feb 6, 2024

github-actions bot commented Apr 9, 2024

monotek commented Apr 9, 2024

github-actions bot commented Jun 9, 2024

monotek commented Jun 10, 2024

pandvag commented Aug 6, 2024

dpunkturban commented Aug 9, 2024

pandvag commented Aug 12, 2024

github-actions bot commented Oct 13, 2024

github-actions bot commented Oct 30, 2024

Backup sometimes fails with 'TLS handshake timeout' #5739

Backup sometimes fails with 'TLS handshake timeout' #5739

Comments

nomaster commented Jan 5, 2023

allenxu404 commented Jan 6, 2023 • edited Loading

nomaster commented Jan 6, 2023

VickyWinner commented Feb 1, 2023

nomaster commented Feb 2, 2023

VickyWinner commented Feb 8, 2023 • edited Loading

stale bot commented May 1, 2023

nomaster commented May 2, 2023

aquarelacs-arthur-rosa commented May 24, 2023

github-actions bot commented Jul 24, 2023

nomaster commented Jul 24, 2023

tberreis commented Aug 30, 2023 • edited Loading

pandvag commented Oct 19, 2023 • edited Loading

dpunkturban commented Nov 4, 2023

fk-flip commented Nov 22, 2023

pandvag commented Nov 22, 2023 • edited Loading

github-actions bot commented Jan 23, 2024

pandvag commented Feb 6, 2024

github-actions bot commented Apr 9, 2024

monotek commented Apr 9, 2024

github-actions bot commented Jun 9, 2024

monotek commented Jun 10, 2024

pandvag commented Aug 6, 2024

dpunkturban commented Aug 9, 2024

pandvag commented Aug 12, 2024

github-actions bot commented Oct 13, 2024

github-actions bot commented Oct 30, 2024

allenxu404 commented Jan 6, 2023 •

edited

Loading

VickyWinner commented Feb 8, 2023 •

edited

Loading

tberreis commented Aug 30, 2023 •

edited

Loading

pandvag commented Oct 19, 2023 •

edited

Loading

pandvag commented Nov 22, 2023 •

edited

Loading