Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

issue triggering upgrade #412

Open
ysineil opened this issue Aug 14, 2024 · 7 comments
Open

issue triggering upgrade #412

ysineil opened this issue Aug 14, 2024 · 7 comments
Labels
bug Bug

Comments

@ysineil
Copy link

ysineil commented Aug 14, 2024

Describe the bug

issue upgrading workload cluster from 1.25 to 1.26 with the below error

tanzu-mission-control_tanzu_kubernetes_cluster.tkgs_cluster[0]: Refreshing state... [id=pct-ha-a/pct-ha-a-1547/pct-qa-mlai]
Planning failed. Terraform encountered an error while generating this plan.

│ Error: Request cancelled

│ with tanzu-mission-control_tanzu_kubernetes_cluster.tkgs_cluster[0],
│ on main.tf line 219, in resource "tanzu-mission-control_tanzu_kubernetes_cluster" "tkgs_cluster":
│ 219: resource "tanzu-mission-control_tanzu_kubernetes_cluster" "tkgs_cluster" {

│ The plugin.(*GRPCProvider).ReadResource request was cancelled.

Stack trace from the terraform-provider-tanzu-mission-control_v1.4.5 plugin:
panic: runtime error: index out of range [3] with length 3
goroutine 44 [running]:
github.com/vmware/terraform-provider-tanzu-mission-control/internal/resources/tanzukubernetescluster.removeUnspecifiedNodePoolsOverrides({0xc00069bd80?, 0x4, 0x1c5e659?}, 0xc0005a0ff0)
github.com/vmware/terraform-provider-tanzu-mission-control/internal/resources/tanzukubernetescluster/helper.go:402 +0x3c5
github.com/vmware/terraform-provider-tanzu-mission-control/internal/resources/tanzukubernetescluster.resourceTanzuKubernetesClusterRead({0x1f711a0, 0xc000b08c00}, 0xc00092b580, {0x1b65c40?, 0xc000b04180})
github.com/vmware/terraform-provider-tanzu-mission-control/internal/resources/tanzukubernetescluster/resource_tanzu_kuberenetes_cluster.go:154 +0x57e
github.com/hashicorp/terraform-plugin-sdk/v2/helper/schema.(*Resource).read(0x1f711a0?, {0x1f711a0?, 0xc000b08c00?}, 0xd?, {0x1b65c40?, 0xc000b04180?})
github.com/hashicorp/terraform-plugin-sdk/[email protected]/helper/schema/resource.go:719 +0x87
github.com/hashicorp/terraform-plugin-sdk/v2/helper/schema.(*Resource).RefreshWithoutUpgrade(0xc00035db20, {0x1f711a0, 0xc000b08c00}, 0xc0004d9450, {0x1b65c40, 0xc000b04180})
github.com/hashicorp/terraform-plugin-sdk/[email protected]/helper/schema/resource.go:1015 +0x585
github.com/hashicorp/terraform-plugin-sdk/v2/helper/schema.(*GRPCProviderServer).ReadResource(0xc000576570, {0x1f710f8?, 0xc0004b4480?}, 0xc0004b4500)
github.com/hashicorp/terraform-plugin-sdk/[email protected]/helper/schema/grpc_provider.go:613 +0x4a5
github.com/hashicorp/terraform-plugin-go/tfprotov5/tf5server.(*server).ReadResource(0xc00047e820, {0x1f711a0?, 0xc000b08210?}, 0xc000b040c0)
github.com/hashicorp/[email protected]/tfprotov5/tf5server/server.go:746 +0x43d
github.com/hashicorp/terraform-plugin-go/tfprotov5/internal/tfplugin5._Provider_ReadResource_Handler({0x1b802a0?, 0xc00047e820}, {0x1f711a0, 0xc000b08210}, 0xc00015a1c0, 0x0)
github.com/hashicorp/[email protected]/tfprotov5/internal/tfplugin5/tfplugin5_grpc.pb.go:349 +0x170
google.golang.org/grpc.(*Server).processUnaryRPC(0xc0002241e0, {0x1f78078, 0xc000007380}, 0xc0004fa000, 0xc0005840c0, 0x2d70050, 0x0)
google.golang.org/[email protected]/server.go:1335 +0xdf0
google.golang.org/grpc.(*Server).handleStream(0xc0002241e0, {0x1f78078, 0xc000007380}, 0xc0004fa000, 0x0)
google.golang.org/[email protected]/server.go:1712 +0xa2f
google.golang.org/grpc.(*Server).serveStreams.func1.1()
google.golang.org/[email protected]/server.go:947 +0xca
created by google.golang.org/grpc.(*Server).serveStreams.func1
google.golang.org/[email protected]/server.go:958 +0x15c
Error: The terraform-provider-tanzu-mission-control_v1.4.5 plugin crashed!
This is always indicative of a bug within the plugin. It would be immensely
helpful if you could report the crash with the plugin's maintainers so that it
can be fixed. The output above should help diagnose the issue.
::debug::{"message":"command terminated with non-zero exit code: error executing command [sh -e /__w/_temp/7d55ac10-5a43-11ef-980e-43d815da43f2.sh], exit code 1","details":{"causes":[{"reason":"ExitCode","message":"1"}]}}
Error: Error: failed to run script step: command terminated with non-zero exit code: error executing command [sh -e /__w/_temp/7d55ac10-5a43-11ef-980e-43d815da43f2.sh], exit code 1
Error: Process completed with exit code 1.
Error: Executing the custom container implementation failed. Please contact your self hosted runner administrator.

Reproduction steps

  1. Attempt upgrade from 1.25 to 1.26

...

Expected behavior

Upgrade should complete successfully. Appears this is a new bug in 1.4.5

Additional context

No response

@ysineil ysineil added the bug Bug label Aug 14, 2024
@warroyo
Copy link
Collaborator

warroyo commented Aug 14, 2024

Which versions of tkr exactly are being used here, trying to replicate the issue

@ysineil
Copy link
Author

ysineil commented Aug 14, 2024

Going from v1.25.7+vmware.3-fips.1-tkg.1 to v1.26.5+vmware.2-fips.1-tkg.1

@warroyo
Copy link
Collaborator

warroyo commented Aug 14, 2024

ok, I was able to replicate it, it's not exactly upgrade related. Can you check and see if this cluster has a nodepool that was deleted outside of TF? I tried a few scenarios and this seems to occur when the result returned from the TMC api does not match what TMC expects in the nodepool list. so my suspicion is terraform thinks there's an extra node pool that doesn't actually exist in TMC. 

@ysineil
Copy link
Author

ysineil commented Aug 14, 2024

interesting - yes, you're correct, there's an extra nodepool in the state file.

thank you for responding so quickly!

@warroyo
Copy link
Collaborator

warroyo commented Aug 14, 2024

No problem, I think this is something we should handle rather than just panic 😄 so we will keep looking into it, but removing the extra nodepool from the state should get you past this.

@ysineil
Copy link
Author

ysineil commented Aug 14, 2024

absolutely, deleted the reference in the state file and it's all happy. thanks again.

@warroyo
Copy link
Collaborator

warroyo commented Aug 14, 2024

tracking this here #413

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bug
Projects
None yet
Development

No branches or pull requests

2 participants