You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Nodes (Standard_NC24ads_A100_v4) should autoscale and be configured properly for workloads.
Actual Behavior
Nodes have spawned and report that they are being configured for workloads, but ultimately terminate before the job is started. I can log into the node via SSH and see that it is active, but something isn't connecting correctly. I will post back with the specific error message (was a python error from jetpack) when I get a chance to try again, but these were working up to a week or so prior.
Steps to Reproduce the Problem
Deploy gpu cluster (Standard_NC24ads_A100_v4) using the versions above.
The text was updated successfully, but these errors were encountered:
jlphillipsphd
changed the title
Nodes () not configuring anymore for v1.0.40
Nodes (Standard_NC24ads_A100_v4) not configuring anymore for v1.0.40
Aug 26, 2024
Version
v1.0.40
slurm: 22.05.3
cyclecloud: 2.7.2
In what area(s)?
/area ansible
/area autoscaling
/area configuration
/area cyclecloud
Expected Behavior
Nodes (Standard_NC24ads_A100_v4) should autoscale and be configured properly for workloads.
Actual Behavior
Nodes have spawned and report that they are being configured for workloads, but ultimately terminate before the job is started. I can log into the node via SSH and see that it is active, but something isn't connecting correctly. I will post back with the specific error message (was a python error from jetpack) when I get a chance to try again, but these were working up to a week or so prior.
Steps to Reproduce the Problem
Deploy gpu cluster (Standard_NC24ads_A100_v4) using the versions above.
The text was updated successfully, but these errors were encountered: