You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When these conditions are true, a new controller may terminate existing resources being served by another controller.
User id hash matches last 4 characters, but has a difference otherwise.
Service name is equivalent
This may results in a model node's cluster_name with the same name as an existing model node depending on version.
e.g.
Existing controller --
USER_ID_HASH=12345678
Controller node: sky-sky-serve-controller-12345678-5678-head
model node: sky-<SERVICE_NAME>-<VERSION>-5678-head
New Controller --
USER_ID_HASH=11115678
Controller node: sky-sky-serve-controller-11115678-5678-head
model node: sky-<SERVICE_NAME>-<VERSION>-5678-head
In this case, if the <VERSION> matches, the existing may get terminated.
I believe this results from the filter in terminate only looking for the Name as opposed:
I'm not sure if this widespread across other deployment platforms.
In AWS, this could be resolved by including a new TAG in addition the the name that specifies the correct controller/cluster association and using that in the filter in as well.
Possible tags:
USER_ID_HASH (easy fix)
UUID for every controller / model associated with that controller (needs to be stored in the db)
The text was updated successfully, but these errors were encountered:
Hi @JGSweets ! Thanks for reporting this error. Just want to make sure, this is for multiple users (in multiple laptops) running SkyPilot in a shared AWS project?
Hi @JGSweets ! Thanks for reporting this error. Just want to make sure, this is for multiple users (in multiple laptops) running SkyPilot in a shared AWS project?
@cblmemo Actually, this case would be for a single compute resource where SKYPILOT_USER_ID is set via environment variables. I would be inclined to believe that multiple laptops would have a similar effect.
When these conditions are true, a new controller may terminate existing resources being served by another controller.
This may results in a model node's
cluster_name
with the same name as an existing model node depending on version.e.g.
Existing controller --
USER_ID_HASH=12345678
Controller node:
sky-sky-serve-controller-12345678-5678-head
model node:
sky-<SERVICE_NAME>-<VERSION>-5678-head
New Controller --
USER_ID_HASH=11115678
Controller node:
sky-sky-serve-controller-11115678-5678-head
model node:
sky-<SERVICE_NAME>-<VERSION>-5678-head
In this case, if the
<VERSION>
matches, the existing may get terminated.I believe this results from the filter in terminate only looking for the Name as opposed:
I'm not sure if this widespread across other deployment platforms.
In AWS, this could be resolved by including a new TAG in addition the the name that specifies the correct controller/cluster association and using that in the filter in as well.
Possible tags:
The text was updated successfully, but these errors were encountered: