-
Notifications
You must be signed in to change notification settings - Fork 433
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GCS FT] GCS FT misconfiguration #2694
Comments
/assign @rueian |
Ray now relies on the To disable the GCS FT explicitly from KubeRay, we now have three options:
The first two options require us to make changes to Ray. Among them, I prefer the second one as it only slightly extends the However, I prefer the third option overall since it doesn't depend on Ray and it can work with all the current Ray versions. It also seems to be the easiest to implement. Please let me know what you think about these approaches @kevin85421. Thanks. |
After discussing this offline with @kevin85421, we decided not to override and explicitly disable users' configuration but to raise an error to users instead. i.e. If users configure the GCS FT in a way KubeRay doesn't support, the RayCluster will be rejected by the KubeRay operator. This mechanism will be added after #2712. |
@rueian another validation is also required: the annotation explicitly disables GCS FT (i.e., |
Search before asking
KubeRay Component
ray-operator
What happened + What you expected to happen
See this Slack thread for more details.
The user creates a RayService without setting the annotation
ray.io/ft-enabled: "true"
, but they do setRAY_REDIS_ADDRESS
andREDIS_PASSWORD
. As a result, the RayCluster enables GCS FT and writes data to the external Redis, but KubeRay is unaware of this. KubeRay doesn't configureRAY_external_storage_namespace
so that the RayCluster writes data to the keydefault
in the Redis.When the user triggers a zero-downtime upgrade, the new RayCluster also attempts to read metadata from the
default
key in Redis. Therefore, the new RayCluster will see some information from the old RayCluster.Solution:
Reproduction script
TODO
Anything else
No response
Are you willing to submit a PR?
The text was updated successfully, but these errors were encountered: