Replies: 1 comment
-
@cjstntjd Hi, Thank you very much for raising this use case scenario. The current version of NVFlare we only allow the job to run within the pre-arranged / pre-approved list of participants. We are working on the feature of allowing the clients to pause / rejoin, or even register during the job run. This will be coming in the next release. For the multiple servers scenario, we only allow the clients to connect to a single server, and the workflow execution controlled by a single server controller. However, we do have the HA servers, which supports the server high availability in case of server crash or failure. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Python version (
python3 -V
)3.8
NVFlare version (
python3 -m pip list | grep "nvflare"
)2.5.0
NVFlare branch (if running examples, please use the branch that corresponds to the NVFlare version,
git branch
)2.5
Operating system
ubuntu 20.04
Have you successfully run any of the following examples?
Please describe your question
We are considering a scenario where random participants who are not pre-approved join and drop out of the training. Each client may not have stable training resources or the network conditions may be inconsistent. At this time, each client is training locally but may drop out and merge in another global iteration over time. The participant list at this time may not have the client selected. Of course, we can create an agent to manage and track these clients, but the number of these clients will gradually increase as the global iteration progresses. Does NVFLARE have an example of a scenario where a client joins or drops out during training and then reconnects? Or is there an example designed to allow multiple servers to cooperate to manage clients?
Beta Was this translation helpful? Give feedback.
All reactions