Problem with request authentication when running job #1740
-
Hi! I am working on a federated non-HA project and I am trying to run it in Azure ML workspaces. (https://github.com/eduardpauliuc/fedpix/blob/main/project.yml, app dppix) Both the server and the clients start alright, here is some of the client log.
However, when submitting the job from admin console, the client crashes:
Some of the server log:
I opened ports 8002-8003 on the server (which is an azure VM), uploaded manually the provisioned site directories to each client. Is there something else to do regarding authentication that I am missing? |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 4 replies
-
From the server log, both FL-Europe-Site and FL-US-Site were registered without issues and thus those certificates were good. Upon job submission, the get_task request from EU client got authentication failed. It seemed the cell communication couldn't authenticate peer. @yhwen , can you check if the log showed some cell issues? |
Beta Was this translation helpful? Give feedback.
-
If it is of any help, here are the full logs from server and client. https://gist.github.com/eduardpauliuc/c9f9b223da3be68494e13eeaa704829c In some runs, there are multiple get_task request
|
Beta Was this translation helpful? Give feedback.
Hi @eduardpauliuc, Your error experienced is not because of the client request authentication issue when running the job. Both "FL-Europe-Site" and "FL-US-Site" have been successfully registered to the server and were able to send "get_task" request to the server. However, when running the job the server, there's following error which caused the server job process crashed. This is due to the missing library of tensoflow. This error caused the server job process fail and not be able to authenticate the client get_task request.