-
Notifications
You must be signed in to change notification settings - Fork 171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(ai): add hardware info from Orchestrators and expand network information available. #3246
base: master
Are you sure you want to change the base?
feat(ai): add hardware info from Orchestrators and expand network information available. #3246
Conversation
*expand net.PriceInfo to include optional capabillity and constraint information *update /getNetworkCapabilities to summary information relevant for data aggregation *add /getOrchestratorInfo endpoint to get raw OrchestratorInfo data for one Orchestrator
Thanks! For future reference this was a follow up on #3052. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #3246 +/- ##
===================================================
+ Coverage 34.74775% 34.86813% +0.12038%
===================================================
Files 136 136
Lines 36175 36400 +225
===================================================
+ Hits 12570 12692 +122
- Misses 22893 22989 +96
- Partials 712 719 +7
... and 1 file with indirect coverage changes Continue to review full report in Codecov by Sentry.
|
@ad-astra-video is this ready for review? |
Yep! I will fix the editor config issue real quick. |
What does this pull request do? Explain your changes. (required)
This PR enables enhanced reporting on Livepeer network state for a Gateway to use in monitoring or analysis processes. This could also be the start of exporting all relevant information from a Gateway to implement an external selection process using the orch webhook.
Orchestrator is updated in a couple ways to provide more information to the Gateway during polling process. AI-worker and ai-runner are updated to retrieve hardware information at startup of the runner (PR #273). The hardware information is stored in the ai worker list in memory and is removed when the ai-worker disconnects. Orchestrators are also updated to enable providing additional information to Gateways with the
OrchestratorInfo
gRPC message to add hardware information and capabilities prices to provide Gateways all prices the Orchestrator has set at the time of theGetOrchestrator
request.Gateway is updated to save additional information from Orchestrators in the polling process. This enables the Gateway to export useful information on the state of the network to external monitoring or analysis processes with the added
/getNetworkCapabilities
endpoint.example data for an orchestrator
Open to all feedback and design change suggestions. I went with using the local db because this is not intended to serve a real time process and the polling process already saved some data to the db.
cc @ecmulli @gioelecerati
Specific updates (required)
Gateway Updates
OrchestratorInfo
json received from each Orchestrator in the polling process in newremoteInfo
column in the dborchestrators
table./getNetworkCapabilities
endpoint is added to local cli webserver to enable exporting lightly formatted network information includingcapabilities_prices
andhardware
information deployed by Orchestrators for each pipeline/model id./getOrchestratorInfo
endpoint is added to local cli webserver to get rawOrchestratorInfo
info from an Orchestrator.Orchestrator Updates
OrchestratorInfo
response only when capabilities included in theGetOrchestrator
request arenil
. All transcoding and AI jobs requests include capabilities which will prevent the additional information being included in the response to not further increase the size of the response and information processing during actual work.PriceInfo
gRPC message is updated to add two optional fields for capability and constraint to support sendingcapabilities_prices
to GatewayAI Runner Updates PR #273
/hardware/info
provides the basic information on the hardware at startup and continues to be available after startup./hardware/stats
provides more focused information on utilization of the GPU to assist Orchestrators getting current information for monitoring./hardware/stats
endpoint is not blocked while the pipeline is running. This can be achieved by addingoutputs = await asyncio.to_thread([pipeline object], **kwargs)
and the routes updated toreturn await pipeline([...])
as well as updating the route and pipeline call functions to beasync
functions where needed. I tested this on the text-to-image and audio-to-text pipeline and confirmed the/hardware/stats
endpoint is blocked until the pipeline call returns.How did you test each of these updates (required)
Built docker image and put on my mainnet Orchestrator
Does this pull request close any open issues?
No
Checklist:
make
runs successfully./test.sh
pass