You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
My understanding of the documentation and the code is that llm-guard will lazy-load the models required by the chosen scanners from Huggingface. I apologize if this is incorrect
This is not ideal for consumers like Kubernetes workloads because :
When llm-guard is used as a library
each pod will download the same models, wasting resources
k8s workloads are usually preferred with low resource allocations to do efficient horizontal scaling.
With "usage as API" scenario to have an llm-guard-api dedicated deployment with more resources
you might still want your llm-guard-api deployment to scale too, and you face the same resource optimization issue.
A third option is that you already have the models deployed somewhere in a central place so that the only information required by the scanners would be the inference URL and the authentication.
Describe the solution you'd like
Users that use a platform to host and run models in a central place should be able to provide inference URLs and authentication to the scanners, instead of lazy-loading the models.
Describe alternatives you've considered
The existing possible usages described by the documentation (as a library or as API).
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem? Please describe.
My understanding of the documentation and the code is that
llm-guard
will lazy-load the models required by the chosen scanners from Huggingface. I apologize if this is incorrectThis is not ideal for consumers like Kubernetes workloads because :
llm-guard
is used as a libraryllm-guard-api
dedicated deployment with more resourcesllm-guard-api
deployment to scale too, and you face the same resource optimization issue.A third option is that you already have the models deployed somewhere in a central place so that the only information required by the scanners would be the inference URL and the authentication.
Describe the solution you'd like
Users that use a platform to host and run models in a central place should be able to provide inference URLs and authentication to the scanners, instead of lazy-loading the models.
Describe alternatives you've considered
The existing possible usages described by the documentation (as a library or as API).
The text was updated successfully, but these errors were encountered: