Support inference URLs for models used by scanners #101

adrien-lesur · 2024-02-22T08:52:57Z

Is your feature request related to a problem? Please describe.
My understanding of the documentation and the code is that llm-guard will lazy-load the models required by the chosen scanners from Huggingface. I apologize if this is incorrect

This is not ideal for consumers like Kubernetes workloads because :

When llm-guard is used as a library
- each pod will download the same models, wasting resources
- k8s workloads are usually preferred with low resource allocations to do efficient horizontal scaling.
With "usage as API" scenario to have an llm-guard-api dedicated deployment with more resources
- you might still want your llm-guard-api deployment to scale too, and you face the same resource optimization issue.

A third option is that you already have the models deployed somewhere in a central place so that the only information required by the scanners would be the inference URL and the authentication.

Describe the solution you'd like
Users that use a platform to host and run models in a central place should be able to provide inference URLs and authentication to the scanners, instead of lazy-loading the models.

Describe alternatives you've considered
The existing possible usages described by the documentation (as a library or as API).

The text was updated successfully, but these errors were encountered:

asofter · 2024-02-23T20:30:57Z

Hey @adrien-lesur , at some point, we considered having the support of HuggingFace Inference Endpoints but we learned that it's not used widely.

How would you usually deploy those models? I assume https://github.com/neuralmagic/deepsparse or something.

adrien-lesur · 2024-02-26T08:17:59Z

Hi @asofter,
The models would usually be deployed via vLLM like documented here for Mistral.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support inference URLs for models used by scanners #101

Support inference URLs for models used by scanners #101

adrien-lesur commented Feb 22, 2024 •

edited

Loading

asofter commented Feb 23, 2024

adrien-lesur commented Feb 26, 2024

Support inference URLs for models used by scanners #101

Support inference URLs for models used by scanners #101

Comments

adrien-lesur commented Feb 22, 2024 • edited Loading

asofter commented Feb 23, 2024

adrien-lesur commented Feb 26, 2024

adrien-lesur commented Feb 22, 2024 •

edited

Loading