Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Hugging Face #14412

Merged
merged 2 commits into from
Nov 12, 2024
Merged

Add support for Hugging Face #14412

merged 2 commits into from
Nov 12, 2024

Conversation

JonasHelming
Copy link
Contributor

fixed #14411

What it does

Adds support for using Hugging Face as an inference provider. This was possible before by using the OpenAI API, but some models require custom parameters

How to test

Create a HF Account and an API Key. Copy any model in the settings and select it to be used in the AI configuration view.
For simplicity, select a model that supports server less so that inference is for free.
For StarCoder code completion you can use:

The language is {{language}}.
<fim_prefix>{{textUntilCurrentPosition}}<fim_suffix>{{textAfterCurrentPosition}}<fim_middle>

Follow-ups

  • We should think about prompt management for agents so that you can define different prompt depending on the connected model.
  • We might want to harmonize the LLM Providers we have and extract common behavior soon

Review checklist

Reminder for reviewers

fixed #14411

Signed-off-by: Jonas Helming <[email protected]>
Copy link
Member

@sdirix sdirix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In principle it seems to work. I was able to communicate with starcoder hosted at huggingface.

  • There is an issue with the language model management which I described in detail in the comments
  • I used the suggested
    The language is {{language}}.
    <fim_prefix>{{textUntilCurrentPosition}}<fim_suffix>{{textAfterCurrentPosition}}<fim_middle>
    
    as the code-completion prompt but only got very bad completions back 🤷

Comment on lines 61 to 63
if (!token) {
throw new Error('Please provide a Hugging Face API token.');
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not throw errors for expected code paths. This is even the default code path as the api key will be undefined by default. Therefore this will show up as an error for all users in their logs.

If we don't want to be able to configure hugging face models without keys, then the huggingface-language-model-manager should not even create them.

Note that if the user deletes the apiKey in their preferences, you will still use it as the hfInference is only created once here. In fact the whole dynamic apiKey provider is not utilized in this implementation as it's only called once.

I would like to suggest to either make the apiKey static here and making sure that the models are removed/recreated if the apiKey changes or refactor them to being able to handle non existent api keys at request time, like in the open ai language model implementation.

@JonasHelming
Copy link
Contributor Author

@sdirix I changed the behavior so that API Key changes are respected (used the APIkeyProvider). It is now similar to the openAI provider. If no API key is set, the models are visible, but an error is thrown. This is catched by the chat and shown to the user (just like with the open AI provider)

Signed-off-by: Jonas Helming <[email protected]>
Copy link
Member

@sdirix sdirix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works for me with the suggested starcoder model. I have one more question regarding the code.

Comment on lines +94 to +106
parameters: {
temperature: 0.1,
max_new_tokens: 200,
return_full_text: false,
do_sample: true,
stop: ['<|endoftext|>']
}
});

const asyncIterator = {
async *[Symbol.asyncIterator](): AsyncIterator<LanguageModelStreamResponsePart> {
for await (const chunk of stream) {
const content = chunk.token.text.replace(/<\|endoftext\|>/g, '');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this work for most/all hugging face models or only for starcoder? If this is starcoder specific then we should check for that model before setting the parameters / replacing the content.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found that almost all HuggingFace Model return stop charakters. And that it is most robust to explcitly specify it and then replace it. Good test case is meta-llama/Llama-3.2-3B-Instruct which is always "warm"

@JonasHelming JonasHelming merged commit 8c69d73 into master Nov 12, 2024
11 checks passed
@github-actions github-actions bot added this to the 1.56.0 milestone Nov 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

[Theia AI] Support for Hugging Face
2 participants