Skip to content

Commit

Permalink
Deployed 67d748c to master with MkDocs 1.6.0 and mike 2.1.1
Browse files Browse the repository at this point in the history
  • Loading branch information
github-actions[bot] committed Jun 10, 2024
1 parent dc3b40e commit c6ae8db
Show file tree
Hide file tree
Showing 2 changed files with 6 additions and 7 deletions.
11 changes: 5 additions & 6 deletions master/modelserving/v1beta1/llm/huggingface/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -1263,7 +1263,7 @@ <h4 id="perform-model-inference">Perform Model Inference<a class="headerlink" hr
<div class="no-copy highlight"><pre><span></span><code><span class="w"> </span><span class="p">{</span><span class="nt">"id"</span><span class="p">:</span><span class="s2">"cmpl-87ee252062934e2f8f918dce011e8484"</span><span class="p">,</span><span class="nt">"choices"</span><span class="p">:[{</span><span class="nt">"finish_reason"</span><span class="p">:</span><span class="s2">"length"</span><span class="p">,</span><span class="nt">"index"</span><span class="p">:</span><span class="mi">0</span><span class="p">,</span><span class="nt">"message"</span><span class="p">:{</span><span class="nt">"content"</span><span class="p">:</span><span class="s2">"&lt;generated_response&gt;"</span><span class="p">,</span><span class="nt">"tool_calls"</span><span class="p">:</span><span class="kc">null</span><span class="p">,</span><span class="nt">"role"</span><span class="p">:</span><span class="s2">"assistant"</span><span class="p">,</span><span class="nt">"function_call"</span><span class="p">:</span><span class="kc">null</span><span class="p">},</span><span class="nt">"logprobs"</span><span class="p">:</span><span class="kc">null</span><span class="p">}],</span><span class="nt">"created"</span><span class="p">:</span><span class="mi">1715353461</span><span class="p">,</span><span class="nt">"model"</span><span class="p">:</span><span class="s2">"llama3"</span><span class="p">,</span><span class="nt">"system_fingerprint"</span><span class="p">:</span><span class="kc">null</span><span class="p">,</span><span class="nt">"object"</span><span class="p">:</span><span class="s2">"chat.completion"</span><span class="p">,</span><span class="nt">"usage"</span><span class="p">:{</span><span class="nt">"completion_tokens"</span><span class="p">:</span><span class="mi">30</span><span class="p">,</span><span class="nt">"prompt_tokens"</span><span class="p">:</span><span class="mi">3</span><span class="p">,</span><span class="nt">"total_tokens"</span><span class="p">:</span><span class="mi">33</span><span class="p">}}</span>
</code></pre></div>
<h3 id="serve-the-hugging-face-llm-model-using-huggingface-backend">Serve the Hugging Face LLM model using HuggingFace Backend<a class="headerlink" href="#serve-the-hugging-face-llm-model-using-huggingface-backend" title="Permanent link"></a></h3>
<p>You can use <code>--backend=huggingface</code> arg to perform the inference using Hugging Face. KServe Hugging Face backend runtime also
<p>You can use <code>--backend=huggingface</code> argument to perform the inference using Hugging Face API. KServe Hugging Face backend runtime also
supports the OpenAI <code>/v1/completions</code> and <code>/v1/chat/completions</code> endpoints for inference.</p>
<div class="tabbed-set tabbed-alternate" data-tabs="2:1"><input checked="checked" id="__tabbed_2_1" name="__tabbed_2" type="radio"><div class="tabbed-labels"><label for="__tabbed_2_1">Yaml</label></div>
<div class="tabbed-content">
Expand All @@ -1284,12 +1284,12 @@ <h3 id="serve-the-hugging-face-llm-model-using-huggingface-backend">Serve the Hu
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">--backend=huggingface</span>
<span class="w"> </span><span class="nt">resources</span><span class="p">:</span>
<span class="w"> </span><span class="nt">limits</span><span class="p">:</span>
<span class="w"> </span><span class="nt">cpu</span><span class="p">:</span><span class="w"> </span><span class="s">"6"</span>
<span class="w"> </span><span class="nt">memory</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">24Gi</span>
<span class="w"> </span><span class="nt">cpu</span><span class="p">:</span><span class="w"> </span><span class="s">"1"</span>
<span class="w"> </span><span class="nt">memory</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">2Gi</span>
<span class="w"> </span><span class="nt">nvidia.com/gpu</span><span class="p">:</span><span class="w"> </span><span class="s">"1"</span>
<span class="w"> </span><span class="nt">requests</span><span class="p">:</span>
<span class="w"> </span><span class="nt">cpu</span><span class="p">:</span><span class="w"> </span><span class="s">"6"</span>
<span class="w"> </span><span class="nt">memory</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">24Gi</span>
<span class="w"> </span><span class="nt">cpu</span><span class="p">:</span><span class="w"> </span><span class="s">"1"</span>
<span class="w"> </span><span class="nt">memory</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">2Gi</span>
<span class="w"> </span><span class="nt">nvidia.com/gpu</span><span class="p">:</span><span class="w"> </span><span class="s">"1"</span>
<span class="l l-Scalar l-Scalar-Plain">EOF</span>
</code></pre></div>
Expand All @@ -1301,7 +1301,6 @@ <h4 id="perform-model-inference_1">Perform Model Inference<a class="headerlink"
<div class="highlight"><pre><span></span><code><span class="nv">MODEL_NAME</span><span class="o">=</span>t5
<span class="nv">SERVICE_HOSTNAME</span><span class="o">=</span><span class="k">$(</span>kubectl<span class="w"> </span>get<span class="w"> </span>inferenceservice<span class="w"> </span>huggingface-t5<span class="w"> </span>-o<span class="w"> </span><span class="nv">jsonpath</span><span class="o">=</span><span class="s1">'{.status.url}'</span><span class="w"> </span><span class="p">|</span><span class="w"> </span>cut<span class="w"> </span>-d<span class="w"> </span><span class="s2">"/"</span><span class="w"> </span>-f<span class="w"> </span><span class="m">3</span><span class="k">)</span>
</code></pre></div>
<p>KServe Hugging Face vLLM runtime supports the OpenAI <code>/v1/completions</code> and <code>/v1/chat/completions</code> endpoints for inference</p>
<p>Sample OpenAI Completions request:</p>
<div class="highlight"><pre><span></span><code>curl<span class="w"> </span>-H<span class="w"> </span><span class="s2">"content-type:application/json"</span><span class="w"> </span>-H<span class="w"> </span><span class="s2">"Host: </span><span class="si">${</span><span class="nv">SERVICE_HOSTNAME</span><span class="si">}</span><span class="s2">"</span><span class="w"> </span>-v<span class="w"> </span>http://<span class="si">${</span><span class="nv">INGRESS_HOST</span><span class="si">}</span>:<span class="si">${</span><span class="nv">INGRESS_PORT</span><span class="si">}</span>/openai/v1/completions<span class="w"> </span>-d<span class="w"> </span><span class="s1">'{"model": "${MODEL_NAME}", "prompt": "translate English to German: The house is wonderful.", "stream":false, "max_tokens": 30 }'</span>
</code></pre></div>
Expand Down
2 changes: 1 addition & 1 deletion master/search/search_index.json

Large diffs are not rendered by default.

0 comments on commit c6ae8db

Please sign in to comment.