Skip to content

Commit

Permalink
[ci skip] iter 323fea6
Browse files Browse the repository at this point in the history
  • Loading branch information
glemaitre committed Apr 19, 2024
1 parent 093a750 commit d8e04e8
Show file tree
Hide file tree
Showing 9 changed files with 72 additions and 55 deletions.
6 changes: 3 additions & 3 deletions _sources/user_guide/information_retrieval.rst.txt
Original file line number Diff line number Diff line change
Expand Up @@ -44,15 +44,15 @@ approximate nearest neighbor algorithm, namely `FAISS
As embedding, we provide a :class:`~ragger_duck.embedding.SentenceTransformer` that
download any pre-trained sentence transformers from HuggingFace.

Reranker: merging lexical and semantic retrievers
=================================================
Reranker: merging lexical and semantic retrievers results
=========================================================

If we use both lexical and semantic retrievers, we need to merge the results of both
retrievers. :class:`~ragger_duck.retrieval.RetrieverReranker` makes such reranking by
using a cross-encoder model. In our case, cross-encoder model is trained on Microsoft
Bing query-document pairs and is available on HuggingFace.

API of retrivers and Reranker
API of retrivers and reranker
=============================

All retrievers and reranker adhere to the same API with a `fit` and `query` method.
Expand Down
35 changes: 26 additions & 9 deletions _sources/user_guide/large_language_model.rst.txt
Original file line number Diff line number Diff line change
@@ -1,13 +1,30 @@
.. _large_language_model:

=========
Prompting
=========
====================
Large Language Model
====================

Prompting for API documentation
===============================
In the RAG framework, the Large Language Model (LLM) is the cherry on top. It is in
charge of generating the answer to the query based on the context retrieved.

:class:`~ragger_duck.prompt.BasicPromptingStrategy` implements a prompting
strategy to answer documentation questions. We get context by reranking the
search from a lexical and semantic retrievers. Once the context is retrieved,
we request a Large Language Model (LLM) to answer the question.
A rather important part of the LLM is related to the prompt to trigger the generation.
In this POC, we did not intend to optimize the prompt because we did not have the data
at hand to make a proper evaluation.

:class:`~ragger_duck.prompt.BasicPromptingStrategy` allows to interface the LLM with
the context found by the retriever. For prototyping purposes, we also allow the
retrievers to be bypassed. The prompt provided to the LLM is the following::

prompt = (
"[INST] You are a scikit-learn expert that should be able to answer"
" machine-learning question.\n\nAnswer to the query below using the"
" additional provided content. The additional content is composed of"
" the HTML link to the source and the extracted contextual"
" information.\n\nBe succinct.\n\n"
"Make sure to use backticks whenever you refer to class, function, "
"method, or name that contains underscores.\n\n"
f"query: {query}\n\n{context_query} [/INST]."
)

When bypassing the retrievers, we do not provide any context and the sentence related
to this part.
Binary file modified objects.inv
Binary file not shown.
4 changes: 2 additions & 2 deletions references/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@
<link rel="index" title="Index" href="../genindex.html" />
<link rel="search" title="Search" href="../search.html" />
<link rel="next" title="Scraping the documentation" href="scraping.html" />
<link rel="prev" title="Prompting" href="../user_guide/large_language_model.html" />
<link rel="prev" title="Large Language Model" href="../user_guide/large_language_model.html" />
<meta name="viewport" content="width=device-width, initial-scale=1"/>
<meta name="docsearch:language" content="en"/>
</head>
Expand Down Expand Up @@ -501,7 +501,7 @@
<i class="fa-solid fa-angle-left"></i>
<div class="prev-next-info">
<p class="prev-next-subtitle">previous</p>
<p class="prev-next-title">Prompting</p>
<p class="prev-next-title">Large Language Model</p>
</div>
</a>
<a class="right-next"
Expand Down
2 changes: 1 addition & 1 deletion searchindex.js

Large diffs are not rendered by default.

11 changes: 4 additions & 7 deletions user_guide/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -384,7 +384,7 @@
<div class="bd-toc-item navbar-nav"><ul class="nav bd-sidenav">
<li class="toctree-l1"><a class="reference internal" href="text_scraping.html">Text Scraping</a></li>
<li class="toctree-l1"><a class="reference internal" href="information_retrieval.html">Retriever</a></li>
<li class="toctree-l1"><a class="reference internal" href="large_language_model.html">Prompting</a></li>
<li class="toctree-l1"><a class="reference internal" href="large_language_model.html">Large Language Model</a></li>
</ul>
</div>
</nav></div>
Expand Down Expand Up @@ -533,14 +533,11 @@ <h2>Implementation details<a class="headerlink" href="#implementation-details" t
<li class="toctree-l1"><a class="reference internal" href="information_retrieval.html">Retriever</a><ul>
<li class="toctree-l2"><a class="reference internal" href="information_retrieval.html#lexical-retrievers">Lexical retrievers</a></li>
<li class="toctree-l2"><a class="reference internal" href="information_retrieval.html#semantic-retrievers">Semantic retrievers</a></li>
<li class="toctree-l2"><a class="reference internal" href="information_retrieval.html#reranker-merging-lexical-and-semantic-retrievers">Reranker: merging lexical and semantic retrievers</a></li>
<li class="toctree-l2"><a class="reference internal" href="information_retrieval.html#api-of-retrivers-and-reranker">API of retrivers and Reranker</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="large_language_model.html">Prompting</a><ul>
<li class="toctree-l2"><a class="reference internal" href="large_language_model.html#prompting-for-api-documentation">Prompting for API documentation</a></li>
<li class="toctree-l2"><a class="reference internal" href="information_retrieval.html#reranker-merging-lexical-and-semantic-retrievers-results">Reranker: merging lexical and semantic retrievers results</a></li>
<li class="toctree-l2"><a class="reference internal" href="information_retrieval.html#api-of-retrivers-and-reranker">API of retrivers and reranker</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="large_language_model.html">Large Language Model</a></li>
</ul>
</div>
</section>
Expand Down
16 changes: 8 additions & 8 deletions user_guide/information_retrieval.html
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@
<link rel="author" title="About these documents" href="../about.html" />
<link rel="index" title="Index" href="../genindex.html" />
<link rel="search" title="Search" href="../search.html" />
<link rel="next" title="Prompting" href="large_language_model.html" />
<link rel="next" title="Large Language Model" href="large_language_model.html" />
<link rel="prev" title="Text Scraping" href="text_scraping.html" />
<meta name="viewport" content="width=device-width, initial-scale=1"/>
<meta name="docsearch:language" content="en"/>
Expand Down Expand Up @@ -384,7 +384,7 @@
<div class="bd-toc-item navbar-nav"><ul class="current nav bd-sidenav">
<li class="toctree-l1"><a class="reference internal" href="text_scraping.html">Text Scraping</a></li>
<li class="toctree-l1 current active"><a class="current reference internal" href="#">Retriever</a></li>
<li class="toctree-l1"><a class="reference internal" href="large_language_model.html">Prompting</a></li>
<li class="toctree-l1"><a class="reference internal" href="large_language_model.html">Large Language Model</a></li>
</ul>
</div>
</nav></div>
Expand Down Expand Up @@ -475,15 +475,15 @@ <h2>Semantic retrievers<a class="headerlink" href="#semantic-retrievers" title="
<p>As embedding, we provide a <a class="reference internal" href="../references/generated/ragger_duck.embedding.SentenceTransformer.html#ragger_duck.embedding.SentenceTransformer" title="ragger_duck.embedding.SentenceTransformer"><code class="xref py py-class docutils literal notranslate"><span class="pre">SentenceTransformer</span></code></a> that
download any pre-trained sentence transformers from HuggingFace.</p>
</section>
<section id="reranker-merging-lexical-and-semantic-retrievers">
<h2>Reranker: merging lexical and semantic retrievers<a class="headerlink" href="#reranker-merging-lexical-and-semantic-retrievers" title="Link to this heading">#</a></h2>
<section id="reranker-merging-lexical-and-semantic-retrievers-results">
<h2>Reranker: merging lexical and semantic retrievers results<a class="headerlink" href="#reranker-merging-lexical-and-semantic-retrievers-results" title="Link to this heading">#</a></h2>
<p>If we use both lexical and semantic retrievers, we need to merge the results of both
retrievers. <a class="reference internal" href="../references/generated/ragger_duck.retrieval.RetrieverReranker.html#ragger_duck.retrieval.RetrieverReranker" title="ragger_duck.retrieval.RetrieverReranker"><code class="xref py py-class docutils literal notranslate"><span class="pre">RetrieverReranker</span></code></a> makes such reranking by
using a cross-encoder model. In our case, cross-encoder model is trained on Microsoft
Bing query-document pairs and is available on HuggingFace.</p>
</section>
<section id="api-of-retrivers-and-reranker">
<h2>API of retrivers and Reranker<a class="headerlink" href="#api-of-retrivers-and-reranker" title="Link to this heading">#</a></h2>
<h2>API of retrivers and reranker<a class="headerlink" href="#api-of-retrivers-and-reranker" title="Link to this heading">#</a></h2>
<p>All retrievers and reranker adhere to the same API with a <code class="docutils literal notranslate"><span class="pre">fit</span></code> and <code class="docutils literal notranslate"><span class="pre">query</span></code> method.
For the retrievers, the <code class="docutils literal notranslate"><span class="pre">fit</span></code> method is used to create the index while the <code class="docutils literal notranslate"><span class="pre">query</span></code>
method is used to retrieve the top-k documents given a query.</p>
Expand Down Expand Up @@ -514,7 +514,7 @@ <h2>API of retrivers and Reranker<a class="headerlink" href="#api-of-retrivers-a
title="next page">
<div class="prev-next-info">
<p class="prev-next-subtitle">next</p>
<p class="prev-next-title">Prompting</p>
<p class="prev-next-title">Large Language Model</p>
</div>
<i class="fa-solid fa-angle-right"></i>
</a>
Expand All @@ -538,8 +538,8 @@ <h2>API of retrivers and Reranker<a class="headerlink" href="#api-of-retrivers-a
<ul class="visible nav section-nav flex-column">
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#lexical-retrievers">Lexical retrievers</a></li>
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#semantic-retrievers">Semantic retrievers</a></li>
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#reranker-merging-lexical-and-semantic-retrievers">Reranker: merging lexical and semantic retrievers</a></li>
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#api-of-retrivers-and-reranker">API of retrivers and Reranker</a></li>
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#reranker-merging-lexical-and-semantic-retrievers-results">Reranker: merging lexical and semantic retrievers results</a></li>
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#api-of-retrivers-and-reranker">API of retrivers and reranker</a></li>
</ul>
</nav></div>

Expand Down
51 changes: 27 additions & 24 deletions user_guide/large_language_model.html
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="viewport" content="width=device-width, initial-scale=1" />

<title>Prompting &#8212; Ragger Duck 0.0.1.dev0 documentation</title>
<title>Large Language Model &#8212; Ragger Duck 0.0.1.dev0 documentation</title>



Expand Down Expand Up @@ -384,7 +384,7 @@
<div class="bd-toc-item navbar-nav"><ul class="current nav bd-sidenav">
<li class="toctree-l1"><a class="reference internal" href="text_scraping.html">Text Scraping</a></li>
<li class="toctree-l1"><a class="reference internal" href="information_retrieval.html">Retriever</a></li>
<li class="toctree-l1 current active"><a class="current reference internal" href="#">Prompting</a></li>
<li class="toctree-l1 current active"><a class="current reference internal" href="#">Large Language Model</a></li>
</ul>
</div>
</nav></div>
Expand Down Expand Up @@ -425,7 +425,7 @@

<li class="breadcrumb-item"><a href="index.html" class="nav-link">User Guide</a></li>

<li class="breadcrumb-item active" aria-current="page">Prompting</li>
<li class="breadcrumb-item active" aria-current="page">Large Language Model</li>
</ul>
</nav>
</div>
Expand All @@ -442,15 +442,30 @@
<div id="searchbox"></div>
<article class="bd-article">

<section id="prompting">
<span id="large-language-model"></span><h1>Prompting<a class="headerlink" href="#prompting" title="Link to this heading">#</a></h1>
<section id="prompting-for-api-documentation">
<h2>Prompting for API documentation<a class="headerlink" href="#prompting-for-api-documentation" title="Link to this heading">#</a></h2>
<p><a class="reference internal" href="../references/generated/ragger_duck.prompt.BasicPromptingStrategy.html#ragger_duck.prompt.BasicPromptingStrategy" title="ragger_duck.prompt.BasicPromptingStrategy"><code class="xref py py-class docutils literal notranslate"><span class="pre">BasicPromptingStrategy</span></code></a> implements a prompting
strategy to answer documentation questions. We get context by reranking the
search from a lexical and semantic retrievers. Once the context is retrieved,
we request a Large Language Model (LLM) to answer the question.</p>
</section>
<section id="large-language-model">
<span id="id1"></span><h1>Large Language Model<a class="headerlink" href="#large-language-model" title="Link to this heading">#</a></h1>
<p>In the RAG framework, the Large Language Model (LLM) is the cherry on top. It is in
charge of generating the answer to the query based on the context retrieved.</p>
<p>A rather important part of the LLM is related to the prompt to trigger the generation.
In this POC, we did not intend to optimize the prompt because we did not have the data
at hand to make a proper evaluation.</p>
<p><a class="reference internal" href="../references/generated/ragger_duck.prompt.BasicPromptingStrategy.html#ragger_duck.prompt.BasicPromptingStrategy" title="ragger_duck.prompt.BasicPromptingStrategy"><code class="xref py py-class docutils literal notranslate"><span class="pre">BasicPromptingStrategy</span></code></a> allows to interface the LLM with
the context found by the retriever. For prototyping purposes, we also allow the
retrievers to be bypassed. The prompt provided to the LLM is the following:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">prompt</span> <span class="o">=</span> <span class="p">(</span>
<span class="s2">&quot;[INST] You are a scikit-learn expert that should be able to answer&quot;</span>
<span class="s2">&quot; machine-learning question.</span><span class="se">\n\n</span><span class="s2">Answer to the query below using the&quot;</span>
<span class="s2">&quot; additional provided content. The additional content is composed of&quot;</span>
<span class="s2">&quot; the HTML link to the source and the extracted contextual&quot;</span>
<span class="s2">&quot; information.</span><span class="se">\n\n</span><span class="s2">Be succinct.</span><span class="se">\n\n</span><span class="s2">&quot;</span>
<span class="s2">&quot;Make sure to use backticks whenever you refer to class, function, &quot;</span>
<span class="s2">&quot;method, or name that contains underscores.</span><span class="se">\n\n</span><span class="s2">&quot;</span>
<span class="sa">f</span><span class="s2">&quot;query: </span><span class="si">{</span><span class="n">query</span><span class="si">}</span><span class="se">\n\n</span><span class="si">{</span><span class="n">context_query</span><span class="si">}</span><span class="s2"> [/INST].&quot;</span>
<span class="p">)</span>
</pre></div>
</div>
<p>When bypassing the retrievers, we do not provide any context and the sentence related
to this part.</p>
</section>


Expand Down Expand Up @@ -492,18 +507,6 @@ <h2>Prompting for API documentation<a class="headerlink" href="#prompting-for-ap


<div class="sidebar-secondary-item">
<div
id="pst-page-navigation-heading-2"
class="page-toc tocsection onthispage">
<i class="fa-solid fa-list"></i> On this page
</div>
<nav class="bd-toc-nav page-toc" aria-labelledby="pst-page-navigation-heading-2">
<ul class="visible nav section-nav flex-column">
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#prompting-for-api-documentation">Prompting for API documentation</a></li>
</ul>
</nav></div>

<div class="sidebar-secondary-item">


<div class="tocsection editthispage">
Expand Down
2 changes: 1 addition & 1 deletion user_guide/text_scraping.html
Original file line number Diff line number Diff line change
Expand Up @@ -384,7 +384,7 @@
<div class="bd-toc-item navbar-nav"><ul class="current nav bd-sidenav">
<li class="toctree-l1 current active"><a class="current reference internal" href="#">Text Scraping</a></li>
<li class="toctree-l1"><a class="reference internal" href="information_retrieval.html">Retriever</a></li>
<li class="toctree-l1"><a class="reference internal" href="large_language_model.html">Prompting</a></li>
<li class="toctree-l1"><a class="reference internal" href="large_language_model.html">Large Language Model</a></li>
</ul>
</div>
</nav></div>
Expand Down

0 comments on commit d8e04e8

Please sign in to comment.