[ci skip] iter 323fea6

probabl-ai · Apr 19, 2024 · d8e04e8 · d8e04e8
1 parent 093a750
commit d8e04e8
Show file tree

Hide file tree

Showing 9 changed files with 72 additions and 55 deletions.
diff --git a/_sources/user_guide/information_retrieval.rst.txt b/_sources/user_guide/information_retrieval.rst.txt
@@ -44,15 +44,15 @@ approximate nearest neighbor algorithm, namely `FAISS
 As embedding, we provide a :class:`~ragger_duck.embedding.SentenceTransformer` that
 download any pre-trained sentence transformers from HuggingFace.
 
-Reranker: merging lexical and semantic retrievers
-=================================================
+Reranker: merging lexical and semantic retrievers results
+=========================================================
 
 If we use both lexical and semantic retrievers, we need to merge the results of both
 retrievers. :class:`~ragger_duck.retrieval.RetrieverReranker` makes such reranking by
 using a cross-encoder model. In our case, cross-encoder model is trained on Microsoft
 Bing query-document pairs and is available on HuggingFace.
 
-API of retrivers and Reranker
+API of retrivers and reranker
 =============================
 
 All retrievers and reranker adhere to the same API with a `fit` and `query` method.

diff --git a/_sources/user_guide/large_language_model.rst.txt b/_sources/user_guide/large_language_model.rst.txt
@@ -1,13 +1,30 @@
 .. _large_language_model:
 
-=========
-Prompting
-=========
+====================
+Large Language Model
+====================
 
-Prompting for API documentation
-===============================
+In the RAG framework, the Large Language Model (LLM) is the cherry on top. It is in
+charge of generating the answer to the query based on the context retrieved.
 
-:class:`~ragger_duck.prompt.BasicPromptingStrategy` implements a prompting
-strategy to answer documentation questions. We get context by reranking the
-search from a lexical and semantic retrievers. Once the context is retrieved,
-we request a Large Language Model (LLM) to answer the question.
+A rather important part of the LLM is related to the prompt to trigger the generation.
+In this POC, we did not intend to optimize the prompt because we did not have the data
+at hand to make a proper evaluation.
+
+:class:`~ragger_duck.prompt.BasicPromptingStrategy` allows to interface the LLM with
+the context found by the retriever. For prototyping purposes, we also allow the
+retrievers to be bypassed. The prompt provided to the LLM is the following::
+
+    prompt = (
+        "[INST] You are a scikit-learn expert that should be able to answer"
+        " machine-learning question.\n\nAnswer to the query below using the"
+        " additional provided content. The additional content is composed of"
+        " the HTML link to the source and the extracted contextual"
+        " information.\n\nBe succinct.\n\n"
+        "Make sure to use backticks whenever you refer to class, function, "
+        "method, or name that contains underscores.\n\n"
+        f"query: {query}\n\n{context_query} [/INST]."
+    )
+
+When bypassing the retrievers, we do not provide any context and the sentence related
+to this part.
diff --git a/objects.inv b/objects.inv
diff --git a/references/index.html b/references/index.html
@@ -46,7 +46,7 @@
     <link rel="index" title="Index" href="../genindex.html" />
     <link rel="search" title="Search" href="../search.html" />
     <link rel="next" title="Scraping the documentation" href="scraping.html" />
-    <link rel="prev" title="Prompting" href="../user_guide/large_language_model.html" />
+    <link rel="prev" title="Large Language Model" href="../user_guide/large_language_model.html" />
   <meta name="viewport" content="width=device-width, initial-scale=1"/>
   <meta name="docsearch:language" content="en"/>
   </head>
@@ -501,7 +501,7 @@
       <i class="fa-solid fa-angle-left"></i>
       <div class="prev-next-info">
         <p class="prev-next-subtitle">previous</p>
-        <p class="prev-next-title">Prompting</p>
+        <p class="prev-next-title">Large Language Model</p>
       </div>
     </a>
     <a class="right-next"

diff --git a/searchindex.js b/searchindex.js
diff --git a/user_guide/index.html b/user_guide/index.html
@@ -384,7 +384,7 @@
   <div class="bd-toc-item navbar-nav"><ul class="nav bd-sidenav">
 <li class="toctree-l1"><a class="reference internal" href="text_scraping.html">Text Scraping</a></li>
 <li class="toctree-l1"><a class="reference internal" href="information_retrieval.html">Retriever</a></li>
-<li class="toctree-l1"><a class="reference internal" href="large_language_model.html">Prompting</a></li>
+<li class="toctree-l1"><a class="reference internal" href="large_language_model.html">Large Language Model</a></li>
 </ul>
 </div>
 </nav></div>
@@ -533,14 +533,11 @@ <h2>Implementation details<a class="headerlink" href="#implementation-details" t
 <li class="toctree-l1"><a class="reference internal" href="information_retrieval.html">Retriever</a><ul>
 <li class="toctree-l2"><a class="reference internal" href="information_retrieval.html#lexical-retrievers">Lexical retrievers</a></li>
 <li class="toctree-l2"><a class="reference internal" href="information_retrieval.html#semantic-retrievers">Semantic retrievers</a></li>
-<li class="toctree-l2"><a class="reference internal" href="information_retrieval.html#reranker-merging-lexical-and-semantic-retrievers">Reranker: merging lexical and semantic retrievers</a></li>
-<li class="toctree-l2"><a class="reference internal" href="information_retrieval.html#api-of-retrivers-and-reranker">API of retrivers and Reranker</a></li>
-</ul>
-</li>
-<li class="toctree-l1"><a class="reference internal" href="large_language_model.html">Prompting</a><ul>
-<li class="toctree-l2"><a class="reference internal" href="large_language_model.html#prompting-for-api-documentation">Prompting for API documentation</a></li>
+<li class="toctree-l2"><a class="reference internal" href="information_retrieval.html#reranker-merging-lexical-and-semantic-retrievers-results">Reranker: merging lexical and semantic retrievers results</a></li>
+<li class="toctree-l2"><a class="reference internal" href="information_retrieval.html#api-of-retrivers-and-reranker">API of retrivers and reranker</a></li>
 </ul>
 </li>
+<li class="toctree-l1"><a class="reference internal" href="large_language_model.html">Large Language Model</a></li>
 </ul>
 </div>
 </section>

diff --git a/user_guide/information_retrieval.html b/user_guide/information_retrieval.html
@@ -45,7 +45,7 @@
     <link rel="author" title="About these documents" href="../about.html" />
     <link rel="index" title="Index" href="../genindex.html" />
     <link rel="search" title="Search" href="../search.html" />
-    <link rel="next" title="Prompting" href="large_language_model.html" />
+    <link rel="next" title="Large Language Model" href="large_language_model.html" />
     <link rel="prev" title="Text Scraping" href="text_scraping.html" />
   <meta name="viewport" content="width=device-width, initial-scale=1"/>
   <meta name="docsearch:language" content="en"/>
@@ -384,7 +384,7 @@
   <div class="bd-toc-item navbar-nav"><ul class="current nav bd-sidenav">
 <li class="toctree-l1"><a class="reference internal" href="text_scraping.html">Text Scraping</a></li>
 <li class="toctree-l1 current active"><a class="current reference internal" href="#">Retriever</a></li>
-<li class="toctree-l1"><a class="reference internal" href="large_language_model.html">Prompting</a></li>
+<li class="toctree-l1"><a class="reference internal" href="large_language_model.html">Large Language Model</a></li>
 </ul>
 </div>
 </nav></div>
@@ -475,15 +475,15 @@ <h2>Semantic retrievers<a class="headerlink" href="#semantic-retrievers" title="
 <p>As embedding, we provide a <a class="reference internal" href="../references/generated/ragger_duck.embedding.SentenceTransformer.html#ragger_duck.embedding.SentenceTransformer" title="ragger_duck.embedding.SentenceTransformer"><code class="xref py py-class docutils literal notranslate"><span class="pre">SentenceTransformer</span></code></a> that
 download any pre-trained sentence transformers from HuggingFace.</p>
 </section>
-<section id="reranker-merging-lexical-and-semantic-retrievers">
-<h2>Reranker: merging lexical and semantic retrievers<a class="headerlink" href="#reranker-merging-lexical-and-semantic-retrievers" title="Link to this heading">#</a></h2>
+<section id="reranker-merging-lexical-and-semantic-retrievers-results">
+<h2>Reranker: merging lexical and semantic retrievers results<a class="headerlink" href="#reranker-merging-lexical-and-semantic-retrievers-results" title="Link to this heading">#</a></h2>
 <p>If we use both lexical and semantic retrievers, we need to merge the results of both
 retrievers. <a class="reference internal" href="../references/generated/ragger_duck.retrieval.RetrieverReranker.html#ragger_duck.retrieval.RetrieverReranker" title="ragger_duck.retrieval.RetrieverReranker"><code class="xref py py-class docutils literal notranslate"><span class="pre">RetrieverReranker</span></code></a> makes such reranking by
 using a cross-encoder model. In our case, cross-encoder model is trained on Microsoft
 Bing query-document pairs and is available on HuggingFace.</p>
 </section>
 <section id="api-of-retrivers-and-reranker">
-<h2>API of retrivers and Reranker<a class="headerlink" href="#api-of-retrivers-and-reranker" title="Link to this heading">#</a></h2>
+<h2>API of retrivers and reranker<a class="headerlink" href="#api-of-retrivers-and-reranker" title="Link to this heading">#</a></h2>
 <p>All retrievers and reranker adhere to the same API with a <code class="docutils literal notranslate"><span class="pre">fit</span></code> and <code class="docutils literal notranslate"><span class="pre">query</span></code> method.
 For the retrievers, the <code class="docutils literal notranslate"><span class="pre">fit</span></code> method is used to create the index while the <code class="docutils literal notranslate"><span class="pre">query</span></code>
 method is used to retrieve the top-k documents given a query.</p>
@@ -514,7 +514,7 @@ <h2>API of retrivers and Reranker<a class="headerlink" href="#api-of-retrivers-a
        title="next page">
       <div class="prev-next-info">
         <p class="prev-next-subtitle">next</p>
-        <p class="prev-next-title">Prompting</p>
+        <p class="prev-next-title">Large Language Model</p>
       </div>
       <i class="fa-solid fa-angle-right"></i>
     </a>
@@ -538,8 +538,8 @@ <h2>API of retrivers and Reranker<a class="headerlink" href="#api-of-retrivers-a
     <ul class="visible nav section-nav flex-column">
 <li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#lexical-retrievers">Lexical retrievers</a></li>
 <li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#semantic-retrievers">Semantic retrievers</a></li>
-<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#reranker-merging-lexical-and-semantic-retrievers">Reranker: merging lexical and semantic retrievers</a></li>
-<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#api-of-retrivers-and-reranker">API of retrivers and Reranker</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#reranker-merging-lexical-and-semantic-retrievers-results">Reranker: merging lexical and semantic retrievers results</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#api-of-retrivers-and-reranker">API of retrivers and reranker</a></li>
 </ul>
   </nav></div>
 

diff --git a/user_guide/large_language_model.html b/user_guide/large_language_model.html
@@ -8,7 +8,7 @@
     <meta charset="utf-8" />
     <meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="viewport" content="width=device-width, initial-scale=1" />
 
-    <title>Prompting &#8212; Ragger Duck 0.0.1.dev0 documentation</title>
+    <title>Large Language Model &#8212; Ragger Duck 0.0.1.dev0 documentation</title>
 
 
 
@@ -384,7 +384,7 @@
   <div class="bd-toc-item navbar-nav"><ul class="current nav bd-sidenav">
 <li class="toctree-l1"><a class="reference internal" href="text_scraping.html">Text Scraping</a></li>
 <li class="toctree-l1"><a class="reference internal" href="information_retrieval.html">Retriever</a></li>
-<li class="toctree-l1 current active"><a class="current reference internal" href="#">Prompting</a></li>
+<li class="toctree-l1 current active"><a class="current reference internal" href="#">Large Language Model</a></li>
 </ul>
 </div>
 </nav></div>
@@ -425,7 +425,7 @@
 
     <li class="breadcrumb-item"><a href="index.html" class="nav-link">User Guide</a></li>
 
-    <li class="breadcrumb-item active" aria-current="page">Prompting</li>
+    <li class="breadcrumb-item active" aria-current="page">Large Language Model</li>
   </ul>
 </nav>
 </div>
@@ -442,15 +442,30 @@
 <div id="searchbox"></div>
                 <article class="bd-article">
 
-  <section id="prompting">
-<span id="large-language-model"></span><h1>Prompting<a class="headerlink" href="#prompting" title="Link to this heading">#</a></h1>
-<section id="prompting-for-api-documentation">
-<h2>Prompting for API documentation<a class="headerlink" href="#prompting-for-api-documentation" title="Link to this heading">#</a></h2>
-<p><a class="reference internal" href="../references/generated/ragger_duck.prompt.BasicPromptingStrategy.html#ragger_duck.prompt.BasicPromptingStrategy" title="ragger_duck.prompt.BasicPromptingStrategy"><code class="xref py py-class docutils literal notranslate"><span class="pre">BasicPromptingStrategy</span></code></a> implements a prompting
-strategy to answer documentation questions. We get context by reranking the
-search from a lexical and semantic retrievers. Once the context is retrieved,
-we request a Large Language Model (LLM) to answer the question.</p>
-</section>
+  <section id="large-language-model">
+<span id="id1"></span><h1>Large Language Model<a class="headerlink" href="#large-language-model" title="Link to this heading">#</a></h1>
+<p>In the RAG framework, the Large Language Model (LLM) is the cherry on top. It is in
+charge of generating the answer to the query based on the context retrieved.</p>
+<p>A rather important part of the LLM is related to the prompt to trigger the generation.
+In this POC, we did not intend to optimize the prompt because we did not have the data
+at hand to make a proper evaluation.</p>
+<p><a class="reference internal" href="../references/generated/ragger_duck.prompt.BasicPromptingStrategy.html#ragger_duck.prompt.BasicPromptingStrategy" title="ragger_duck.prompt.BasicPromptingStrategy"><code class="xref py py-class docutils literal notranslate"><span class="pre">BasicPromptingStrategy</span></code></a> allows to interface the LLM with
+the context found by the retriever. For prototyping purposes, we also allow the
+retrievers to be bypassed. The prompt provided to the LLM is the following:</p>
+<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">prompt</span> <span class="o">=</span> <span class="p">(</span>
+    <span class="s2">&quot;[INST] You are a scikit-learn expert that should be able to answer&quot;</span>
+    <span class="s2">&quot; machine-learning question.</span><span class="se">\n\n</span><span class="s2">Answer to the query below using the&quot;</span>
+    <span class="s2">&quot; additional provided content. The additional content is composed of&quot;</span>
+    <span class="s2">&quot; the HTML link to the source and the extracted contextual&quot;</span>
+    <span class="s2">&quot; information.</span><span class="se">\n\n</span><span class="s2">Be succinct.</span><span class="se">\n\n</span><span class="s2">&quot;</span>
+    <span class="s2">&quot;Make sure to use backticks whenever you refer to class, function, &quot;</span>
+    <span class="s2">&quot;method, or name that contains underscores.</span><span class="se">\n\n</span><span class="s2">&quot;</span>
+    <span class="sa">f</span><span class="s2">&quot;query: </span><span class="si">{</span><span class="n">query</span><span class="si">}</span><span class="se">\n\n</span><span class="si">{</span><span class="n">context_query</span><span class="si">}</span><span class="s2"> [/INST].&quot;</span>
+<span class="p">)</span>
+</pre></div>
+</div>
+<p>When bypassing the retrievers, we do not provide any context and the sentence related
+to this part.</p>
 </section>
 
 
@@ -492,18 +507,6 @@ <h2>Prompting for API documentation<a class="headerlink" href="#prompting-for-ap
 
 
   <div class="sidebar-secondary-item">
-<div
-    id="pst-page-navigation-heading-2"
-    class="page-toc tocsection onthispage">
-    <i class="fa-solid fa-list"></i> On this page
-  </div>
-  <nav class="bd-toc-nav page-toc" aria-labelledby="pst-page-navigation-heading-2">
-    <ul class="visible nav section-nav flex-column">
-<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#prompting-for-api-documentation">Prompting for API documentation</a></li>
-</ul>
-  </nav></div>
-
-  <div class="sidebar-secondary-item">
 
 
   <div class="tocsection editthispage">

diff --git a/user_guide/text_scraping.html b/user_guide/text_scraping.html
@@ -384,7 +384,7 @@
   <div class="bd-toc-item navbar-nav"><ul class="current nav bd-sidenav">
 <li class="toctree-l1 current active"><a class="current reference internal" href="#">Text Scraping</a></li>
 <li class="toctree-l1"><a class="reference internal" href="information_retrieval.html">Retriever</a></li>
-<li class="toctree-l1"><a class="reference internal" href="large_language_model.html">Prompting</a></li>
+<li class="toctree-l1"><a class="reference internal" href="large_language_model.html">Large Language Model</a></li>
 </ul>
 </div>
 </nav></div>