first draft of alignment chapter

souzatharsis · Dec 15, 2024 · 1e6ce32 · 1e6ce32
1 parent 3f9d131
commit 1e6ce32
Show file tree

Hide file tree

Showing 22 changed files with 84 additions and 82 deletions.
diff --git a/README.md b/README.md
@@ -108,14 +108,15 @@ Abstract: *The current discourse around Large Language Models (LLMs) tends to fo
   - 6.5.1 Building a RAG Pipeline
   - 6.5.2 Testing and Validation
 
-## Chapter 7: Safety Concerns
-- 7.1 Common Safety Issues
-- 7.2 Implementation of Safety Guards
-- 7.3 Content Filtering
-- 7.4 Input Validation
-- 7.5 Output Sanitization
-- 7.6 Monitoring and Alerts
-- 7.7 Best Practices
+## Chapter 7: [Preference-based Alignment](https://www.souzatharsis.com/tamingLLMs/notebooks/alignment.html)
+- 7.1 Introduction
+- 7.2 From Raw Capabilities to Preference Alignment
+- 7.3 On the Misalignment of Language Models
+- 7.4 Aligning Language Models with Human Preferences
+- 7.5 Supervised Fine-Tuning (SFT) for Model Alignment
+- 7.6 Augmenting SFT with Human Preferences
+- 7.7 Case Study: Aligning a Language Model to a Policy
+- 7.8 Discussion
 
 ## Chapter 8: The Cost Factor
 - 8.1 Understanding LLM Costs

diff --git a/tamingllms/_build/.doctrees/environment.pickle b/tamingllms/_build/.doctrees/environment.pickle
diff --git a/tamingllms/_build/.doctrees/markdown/intro.doctree b/tamingllms/_build/.doctrees/markdown/intro.doctree
diff --git a/tamingllms/_build/.doctrees/markdown/toc.doctree b/tamingllms/_build/.doctrees/markdown/toc.doctree
diff --git a/tamingllms/_build/.doctrees/notebooks/alignment.doctree b/tamingllms/_build/.doctrees/notebooks/alignment.doctree
diff --git a/tamingllms/_build/.doctrees/notebooks/evals.doctree b/tamingllms/_build/.doctrees/notebooks/evals.doctree
diff --git a/tamingllms/_build/.doctrees/notebooks/output_size_limit.doctree b/tamingllms/_build/.doctrees/notebooks/output_size_limit.doctree
diff --git a/tamingllms/_build/.doctrees/notebooks/structured_output.doctree b/tamingllms/_build/.doctrees/notebooks/structured_output.doctree
diff --git a/tamingllms/_build/html/_sources/markdown/toc.md b/tamingllms/_build/html/_sources/markdown/toc.md
@@ -105,14 +105,15 @@ Abstract: *The current discourse around Large Language Models (LLMs) tends to fo
   - 6.5.1 Building a RAG Pipeline
   - 6.5.2 Testing and Validation
 
-## Chapter 7: Safety Concerns
-- 7.1 Common Safety Issues
-- 7.2 Implementation of Safety Guards
-- 7.3 Content Filtering
-- 7.4 Input Validation
-- 7.5 Output Sanitization
-- 7.6 Monitoring and Alerts
-- 7.7 Best Practices
+## Chapter 7: [Preference-based Alignment](https://www.souzatharsis.com/tamingLLMs/notebooks/alignment.html)
+- 7.1 Introduction
+- 7.2 From Raw Capabilities to Preference Alignment
+- 7.3 On the Misalignment of Language Models
+- 7.4 Aligning Language Models with Human Preferences
+- 7.5 Supervised Fine-Tuning (SFT) for Model Alignment
+- 7.6 Augmenting SFT with Human Preferences
+- 7.7 Case Study: Aligning a Language Model to a Policy
+- 7.8 Discussion
 
 ## Chapter 8: The Cost Factor
 - 8.1 Understanding LLM Costs

diff --git a/tamingllms/_build/html/_sources/notebooks/alignment.ipynb b/tamingllms/_build/html/_sources/notebooks/alignment.ipynb
@@ -254,12 +254,11 @@
     " 2. Training the model to assign higher probability to the chosen response\n",
     " 3. Minimizing the KL divergence between the original and fine-tuned model to preserve general capabilities\n",
     "\n",
-    "At a high-level DPO maximizes the probability of preferred output and minimize rejected output as defined in {eq}`dpo-loss`.\n",
+    "At a high-level DPO maximizes the probability of preferred output and minimize rejected output as defined in the following equation:\n",
     "\n",
-    "```{math}\n",
-    ":label: dpo-loss\n",
+    "\\begin{gather*}\n",
     "\\mathcal{L}_{\\text{DPO}}(\\pi_\\theta; \\pi_\\text{ref}) = -\\mathbb{E}_{(x,y_w,y_l) \\sim \\mathcal{D}} \\left[\\log \\sigma \\left(\\beta \\underbrace{\\log \\frac{\\pi_\\theta(y_w | x)}{\\pi_\\text{ref}(y_w | x)}}_{\\color{green}\\text{preferred}} - \\beta \\underbrace{\\log \\frac{\\pi_\\theta(y_l | x)}{\\pi_\\text{ref}(y_l | x)}}_{\\color{red}\\text{rejected}}\\right)\\right]\n",
-    "```\n",
+    "\\end{gather*}\n",
     "\n",
     "This approach is more straightforward than PPO, as it avoids the need for a reward model and instead uses a direct comparison of model outputs against human preferences.\n",
     "\n",

diff --git a/tamingllms/_build/html/markdown/intro.html b/tamingllms/_build/html/markdown/intro.html
@@ -208,7 +208,7 @@
 <hr>
           <div class="content" role="main" v-pre>
 
-  <section class="tex2jax_ignore mathjax_ignore" id="introduction">
+  <section id="introduction">
 <span id="intro"></span><h1><a class="toc-backref" href="#id1" role="doc-backlink"><span class="section-number">1. </span>Introduction</a><a class="headerlink" href="#introduction" title="Permalink to this heading">¶</a></h1>
 <blockquote class="epigraph">
 <div><p>I am always doing that which I cannot do, in order that I may learn how to do it.</p>
@@ -286,7 +286,7 @@ <h2><a class="toc-backref" href="#id5" role="doc-backlink"><span class="section-
 <li><p>Share their own experiences and solutions with the community</p></li>
 <li><p>Propose new chapters or sections that address emerging challenges</p></li>
 </ul>
-<p>The repository can be found at <a class="reference external" href="https://github.com/souzatharsis/tamingllms">https://github.com/souzatharsis/tamingllms</a>. Whether you’ve found a typo, have a better solution to share, or want to contribute an entirely new section, your contributions are welcome.</p>
+<p>The repository can be found at https://github.com/souzatharsis/tamingllms. Whether you’ve found a typo, have a better solution to share, or want to contribute an entirely new section, your contributions are welcome.</p>
 </section>
 <section id="a-note-on-perspective">
 <h2><a class="toc-backref" href="#id6" role="doc-backlink"><span class="section-number">1.5. </span>A Note on Perspective</a><a class="headerlink" href="#a-note-on-perspective" title="Permalink to this heading">¶</a></h2>
@@ -399,7 +399,7 @@ <h3><a class="toc-backref" href="#id14" role="doc-backlink"><span class="section
 <h2><a class="toc-backref" href="#id15" role="doc-backlink"><span class="section-number">1.10. </span>About the Author(s)</a><a class="headerlink" href="#about-the-author-s" title="Permalink to this heading">¶</a></h2>
 <p>Dr. Tharsis Souza is a computer scientist and product leader specializing in AI-based products. He is a Lecturer at Columbia University’s Master of Science program in Applied Analytics, (<em>incoming</em>) Head of Product, Equities at Citadel, and former Senior VP at Two Sigma Investments. He also enjoys mentoring under-represented students &amp; working professionals to help create a more diverse global AI ecosystem.</p>
 <p>With over 15 years of experience delivering technology products across startups and Fortune 500 companies, Dr. Souza is also an author of numerous scholarly publications and is a frequent speaker at academic and business conferences. Grounded on academic background and drawing from practical experience building and scaling up products powered by language models at early-stage startups, major institutions as well as advising non-profit organizations, and contributing to open source projects, he brings a unique perspective on bridging the gap between LLMs promised potential and their practical implementation challenges to enable the next generation of AI-powered products.</p>
-<p>Dr. Tharsis holds a Ph.D. in Computer Science from UCL, University of London following an M.Phil. and <a class="reference external" href="http://M.Sc">M.Sc</a>. in Computer Science and a <a class="reference external" href="http://B.Sc">B.Sc</a>. in Computer Engineering.</p>
+<p>Dr. Tharsis holds a Ph.D. in Computer Science from UCL, University of London following an M.Phil. and M.Sc. in Computer Science and a B.Sc. in Computer Engineering.</p>
 </section>
 </section>
 

diff --git a/tamingllms/_build/html/markdown/toc.html b/tamingllms/_build/html/markdown/toc.html
@@ -180,7 +180,7 @@
           <div class="content" role="main" v-pre>
 
   <p>Sign-up to receive updates on <a class="reference external" href="https://tamingllm.substack.com/">new Chapters here</a>.</p>
-<section class="tex2jax_ignore mathjax_ignore" id="taming-llms">
+<section id="taming-llms">
 <h1>Taming LLMs<a class="headerlink" href="#taming-llms" title="Permalink to this heading">¶</a></h1>
 <section id="a-practical-guide-to-llm-pitfalls-with-open-source-software">
 <h2><em>A Practical Guide to LLM Pitfalls with Open Source Software</em><a class="headerlink" href="#a-practical-guide-to-llm-pitfalls-with-open-source-software" title="Permalink to this heading">¶</a></h2>
@@ -336,16 +336,17 @@ <h2>Chapter 6: Hallucination: The Reality Gap<a class="headerlink" href="#chapte
 </li>
 </ul>
 </section>
-<section id="chapter-7-safety-concerns">
-<h2>Chapter 7: Safety Concerns<a class="headerlink" href="#chapter-7-safety-concerns" title="Permalink to this heading">¶</a></h2>
+<section id="chapter-7-preference-based-alignment">
+<h2>Chapter 7: <a class="reference external" href="https://www.souzatharsis.com/tamingLLMs/notebooks/alignment.html">Preference-based Alignment</a><a class="headerlink" href="#chapter-7-preference-based-alignment" title="Permalink to this heading">¶</a></h2>
 <ul class="simple">
-<li><p>7.1 Common Safety Issues</p></li>
-<li><p>7.2 Implementation of Safety Guards</p></li>
-<li><p>7.3 Content Filtering</p></li>
-<li><p>7.4 Input Validation</p></li>
-<li><p>7.5 Output Sanitization</p></li>
-<li><p>7.6 Monitoring and Alerts</p></li>
-<li><p>7.7 Best Practices</p></li>
+<li><p>7.1 Introduction</p></li>
+<li><p>7.2 From Raw Capabilities to Preference Alignment</p></li>
+<li><p>7.3 On the Misalignment of Language Models</p></li>
+<li><p>7.4 Aligning Language Models with Human Preferences</p></li>
+<li><p>7.5 Supervised Fine-Tuning (SFT) for Model Alignment</p></li>
+<li><p>7.6 Augmenting SFT with Human Preferences</p></li>
+<li><p>7.7 Case Study: Aligning a Language Model to a Policy</p></li>
+<li><p>7.8 Discussion</p></li>
 </ul>
 </section>
 <section id="chapter-8-the-cost-factor">

diff --git a/tamingllms/_build/html/notebooks/alignment.html b/tamingllms/_build/html/notebooks/alignment.html
@@ -29,8 +29,7 @@
         <script src="../_static/design-tabs.js"></script>
         <script>const THEBE_JS_URL = "https://unpkg.com/[email protected]/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
         <script async="async" src="../_static/sphinx-thebe.js"></script>
-        <script>window.MathJax = {"options": {"processHtmlClass": "tex2jax_process|mathjax_process|math|output_area"}}</script>
-        <script defer="defer" src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js"></script>
+        <script async="async" src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js"></script>
         <script type="module" src="https://cdn.jsdelivr.net/npm/[email protected]/dist/mermaid.esm.min.mjs"></script>
         <script type="module" src="https://cdn.jsdelivr.net/npm/@mermaid-js/[email protected]/dist/mermaid-layout-elk.esm.min.mjs"></script>
         <script type="module">import mermaid from "https://cdn.jsdelivr.net/npm/[email protected]/dist/mermaid.esm.min.mjs";import elkLayouts from "https://cdn.jsdelivr.net/npm/@mermaid-js/[email protected]/dist/mermaid-layout-elk.esm.min.mjs";mermaid.registerLayoutLoaders(elkLayouts);mermaid.initialize({startOnLoad:false});</script>
@@ -203,7 +202,7 @@
 <hr>
           <div class="content" role="main" v-pre>
 
-  <section class="tex2jax_ignore mathjax_ignore" id="preference-based-alignment">
+  <section id="preference-based-alignment">
 <h1><a class="toc-backref" href="#id126" role="doc-backlink"><span class="section-number">5. </span>Preference-Based Alignment</a><a class="headerlink" href="#preference-based-alignment" title="Permalink to this heading">¶</a></h1>
 <blockquote class="epigraph">
 <div><p>Move fast and be responsible.</p>
@@ -431,9 +430,11 @@ <h4><a class="toc-backref" href="#id132" role="doc-backlink"><span class="sectio
 <li><p>Training the model to assign higher probability to the chosen response</p></li>
 <li><p>Minimizing the KL divergence between the original and fine-tuned model to preserve general capabilities</p></li>
 </ol>
-<p>At a high-level DPO maximizes the probability of preferred output and minimize rejected output as defined in <a class="reference internal" href="#equation-dpo-loss">(5.1)</a>.</p>
-<div class="math notranslate nohighlight" id="equation-dpo-loss">
-<span class="eqno">(5.1)<a class="headerlink" href="#equation-dpo-loss" title="Permalink to this equation">¶</a></span>\[\mathcal{L}_{\text{DPO}}(\pi_\theta; \pi_\text{ref}) = -\mathbb{E}_{(x,y_w,y_l) \sim \mathcal{D}} \left[\log \sigma \left(\beta \underbrace{\log \frac{\pi_\theta(y_w | x)}{\pi_\text{ref}(y_w | x)}}_{\color{green}\text{preferred}} - \beta \underbrace{\log \frac{\pi_\theta(y_l | x)}{\pi_\text{ref}(y_l | x)}}_{\color{red}\text{rejected}}\right)\right]\]</div>
+<p>At a high-level DPO maximizes the probability of preferred output and minimize rejected output as defined in the following equation:</p>
+<div class="amsmath math notranslate nohighlight">
+\[\begin{gather*}
+\mathcal{L}_{\text{DPO}}(\pi_\theta; \pi_\text{ref}) = -\mathbb{E}_{(x,y_w,y_l) \sim \mathcal{D}} \left[\log \sigma \left(\beta \underbrace{\log \frac{\pi_\theta(y_w | x)}{\pi_\text{ref}(y_w | x)}}_{\color{green}\text{preferred}} - \beta \underbrace{\log \frac{\pi_\theta(y_l | x)}{\pi_\text{ref}(y_l | x)}}_{\color{red}\text{rejected}}\right)\right]
+\end{gather*}\]</div>
 <p>This approach is more straightforward than PPO, as it avoids the need for a reward model and instead uses a direct comparison of model outputs against human preferences.</p>
 <p>Modern libraries such as HuggingFace’s TRL <span id="id21">[<a class="reference internal" href="#id125" title="Hugging Face. Trl. 2024d. TRL. URL: https://huggingface.co/docs/trl/en/index.">Face, 2024d</a>]</span> offer a suite of techniques for fine-tuning language models with reinforcement learning, including PPO, and DPO. It provides a user-friendly interface and a wide range of features for fine-tuning and aligning LLMs, which will be the focus of the next section as we go through a case study.</p>
 </section>
@@ -853,7 +854,7 @@ <h4><a class="toc-backref" href="#id141" role="doc-backlink"><span class="sectio
 </div>
 <p>Recall our base model is <code class="docutils literal notranslate"><span class="pre">HuggingFaceTB/SmolLM2-360M-Instruct</span></code>. Here, we will use the HuggingFace Inference API to generate rejected responses from a cloud endpoint for enhanced performance:</p>
 <ol class="arabic simple">
-<li><p>Visit the HuggingFace Endpoints UI: <a class="reference external" href="https://ui.endpoints.huggingface.co/">https://ui.endpoints.huggingface.co/</a></p></li>
+<li><p>Visit the HuggingFace Endpoints UI: https://ui.endpoints.huggingface.co/</p></li>
 <li><p>Click “New Endpoint” and select the model <code class="docutils literal notranslate"><span class="pre">HuggingFaceTB/SmolLM2-360M-Instruct</span></code></p></li>
 <li><p>Choose the compute resources (e.g., CPU or GPU instance, GPU preferred)</p></li>
 <li><p>Configure the endpoint settings:</p>

diff --git a/tamingllms/_build/html/notebooks/evals.html b/tamingllms/_build/html/notebooks/evals.html
@@ -210,7 +210,7 @@
 <hr>
           <div class="content" role="main" v-pre>
 
-  <section class="tex2jax_ignore mathjax_ignore" id="the-evals-gap">
+  <section id="the-evals-gap">
 <h1><a class="toc-backref" href="#id120" role="doc-backlink"><span class="section-number">4. </span>The Evals Gap</a><a class="headerlink" href="#the-evals-gap" title="Permalink to this heading">¶</a></h1>
 <blockquote class="epigraph">
 <div><p>It doesn’t matter how beautiful your theory is, <br>

diff --git a/tamingllms/_build/html/notebooks/output_size_limit.html b/tamingllms/_build/html/notebooks/output_size_limit.html
@@ -202,7 +202,7 @@
 <hr>
           <div class="content" role="main" v-pre>
 
-  <section class="tex2jax_ignore mathjax_ignore" id="output-size-limitations">
+  <section id="output-size-limitations">
 <h1><a class="toc-backref" href="#id85" role="doc-backlink"><span class="section-number">2. </span>Output Size Limitations</a><a class="headerlink" href="#output-size-limitations" title="Permalink to this heading">¶</a></h1>
 <blockquote class="epigraph">
 <div><p>Only those who will risk going too far can possibly find out how far one can go.</p>