Skip to content

Commit

Permalink
ready for review: Chapter 7: Breaking Free from Cloud-Based Models
Browse files Browse the repository at this point in the history
  • Loading branch information
souzatharsis committed Dec 23, 2024
1 parent c16db0d commit 06f8bdf
Show file tree
Hide file tree
Showing 74 changed files with 3,027 additions and 868 deletions.
26 changes: 14 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,21 +10,23 @@ Please [open an issue](https://github.com/souzatharsis/tamingLLMs/issues) with y
*Publication Date: February 2, 2025*
### *A Practical Guide to LLM Pitfalls with Open Source Software*

Abstract: *The current discourse around Large Language Models (LLMs) tends to focus heavily on their capabilities while glossing over fundamental challenges. Conversely, this book takes a critical look at the key limitations and implementation pitfalls that engineers and technical product managers encounter when building LLM-powered applications. Through practical Python examples and proven open source solutions, it provides an introductory yet comprehensive guide for navigating these challenges. The focus is on concrete problems - from handling unstructured output to managing context windows - with reproducible code examples and battle-tested open source tools. By understanding these pitfalls upfront, readers will be better equipped to build products that harness the power of LLMs while sidestepping their inherent limitations.*
Abstract: *The current discourse around Large Language Models (LLMs) tends to focus heavily on their capabilities while glossing over fundamental challenges. Conversely, this book takes a critical look at the key limitations and implementation pitfalls that engineers and technical leaders encounter when building LLM-powered applications. Through practical Python examples and proven open source solutions, it provides an introductory yet comprehensive guide for navigating these challenges. The focus is on concrete problems with reproducible code examples and battle-tested open source tools.

By understanding these pitfalls upfront, readers will be better equipped to build products that harness the power of LLMs while sidestepping their inherent limitations.*

| Chapter | Website | Notebook | Status |
|-------------------------------------------|--------------|---------------|----------------------|
| Preface | [html](https://www.souzatharsis.com/tamingLLMs/markdown/preface.html) | N/A | *Ready for Review* |
| Chapter 1: Introduction | [html](https://www.souzatharsis.com/tamingLLMs/markdown/intro.html) | N/A | *Ready for Review* |
| Chapter 2: Wrestling with Structured Output| [html](https://www.souzatharsis.com/tamingLLMs/notebooks/structured_output.html) | [ipynb](https://github.com/souzatharsis/tamingLLMs/blob/master/tamingllms/notebooks/structured_output.ipynb) | *Ready for Review* |
| Chapter 3: The Input Data Challenge | | | |
| Chapter 4: Output Size Limitations | [html](https://www.souzatharsis.com/tamingLLMs/notebooks/output_size_limit.html) | [ipynb](https://github.com/souzatharsis/tamingLLMs/blob/master/tamingllms/notebooks/output_size_limit.ipynb) | *Ready for Review* |
| Chapter 5: The Evals Gap | [html](https://www.souzatharsis.com/tamingLLMs/notebooks/evals.html) | [ipynb](https://github.com/souzatharsis/tamingLLMs/blob/master/tamingllms/notebooks/evals.ipynb) | *Ready for Review* |
| Chapter 6: Safety Concerns | [html](https://www.souzatharsis.com/tamingLLMs/notebooks/safety.html) | [ipynb](https://github.com/souzatharsis/tamingLLMs/blob/master/tamingllms/notebooks/safety.ipynb) | *Ready for Review* |
| Chapter 7: Preference-Based Alignment | [html](https://www.souzatharsis.com/tamingLLMs/notebooks/alignment.html) | [ipynb](https://github.com/souzatharsis/tamingLLMs/blob/master/tamingllms/notebooks/alignment.ipynb) | *Ready for Review* |
| Chapter 8: Breaking Free from Cloud Providers | [html](https://www.souzatharsis.com/tamingLLMs/notebooks/local.html) | [ipynb](https://github.com/souzatharsis/tamingLLMs/blob/master/tamingllms/notebooks/local.ipynb) | WIP |
| Chapter 9: The Cost Factor | | | |
| Chapter 10: Frontiers | | | |
| Foreword | [html](https://www.souzatharsis.com/tamingLLMs/markdown/preface.html) | N/A | *Ready for Review* |
| Preface | [html](https://www.souzatharsis.com/tamingLLMs/markdown/intro.html) | N/A | *Ready for Review* |
| Chapter 1: Wrestling with Structured Output| [html](https://www.souzatharsis.com/tamingLLMs/notebooks/structured_output.html) | [ipynb](https://github.com/souzatharsis/tamingLLMs/blob/master/tamingllms/notebooks/structured_output.ipynb) | *Ready for Review* |
| Chapter 2: The Input Data Challenge | | | |
| Chapter 3: Output Size Limitations | [html](https://www.souzatharsis.com/tamingLLMs/notebooks/output_size_limit.html) | [ipynb](https://github.com/souzatharsis/tamingLLMs/blob/master/tamingllms/notebooks/output_size_limit.ipynb) | *Ready for Review* |
| Chapter 4: The Evals Gap | [html](https://www.souzatharsis.com/tamingLLMs/notebooks/evals.html) | [ipynb](https://github.com/souzatharsis/tamingLLMs/blob/master/tamingllms/notebooks/evals.ipynb) | *Ready for Review* |
| Chapter 5: Safety Concerns | [html](https://www.souzatharsis.com/tamingLLMs/notebooks/safety.html) | [ipynb](https://github.com/souzatharsis/tamingLLMs/blob/master/tamingllms/notebooks/safety.ipynb) | *Ready for Review* |
| Chapter 6: Preference-Based Alignment | [html](https://www.souzatharsis.com/tamingLLMs/notebooks/alignment.html) | [ipynb](https://github.com/souzatharsis/tamingLLMs/blob/master/tamingllms/notebooks/alignment.ipynb) | *Ready for Review* |
| Chapter 7: Breaking Free from Cloud-Based Models | [html](https://www.souzatharsis.com/tamingLLMs/notebooks/local.html) | [ipynb](https://github.com/souzatharsis/tamingLLMs/blob/master/tamingllms/notebooks/local.ipynb) | *Ready for Review* |
| Chapter 8: The Cost Factor | | | |
| Chapter 9: Frontiers | | | |
| Appendix A: Tools and Resources | | | |


Expand Down
Binary file modified tamingllms/_build/.doctrees/environment.pickle
Binary file not shown.
Binary file modified tamingllms/_build/.doctrees/markdown/preface.doctree
Binary file not shown.
Binary file modified tamingllms/_build/.doctrees/markdown/toc.doctree
Binary file not shown.
Binary file modified tamingllms/_build/.doctrees/notebooks/alignment.doctree
Binary file not shown.
Binary file modified tamingllms/_build/.doctrees/notebooks/evals.doctree
Binary file not shown.
Binary file modified tamingllms/_build/.doctrees/notebooks/local.doctree
Binary file not shown.
Binary file modified tamingllms/_build/.doctrees/notebooks/output_size_limit.doctree
Binary file not shown.
Binary file modified tamingllms/_build/.doctrees/notebooks/safety.doctree
Binary file not shown.
Binary file modified tamingllms/_build/.doctrees/notebooks/structured_output.doctree
Binary file not shown.
Binary file added tamingllms/_build/html/_images/downloads.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added tamingllms/_build/html/_images/latency.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
116 changes: 116 additions & 0 deletions tamingllms/_build/html/_images/model_types.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added tamingllms/_build/html/_images/p1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added tamingllms/_build/html/_images/p2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added tamingllms/_build/html/_images/perf_.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added tamingllms/_build/html/_images/qwen_perf.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added tamingllms/_build/html/_images/task_number.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
26 changes: 13 additions & 13 deletions tamingllms/_build/html/_sources/markdown/toc.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,33 +14,33 @@ Sign-up to receive updates on [new Chapters here](https://tamingllm.substack.com
# [Taming LLMs](https://www.souzatharsis.com/tamingLLMs)
## *A Practical Guide to LLM Pitfalls with Open Source Software*

Abstract: *The current discourse around Large Language Models (LLMs) tends to focus heavily on their capabilities while glossing over fundamental challenges. Conversely, this book takes a critical look at the key limitations and implementation pitfalls that engineers and technical product managers encounter when building LLM-powered applications. Through practical Python examples and proven open source solutions, it provides an introductory yet comprehensive guide for navigating these challenges. The focus is on concrete problems - from handling unstructured output to managing context windows - with reproducible code examples and battle-tested open source tools. By understanding these pitfalls upfront, readers will be better equipped to build products that harness the power of LLMs while sidestepping their inherent limitations.*
Abstract: *The current discourse around Large Language Models (LLMs) tends to focus heavily on their capabilities while glossing over fundamental challenges. Conversely, this book takes a critical look at the key limitations and implementation pitfalls that engineers and technical leaders encounter when building LLM-powered applications. Through practical Python examples and proven open source solutions, it provides an introductory yet comprehensive guide for navigating these challenges. The focus is on concrete problems with reproducible code examples and battle-tested open source tools. By understanding these pitfalls upfront, readers will be better equipped to build products that harness the power of LLMs while sidestepping their inherent limitations.*

## [Preface](https://www.souzatharsis.com/tamingLLMs/markdown/preface.html)
## [Foreword](https://www.souzatharsis.com/tamingLLMs/markdown/preface.html)

## [Chapter 1: Introduction](https://www.souzatharsis.com/tamingLLMs/markdown/intro.html)
## [Preface](https://www.souzatharsis.com/tamingLLMs/markdown/intro.html)

## [Chapter 2: Wrestling with Structured Output](https://www.souzatharsis.com/tamingLLMs/notebooks/structured_output.html)
## [Chapter 1: Wrestling with Structured Output](https://www.souzatharsis.com/tamingLLMs/notebooks/structured_output.html)

## Chapter 3: Input Data Challenge
## Chapter 2: Input Data Challenge

## [Chapter 4: Output Size and Length Limitations](https://www.souzatharsis.com/tamingLLMs/notebooks/output_size_limit.html)
## [Chapter 3: Output Size and Length Limitations](https://www.souzatharsis.com/tamingLLMs/notebooks/output_size_limit.html)

## [Chapter 5: The Evals Gap](https://www.souzatharsis.com/tamingLLMs/notebooks/evals.html)
## [Chapter 4: The Evals Gap](https://www.souzatharsis.com/tamingLLMs/notebooks/evals.html)

## [Chapter 6: Safety Concerns](https://www.souzatharsis.com/tamingLLMs/notebooks/safety.html)
## [Chapter 5: Safety Concerns](https://www.souzatharsis.com/tamingLLMs/notebooks/safety.html)

## [Chapter 7: Preference-based Alignment](https://www.souzatharsis.com/tamingLLMs/notebooks/alignment.html)
## [Chapter 6: Preference-based Alignment](https://www.souzatharsis.com/tamingLLMs/notebooks/alignment.html)

## [Chapter 8: Breaking Free from Cloud Providers](https://www.souzatharsis.com/tamingLLMs/notebooks/local.html)
## [Chapter 7: Breaking Free from Cloud-Based Models](https://www.souzatharsis.com/tamingLLMs/notebooks/local.html)

## Chapter 9: The Cost Factor
## Chapter 8: The Cost Factor

## Chapter 10: Frontiers
## Chapter 9: Frontiers

## Appendix A: Tools and Resources

## Citation

[![CC BY-NC-SA 4.0][cc-by-nc-sa-image]][cc-by-nc-sa]

[cc-by-nc-sa]: http://creativecommons.org/licenses/by-nc-sa/4.0/
Expand Down
9 changes: 4 additions & 5 deletions tamingllms/_build/html/_sources/notebooks/alignment.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -414,6 +414,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"(alignment-case-study)=\n",
"## Case Study: Aligning a Language Model to a Policy\n",
"\n",
"In this case study, we will align a language model to a policy. The policy is a set of principles and rules that we want the language model to adhere to. All methodology and code available solves this general problem of policy-based alignment. However, we will describe a specific case study to illustrate our approach.\n",
Expand All @@ -427,9 +428,7 @@
"2. Fine-tuning a base model using Direct Preference Optimization (DPO)\n",
"3. Evaluating the aligned model against the base model and measuring alignment with Acme Inc.'s educational policies\n",
"\n",
"\n",
"### Introduction\n",
"#### Experimental Setup\n",
"### Experimental Setup\n",
"\n",
"We will use the following base model: `HuggingFaceTB/SmolLM2-360M-Instruct` {cite}`smollm2024model`, a compact open source language model that is part of the SmolLM2 family published by HuggingFace.\n",
"\n",
Expand All @@ -448,15 +447,15 @@
"OPENAI_API_KEY=<YOUR_OPENAI_API_KEY>\n",
"```\n",
"\n",
"#### Deliverables\n",
"### Deliverables\n",
"\n",
"As a result, we will have:\n",
"\n",
"* `smolK-12`, a fine-tuned model aligned with Acme Inc.'s policy \n",
"* A DPO-based reusable dataset capturing policy preferences\n",
"* Evaluation metrics to measure alignment\n",
"\n",
"#### A Note on smolLM2 Models\n",
"### A Note on smolLM2 Models\n",
"\n",
"Since we have decided to anchor our Case Study on HuggingFace's SmolLM2 models {cite}`smollm2024`, it is worth providing a reason for this choice.\n",
"\n",
Expand Down
1 change: 1 addition & 0 deletions tamingllms/_build/html/_sources/notebooks/evals.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"(evals)=\n",
"# The Evals Gap\n",
"```{epigraph}\n",
"It doesn't matter how beautiful your theory is, <br>\n",
Expand Down
Loading

0 comments on commit 06f8bdf

Please sign in to comment.