Skip to content

Commit

Permalink
update output size limit
Browse files Browse the repository at this point in the history
  • Loading branch information
souzatharsis committed Dec 7, 2024
1 parent 06ddf6e commit 44de3e0
Show file tree
Hide file tree
Showing 13 changed files with 141 additions and 126 deletions.
Binary file modified tamingllms/_build/.doctrees/environment.pickle
Binary file not shown.
Binary file modified tamingllms/_build/.doctrees/notebooks/evals.doctree
Binary file not shown.
Binary file modified tamingllms/_build/.doctrees/notebooks/output_size_limit.doctree
Binary file not shown.
Binary file modified tamingllms/_build/.doctrees/notebooks/structured_output.doctree
Binary file not shown.
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Here, we will utilize `langchain` for a content-aware sentence-splitting strategy for chunking. We will use the `CharacterTextSplitter` with `tiktoken` as our tokenizer to count the number of tokens per chunk which we can use to ensure that we do not surpass the input token limit of our model."
"Here, we will utilize `langchain` for a content-aware sentence-splitting strategy for chunking. Langchain offers several text splitters {cite}`langchain_text_splitters` such as JSON-, Markdown- and HTML-based or split by token. We will use the `CharacterTextSplitter` with `tiktoken` as our tokenizer to count the number of tokens per chunk which we can use to ensure that we do not surpass the input token limit of our model."
]
},
{
Expand Down Expand Up @@ -471,8 +471,9 @@
"\n",
"\n",
"## References\n",
"\n",
"- [LangChain Text Splitter](https://langchain.readthedocs.io/en/latest/modules/text_splitter.html)."
"```{bibliography}\n",
":filter: docname in docnames\n",
"```"
]
},
{
Expand Down
80 changes: 40 additions & 40 deletions tamingllms/_build/html/notebooks/evals.html

Large diffs are not rendered by default.

72 changes: 38 additions & 34 deletions tamingllms/_build/html/notebooks/output_size_limit.html

Large diffs are not rendered by default.

80 changes: 40 additions & 40 deletions tamingllms/_build/html/notebooks/structured_output.html

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion tamingllms/_build/html/searchindex.js

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion tamingllms/_build/jupyter_execute/markdown/intro.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
"cells": [
{
"cell_type": "markdown",
"id": "8e40cf5d",
"id": "cc6b3dc4",
"metadata": {},
"source": [
"(intro)=\n",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Here, we will utilize `langchain` for a content-aware sentence-splitting strategy for chunking. We will use the `CharacterTextSplitter` with `tiktoken` as our tokenizer to count the number of tokens per chunk which we can use to ensure that we do not surpass the input token limit of our model."
"Here, we will utilize `langchain` for a content-aware sentence-splitting strategy for chunking. Langchain offers several text splitters {cite}`langchain_text_splitters` such as JSON-, Markdown- and HTML-based or split by token. We will use the `CharacterTextSplitter` with `tiktoken` as our tokenizer to count the number of tokens per chunk which we can use to ensure that we do not surpass the input token limit of our model."
]
},
{
Expand Down Expand Up @@ -471,8 +471,9 @@
"\n",
"\n",
"## References\n",
"\n",
"- [LangChain Text Splitter](https://langchain.readthedocs.io/en/latest/modules/text_splitter.html)."
"```{bibliography}\n",
":filter: docname in docnames\n",
"```"
]
},
{
Expand Down
7 changes: 4 additions & 3 deletions tamingllms/notebooks/output_size_limit.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Here, we will utilize `langchain` for a content-aware sentence-splitting strategy for chunking. We will use the `CharacterTextSplitter` with `tiktoken` as our tokenizer to count the number of tokens per chunk which we can use to ensure that we do not surpass the input token limit of our model."
"Here, we will utilize `langchain` for a content-aware sentence-splitting strategy for chunking. Langchain offers several text splitters {cite}`langchain_text_splitters` such as JSON-, Markdown- and HTML-based or split by token. We will use the `CharacterTextSplitter` with `tiktoken` as our tokenizer to count the number of tokens per chunk which we can use to ensure that we do not surpass the input token limit of our model."
]
},
{
Expand Down Expand Up @@ -471,8 +471,9 @@
"\n",
"\n",
"## References\n",
"\n",
"- [LangChain Text Splitter](https://langchain.readthedocs.io/en/latest/modules/text_splitter.html)."
"```{bibliography}\n",
":filter: docname in docnames\n",
"```"
]
},
{
Expand Down
10 changes: 9 additions & 1 deletion tamingllms/references.bib
Original file line number Diff line number Diff line change
Expand Up @@ -227,4 +227,12 @@ @article{long2024llms
author={Long, Do Xuan and Ngoc, Hai Nguyen and Sim, Tiviatis and Dao, Hieu and Joty, Shafiq and Kawaguchi, Kenji and Chen, Nancy F and Kan, Min-Yen},
journal={arXiv preprint arXiv:2408.08656},
year={2024}
}
}

@misc{langchain_text_splitters,
title={Text Splitters - LangChain Documentation},
author={{LangChain}},
year={2024},
howpublished={\url{https://python.langchain.com/docs/how_to/#text-splitters}},
note={Accessed: 12/07/2024}
}

0 comments on commit 44de3e0

Please sign in to comment.