Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Late chunking #257

Closed
wants to merge 155 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
155 commits
Select commit Hold shift + click to select a range
c9c2373
assests and app name
PrashantDixit0 Feb 6, 2024
433fde0
update README
PrashantDixit0 Feb 6, 2024
306c75e
Merge branch 'lancedb:main' into main
PrashantDixit0 Feb 8, 2024
6f85869
demo gifs
PrashantDixit0 Feb 10, 2024
b7ac9b2
Merge branch 'lancedb:main' into main
PrashantDixit0 Feb 15, 2024
8b66fdd
talk with github codespaces
PrashantDixit0 Feb 15, 2024
55d01de
Merge branch 'main' of github.com:PrashantDixit0/vectordb-recipes
PrashantDixit0 Feb 15, 2024
1600106
talk with github codespaces
PrashantDixit0 Feb 15, 2024
5895c96
gitignore
PrashantDixit0 Feb 16, 2024
eb4eae7
linted
PrashantDixit0 Feb 17, 2024
35fd84f
added version
PrashantDixit0 Feb 21, 2024
ef8b2be
link fix
PrashantDixit0 Feb 23, 2024
17cd7f1
link fix
PrashantDixit0 Feb 23, 2024
a561df0
added local llm tag
PrashantDixit0 Feb 25, 2024
4d772a9
Merge branch 'main' of github.com:PrashantDixit0/vectordb-recipes
PrashantDixit0 Feb 25, 2024
01d4964
Merge branch 'lancedb:main' into main
PrashantDixit0 Feb 25, 2024
71b6fe2
crag
PrashantDixit0 Feb 25, 2024
dc2f214
Merge branch 'main' of github.com:PrashantDixit0/vectordb-recipes
PrashantDixit0 Feb 25, 2024
ea854fa
link fix
PrashantDixit0 Feb 25, 2024
85a8e8b
lint
PrashantDixit0 Feb 25, 2024
c2313fd
Merge branch 'main' of github.com:PrashantDixit0/vectordb-recipes
PrashantDixit0 Feb 26, 2024
1750478
llm tags
PrashantDixit0 Feb 26, 2024
f8672ae
non-clickable badge
PrashantDixit0 Feb 27, 2024
3ede6c0
non-clickable badge
PrashantDixit0 Feb 27, 2024
5b00604
non-clickable badge
PrashantDixit0 Feb 27, 2024
e54f95d
fix
PrashantDixit0 Feb 27, 2024
964139b
Merge branch 'main' of github.com:PrashantDixit0/vectordb-recipes
PrashantDixit0 Feb 28, 2024
f91482b
tutorial llm tags
PrashantDixit0 Feb 28, 2024
6842b72
Merge branch 'main' of github.com:PrashantDixit0/vectordb-recipes
PrashantDixit0 Mar 1, 2024
c2be50b
added instructions and fix
PrashantDixit0 Mar 1, 2024
0ab5372
Merge branch 'lancedb:main' into main
PrashantDixit0 Mar 2, 2024
1e71e1a
colab fix
PrashantDixit0 Mar 2, 2024
a005a42
Merge branch 'main' of github.com:PrashantDixit0/vectordb-recipes
PrashantDixit0 Mar 13, 2024
1609b05
fix
PrashantDixit0 Mar 13, 2024
208aa06
formatted
PrashantDixit0 Mar 13, 2024
de5b31d
hybrid search and rag colab
PrashantDixit0 Mar 13, 2024
871d4f0
colab format
PrashantDixit0 Mar 13, 2024
56e5571
python test
PrashantDixit0 Mar 13, 2024
ea0dad6
node test
PrashantDixit0 Mar 13, 2024
f7637a1
python test
PrashantDixit0 Mar 13, 2024
cfefcd9
blog link update
PrashantDixit0 Mar 17, 2024
d549b2c
blog
PrashantDixit0 Mar 17, 2024
6c28504
rag mlx
PrashantDixit0 Mar 20, 2024
732ae4e
Merge branch 'main' of github.com:PrashantDixit0/vectordb-recipes
PrashantDixit0 Mar 23, 2024
ff5cdf5
myntra search engine app
PrashantDixit0 Mar 23, 2024
a15c763
link fix
PrashantDixit0 Mar 24, 2024
754413c
CrewAI Example
PrashantDixit0 Mar 27, 2024
3ff4f50
Merge branch 'main' of github.com:PrashantDixit0/vectordb-recipes
PrashantDixit0 Mar 27, 2024
ea9badb
lint
PrashantDixit0 Mar 27, 2024
6d93933
node test
PrashantDixit0 Mar 27, 2024
be65606
node test
PrashantDixit0 Mar 27, 2024
538cabc
node test
PrashantDixit0 Mar 27, 2024
27b497b
Merge branch 'main' of github.com:PrashantDixit0/vectordb-recipes
PrashantDixit0 Mar 27, 2024
875fdde
Merge branch 'lancedb:main' into main
PrashantDixit0 Mar 31, 2024
297de9c
added readme
PrashantDixit0 Mar 31, 2024
416a71c
Merge branch 'lancedb:main' into main
PrashantDixit0 Apr 5, 2024
5c6f72d
support for Gemini Pro
PrashantDixit0 Apr 5, 2024
fed4be9
fix
PrashantDixit0 Apr 8, 2024
d12eeca
Merge branch 'lancedb:main' into main
PrashantDixit0 Apr 11, 2024
f93218a
chunking techniques
PrashantDixit0 Apr 18, 2024
ec2370a
lint
PrashantDixit0 Apr 18, 2024
f8a9c54
Merge branch 'lancedb:main' into main
PrashantDixit0 Apr 21, 2024
816f1ec
Locally RAG from Scratch
PrashantDixit0 Apr 21, 2024
4c7f42a
lint
PrashantDixit0 Apr 21, 2024
5c20f98
llama3 added
PrashantDixit0 Apr 21, 2024
1801ddd
link finx
PrashantDixit0 Apr 21, 2024
e070896
Merge branch 'main' of github.com:PrashantDixit0/vectordb-recipes
PrashantDixit0 Apr 24, 2024
1cec0d8
sdk manual cli chatbot phidata
PrashantDixit0 Apr 24, 2024
c6578cb
sdk manual cli chatbot phidata
PrashantDixit0 Apr 24, 2024
8cee229
link fix
PrashantDixit0 Apr 24, 2024
6b416ab
Merge branch 'main' of github.com:PrashantDixit0/vectordb-recipes
PrashantDixit0 Apr 25, 2024
53803a4
tags
PrashantDixit0 Apr 25, 2024
e7a3b1e
advanced
PrashantDixit0 Apr 26, 2024
c272e7a
update readme
PrashantDixit0 Apr 26, 2024
299b14a
Update README
PrashantDixit0 Apr 26, 2024
1b35bf2
remove key
PrashantDixit0 Apr 27, 2024
5cecc19
lint
PrashantDixit0 Apr 27, 2024
886f069
lint
PrashantDixit0 Apr 27, 2024
6b81c5b
link fix
PrashantDixit0 May 13, 2024
2290b4a
broken link fix
PrashantDixit0 May 13, 2024
2095456
formatting fixes
PrashantDixit0 May 26, 2024
5258e6b
formatting fixes
PrashantDixit0 May 26, 2024
c5cce3b
lint
PrashantDixit0 May 26, 2024
b4d9ba5
updated image
PrashantDixit0 May 27, 2024
cfc1138
added demo image
PrashantDixit0 May 27, 2024
e7d7996
change autogen notebook
PrashantDixit0 May 28, 2024
981fea6
lint
PrashantDixit0 May 28, 2024
1b65b84
Merge branch 'main' of github.com:PrashantDixit0/vectordb-recipes
PrashantDixit0 May 28, 2024
3b28aad
lint
PrashantDixit0 May 28, 2024
a3f0404
rag evaluation with ragas
PrashantDixit0 May 29, 2024
735231b
README update
PrashantDixit0 Jun 3, 2024
00483e4
broken link fix
PrashantDixit0 Jun 3, 2024
a023440
Restructured README
PrashantDixit0 Jun 6, 2024
ec82f13
updated titles
PrashantDixit0 Jun 6, 2024
3d549b6
restructed README
PrashantDixit0 Jun 7, 2024
feb8127
changes
PrashantDixit0 Jun 7, 2024
6eccb04
Merge branch 'main' of github.com:PrashantDixit0/vectordb-recipes
PrashantDixit0 Jun 8, 2024
80d2b3a
sectional description
PrashantDixit0 Jun 9, 2024
f8a808f
lint
PrashantDixit0 Jun 9, 2024
0837a64
Merge branch 'lancedb:main' into main
PrashantDixit0 Jun 17, 2024
fa5c411
dataset with Instructor
PrashantDixit0 Jun 17, 2024
ea43855
Merge branch 'main' of github.com:PrashantDixit0/vectordb-recipes
PrashantDixit0 Jun 17, 2024
5df0f36
lint
PrashantDixit0 Jun 17, 2024
afccc89
remove blog link
PrashantDixit0 Jun 17, 2024
0dbb293
img update
PrashantDixit0 Jun 17, 2024
6b58eb2
broken link
PrashantDixit0 Jun 17, 2024
d449a17
structured dataset using Instructor
PrashantDixit0 Jun 19, 2024
cf6363a
lint
PrashantDixit0 Jun 19, 2024
8d35cc5
lint test update
PrashantDixit0 Jun 19, 2024
63e9b9b
Merge branch 'main' of github.com:PrashantDixit0/vectordb-recipes
PrashantDixit0 Jun 29, 2024
882776e
lint workflow test
PrashantDixit0 Jul 1, 2024
eb776aa
Merge branch 'main' of github.com:PrashantDixit0/vectordb-recipes
PrashantDixit0 Jul 1, 2024
5573424
linting
PrashantDixit0 Jul 1, 2024
1466151
linting workflow
PrashantDixit0 Jul 1, 2024
eb234c5
linting workflow test
PrashantDixit0 Jul 1, 2024
f7ed743
linting workflow test
PrashantDixit0 Jul 1, 2024
b123719
linting workflow test
PrashantDixit0 Jul 1, 2024
08d6c80
linting workflow test
PrashantDixit0 Jul 1, 2024
0cba22a
linting workflow test
PrashantDixit0 Jul 1, 2024
652a55d
Merge branch 'lancedb:main' into main
PrashantDixit0 Jul 6, 2024
d9768fc
added blog link
PrashantDixit0 Jul 6, 2024
820f7e8
updated readme
PrashantDixit0 Jul 6, 2024
0942524
lint
PrashantDixit0 Jul 23, 2024
4f216d3
Merge branch 'main' of github.com:PrashantDixit0/vectordb-recipes
PrashantDixit0 Jul 26, 2024
a7432b0
update Readme
PrashantDixit0 Jul 26, 2024
1c39614
Merge branch 'main' of github.com:PrashantDixit0/vectordb-recipes
PrashantDixit0 Jul 28, 2024
65a3327
data source link
PrashantDixit0 Jul 28, 2024
33aa846
data source link
PrashantDixit0 Jul 28, 2024
453efcc
fixes
PrashantDixit0 Jul 28, 2024
fea0bdf
fix
PrashantDixit0 Jul 28, 2024
36fb46b
object detection with CLIP
PrashantDixit0 Jul 30, 2024
b29ce89
cambrian kaggle link update
PrashantDixit0 Jul 30, 2024
b5da5a2
Merge branch 'main' of github.com:PrashantDixit0/vectordb-recipes
PrashantDixit0 Aug 5, 2024
994ed2f
Merge branch 'main' of github.com:PrashantDixit0/vectordb-recipes
PrashantDixit0 Aug 15, 2024
0a4f83f
Merge branch 'main' of github.com:PrashantDixit0/vectordb-recipes
PrashantDixit0 Aug 16, 2024
c2207bc
Merge branch 'main' of github.com:PrashantDixit0/vectordb-recipes
PrashantDixit0 Sep 26, 2024
a6dd5c4
archived examples and applications
PrashantDixit0 Sep 29, 2024
9d43474
lint
PrashantDixit0 Sep 29, 2024
454994b
added section filtering
PrashantDixit0 Sep 29, 2024
4b247dc
section with descriptions
PrashantDixit0 Sep 29, 2024
c9f7219
llama3.2 example
PrashantDixit0 Sep 30, 2024
f7f83d5
Merge branch 'main' of github.com:PrashantDixit0/vectordb-recipes int…
PrashantDixit0 Sep 30, 2024
17bb4c2
remove key
PrashantDixit0 Sep 30, 2024
7c353ee
remove huggingface key
PrashantDixit0 Sep 30, 2024
9833c6a
contextual rag
PrashantDixit0 Oct 8, 2024
0abed6c
lint
PrashantDixit0 Oct 8, 2024
cab2314
link fix
PrashantDixit0 Oct 8, 2024
af6c930
context enrichment window
PrashantDixit0 Oct 16, 2024
f788fd4
lint
PrashantDixit0 Oct 16, 2024
17f51c7
blog link
PrashantDixit0 Oct 16, 2024
9756959
Merge branch 'main' of github.com:PrashantDixit0/vectordb-recipes int…
PrashantDixit0 Oct 19, 2024
13f6066
Merge branch 'main' of github.com:PrashantDixit0/vectordb-recipes int…
PrashantDixit0 Oct 26, 2024
e00615a
assistant bot with openai swarm
PrashantDixit0 Oct 26, 2024
3aee2db
lint
PrashantDixit0 Oct 26, 2024
607e7df
colab
PrashantDixit0 Oct 29, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,8 @@ Develop a Retrieval-Augmented Generation (RAG) application using LanceDB for eff
| [Instruct-Multitask](./examples/instruct-multitask) | <a href="https://colab.research.google.com/github/lancedb/vectordb-recipes/blob/main/examples/instruct-multitask/main.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"></a> [![Python](https://img.shields.io/badge/python-3670A0?style=for-the-badge&logo=python&logoColor=ffdd54)](./examples/instruct-multitask/main.py) [![LLM](https://img.shields.io/badge/local-llm-green)](#) [![beginner](https://img.shields.io/badge/beginner-B5FF33)](#)|[![Ghost](https://img.shields.io/badge/ghost-000?style=for-the-badge&logo=ghost&logoColor=%23F7DF1E)](https://blog.lancedb.com/multitask-embedding-with-lancedb-be18ec397543)|
| [Improve RAG with HyDE](/examples/Advance-RAG-with-HyDE/) | <a href="https://colab.research.google.com/github/lancedb/vectordb-recipes/blob/main/examples/Advance-RAG-with-HyDE/main.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"></a> [![LLM](https://img.shields.io/badge/openai-api-white)](#) [![intermediate](https://img.shields.io/badge/intermediate-FFDA33)](#)|[![Ghost](https://img.shields.io/badge/ghost-000?style=for-the-badge&logo=ghost&logoColor=%23F7DF1E)](https://blog.lancedb.com/advanced-rag-precise-zero-shot-dense-retrieval-with-hyde-0946c54dfdcb)|
| [Improve RAG with LOTR ](/examples/Advance_RAG_LOTR/) | <a href="https://colab.research.google.com/github/lancedb/vectordb-recipes/blob/main/examples/Advance_RAG_LOTR/main.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"></a> [![LLM](https://img.shields.io/badge/openai-api-white)](#) [![intermediate](https://img.shields.io/badge/intermediate-FFDA33)](#)|[![Ghost](https://img.shields.io/badge/ghost-000?style=for-the-badge&logo=ghost&logoColor=%23F7DF1E)](https://blog.lancedb.com/better-rag-with-lotr-lord-of-retriever-23c8336b9a35)|
| [Advanced RAG: Context Enrichment Window](./examples/Advanced_RAG_Context_Enrichment_Window/) | <a href="https://colab.research.google.com/github/lancedb/vectordb-recipes/blob/main/examples/Advanced_RAG_Context_Enrichment_Window/Advanced_RAG_Context_Enrichment_Window.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"></a> [![LLM](https://img.shields.io/badge/openai-api-white)](#) [![intermediate](https://img.shields.io/badge/intermediate-FFDA33)](#)||
| [Advanced RAG: Context Enrichment Window](./examples/Advanced_RAG_Context_Enrichment_Window/) | <a href="https://colab.research.google.com/github/lancedb/vectordb-recipes/blob/main/examples/Advanced_RAG_Context_Enrichment_Window/Advanced_RAG_Context_Enrichment_Window.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"></a> [![LLM](https://img.shields.io/badge/openai-api-white)](#) [![intermediate](https://img.shields.io/badge/intermediate-FFDA33)](#)|[![Ghost](https://img.shields.io/badge/ghost-000?style=for-the-badge&logo=ghost&logoColor=%23F7DF1E)](https://blog.lancedb.com/https://blog.lancedb.com/advanced-rag-context-enrichment-window/)|
| [Advanced RAG: Late Chunking](./examples/Advanced_RAG_Late_Chunking/) | <a href="https://colab.research.google.com/github/lancedb/vectordb-recipes/blob/main/examples/Advanced_RAG_Late_Chunking/Late_Chunking_(Chunked_Pooling).ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"></a> [![LLM](https://img.shields.io/badge/openai-api-white)](#) [![intermediate](https://img.shields.io/badge/intermediate-FFDA33)](#)|[![Ghost](https://img.shields.io/badge/ghost-000?style=for-the-badge&logo=ghost&logoColor=%23F7DF1E)](https://blog.lancedb.com/https://blog.lancedb.com/advanced-rag-context-enrichment-window/)|
| [Advanced RAG: Parent Document Retriever](/examples/parent_document_retriever/) | <a href="https://colab.research.google.com/github/lancedb/vectordb-recipes/blob/main/examples/parent_document_retriever/main.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"></a> [![LLM](https://img.shields.io/badge/openai-api-white)](#) [![intermediate](https://img.shields.io/badge/intermediate-FFDA33)](#)|[![Ghost](https://img.shields.io/badge/ghost-000?style=for-the-badge&logo=ghost&logoColor=%23F7DF1E)](https://blog.lancedb.com/modified-rag-parent-document-bigger-chunk-retriever-62b3d1e79bc6)|
| [Corrective RAG with Langgraph](./tutorials/Corrective-RAG-with_Langgraph/) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/lancedb/vectordb-recipes/blob/main/tutorials/Corrective-RAG-with_Langgraph/CRAG_with_Langgraph.ipynb) [![LLM](https://img.shields.io/badge/openai-api-white)](#) [![intermediate](https://img.shields.io/badge/intermediate-FFDA33)](#)| [![Ghost](https://img.shields.io/badge/ghost-000?style=for-the-badge&logo=ghost&logoColor=%23F7DF1E)](https://blog.lancedb.com/implementing-corrective-rag-in-the-easiest-way-2/)|
| [Contextual-Compression-with-RAG](/examples/Contextual-Compression-with-RAG/) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/lancedb/vectordb-recipes/blob/main/examples/Contextual-Compression-with-RAG/main.ipynb) [![local LLM](https://img.shields.io/badge/local-llm-green)](#) [![intermediate](https://img.shields.io/badge/intermediate-FFDA33)](#)|[![Ghost](https://img.shields.io/badge/ghost-000?style=for-the-badge&logo=ghost&logoColor=%23F7DF1E)](https://blog.lancedb.com/enhance-rag-integrate-contextual-compression-and-filtering-for-precision-a29d4a810301/) |
Expand Down Expand Up @@ -146,6 +147,7 @@ Design an AI agents coordination application with LanceDB for efficient vector-b
| --------- | -------------------------- | ----------- |
||||
| [AI email assistant with Composio](/examples/AI-Email-Assistant-with-Composio/) |<a href="https://colab.research.google.com/github/lancedb/vectordb-recipes/blob/main/examples/AI-Email-Assistant-with-Composio/composio-lance.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"></a> [![LLM](https://img.shields.io/badge/openai-api-white)](#) [![beginner](https://img.shields.io/badge/beginner-B5FF33)](#)|
| [Assitant Bot with OpenAI Swarm](./examples/assistance-bot-with-swarm/) |[![Python](https://img.shields.io/badge/python-3670A0?style=for-the-badge&logo=python&logoColor=ffdd54)](./examples/assistance-bot-with-swarm/) [![LLM](https://img.shields.io/badge/openai-api-white)](#) [![intermediate](https://img.shields.io/badge/intermediate-FFDA33)](#)|
| [AI Trends Searcher with CrewAI](./examples/AI-Trends-with-CrewAI/) |<a href="https://colab.research.google.com/github/lancedb/vectordb-recipes/blob/main/examples/AI-Trends-with-CrewAI/CrewAI_AI_Trends.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"></a> [![LLM](https://img.shields.io/badge/openai-api-white)](#) [![beginner](https://img.shields.io/badge/beginner-B5FF33)](#)|[![Ghost](https://img.shields.io/badge/ghost-000?style=for-the-badge&logo=ghost&logoColor=%23F7DF1E)](https://blog.lancedb.com/track-ai-trends-crewai-agents-rag/)|
| [SuperAgent Autogen](/examples/SuperAgent_Autogen) |<a href="https://colab.research.google.com/github/lancedb/vectordb-recipes/blob/main/examples/SuperAgent_Autogen/main.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"></a> [![LLM](https://img.shields.io/badge/openai-api-white)](#) [![intermediate](https://img.shields.io/badge/intermediate-FFDA33)](#)||
| [AI Agents: Reducing Hallucination](/examples/reducing_hallucinations_ai_agents/) | <a href="https://colab.research.google.com/github/lancedb/vectordb-recipes/blob/main/examples/reducing_hallucinations_ai_agents/main.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"></a> [![Python](https://img.shields.io/badge/python-3670A0?style=for-the-badge&logo=python&logoColor=ffdd54)](./examples/reducing_hallucinations_ai_agents/main.py) [![JS](https://img.shields.io/badge/javascript-%23323330.svg?style=for-the-badge&logo=javascript&logoColor=%23F7DF1E)](./examples/reducing_hallucinations_ai_agents/index.js) [![LLM](https://img.shields.io/badge/openai-api-white)](#) [![advanced](https://img.shields.io/badge/advanced-FF3333)](#) |[![Ghost](https://img.shields.io/badge/ghost-000?style=for-the-badge&logo=ghost&logoColor=%23F7DF1E)](https://blog.lancedb.com/how-to-reduce-hallucinations-from-llm-powered-agents-using-long-term-memory-72f262c3cc1f/)|
Expand Down
5,211 changes: 5,211 additions & 0 deletions examples/Advanced_RAG_Late_Chunking/Late_Chunking_(Chunked_Pooling).ipynb

Large diffs are not rendered by default.

43 changes: 43 additions & 0 deletions examples/assistance-bot-with-swarm/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# Assistant bot with OpenAI's Swarm

This example shows a customer service bot that has two parts: one for interacting with users and another for providing help. It includes tools to assist in these tasks. The `run_demo_loop` function helps us create an interactive demo session.

## Overview

The support bot has two main parts:

1. **User Interface Agent**: This part interacts with users at first and directs them to the help center based on what they need.
2. **Help Center Agent**: This part offers detailed help and support using various tools and is connected to a LanceDB VectorDB to retrieve documents.

## Setup

To start the bot:

set OpenAI key as env variable
```
export OPENAI_API_KEY="sk-yourapikey"
```

1. **Install requirements**

```python3
pip install -r requirements.txt
git+ssh://[email protected]/openai/swarm.git
```

2. **Prepare and Ingest dataset in LanceDB**

We'll prepare dataset from OpenAI in JSON format and ingest it in LanceDB table.
```python3
python3 dataset_prep.py
```

3. **Ready to RUN**

Now you are ready to run assistant bot with Swarm
```python3
python3 main.py
```

*Note: You can change dataset and ingestion pipeline accordingly for your dataset to build agents around it.*

Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"text": "Introduction\n============\n\n\n\u200bSince releasing the Answers endpoint in beta last year, we\u2019ve developed new methods that achieve better results for this task. As a result, we\u2019ll be removing the Answers endpoint from our documentation and removing access to this endpoint on December 3, 2022 for all organizations. New accounts created after June 3rd will not have access to this endpoint.\n\n\n\nWe strongly encourage developers to switch over to newer techniques which produce better results, outlined below.\n\n\n\nCurrent documentation\n---------------------\n\n\n<https://beta.openai.com/docs/guides/answers> \n\n\n<https://beta.openai.com/docs/api-reference/answers>\n\n\n\nOptions\n=======\n\n\nAs a quick review, here are the high level steps of the current Answers endpoint:\n\n\n\n\n![](https://openai.intercom-attachments-7.com/i/o/524217540/51eda23e171f33f1b9d5acff/rM6ZVI3XZ2CpxcEStmG5mFy6ATBCskmX2g3_GPmeY3FicvrWfJCuFOtzsnbkpMQe-TQ6hi5j1BV9cFo7bCDcsz8VWxFfeOnC1Gb4QNaeVYtJq4Qtg76SBOLLk-jgHUA8mWZ0QgOuV636UgcvMA)All of these options are also outlined [here](https://github.com/openai/openai-cookbook/tree/main/transition_guides_for_deprecated_API_endpoints)\n\n\n\nOption 1: Transition to Embeddings-based search (recommended)\n-------------------------------------------------------------\n\n\nWe believe that most use cases will be better served by moving the underlying search system to use a vector-based embedding search. The major reason for this is that our current system used a bigram filter to narrow down the scope of candidates whereas our embeddings system has much more contextual awareness. Also, in general, using embeddings will be considerably lower cost in the long run. If you\u2019re not familiar with this, you can learn more by visiting our [guide to embeddings](https://beta.openai.com/docs/guides/embeddings/use-cases).\n\n\n\nIf you\u2019re using a small dataset (<10,000 documents), consider using the techniques described in that guide to find the best documents to construct a prompt similar to [this](#h_89196129b2). Then, you can just submit that prompt to our [Completions](https://beta.openai.com/docs/api-reference/completions) endpoint.\n\n\n\nIf you have a larger dataset, consider using a vector search engine like [Pinecone](https://share.streamlit.io/pinecone-io/playground/beyond_search_openai/src/server.py) or [Weaviate](https://weaviate.io/developers/weaviate/current/retriever-vectorizer-modules/text2vec-openai.html) to power that search.\n\n\n\nOption 2: Reimplement existing functionality\n--------------------------------------------\n\n\nIf you\u2019d like to recreate the functionality of the Answers endpoint, here\u2019s how we did it. There is also a [script](https://github.com/openai/openai-cookbook/blob/main/transition_guides_for_deprecated_API_endpoints/answers_functionality_example.py) that replicates most of this functionality.\n\n\n\nAt a high level, there are two main ways you can use the answers endpoint: you can source the data from an uploaded file or send it in with the request.\n\n\n\n**If you\u2019re using the document parameter**\n------------------------------------------\n\n\nThere\u2019s only one step if you provide the documents in the Answers API call.\n\n\n\nHere\u2019s roughly the steps we used: \n\n\n* Construct the prompt [with this format.](#h_89196129b2)\n* Gather all of the provided documents. If they fit in the prompt, just use all of them.\n* Do an [OpenAI search](https://beta.openai.com/docs/api-reference/searches) (note that this is also being deprecated and has a [transition guide](https://app.intercom.com/a/apps/dgkjq2bp/articles/articles/6272952/show)) where the documents are the user provided documents and the query is the query from above. Rank the documents by score.\n* In order of score, attempt to add Elastic search documents until you run out of space in the context.\n* Request a completion with the provided parameters (logit\\_bias, n, stop, etc)\n\n\nThroughout all of this, you\u2019ll need to check that the prompt\u2019s length doesn\u2019t exceed [the model's token limit](https://beta.openai.com/docs/engines/gpt-3). To assess the number of tokens present in a prompt, we recommend <https://huggingface.co/docs/transformers/model_doc/gpt2#transformers.GPT2TokenizerFast>. \n\n\n\nIf you're using the file parameter\n----------------------------------\n\n\nStep 1: upload a jsonl file\n\n\n\nBehind the scenes, we upload new files meant for answers to an Elastic search cluster. Each line of the jsonl is then submitted as a document.\n\n\n\nIf you uploaded the file with the purpose \u201canswers,\u201d we additionally split the documents on newlines and upload each of those chunks as separate documents to ensure that we can search across and reference the highest number of relevant text sections in the file.\n\n\n\nEach line requires a \u201ctext\u201d field and an optional \u201cmetadata\u201d field.\n\n\n\nThese are the Elastic search settings and mappings for our index:\n\n\n\n[Elastic searching mapping](https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping.html): \n\n\n\n```\n{ \n \"properties\": { \n \"document\": {\"type\": \"text\", \"analyzer\": \"standard_bigram_analyzer\"}, -> the \u201ctext\u201d field \n \"metadata\": {\"type\": \"object\", \"enabled\": False}, -> the \u201cmetadata\u201d field \n } \n}\n```\n\n\n[Elastic search analyzer](https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping.html):\n\n\n\n```\n{ \n \"analysis\": { \n \"analyzer\": { \n \"standard_bigram_analyzer\": { \n \"type\": \"custom\", \n \"tokenizer\": \"standard\", \n \"filter\": [\"lowercase\", \"english_stop\", \"shingle\"], \n } \n }, \n \"filter\": {\"english_stop\": {\"type\": \"stop\", \"stopwords\": \"_english_\"}}, \n } \n}\n```\n\n\nAfter that, we performed [standard Elastic search search calls](https://elasticsearch-py.readthedocs.io/en/v8.2.0/api.html#elasticsearch.Elasticsearch.search) and used `max\\_rerank` to determine the number of documents to return from Elastic search.\n\n\n\nStep 2: Search\n\n\nHere\u2019s roughly the steps we used. Our end goal is to create a [Completions](https://beta.openai.com/docs/api-reference/completions) request [with this format](#h_89196129b2). It will look very similar to [Documents](#h_cb1d8a8d3f)\n\n\n\nFrom there, our steps are: \n\n\n* Start with the `experimental\\_alternative\\_question` or, if that's not provided, what\u2019s in the `question` field. Call that the query.\n* Query Elastic search for `max\\_rerank` documents with query as the search param.\n* Take those documents and do an [OpenAI search](https://beta.openai.com/docs/api-reference/searches) on them where the entries from Elastic search are the docs, and the query is the query that you used above. Use the score from the search to rank the documents.\n* In order of score, attempt to add Elastic search documents until you run out of space in the prompt.\n* Request an OpenAI completion with the provided parameters (logit\\_bias, n, stop, etc). Return that answer to the user.\n\n\nCompletion Prompt\n-----------------\n\n\n\n```\n=== \nContext: #{{ provided examples_context }} \n=== \nQ: example 1 question \nA: example 1 answer \n--- \nQ: example 2 question \nA: example 2 answer \n(and so on for all examples provided in the request) \n=== \nContext: #{{ what we return from Elasticsearch }} \n=== \nQ: #{{ user provided question }} \nA:\n```\n", "title": "Answers Transition Guide", "article_id": "6233728", "url": "https://help.openai.com/en/articles/6233728-answers-transition-guide"}
Loading
Loading