-
Notifications
You must be signed in to change notification settings - Fork 16k
Commit
- Loading branch information
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
docs/img_*.jpg |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
MIT License | ||
|
||
Copyright (c) 2023 LangChain, Inc. | ||
|
||
Permission is hereby granted, free of charge, to any person obtaining a copy | ||
of this software and associated documentation files (the "Software"), to deal | ||
in the Software without restriction, including without limitation the rights | ||
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell | ||
copies of the Software, and to permit persons to whom the Software is | ||
furnished to do so, subject to the following conditions: | ||
|
||
The above copyright notice and this permission notice shall be included in all | ||
copies or substantial portions of the Software. | ||
|
||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR | ||
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, | ||
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE | ||
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER | ||
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, | ||
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE | ||
SOFTWARE. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,106 @@ | ||
|
||
# rag-chroma-multi-modal | ||
|
||
Presentations (slide decks, etc) contain visual content that challenges conventional RAG. | ||
|
||
Multi-modal LLMs unlock new ways to build apps over visual content like presentations. | ||
|
||
This template performs multi-modal RAG using Chroma with the multi-vector retriever (see [blog](https://blog.langchain.dev/multi-modal-rag-template/)): | ||
|
||
* Extract the slides as images | ||
* Use GPT-4V to summarize each image | ||
* Embed the image summaries with a link to the original images | ||
* Retrieve relevant image based on similarity between the image summary and the user input | ||
* Finally pass those images to GPT-4V for answer synthesis | ||
|
||
## Storage | ||
|
||
We will use Upstash to store the images. | ||
|
||
Simply login [here](https://upstash.com/) and create a database. | ||
|
||
This will give you: | ||
|
||
* UPSTASH_URL | ||
* UPSTASH_TOKEN | ||
|
||
Set these in chain.py (***TODO: Update this? Env var?***) | ||
|
||
We will use Chroma to store and index the image summaries, which will be created locally in the template directory. | ||
|
||
## Input | ||
|
||
Supply a slide deck as pdf in the `/docs` directory. | ||
|
||
Create your vectorstore (Chroma) and populae Upstash with: | ||
|
||
``` | ||
poetry install | ||
python ingest.py | ||
``` | ||
|
||
## LLM | ||
|
||
The app will retrieve images using multi-modal embeddings, and pass them to GPT-4V. | ||
|
||
## Environment Setup | ||
|
||
Set the `OPENAI_API_KEY` environment variable to access the OpenAI GPT-4V. | ||
|
||
## Usage | ||
|
||
To use this package, you should first have the LangChain CLI installed: | ||
|
||
```shell | ||
pip install -U langchain-cli | ||
``` | ||
|
||
To create a new LangChain project and install this as the only package, you can do: | ||
|
||
```shell | ||
langchain app new my-app --package rag-chroma-multi-modal-multi-vector | ||
``` | ||
|
||
If you want to add this to an existing project, you can just run: | ||
|
||
```shell | ||
langchain app add rag-chroma-multi-modal-multi-vector | ||
``` | ||
|
||
And add the following code to your `server.py` file: | ||
```python | ||
from rag_chroma_multi_modal_multi_vector import chain as rag_chroma_multi_modal_chain_mv | ||
|
||
add_routes(app, rag_chroma_multi_modal_chain_mv, path="/rag-chroma-multi-modal-multi-vector") | ||
``` | ||
|
||
(Optional) Let's now configure LangSmith. | ||
LangSmith will help us trace, monitor and debug LangChain applications. | ||
LangSmith is currently in private beta, you can sign up [here](https://smith.langchain.com/). | ||
If you don't have access, you can skip this section | ||
|
||
```shell | ||
export LANGCHAIN_TRACING_V2=true | ||
export LANGCHAIN_API_KEY=<your-api-key> | ||
export LANGCHAIN_PROJECT=<your-project> # if not specified, defaults to "default" | ||
``` | ||
|
||
If you are inside this directory, then you can spin up a LangServe instance directly by: | ||
|
||
```shell | ||
langchain serve | ||
``` | ||
|
||
This will start the FastAPI app with a server is running locally at | ||
[http://localhost:8000](http://localhost:8000) | ||
|
||
We can see all templates at [http://127.0.0.1:8000/docs](http://127.0.0.1:8000/docs) | ||
We can access the playground at [http://127.0.0.1:8000/rag-chroma-multi-modal-multi-vector/playground](http://127.0.0.1:8000/rag-chroma-multi-modal-multi-vector/playground) | ||
|
||
We can access the template from code with: | ||
|
||
```python | ||
from langserve.client import RemoteRunnable | ||
|
||
runnable = RemoteRunnable("http://localhost:8000/rag-chroma-multi-modal-multi-vector") | ||
``` |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,207 @@ | ||
import os | ||
Check failure on line 1 in templates/rag-chroma-multi-modal-multi-vector/ingest.py GitHub Actions / lint / build (3.8)Ruff (F401)
Check failure on line 1 in templates/rag-chroma-multi-modal-multi-vector/ingest.py GitHub Actions / lint / build (3.8)Ruff (F401)
Check failure on line 1 in templates/rag-chroma-multi-modal-multi-vector/ingest.py GitHub Actions / lint / build (3.11)Ruff (F401)
|
||
from pathlib import Path | ||
|
||
import base64 | ||
import io | ||
from io import BytesIO | ||
|
||
from PIL import Image | ||
|
||
|
||
import pypdfium2 as pdfium | ||
from langchain.vectorstores import Chroma | ||
from langchain_experimental.open_clip import OpenCLIPEmbeddings | ||
Check failure on line 13 in templates/rag-chroma-multi-modal-multi-vector/ingest.py GitHub Actions / lint / build (3.8)Ruff (F401)
Check failure on line 13 in templates/rag-chroma-multi-modal-multi-vector/ingest.py GitHub Actions / lint / build (3.8)Ruff (F401)
Check failure on line 13 in templates/rag-chroma-multi-modal-multi-vector/ingest.py GitHub Actions / lint / build (3.11)Ruff (F401)
Check failure on line 13 in templates/rag-chroma-multi-modal-multi-vector/ingest.py GitHub Actions / lint / build (3.11)Ruff (F401)
|
||
|
||
import uuid | ||
|
||
from langchain.embeddings import OpenAIEmbeddings | ||
from langchain.retrievers.multi_vector import MultiVectorRetriever | ||
from langchain.schema.document import Document | ||
from langchain.schema.output_parser import StrOutputParser | ||
Check failure on line 20 in templates/rag-chroma-multi-modal-multi-vector/ingest.py GitHub Actions / lint / build (3.8)Ruff (F401)
Check failure on line 20 in templates/rag-chroma-multi-modal-multi-vector/ingest.py GitHub Actions / lint / build (3.8)Ruff (F401)
Check failure on line 20 in templates/rag-chroma-multi-modal-multi-vector/ingest.py GitHub Actions / lint / build (3.11)Ruff (F401)
Check failure on line 20 in templates/rag-chroma-multi-modal-multi-vector/ingest.py GitHub Actions / lint / build (3.11)Ruff (F401)
|
||
from langchain.storage import UpstashRedisByteStore | ||
|
||
|
||
from langchain.chat_models import ChatOpenAI | ||
from langchain.schema.messages import HumanMessage | ||
|
||
|
||
def image_summarize(img_base64, prompt): | ||
Check failure on line 28 in templates/rag-chroma-multi-modal-multi-vector/ingest.py GitHub Actions / lint / build (3.8)Ruff (I001)
Check failure on line 28 in templates/rag-chroma-multi-modal-multi-vector/ingest.py GitHub Actions / lint / build (3.8)Ruff (I001)
Check failure on line 28 in templates/rag-chroma-multi-modal-multi-vector/ingest.py GitHub Actions / lint / build (3.11)Ruff (I001)
|
||
""" | ||
Make image summary | ||
:param img_base64: Base64 encoded string for image | ||
:param prompt: Text prompt for summarizatiomn | ||
:return: Image summarization prompt | ||
""" | ||
chat = ChatOpenAI(model="gpt-4-vision-preview", max_tokens=1024) | ||
|
||
msg = chat.invoke( | ||
[ | ||
HumanMessage( | ||
content=[ | ||
{"type": "text", "text": prompt}, | ||
{ | ||
"type": "image_url", | ||
"image_url": {"url": f"data:image/jpeg;base64,{img_base64}"}, | ||
}, | ||
] | ||
) | ||
] | ||
) | ||
return msg.content | ||
|
||
|
||
def generate_img_summaries(img_base64_list): | ||
""" | ||
Generate summaries for images | ||
:param img_base64_list: Base64 encoded images | ||
:return: List of image summaries and processed images | ||
""" | ||
|
||
# Store image summaries | ||
image_summaries = [] | ||
processed_images = [] | ||
|
||
# Prompt | ||
prompt = """You are an assistant tasked with summarizing images for retrieval. \ | ||
These summaries will be embedded and used to retrieve the raw image. \ | ||
Give a concise summary of the image that is well optimized for retrieval.""" | ||
|
||
# Apply summarization to images | ||
for i, base64_image in enumerate(img_base64_list): | ||
try: | ||
image_summaries.append(image_summarize(base64_image, prompt)) | ||
processed_images.append(base64_image) | ||
except: | ||
Check failure on line 77 in templates/rag-chroma-multi-modal-multi-vector/ingest.py GitHub Actions / lint / build (3.8)Ruff (E722)
Check failure on line 77 in templates/rag-chroma-multi-modal-multi-vector/ingest.py GitHub Actions / lint / build (3.8)Ruff (E722)
Check failure on line 77 in templates/rag-chroma-multi-modal-multi-vector/ingest.py GitHub Actions / lint / build (3.11)Ruff (E722)
|
||
print(f"BadRequestError with image {i+1}") | ||
|
||
return image_summaries, processed_images | ||
|
||
|
||
def get_images_from_pdf(pdf_path, img_dump_path): | ||
""" | ||
Extract images from each page of a PDF document and save as JPEG files. | ||
:param pdf_path: A string representing the path to the PDF file. | ||
:param img_dump_path: A string representing the path to dummp images. | ||
""" | ||
pdf = pdfium.PdfDocument(pdf_path) | ||
n_pages = len(pdf) | ||
pil_images = [] | ||
for page_number in range(n_pages): | ||
page = pdf.get_page(page_number) | ||
bitmap = page.render(scale=1, rotation=0, crop=(0, 0, 0, 0)) | ||
pil_image = bitmap.to_pil() | ||
pil_image.save(f"{img_dump_path}/img_{page_number + 1}.jpg", format="JPEG") | ||
pil_images.append(pil_image) | ||
return pil_images | ||
|
||
|
||
def resize_base64_image(base64_string, size=(128, 128)): | ||
""" | ||
Resize an image encoded as a Base64 string | ||
:param base64_string: Base64 string | ||
:param size: Image size | ||
:return: Re-sized Base64 string | ||
""" | ||
# Decode the Base64 string | ||
img_data = base64.b64decode(base64_string) | ||
img = Image.open(io.BytesIO(img_data)) | ||
|
||
# Resize the image | ||
resized_img = img.resize(size, Image.LANCZOS) | ||
|
||
# Save the resized image to a bytes buffer | ||
buffered = io.BytesIO() | ||
resized_img.save(buffered, format=img.format) | ||
|
||
# Encode the resized image to Base64 | ||
return base64.b64encode(buffered.getvalue()).decode("utf-8") | ||
|
||
|
||
def convert_to_base64(pil_image): | ||
""" | ||
Convert PIL images to Base64 encoded strings | ||
:param pil_image: PIL image | ||
:return: Re-sized Base64 string | ||
""" | ||
|
||
buffered = BytesIO() | ||
pil_image.save(buffered, format="JPEG") # You can change the format if needed | ||
img_str = base64.b64encode(buffered.getvalue()).decode("utf-8") | ||
img_str = resize_base64_image(img_str, size=(960, 540)) | ||
return img_str | ||
|
||
def create_multi_vector_retriever(vectorstore, image_summaries, images): | ||
""" | ||
Create retriever that indexes summaries, but returns raw images or texts | ||
:param vectorstore: Vectorstore to store embedded image sumamries | ||
:param image_summaries: Image summaries | ||
:param images: Base64 encoded images | ||
:return: Retriever | ||
""" | ||
|
||
# Initialize the storage layer for images | ||
UPSTASH_URL = "https://usw1-bright-beagle-34178.upstash.io" | ||
UPSTASH_TOKEN = "AYWCACQgNzk3OTJjZTItMGIxNy00MTEzLWIyZTAtZWI0ZmI1ZGY0NjFhNGRhMGZjNDE4YjgxNGE4MTkzOWYxMzllM2MzZThlOGY=" | ||
Check failure on line 151 in templates/rag-chroma-multi-modal-multi-vector/ingest.py GitHub Actions / lint / build (3.8)Ruff (E501)
Check failure on line 151 in templates/rag-chroma-multi-modal-multi-vector/ingest.py GitHub Actions / lint / build (3.8)Ruff (E501)
Check failure on line 151 in templates/rag-chroma-multi-modal-multi-vector/ingest.py GitHub Actions / lint / build (3.11)Ruff (E501)
|
||
store = UpstashRedisByteStore(url=UPSTASH_URL, | ||
token=UPSTASH_TOKEN) | ||
id_key = "doc_id" | ||
|
||
# Create the multi-vector retriever | ||
retriever = MultiVectorRetriever( | ||
vectorstore=vectorstore, | ||
byte_store=store, | ||
id_key=id_key, | ||
) | ||
|
||
# Helper function to add documents to the vectorstore and docstore | ||
def add_documents(retriever, doc_summaries, doc_contents): | ||
doc_ids = [str(uuid.uuid4()) for _ in doc_contents] | ||
summary_docs = [ | ||
Document(page_content=s, metadata={id_key: doc_ids[i]}) | ||
for i, s in enumerate(doc_summaries) | ||
] | ||
retriever.vectorstore.add_documents(summary_docs) | ||
retriever.docstore.mset(list(zip(doc_ids, doc_contents))) | ||
|
||
add_documents(retriever, image_summaries, images) | ||
|
||
return retriever | ||
|
||
# Load PDF | ||
doc_path = Path(__file__).parent / "docs/DDOG_Q3_earnings_deck.pdf" | ||
img_dump_path = Path(__file__).parent / "docs/" | ||
rel_doc_path = doc_path.relative_to(Path.cwd()) | ||
rel_img_dump_path = img_dump_path.relative_to(Path.cwd()) | ||
print("Extract slides as images") | ||
pil_images = get_images_from_pdf(rel_doc_path, rel_img_dump_path) | ||
|
||
# Convert to b64 | ||
images_base_64 = [convert_to_base64(i) for i in pil_images] | ||
|
||
# Image summaries | ||
print("Generate image summaries") | ||
image_summaries, images_base_64_processed = generate_img_summaries(images_base_64) | ||
|
||
# The vectorstore to use to index the images summaries | ||
vectorstore_mvr = Chroma( | ||
collection_name="image_summaries", | ||
persist_directory=str(Path(__file__).parent / "chroma_db_multi_modal"), | ||
embedding_function=OpenAIEmbeddings() | ||
) | ||
|
||
# Create documents | ||
images_base_64_processed_documents = [Document(page_content=i) for i in images_base_64_processed] | ||
Check failure on line 200 in templates/rag-chroma-multi-modal-multi-vector/ingest.py GitHub Actions / lint / build (3.8)Ruff (E501)
Check failure on line 200 in templates/rag-chroma-multi-modal-multi-vector/ingest.py GitHub Actions / lint / build (3.8)Ruff (E501)
Check failure on line 200 in templates/rag-chroma-multi-modal-multi-vector/ingest.py GitHub Actions / lint / build (3.11)Ruff (E501)
|
||
|
||
# Create retriever | ||
retriever_multi_vector_img = create_multi_vector_retriever( | ||
vectorstore_mvr, | ||
image_summaries, | ||
images_base_64_processed_documents, | ||
) |