From 2b591f98726ed0d883236dd0550201b95203eebb Mon Sep 17 00:00:00 2001
From: m-newhauser <35735816+m-newhauser@users.noreply.github.com>
Date: Thu, 19 Dec 2024 14:57:40 -0600
Subject: [PATCH] docs: add Weaviate RAG recipe notebook (#451)

Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
Co-authored-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
---
 docs/examples/rag_weaviate.ipynb | 732 +++++++++++++++++++++++++++++++
 mkdocs.yml                       |   1 +
 2 files changed, 733 insertions(+)
 create mode 100644 docs/examples/rag_weaviate.ipynb

diff --git a/docs/examples/rag_weaviate.ipynb b/docs/examples/rag_weaviate.ipynb
new file mode 100644
index 00000000..7f897d9e
--- /dev/null
+++ b/docs/examples/rag_weaviate.ipynb
@@ -0,0 +1,732 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "Ag9kcX2B_atc"
+   },
+   "source": [
+    "# Performing RAG over PDFs with Weaviate and Docling\n",
+    "## A recipe 🧑‍🍳 🐥 💚\n",
+    "[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/DS4SD/docling/blob/tree/main/docs/examples/rag_weaviate.ipynb)\n",
+    "\n",
+    "This is a code recipe that uses [Weaviate](https://weaviate.io/) to perform RAG over PDF documents parsed by [Docling](https://ds4sd.github.io/docling/).\n",
+    "\n",
+    "In this notebook, we accomplish the following:\n",
+    "* Parse the top machine learning papers on [arXiv](https://arxiv.org/) using Docling\n",
+    "* Perform hierarchical chunking of the documents using Docling\n",
+    "* Generate text embeddings with OpenAI\n",
+    "* Perform RAG using [Weaviate](https://weaviate.io/developers/weaviate/search/generative)\n",
+    "\n",
+    "To run this notebook, you'll need:\n",
+    "* An [OpenAI API key](https://platform.openai.com/docs/quickstart)\n",
+    "* Access to GPU/s\n",
+    "\n",
+    "Note: For best results, please use **GPU acceleration** to run this notebook. Here are two options for running this notebook:\n",
+    "1. **Locally on a MacBook with an Apple Silicon chip.** Converting all documents in the notebook takes ~2 minutes on a MacBook M2 due to Docling's usage of MPS accelerators.\n",
+    "2. **Run this notebook on Google Colab.** Converting all documents in the notebook takes ~8 mintutes on a Google Colab T4 GPU."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "4YgT7tpXCUl0"
+   },
+   "source": [
+    "### Install Docling and Weaviate client\n",
+    "\n",
+    "Note: If Colab prompts you to restart the session after running the cell below, click \"restart\" and proceed with running the rest of the notebook."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {
+    "collapsed": true,
+    "id": "u076oUSF_YUG"
+   },
+   "outputs": [],
+   "source": [
+    "%%capture\n",
+    "%pip install docling~=\"2.7.0\"\n",
+    "%pip install -U weaviate-client~=\"4.9.4\"\n",
+    "%pip install rich\n",
+    "%pip install torch\n",
+    "\n",
+    "import warnings\n",
+    "\n",
+    "warnings.filterwarnings(\"ignore\")\n",
+    "\n",
+    "import logging\n",
+    "\n",
+    "# Suppress Weaviate client logs\n",
+    "logging.getLogger(\"weaviate\").setLevel(logging.ERROR)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "2q2F9RUmR8Wj"
+   },
+   "source": [
+    "## 🐥 Part 1: Docling\n",
+    "\n",
+    "Part of what makes Docling so remarkable is the fact that it can run on commodity hardware. This means that this notebook can be run on a local machine with GPU acceleration. If you're using a MacBook with a silicon chip, Docling integrates seamlessly with Metal Performance Shaders (MPS). MPS provides out-of-the-box GPU acceleration for macOS, seamlessly integrating with PyTorch and TensorFlow, offering energy-efficient performance on Apple Silicon, and broad compatibility with all Metal-supported GPUs.\n",
+    "\n",
+    "The code below checks to see if a GPU is available, either via CUDA or MPS."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "MPS GPU is enabled.\n"
+     ]
+    }
+   ],
+   "source": [
+    "import torch\n",
+    "\n",
+    "# Check if GPU or MPS is available\n",
+    "if torch.cuda.is_available():\n",
+    "    device = torch.device(\"cuda\")\n",
+    "    print(f\"CUDA GPU is enabled: {torch.cuda.get_device_name(0)}\")\n",
+    "elif torch.backends.mps.is_available():\n",
+    "    device = torch.device(\"mps\")\n",
+    "    print(\"MPS GPU is enabled.\")\n",
+    "else:\n",
+    "    raise EnvironmentError(\n",
+    "        \"No GPU or MPS device found. Please check your environment and ensure GPU or MPS support is configured.\"\n",
+    "    )"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "wHTsy4a8JFPl"
+   },
+   "source": [
+    "Here, we've collected 10 influential machine learning papers published as PDFs on arXiv. Because Docling does not yet have title extraction for PDFs, we manually add the titles in a corresponding list.\n",
+    "\n",
+    "Note: Converting all 10 papers should take around 8 minutes with a T4 GPU."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {
+    "id": "Vy5SMPiGDMy-"
+   },
+   "outputs": [],
+   "source": [
+    "# Influential machine learning papers\n",
+    "source_urls = [\n",
+    "    \"https://arxiv.org/pdf/1706.03762\",\n",
+    "    \"https://arxiv.org/pdf/1810.04805\",\n",
+    "    \"https://arxiv.org/pdf/1406.2661\",\n",
+    "    \"https://arxiv.org/pdf/1409.0473\",\n",
+    "    \"https://arxiv.org/pdf/1412.6980\",\n",
+    "    \"https://arxiv.org/pdf/1312.6114\",\n",
+    "    \"https://arxiv.org/pdf/1312.5602\",\n",
+    "    \"https://arxiv.org/pdf/1512.03385\",\n",
+    "    \"https://arxiv.org/pdf/1409.3215\",\n",
+    "    \"https://arxiv.org/pdf/1301.3781\",\n",
+    "]\n",
+    "\n",
+    "# And their corresponding titles (because Docling doesn't have title extraction yet!)\n",
+    "source_titles = [\n",
+    "    \"Attention Is All You Need\",\n",
+    "    \"BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding\",\n",
+    "    \"Generative Adversarial Nets\",\n",
+    "    \"Neural Machine Translation by Jointly Learning to Align and Translate\",\n",
+    "    \"Adam: A Method for Stochastic Optimization\",\n",
+    "    \"Auto-Encoding Variational Bayes\",\n",
+    "    \"Playing Atari with Deep Reinforcement Learning\",\n",
+    "    \"Deep Residual Learning for Image Recognition\",\n",
+    "    \"Sequence to Sequence Learning with Neural Networks\",\n",
+    "    \"A Neural Probabilistic Language Model\",\n",
+    "]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "5fi8wzHrCoLa"
+   },
+   "source": [
+    "### Convert PDFs to Docling documents\n",
+    "\n",
+    "Here we use Docling's `.convert_all()` to parse a batch of PDFs. The result is a list of Docling documents that we can use for text extraction.\n",
+    "\n",
+    "Note: Please ignore the `ERR#` message."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {
+    "colab": {
+     "base_uri": "https://localhost:8080/",
+     "height": 67,
+     "referenced_widgets": [
+      "6d049f786a2f4ad7857a6cf2d95b5ba2",
+      "db2a7b9f549e4f0fb1ff3fce655d76a2",
+      "630967a2db4c4714b4c15d1358a0fcae",
+      "b3da9595ab7c4995a00e506e7b5202e3",
+      "243ecaf36ee24cafbd1c33d148f2ca78",
+      "5b7e22df1b464ca894126736e6f72207",
+      "02f6af5993bb4a6a9dbca77952f675d2",
+      "dea323b3de0e43118f338842c94ac065",
+      "bd198d2c0c4c4933a6e6544908d0d846",
+      "febd5c498e4f4f5dbde8dec3cd935502",
+      "ab4f282c0d37451092c60e6566e8e945"
+     ]
+    },
+    "id": "Sr44xGR1PNSc",
+    "outputId": "b5cca9ee-d7c0-4c8f-c18a-0ac4787984e9"
+   },
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "Fetching 9 files: 100%|██████████| 9/9 [00:00<00:00, 84072.91it/s]\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "ERR#: COULD NOT CONVERT TO RS THIS TABLE TO COMPUTE SPANS\n"
+     ]
+    }
+   ],
+   "source": [
+    "from docling.datamodel.document import ConversionResult\n",
+    "from docling.document_converter import DocumentConverter\n",
+    "\n",
+    "# Instantiate the doc converter\n",
+    "doc_converter = DocumentConverter()\n",
+    "\n",
+    "# Directly pass list of files or streams to `convert_all`\n",
+    "conv_results_iter = doc_converter.convert_all(source_urls)  # previously `convert`\n",
+    "\n",
+    "# Iterate over the generator to get a list of Docling documents\n",
+    "docs = [result.document for result in conv_results_iter]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "xHun_P-OCtKd"
+   },
+   "source": [
+    "### Post-process extracted document data\n",
+    "#### Perform hierarchical chunking on documents\n",
+    "\n",
+    "We use Docling's `HierarchicalChunker()` to perform hierarchy-aware chunking of our list of documents. This is meant to preserve some of the structure and relationships within the document, which enables more accurate and relevant retrieval in our RAG pipeline."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {
+    "id": "L17ju9xibuIo"
+   },
+   "outputs": [],
+   "source": [
+    "from docling_core.transforms.chunker import HierarchicalChunker\n",
+    "\n",
+    "# Initialize lists for text, and titles\n",
+    "texts, titles = [], []\n",
+    "\n",
+    "chunker = HierarchicalChunker()\n",
+    "\n",
+    "# Process each document in the list\n",
+    "for doc, title in zip(docs, source_titles):  # Pair each document with its title\n",
+    "    chunks = list(\n",
+    "        chunker.chunk(doc)\n",
+    "    )  # Perform hierarchical chunking and get text from chunks\n",
+    "    for chunk in chunks:\n",
+    "        texts.append(chunk.text)\n",
+    "        titles.append(title)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "khbU9R1li2Kj"
+   },
+   "source": [
+    "Because we're splitting the documents into chunks, we'll concatenate the article title to the beginning of each chunk for additional context."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {
+    "id": "HNwYV9P57OwF"
+   },
+   "outputs": [],
+   "source": [
+    "# Concatenate title and text\n",
+    "for i in range(len(texts)):\n",
+    "    texts[i] = f\"{titles[i]} {texts[i]}\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "uhLlCpQODaT3"
+   },
+   "source": [
+    "## 💚 Part 2: Weaviate\n",
+    "### Create and configure an embedded Weaviate collection"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "ho7xYQTZK5Wk"
+   },
+   "source": [
+    "We'll be using the OpenAI API for both generating the text embeddings and for the generative model in our RAG pipeline. The code below dynamically fetches your API key based on whether you're running this notebook in Google Colab and running it as a regular Jupyter notebook. All you need to do is replace `openai_api_key_var` with the name of your environmental variable name or Colab secret name for the API key.\n",
+    "\n",
+    "If you're running this notebook in Google Colab, make sure you [add](https://medium.com/@parthdasawant/how-to-use-secrets-in-google-colab-450c38e3ec75) your API key as a secret."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {
+    "id": "PD53jOT4roj2"
+   },
+   "outputs": [],
+   "source": [
+    "# OpenAI API key variable name\n",
+    "openai_api_key_var = \"OPENAI_API_KEY\"  # Replace with the name of your secret/env var\n",
+    "\n",
+    "# Fetch OpenAI API key\n",
+    "try:\n",
+    "    # If running in Colab, fetch API key from Secrets\n",
+    "    import google.colab\n",
+    "    from google.colab import userdata\n",
+    "\n",
+    "    openai_api_key = userdata.get(openai_api_key_var)\n",
+    "    if not openai_api_key:\n",
+    "        raise ValueError(f\"Secret '{openai_api_key_var}' not found in Colab secrets.\")\n",
+    "except ImportError:\n",
+    "    # If not running in Colab, fetch API key from environment variable\n",
+    "    import os\n",
+    "\n",
+    "    openai_api_key = os.getenv(openai_api_key_var)\n",
+    "    if not openai_api_key:\n",
+    "        raise EnvironmentError(\n",
+    "            f\"Environment variable '{openai_api_key_var}' is not set. \"\n",
+    "            \"Please define it before running this script.\"\n",
+    "        )"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "8G5jZSh6ti3e"
+   },
+   "source": [
+    "[Embedded Weaviate](https://weaviate.io/developers/weaviate/installation/embedded) allows you to spin up a Weaviate instance directly from your application code, without having to use a Docker container. If you're interested in other deployment methods, like using Docker-Compose or Kubernetes, check out this [page](https://weaviate.io/developers/weaviate/installation) in the Weaviate docs."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "colab": {
+     "base_uri": "https://localhost:8080/"
+    },
+    "id": "hFUBEZiJUMic",
+    "outputId": "0b6534c9-66c9-4a47-9754-103bcc030019"
+   },
+   "outputs": [],
+   "source": [
+    "import weaviate\n",
+    "\n",
+    "# Connect to Weaviate embedded\n",
+    "client = weaviate.connect_to_embedded(headers={\"X-OpenAI-Api-Key\": openai_api_key})"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "id": "4nu9qM75hrsd"
+   },
+   "outputs": [],
+   "source": [
+    "import weaviate.classes.config as wc\n",
+    "from weaviate.classes.config import DataType, Property\n",
+    "\n",
+    "# Define the collection name\n",
+    "collection_name = \"docling\"\n",
+    "\n",
+    "# Delete the collection if it already exists\n",
+    "if client.collections.exists(collection_name):\n",
+    "    client.collections.delete(collection_name)\n",
+    "\n",
+    "# Create the collection\n",
+    "collection = client.collections.create(\n",
+    "    name=collection_name,\n",
+    "    vectorizer_config=wc.Configure.Vectorizer.text2vec_openai(\n",
+    "        model=\"text-embedding-3-large\",  # Specify your embedding model here\n",
+    "    ),\n",
+    "    # Enable generative model from Cohere\n",
+    "    generative_config=wc.Configure.Generative.openai(\n",
+    "        model=\"gpt-4o\"  # Specify your generative model for RAG here\n",
+    "    ),\n",
+    "    # Define properties of metadata\n",
+    "    properties=[\n",
+    "        wc.Property(name=\"text\", data_type=wc.DataType.TEXT),\n",
+    "        wc.Property(name=\"title\", data_type=wc.DataType.TEXT, skip_vectorization=True),\n",
+    "    ],\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "RgMcZDB9Dzfs"
+   },
+   "source": [
+    "### Wrangle data into an acceptable format for Weaviate\n",
+    "\n",
+    "Transform our data from lists to a list of dictionaries for insertion into our Weaviate collection."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "metadata": {
+    "id": "kttDgwZEsIJQ"
+   },
+   "outputs": [],
+   "source": [
+    "# Initialize the data object\n",
+    "data = []\n",
+    "\n",
+    "# Create a dictionary for each row by iterating through the corresponding lists\n",
+    "for text, title in zip(texts, titles):\n",
+    "    data_point = {\n",
+    "        \"text\": text,\n",
+    "        \"title\": title,\n",
+    "    }\n",
+    "    data.append(data_point)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "-4amqRaoD5g0"
+   },
+   "source": [
+    "### Insert data into Weaviate and generate embeddings\n",
+    "\n",
+    "Embeddings will be generated upon insertion to our Weaviate collection."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "colab": {
+     "base_uri": "https://localhost:8080/"
+    },
+    "id": "g8VCYnhbaxcz",
+    "outputId": "cc900e56-9fb6-4d4e-ab18-ebd12b1f4201"
+   },
+   "outputs": [],
+   "source": [
+    "# Insert text chunks and metadata into vector DB collection\n",
+    "response = collection.data.insert_many(data)\n",
+    "\n",
+    "if response.has_errors:\n",
+    "    print(response.errors)\n",
+    "else:\n",
+    "    print(\"Insert complete.\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "KI01PxjuD_XR"
+   },
+   "source": [
+    "### Query the data\n",
+    "\n",
+    "Here, we perform a simple similarity search to return the most similar embedded chunks to our search query."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "metadata": {
+    "colab": {
+     "base_uri": "https://localhost:8080/"
+    },
+    "id": "zbz6nWJc5CSj",
+    "outputId": "16aced21-4496-4c91-cc12-d5c9ac983351"
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "{'text': 'BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding A distinctive feature of BERT is its unified architecture across different tasks. There is mini-', 'title': 'BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding'}\n",
+      "0.6578550338745117\n",
+      "{'text': 'BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding We introduce a new language representation model called BERT , which stands for B idirectional E ncoder R epresentations from T ransformers. Unlike recent language representation models (Peters et al., 2018a; Radford et al., 2018), BERT is designed to pretrain deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT model can be finetuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial taskspecific architecture modifications.', 'title': 'BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding'}\n",
+      "0.6696287989616394\n"
+     ]
+    }
+   ],
+   "source": [
+    "from weaviate.classes.query import MetadataQuery\n",
+    "\n",
+    "response = collection.query.near_text(\n",
+    "    query=\"bert\",\n",
+    "    limit=2,\n",
+    "    return_metadata=MetadataQuery(distance=True),\n",
+    "    return_properties=[\"text\", \"title\"],\n",
+    ")\n",
+    "\n",
+    "for o in response.objects:\n",
+    "    print(o.properties)\n",
+    "    print(o.metadata.distance)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "elo32iMnEC18"
+   },
+   "source": [
+    "### Perform RAG on parsed articles\n",
+    "\n",
+    "Weaviate's `generate` module allows you to perform RAG over your embedded data without having to use a separate framework.\n",
+    "\n",
+    "We specify a prompt that includes the field we want to search through in the database (in this case it's `text`), a query that includes our search term, and the number of retrieved results to use in the generation."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "metadata": {
+    "colab": {
+     "base_uri": "https://localhost:8080/",
+     "height": 233
+    },
+    "id": "7r2LMSX9bO4y",
+    "outputId": "84639adf-7783-4d43-94d9-711fb313a168"
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">╭──────────────────────────────────────────────────── Prompt ─────────────────────────────────────────────────────╮</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│</span> Explain how bert works, using only the retrieved context.                                                       <span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯</span>\n",
+       "</pre>\n"
+      ],
+      "text/plain": [
+       "\u001b[1;31m╭─\u001b[0m\u001b[1;31m───────────────────────────────────────────────────\u001b[0m\u001b[1;31m Prompt \u001b[0m\u001b[1;31m────────────────────────────────────────────────────\u001b[0m\u001b[1;31m─╮\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m Explain how bert works, using only the retrieved context.                                                       \u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯\u001b[0m\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/html": [
+       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">╭─────────────────────────────────────────────── Generated Content ───────────────────────────────────────────────╮</span>\n",
+       "<span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">│</span> BERT, which stands for Bidirectional Encoder Representations from Transformers, is a language representation    <span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">│</span>\n",
+       "<span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">│</span> model designed to pretrain deep bidirectional representations from unlabeled text. It conditions on both left   <span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">│</span>\n",
+       "<span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">│</span> and right context in all layers, unlike traditional left-to-right or right-to-left language models. This        <span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">│</span>\n",
+       "<span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">│</span> pre-training involves two unsupervised tasks. The pre-trained BERT model can then be fine-tuned with just one   <span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">│</span>\n",
+       "<span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">│</span> additional output layer to create state-of-the-art models for various tasks, such as question answering and     <span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">│</span>\n",
+       "<span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">│</span> language inference, without needing substantial task-specific architecture modifications. A distinctive feature <span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">│</span>\n",
+       "<span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">│</span> of BERT is its unified architecture across different tasks.                                                     <span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">│</span>\n",
+       "<span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯</span>\n",
+       "</pre>\n"
+      ],
+      "text/plain": [
+       "\u001b[1;32m╭─\u001b[0m\u001b[1;32m──────────────────────────────────────────────\u001b[0m\u001b[1;32m Generated Content \u001b[0m\u001b[1;32m──────────────────────────────────────────────\u001b[0m\u001b[1;32m─╮\u001b[0m\n",
+       "\u001b[1;32m│\u001b[0m BERT, which stands for Bidirectional Encoder Representations from Transformers, is a language representation    \u001b[1;32m│\u001b[0m\n",
+       "\u001b[1;32m│\u001b[0m model designed to pretrain deep bidirectional representations from unlabeled text. It conditions on both left   \u001b[1;32m│\u001b[0m\n",
+       "\u001b[1;32m│\u001b[0m and right context in all layers, unlike traditional left-to-right or right-to-left language models. This        \u001b[1;32m│\u001b[0m\n",
+       "\u001b[1;32m│\u001b[0m pre-training involves two unsupervised tasks. The pre-trained BERT model can then be fine-tuned with just one   \u001b[1;32m│\u001b[0m\n",
+       "\u001b[1;32m│\u001b[0m additional output layer to create state-of-the-art models for various tasks, such as question answering and     \u001b[1;32m│\u001b[0m\n",
+       "\u001b[1;32m│\u001b[0m language inference, without needing substantial task-specific architecture modifications. A distinctive feature \u001b[1;32m│\u001b[0m\n",
+       "\u001b[1;32m│\u001b[0m of BERT is its unified architecture across different tasks.                                                     \u001b[1;32m│\u001b[0m\n",
+       "\u001b[1;32m╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯\u001b[0m\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "from rich.console import Console\n",
+    "from rich.panel import Panel\n",
+    "\n",
+    "# Create a prompt where context from the Weaviate collection will be injected\n",
+    "prompt = \"Explain how {text} works, using only the retrieved context.\"\n",
+    "query = \"bert\"\n",
+    "\n",
+    "response = collection.generate.near_text(\n",
+    "    query=query, limit=3, grouped_task=prompt, return_properties=[\"text\", \"title\"]\n",
+    ")\n",
+    "\n",
+    "# Prettify the output using Rich\n",
+    "console = Console()\n",
+    "\n",
+    "console.print(\n",
+    "    Panel(f\"{prompt}\".replace(\"{text}\", query), title=\"Prompt\", border_style=\"bold red\")\n",
+    ")\n",
+    "console.print(\n",
+    "    Panel(response.generated, title=\"Generated Content\", border_style=\"bold green\")\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 14,
+   "metadata": {
+    "colab": {
+     "base_uri": "https://localhost:8080/",
+     "height": 233
+    },
+    "id": "Dtju3oCiDOdD",
+    "outputId": "2f0f0cf8-0305-40cc-8409-07036c101938"
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">╭──────────────────────────────────────────────────── Prompt ─────────────────────────────────────────────────────╮</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│</span> Explain how a generative adversarial net works, using only the retrieved context.                               <span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯</span>\n",
+       "</pre>\n"
+      ],
+      "text/plain": [
+       "\u001b[1;31m╭─\u001b[0m\u001b[1;31m───────────────────────────────────────────────────\u001b[0m\u001b[1;31m Prompt \u001b[0m\u001b[1;31m────────────────────────────────────────────────────\u001b[0m\u001b[1;31m─╮\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m Explain how a generative adversarial net works, using only the retrieved context.                               \u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯\u001b[0m\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/html": [
+       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">╭─────────────────────────────────────────────── Generated Content ───────────────────────────────────────────────╮</span>\n",
+       "<span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">│</span> Generative Adversarial Nets (GANs) operate within an adversarial framework where two models are trained         <span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">│</span>\n",
+       "<span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">│</span> simultaneously: a generative model (G) and a discriminative model (D). The generative model aims to capture the <span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">│</span>\n",
+       "<span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">│</span> data distribution and generate samples that mimic real data, while the discriminative model's task is to        <span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">│</span>\n",
+       "<span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">│</span> distinguish between samples from the real data and those generated by G. This setup is akin to a game where the <span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">│</span>\n",
+       "<span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">│</span> generative model acts like counterfeiters trying to produce indistinguishable fake currency, and the            <span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">│</span>\n",
+       "<span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">│</span> discriminative model acts like the police trying to detect these counterfeits.                                  <span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">│</span>\n",
+       "<span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">│</span>                                                                                                                 <span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">│</span>\n",
+       "<span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">│</span> The training process involves a minimax two-player game where G tries to maximize the probability of D making a <span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">│</span>\n",
+       "<span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">│</span> mistake, while D tries to minimize it. When both models are defined by multilayer perceptrons, they can be      <span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">│</span>\n",
+       "<span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">│</span> trained using backpropagation without the need for Markov chains or approximate inference networks. The         <span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">│</span>\n",
+       "<span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">│</span> ultimate goal is for G to perfectly replicate the training data distribution, making D's output equal to 1/2    <span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">│</span>\n",
+       "<span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">│</span> everywhere, indicating it cannot distinguish between real and generated data. This framework allows for         <span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">│</span>\n",
+       "<span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">│</span> specific training algorithms and optimization techniques, such as backpropagation and dropout, to be            <span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">│</span>\n",
+       "<span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">│</span> effectively utilized.                                                                                           <span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">│</span>\n",
+       "<span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯</span>\n",
+       "</pre>\n"
+      ],
+      "text/plain": [
+       "\u001b[1;32m╭─\u001b[0m\u001b[1;32m──────────────────────────────────────────────\u001b[0m\u001b[1;32m Generated Content \u001b[0m\u001b[1;32m──────────────────────────────────────────────\u001b[0m\u001b[1;32m─╮\u001b[0m\n",
+       "\u001b[1;32m│\u001b[0m Generative Adversarial Nets (GANs) operate within an adversarial framework where two models are trained         \u001b[1;32m│\u001b[0m\n",
+       "\u001b[1;32m│\u001b[0m simultaneously: a generative model (G) and a discriminative model (D). The generative model aims to capture the \u001b[1;32m│\u001b[0m\n",
+       "\u001b[1;32m│\u001b[0m data distribution and generate samples that mimic real data, while the discriminative model's task is to        \u001b[1;32m│\u001b[0m\n",
+       "\u001b[1;32m│\u001b[0m distinguish between samples from the real data and those generated by G. This setup is akin to a game where the \u001b[1;32m│\u001b[0m\n",
+       "\u001b[1;32m│\u001b[0m generative model acts like counterfeiters trying to produce indistinguishable fake currency, and the            \u001b[1;32m│\u001b[0m\n",
+       "\u001b[1;32m│\u001b[0m discriminative model acts like the police trying to detect these counterfeits.                                  \u001b[1;32m│\u001b[0m\n",
+       "\u001b[1;32m│\u001b[0m                                                                                                                 \u001b[1;32m│\u001b[0m\n",
+       "\u001b[1;32m│\u001b[0m The training process involves a minimax two-player game where G tries to maximize the probability of D making a \u001b[1;32m│\u001b[0m\n",
+       "\u001b[1;32m│\u001b[0m mistake, while D tries to minimize it. When both models are defined by multilayer perceptrons, they can be      \u001b[1;32m│\u001b[0m\n",
+       "\u001b[1;32m│\u001b[0m trained using backpropagation without the need for Markov chains or approximate inference networks. The         \u001b[1;32m│\u001b[0m\n",
+       "\u001b[1;32m│\u001b[0m ultimate goal is for G to perfectly replicate the training data distribution, making D's output equal to 1/2    \u001b[1;32m│\u001b[0m\n",
+       "\u001b[1;32m│\u001b[0m everywhere, indicating it cannot distinguish between real and generated data. This framework allows for         \u001b[1;32m│\u001b[0m\n",
+       "\u001b[1;32m│\u001b[0m specific training algorithms and optimization techniques, such as backpropagation and dropout, to be            \u001b[1;32m│\u001b[0m\n",
+       "\u001b[1;32m│\u001b[0m effectively utilized.                                                                                           \u001b[1;32m│\u001b[0m\n",
+       "\u001b[1;32m╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯\u001b[0m\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "# Create a prompt where context from the Weaviate collection will be injected\n",
+    "prompt = \"Explain how {text} works, using only the retrieved context.\"\n",
+    "query = \"a generative adversarial net\"\n",
+    "\n",
+    "response = collection.generate.near_text(\n",
+    "    query=query, limit=3, grouped_task=prompt, return_properties=[\"text\", \"title\"]\n",
+    ")\n",
+    "\n",
+    "# Prettify the output using Rich\n",
+    "console = Console()\n",
+    "\n",
+    "console.print(\n",
+    "    Panel(f\"{prompt}\".replace(\"{text}\", query), title=\"Prompt\", border_style=\"bold red\")\n",
+    ")\n",
+    "console.print(\n",
+    "    Panel(response.generated, title=\"Generated Content\", border_style=\"bold green\")\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "7tGz49nfUegG"
+   },
+   "source": [
+    "We can see that our RAG pipeline performs relatively well for simple queries, especially given the small size of the dataset. Scaling this method for converting a larger sample of PDFs would require more compute (GPUs) and a more advanced deployment of Weaviate (like Docker, Kubernetes, or Weaviate Cloud). For more information on available Weaviate configurations, check out the [documetation](https://weaviate.io/developers/weaviate/starter-guides/which-weaviate)."
+   ]
+  }
+ ],
+ "metadata": {
+  "accelerator": "GPU",
+  "colab": {
+   "gpuType": "T4",
+   "provenance": []
+  },
+  "kernelspec": {
+   "display_name": "Python 3",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.11.10"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 0
+}
diff --git a/mkdocs.yml b/mkdocs.yml
index b673ab41..0428693c 100644
--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -82,6 +82,7 @@ nav:
       - "RAG with Haystack": examples/rag_haystack.ipynb
       - "RAG with LlamaIndex 🦙": examples/rag_llamaindex.ipynb
       - "RAG with LangChain 🦜🔗": examples/rag_langchain.ipynb
+      - "RAG with Weaviate": examples/rag_weaviate.ipynb
       - "Hybrid RAG with Qdrant": examples/hybrid_rag_qdrant.ipynb
   - Integrations:
     - Integrations: integrations/index.md