Skip to content
forked from Cinnamon/kotaemon

An open-source RAG-based tool for chatting with your documents.

License

Notifications You must be signed in to change notification settings

rndaorg/kotaemon

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

kotaemon

An open-source clean & customizable RAG UI for chatting with your documents. Built with both end users and developers in mind.

Preview

Live Demo | Source Code

User Guide | Developer Guide | Feedback

Python 3.10+ Code style: black docker pull ghcr.io/cinnamon/kotaemon:latest download Featured|HelloGitHub

Cinnamon%2Fkotaemon | Trendshift

Introduction

This project serves as a functional RAG UI for both end users who want to do QA on their documents and developers who want to build their own RAG pipeline.

  • For end users:
    • A clean & minimalistic UI for RAG-based QA.
    • Supports LLM API providers (OpenAI, AzureOpenAI, Cohere, etc) and local LLMs (via ollama and llama-cpp-python).
    • Easy installation scripts.
  • For developers:
    • A framework for building your own RAG-based document QA pipeline.
    • Customize and see your RAG pipeline in action with the provided UI (built with Gradio ).
    • If you use Gradio for development, check out our theme here: kotaemon-gradio-theme.
+----------------------------------------------------------------------------+
| End users: Those who use apps built with `kotaemon`.                       |
| (You use an app like the one in the demo above)                            |
|     +----------------------------------------------------------------+     |
|     | Developers: Those who built with `kotaemon`.                   |     |
|     | (You have `import kotaemon` somewhere in your project)         |     |
|     |     +----------------------------------------------------+     |     |
|     |     | Contributors: Those who make `kotaemon` better.    |     |     |
|     |     | (You make PR to this repo)                         |     |     |
|     |     +----------------------------------------------------+     |     |
|     +----------------------------------------------------------------+     |
+----------------------------------------------------------------------------+

This repository is under active development. Feedback, issues, and PRs are highly appreciated.

Key Features

  • Host your own document QA (RAG) web-UI. Support multi-user login, organize your files in private / public collections, collaborate and share your favorite chat with others.

  • Organize your LLM & Embedding models. Support both local LLMs & popular API providers (OpenAI, Azure, Ollama, Groq).

  • Hybrid RAG pipeline. Sane default RAG pipeline with hybrid (full-text & vector) retriever + re-ranking to ensure best retrieval quality.

  • Multi-modal QA support. Perform Question Answering on multiple documents with figures & tables support. Support multi-modal document parsing (selectable options on UI).

  • Advance citations with document preview. By default the system will provide detailed citations to ensure the correctness of LLM answers. View your citations (incl. relevant score) directly in the in-browser PDF viewer with highlights. Warning when retrieval pipeline return low relevant articles.

  • Support complex reasoning methods. Use question decomposition to answer your complex / multi-hop question. Support agent-based reasoning with ReAct, ReWOO and other agents.

  • Configurable settings UI. You can adjust most important aspects of retrieval & generation process on the UI (incl. prompts).

  • Extensible. Being built on Gradio, you are free to customize / add any UI elements as you like. Also, we aim to support multiple strategies for document indexing & retrieval. GraphRAG indexing pipeline is provided as an example.

Preview

Installation

For end users

This document is intended for developers. If you just want to install and use the app as it is, please follow the non-technical User Guide. Use the most recent release .zip to include latest features and bug-fixes.

For developers

System requirements

  1. Python >=3.10
  2. (optional) Docker
If you would like to process files other than .pdf, .html, .mhtml, and .xlsx documents

You will need to install the system dependencies of unstructured. The installations vary by operating system, so please go to the link and follow the instructions there.

With Docker (recommended)

We support both lite & full version of Docker images. With full, the extra packages of unstructured will be installed as well, it can support additional file types (.doc, .docx, ...) but the cost is larger docker image size. For most users, the lite image should work well in most cases.

  • To use the lite version.
docker run \
-e GRADIO_SERVER_NAME=0.0.0.0 \
-e GRADIO_SERVER_PORT=7860 \
-p 7860:7860 -it --rm \
ghcr.io/cinnamon/kotaemon:main-lite
  • To use the full version.
docker run \
-e GRADIO_SERVER_NAME=0.0.0.0 \
-e GRADIO_SERVER_PORT=7860 \
-p 7860:7860 -it --rm \
ghcr.io/cinnamon/kotaemon:main-full

Currently, two platforms: linux/amd64 and linux/arm64 (for newer Mac) are provided & tested. User can specify the platform by passing --platform in the docker run command. For example:

# To run docker with platform linux/arm64
docker run \
-e GRADIO_SERVER_NAME=0.0.0.0 \
-e GRADIO_SERVER_PORT=7860 \
-p 7860:7860 -it --rm \
--platform linux/arm64 \
ghcr.io/cinnamon/kotaemon:main-lite

If everything is set up fine, navigate to http://localhost:7860/ to access the web UI.

We use GHCR to store docker images, all images can be found here.

Without Docker

  • Clone and install required packages on a fresh python environment.
# optional (setup env)
conda create -n kotaemon python=3.10
conda activate kotaemon

# clone this repo
git clone https://github.com/Cinnamon/kotaemon
cd kotaemon

pip install -e "libs/kotaemon[all]"
pip install -e "libs/ktem"
  • Create a .env file in the root of this project. Use .env.example as a template

The .env file is there to serve use cases where users want to pre-config the models before starting up the app (e.g. deploy the app on HF hub). The file will only be used to populate the db once upon the first run, it will no longer be used in consequent runs.

  • (Optional) To enable in-browser PDF_JS viewer, download PDF_JS_DIST and extract it to libs/ktem/ktem/assets/prebuilt

pdf-setup

  • Start the web server:
python app.py

The app will be automatically launched in your browser.

Default username / password are: admin / admin. You can setup additional users directly on the UI.

Chat tab

  • Check the Resources tab and LLMs and Embeddings and ensure that your api_key value is set correctly from your .env. file. If it is not set, you can set it here.

Setup GraphRAG

Note

Currently GraphRAG feature only works with OpenAI or Ollama API.

  • [If you are not using Docker installation], install GraphRAG with pip install graphrag future
  • To use GraphRAG retriever feature, make sure to set GRAPHRAG_API_KEY environment variables (or in the .env file).
  • To use GraphRAG with local models (Ollama), set USE_CUSTOMIZED_GRAPHRAG_SETTING=true and tweak your settings in settings.yaml.example.

Setup local models (for local / private RAG)

See Local model setup.

Customize your application

By default, all application data are stored in ./ktem_app_data folder. You can backup or copy this folder to move your installation to a new machine.

For advance users or specific use-cases, you can customize those files:

  • flowsettings.py
  • .env

flowsettings.py

This file contains the configuration of your application. You can use the example here as the starting point.

Notable settings
# setup your preferred document store (with full-text search capabilities)
KH_DOCSTORE=(Elasticsearch | LanceDB | SimpleFileDocumentStore)

# setup your preferred vectorstore (for vector-based search)
KH_VECTORSTORE=(ChromaDB | LanceDB | InMemory | Qdrant)

# Enable / disable multimodal QA
KH_REASONINGS_USE_MULTIMODAL=True

# Setup your new reasoning pipeline or modify existing one.
KH_REASONINGS = [
    "ktem.reasoning.simple.FullQAPipeline",
    "ktem.reasoning.simple.FullDecomposeQAPipeline",
    "ktem.reasoning.react.ReactAgentPipeline",
    "ktem.reasoning.rewoo.RewooAgentPipeline",
]
)

.env

This file provides another way to configure your models and credentials.

Configure model via the .env file

Alternatively, you can configure the models via the .env file with the information needed to connect to the LLMs. This file is located in the folder of the application. If you don't see it, you can create one.

Currently, the following providers are supported:

OpenAI

In the .env file, set the OPENAI_API_KEY variable with your OpenAI API key in order to enable access to OpenAI's models. There are other variables that can be modified, please feel free to edit them to fit your case. Otherwise, the default parameter should work for most people.

OPENAI_API_BASE=https://api.openai.com/v1
OPENAI_API_KEY=<your OpenAI API key here>
OPENAI_CHAT_MODEL=gpt-3.5-turbo
OPENAI_EMBEDDINGS_MODEL=text-embedding-ada-002

Azure OpenAI

For OpenAI models via Azure platform, you need to provide your Azure endpoint and API key. Your might also need to provide your developments' name for the chat model and the embedding model depending on how you set up Azure development.

AZURE_OPENAI_ENDPOINT=
AZURE_OPENAI_API_KEY=
OPENAI_API_VERSION=2024-02-15-preview
AZURE_OPENAI_CHAT_DEPLOYMENT=gpt-35-turbo
AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT=text-embedding-ada-002

Local models

Using ollama OpenAI compatible server

Install ollama and start the application.

Pull your model (e.g):

ollama pull llama3.1:8b
ollama pull nomic-embed-text

Set the model names on web UI and make it as default.

Models

Using GGUF with llama-cpp-python

You can search and download a LLM to be ran locally from the Hugging Face Hub. Currently, these model formats are supported:

  • GGUF

You should choose a model whose size is less than your device's memory and should leave about 2 GB. For example, if you have 16 GB of RAM in total, of which 12 GB is available, then you should choose a model that takes up at most 10 GB of RAM. Bigger models tend to give better generation but also take more processing time.

Here are some recommendations and their size in memory:

Add a new LlamaCpp model with the provided model name on the web uI.

Adding your own RAG pipeline

Custom reasoning pipeline

First, check the default pipeline implementation in here. You can make quick adjustment to how the default QA pipeline work.

Next, if you feel comfortable adding new pipeline, add new .py implementation in libs/ktem/ktem/reasoning/ and later include it in flowssettings to enable it on the UI.

Custom indexing pipeline

Check sample implementation in libs/ktem/ktem/index/file/graph

(more instruction WIP).

Developer guide

Please refer to the Developer Guide for more details.

Star History

Star History Chart

About

An open-source RAG-based tool for chatting with your documents.

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 90.9%
  • HTML 3.6%
  • Shell 2.7%
  • Batchfile 1.6%
  • JavaScript 0.5%
  • CSS 0.4%
  • Other 0.3%