QA-with-RAG is a free, containerised question-answer framework that allows you to ask questions about your documents and get accurate answers.
This app uses a method called retrieval-augmented generation (RAG) to retrieve information that is relevant to your question from your uploaded documents. It then uses a large language model (LLM) to answer the question with the retrieved context.
The current implementation uses the following components:
- Language Models: Google Gemma2 (2B) and Microsoft Phi 3 Mini (3.8B)
- Embedding Model: all-MiniLM-L6-v2
- Vector Database: Faiss DB
- Frontend: Streamlit
- Maintenance: Docker
-
Install poetry on your machine
-
Creat a virtual environment and install the dependencies specified in the
pyproject.toml
file by running
poetry install
- Run the project using the command
docker compose up
Note: The first time you run this, it might take a while to build the image and download the embedding model.
- The UI (as showed in the snapshot in the Preview section) should open up in your default browser running on the
port 8501
.
I. Select the directory containing your pdf file(s).
II. Type your question
III. Choose a language model used for final inference
VI. Choose the number of retrieved chunks from the database. The higher this number, the more complex your result may be.
V. Run and enjoy!