Before starting to install the LLM Twin project, make sure you have installed the following dependencies on your local system:
Tool | Version | Purpose | Installation Link |
---|---|---|---|
Python | 3.11 | Runtime environment | Download |
Poetry | ≥1.8.4 | Package management | Install Guide |
GNU Make | ≥3.81 | Build automation | Install Guide |
Docker | ≥27.0.3 | Containerization | Install Guide |
AWS CLI | ≥2.18.5 | Cloud management | Install Guide |
The code also uses and depends on the following cloud services. For now, you don't have to do anything. We will guide you in the installation and deployment sections on how to use them:
Service | Purpose | Cost | Mandatory Credentials |
---|---|---|---|
HuggingFace | Model registry | Free | HUGGINGFACE_ACCESS_TOKEN |
Comet ML | Experiment tracker | Free tier available | COMET_API_KEY COMET_WORKSPACE |
Opik | Prompt monitoring & LLM Evaluation | Free tier available | COMET_API_KEY COMET_WORKSPACE |
OpenAI API | LLM API for recommender system | Pay-per-use | OPENAI_API_KEY |
MongoDB | NoSQL database | Free tier available | - |
Qdrant | Vector database | Free tier available | - |
AWS | Compute and storage | Pay-per-use | AWS_ARN_ROLE AWS_REGION AWS_ACCESS_KEY AWS_SECRET_KEY |
Important
Also, you must manually ask for access on Hugging Face for the gated LLM meta-llama/Llama-3.1-8B
we use throughout the course. It's free. Do it here.
We will use GNU Make
to install and run our application.
To see all our supported commands, run the following:
make help
All the sensitive credentials are placed in a .env
file that will always sit at the root of your directory, at the same level with the .env.example
file.
Go to the root of the repository and copy our .env.example
file as follows:
cp .env.example .env
Now fill it with your credentials, following the suggestions from the next section.
Now, let's understand how to fill in all the essential variables within the .env
file to get you started. The following are the mandatory settings we must complete when working locally:
To authenticate to OpenAI's API, you must fill out the OPENAI_API_KEY
env var with an authentication token.
OPENAI_API_KEY=your_api_key_here
→ Check out this tutorial to learn how to provide one from OpenAI.
To authenticate to Hugging Face, you must fill out the HUGGINGFACE_ACCESS_TOKEN
env var with an authentication token.
HUGGINGFACE_ACCESS_TOKEN=your_token_here
→ Check out this tutorial to learn how to provide one from Hugging Face.
To authenticate to Comet ML (required only during training) and Opik (uses the same Comet credentials), you must fill out the COMET_API_KEY
and COMET_WORKSPACE
env vars with your authentication token and workspace name.
COMET_API_KEY=your_api_key_here
COMET_WORKSPACE=your_workspace_name_here
→ Check out this tutorial to learn how to get the Comet ML variables from above. You can also access Opik's dashboard using 🔗this link.
Required only for fine-tuning and inference, which we will show how to set up later in the document.
Optional, only if you want to use Qdrant cloud. Otherwise, you can complete the course using the local version of Qdrant.
You can create a Python virtual environment and install all the necessary dependencies using Poetry, by running:
make install
Important
You need Python 3.11 installed! You can either install it globally or install pyenv to manage multiple Python dependencies. The .python-version
file will signal to pyenv
what Python version it needs to use in this particular project.
After installing the dependencies into the Poetry virtual environment, you can activate your virtual environment into your current CLI by running:
poetry shell
We support running the entire data infrastructure (crawling, CDC, MongoDB, and Qdrant) through Docker. Thus, with a few commands you can quickly populate the data warehouse and vector DB with relevant data to test out the RAG, training, and inference parts of the course.
You can start all the required Docker containers, by running:
make local-start
It will take a while to run until all the Docker images are pulled or built.
Behind the scenes it will build and run all the Docker images defined in the docker-compose.yml file.
Caution
For MongoDB
to work with multiple replicas (as we use it in our Docker setup) on macOS
or Linux
systems, you have to add the following lines of code to /etc/hosts
:
127.0.0.1 mongo1
127.0.0.1 mongo2
127.0.0.1 mongo3
From what we know, on Windows
, it works out-of-the-box
. For more details, check out this article
Warning
For arm
users (e.g., M macOS devices
), go to your Docker desktop application and enable Use Rosetta for x86_64/amd64 emulation on Apple Silicon
from the Settings. There is a checkbox you have to check.
Otherwise, your Docker containers will crash.
Run the following Make
command to tear down all your docker containers:
make local-stop
Now that we have configured our credentials, local environment and Docker infrastructure let's look at how to run an end-to-end flow of the LLM Twin course.
Important
Note that we won't go into the details of the system here. To fully understand it, check out our free lessons, which explains everything step-by-step: LLM Twin articles series.
Trigger the crawler to collect data and add it to the MongoDB:
make local-test-medium
# or make local-test-github
You should get a response with a 200
status code, as follows:
{"statusCode": 200, "body": "Link processed successfully"}
After running the command, this will happen:
- it will crawl a Medium/GitHub link
- process and add the data to MongoDB
- the CDC component will be triggered, which will populate the RabbitMQ with the event
- the RAG feature pipeline will read the event from RabbitMQ, process it for RAG, and add it to the Qdrant vector DB
You can check the logs from the crawler Docker image, by running:
docker logs llm-twin-data-crawlers
You should see something similar to:
{"level":"INFO","location":"extract:53","message":"Finished scrapping custom article: https://medium.com/decodingml/an-end-to-end-framework-for-production-ready-llm-systems-by-building-your-llm-twin-2cc6bb01141f","timestamp":"2024-12-25 17:13:33,630+0000","service":"llm-twin-course/crawler"}
The previous step actually called both the crawling and RAG feature engineering pipeline. But now, let's check that everything worked as expected.
Thus, let's check that the feature pipeline works and the vector DB is successfully populated.
To do so, check the logs of the llm-twin-feature-pipeline
Docker container by running:
docker logs llm-twin-feature-pipeline
You should see something similar to:
2024-12-25 16:53:45 [info ] Cleaned content chunked successfully. cls=data_logic.dispatchers data_type=repositories num=955
2024-12-25 16:53:45 [info ] Chunk embedded successfully. cls=data_logic.dispatchers data_type=repositories embedding_len=384
2024-12-25 16:53:45 [info ] Chunk embedded successfully. cls=data_logic.dispatchers data_type=repositories embedding_len=384
Also, you can check the logs of the CDC and RabbitMQ Docker containers, by running:
docker logs llm-twin-data-cdc # CDC service
docker logs llm-twin-mq # RabbitMQ
You should see logs reflecting the cleaning, chunking, and embedding operations (without any errors, of course).
To check that the Qdrant vector DB
is populated successfully, go to its dashboard by typing in your browser: localhost:6333/dashboard. There, you should see the repositories or article collections created and populated, similar to the image below:
Note
If using the cloud version of Qdrant, go to your Qdrant account and cluster to see the same thing as in the local dashboard.
To populate MongoDB and VectorDB with ~50 links, run the following command (but first make sure that everything runs smooth):
make local-ingest-data
Note: To crawl different links, you can change the data/links.txt
file.
Now that our Qdrant vector DB is populated with our data let's test out the RAG retrieval module to see that everything works fine by retrieving some items using a dummy query:
make local-test-retriever
Important
Before running this command, check Qdrant's dashboard to ensure your vector DB is populated with data. Otherwise, the retriever will not have any data to work with.
The last step before fine-tuning is to generate an instruct dataset and track it as an artifact with Comet ML. To do so, run:
make local-generate-instruct-dataset
Now go to Comet ML, then to your workspace, and open the Artifacts
tab. There, you should find three artifacts as follows:
articles-instruct-dataset
posts-instruct-dataset
repositories-instruct-dataset
To download the instruct dataset from Comet, run:
make download-instruct-dataset
Note
We also publicly provide our own artifacts that you can use to fine-tune your LLMs in case you got stuck or don't want to generate them yourself:
For your AWS set-up to work correctly, you need the AWS CLI installed on your local machine and properly configured with an admin user (or a user with enough permissions to create new SageMaker, ECR and S3 resources; using an admin user will make everything more straightforward).
With the same configuration used to set up your AWS CLI, also fill in the following environment variables from your .env
file:
AWS_REGION=eu-central-1
AWS_ACCESS_KEY=str
AWS_SECRET_KEY=str
AWS credentials are typically stored in ~/.aws/credentials
. You can view this file directly using cat
or similar commands:
cat ~/.aws/credentials
The next step is to create an IAM execution role used by AWS SageMaker to access other AWS resources. This is standard practice when working with SageMaker.
To automatically create it, run:
make create-sagemaker-execution-role
The script will generate a file found at src/sagemaker_execution_role.json
. Open it, copy the value under the RoleArn
key from the JSON file, and fill in the following env var from your .env
file:
AWS_ARN_ROLE=str
To conclude, by the end of this section you should have filled correctly the following environment variables in your .env
file:
AWS_ARN_ROLE=str
AWS_REGION=eu-central-1
AWS_ACCESS_KEY=str
AWS_SECRET_KEY=str
Now, we can move on to the fine-tunine and inference pipelines, which use AWS SagaMaker.
Important
Note that we use ml.g5.2xlarge
EC2 instances to run AWS SageMaker (which are already set in our settings), which cost ~$2 / hour
(depending on your region). Our tests will take only ~2-3 hours at maximum. Thus, this won't get expensive. Just run our clean-up resources scripts after you finish testing our app: make delete-inference-pipeline-deployment
First, go to our base model, which we will use for fine-tuning on Hugging Face -> meta-llama/Llama-3.1-8B and get access to use it. Everything is free.
Next, after setting up everything necessary for AWS SageMaker and Llama-3.1-8B, kicking off the training in dummy mode is as easy as running the following (dummy mode will reduce the dataset size and epochs to quickly see that everything works fine, reducing the running time <30 minutes):
make start-training-pipeline-dummy-mode
To kick off the full training, run (which will take ~2-3 hours, using our dataset and preconfigured settings):
make start-training-pipeline
Go to your Hugging Face account, and under the Models section, you will see your {your_name}/LLMTwin-Llama-3.1-8B
model. To run the evaluation and inference using your fine-tuned LLM, add your Hugging Face model ID to your .env
file, such as MODEL_ID=pauliusztin/LLMTwin-Llama-3.1-8B
.
Note
You can check out the deployment progress in the AWS console in the SageMaker dashboard.
Warning
If you get any Service Quotas
errors, you must increase your AWS quotas for ml.g5.2xlarge
instances (which is free). More exactly, you have to go to your AWS account -> Service Quatas -> AWS services -> search SageMaker
-> search ml.g5.2xlarge
, then increase the quotas to 1 for ml.g5.2xlarge for training job usage
(training jobs) and ml.g5.2xlarge for endpoint usage
(inference jobs). More details on changing service quotas are in this article.
After you have finetuned your LLM, you can start the LLM evaluation pipeline by running:
make evaluate-llm
To start the RAG evaluation pipeline, run:
make evaluate-rag
Next, check the evaluation datasets and experiment results in Opik's Dashboard.
After you have finetuned and evaluated your model, the first step is to deploy the LLM to AWS SageMaker as a REST API service:
make deploy-inference-pipeline
Note
You can check out the deployment progress in the AWS console in the SageMaker dashboard.
After the deployment is finished (it will take a few minutes), you can call it with a test prompt by running:
make call-inference-pipeline
After testing the inference pipeline from the CLI, you can start playing around with the LLM Twin from our GUI, by running:
make local-start-ui
Now you can access the GUI at http://localhost:7860 and start asking the LLM Twin to generate some content for you, such as: "Draft a post about RAG systems." as in the example below:
After playing around with the model, you will start collecting and monitoring your prompts which you can visualize again in Opik's Dashboard.
Also, you can kick off the monitoring LLM evaluation pipeline by running:
make evaluate-llm-monitoring
Ultimately, after testing the inference pipeline, you can delete the AWS SageMaker deployment, by running:
make delete-inference-pipeline-deployment
Warning
Clear your AWS resources to avoid any unexpected costs. Running make delete-inference-pipeline-deployment
should clear everything, but we recommend double-checking your AWS SageMaker dashboard manually to ensure everything is shut down in case of edge cases.