This repository contains examples of handlers, requirements.txt to host a customized Llama.
Create a conda environment if you don't have one.
conda create --name hf_inference python=3.10
Follow the instructions at https://www.philschmid.de/custom-inference-handler
CUDA_VISIBLE_DEVICES="1" python -m test_handler