[2024.10.12] Release the inference codes of Opt-Visor.
-
Python == 3.10.12
-
CUDA Version == 12.4
pip install -r requirements.txt
Model Name | Visual Encoder | Language Decoder | # Training Frames | Tokens per Frame |
---|---|---|---|---|
Opt-Visor-120frame-49token-Qwen2-7B | siglip-so400m-patch14-384 | Qwen2-7B | 120 | 49 |
Run the following command to get the response of an instruction:
python inference.py \
--model_path /path/to/Opt-Visor \
--gpu_id 0 \
--video_path /path/to/your/video \
--question "Please describe the video indetail."
- Release the inference code.
- Release the model.