Name	Name	Last commit message	Last commit date
parent directory ..
images	images
README.md	README.md
app.py	app.py
art.csv	art.csv
cctv.csv	cctv.csv
medical.csv	medical.csv
retail.csv	retail.csv

VLM Stress Testing Web Application

This repository contains a Streamlit web application designed to stress test Vision-Language Models (VLMs). The application allows users to add and compare multiple VLMs for multi-modal inferences, facilitating the creation of datasets for question-answering (QA) tasks and beyond.

Overview

The VLM Stress Testing Web Application enables users to:

Upload images and input queries to test different VLMs.
Compare responses, latencies, and token usage across models.
Select the best-performing model based on the output and save results to a CSV file.

The tool is useful for evaluating multiple VLMs in a unified interface, enabling insights into model performance for multi-modal question-answering tasks.

Tools Used:

Streamlit: A fast, simple UI for interacting with the VLM models.
TuneAPI: A proxy API to connect and interact with various VLMs like Llama 3.2, Qwen 2 VL, and GPT 4o.
Pandas: For managing and saving results to CSV files.
ColBERT: Used in conjunction with retrieval models to retrieve relevant context (if applicable).

Models Available for Testing:

Llama 3.2: meta/llama-3.2-90b-vision
Qwen 2 VL: qwen/qwen-2-vl-72b
GPT 4o: openai/gpt-4o

Features:

Multi-modal Input: Upload images and ask natural language questions to test VLM models.
Model Comparison: Compare multiple VLMs based on response quality, latency, and token usage.
Dynamic Output: View responses side-by-side and analyze model metrics.
CSV Logging: Save the best-performing model's results to a CSV file for further analysis.

Steps to Run

Clone the Repository Clone this repository to your local machine:

git clone https://github.com/aryankargwal/genai-tutorials.git
cd genai-tutorials/vlm-comparison

Install Dependencies Install the required dependencies from the requirements.txt file:
```
pip install -r requirements.txt
```
Set Up API Keys Export your TuneAPI key to connect to the VLM models:
```
export API_KEY="your_api_key_here"
```
Run the Application Run the Streamlit app to start stress testing VLMs:
```
streamlit run app.py
```
Upload Images & Input Questions
- Upload an image (JPG, JPEG, or PNG).
- Enter a question for the VLMs to answer.
- Select two models to compare.
- View and compare model responses, latencies, and token counts.
Save Results Once the responses are generated, select the best-performing model and save the result to a CSV file.

How It Works

Image Upload & Encoding: Users upload an image, which is encoded to base64 for model input.
Model Querying: The app queries selected models with the image and question. Each model processes the image and generates a response.
Latency Tracking: The app measures and displays the latency for each model's response.
Token Count: The app calculates and shows the token count for each model's generated output.
Result Logging: After selecting the best model, the app saves the responses, latencies, and token counts to a CSV file for further analysis.

Detailed Features

1. Multi-modal Inference:

Users can perform inference using image inputs combined with text-based questions to test how well VLMs handle multi-modal reasoning.

2. Model Response Comparison:

Compare two models side-by-side in terms of:

Response Quality: Generated answer to the user-provided question.
Latency: Time taken by each model to generate a response.
Token Count: Number of tokens generated by each model, useful for understanding efficiency.

3. Saving Results:

Users can log their model comparison results, including the selected best model, to a CSV file with the following information:

Image path
Question
Responses from both models
Latency and token count for each model
The selected best model

Future Work

Enhanced Dataset Creation: Expand support to automatically generate datasets from the saved model outputs.
Fine-Tuning Scripts: Add scripts for fine-tuning models based on user data or custom datasets.
Additional VLM Support: Include more Vision-Language Models to extend comparison options.

License

This project is licensed under the Apache 2.0 License. See the full license here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vlm-comparison

vlm-comparison

README.md

VLM Stress Testing Web Application

Overview

Tools Used:

Models Available for Testing:

Features:

Steps to Run

How It Works

Detailed Features

1. Multi-modal Inference:

2. Model Response Comparison:

3. Saving Results:

Future Work

License

Files

vlm-comparison

Directory actions

More options

Directory actions

More options

Latest commit

History

vlm-comparison

Folders and files

parent directory

README.md

VLM Stress Testing Web Application

Overview

Tools Used:

Models Available for Testing:

Features:

Steps to Run

How It Works

Detailed Features

1. Multi-modal Inference:

2. Model Response Comparison:

3. Saving Results:

Future Work

License