Skip to content

Latest commit

 

History

History

vlm-comparison

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 

VLM Stress Testing Web Application

This repository contains a Streamlit web application designed to stress test Vision-Language Models (VLMs). The application allows users to add and compare multiple VLMs for multi-modal inferences, facilitating the creation of datasets for question-answering (QA) tasks and beyond.

Overview

The VLM Stress Testing Web Application enables users to:

  • Upload images and input queries to test different VLMs.
  • Compare responses, latencies, and token usage across models.
  • Select the best-performing model based on the output and save results to a CSV file.

The tool is useful for evaluating multiple VLMs in a unified interface, enabling insights into model performance for multi-modal question-answering tasks.

Tools Used:

  • Streamlit: A fast, simple UI for interacting with the VLM models.
  • TuneAPI: A proxy API to connect and interact with various VLMs like Llama 3.2, Qwen 2 VL, and GPT 4o.
  • Pandas: For managing and saving results to CSV files.
  • ColBERT: Used in conjunction with retrieval models to retrieve relevant context (if applicable).

Models Available for Testing:

  1. Llama 3.2: meta/llama-3.2-90b-vision
  2. Qwen 2 VL: qwen/qwen-2-vl-72b
  3. GPT 4o: openai/gpt-4o

Features:

  • Multi-modal Input: Upload images and ask natural language questions to test VLM models.
  • Model Comparison: Compare multiple VLMs based on response quality, latency, and token usage.
  • Dynamic Output: View responses side-by-side and analyze model metrics.
  • CSV Logging: Save the best-performing model's results to a CSV file for further analysis.

Steps to Run

  1. Clone the Repository Clone this repository to your local machine:

    git clone https://github.com/aryankargwal/genai-tutorials.git
    cd genai-tutorials/vlm-comparison
  2. Install Dependencies Install the required dependencies from the requirements.txt file:

    pip install -r requirements.txt
  3. Set Up API Keys Export your TuneAPI key to connect to the VLM models:

    export API_KEY="your_api_key_here"
  4. Run the Application Run the Streamlit app to start stress testing VLMs:

    streamlit run app.py
  5. Upload Images & Input Questions

    • Upload an image (JPG, JPEG, or PNG).
    • Enter a question for the VLMs to answer.
    • Select two models to compare.
    • View and compare model responses, latencies, and token counts.
  6. Save Results Once the responses are generated, select the best-performing model and save the result to a CSV file.

How It Works

  • Image Upload & Encoding: Users upload an image, which is encoded to base64 for model input.
  • Model Querying: The app queries selected models with the image and question. Each model processes the image and generates a response.
  • Latency Tracking: The app measures and displays the latency for each model's response.
  • Token Count: The app calculates and shows the token count for each model's generated output.
  • Result Logging: After selecting the best model, the app saves the responses, latencies, and token counts to a CSV file for further analysis.

Detailed Features

1. Multi-modal Inference:

Users can perform inference using image inputs combined with text-based questions to test how well VLMs handle multi-modal reasoning.

2. Model Response Comparison:

Compare two models side-by-side in terms of:

  • Response Quality: Generated answer to the user-provided question.
  • Latency: Time taken by each model to generate a response.
  • Token Count: Number of tokens generated by each model, useful for understanding efficiency.

3. Saving Results:

Users can log their model comparison results, including the selected best model, to a CSV file with the following information:

  • Image path
  • Question
  • Responses from both models
  • Latency and token count for each model
  • The selected best model

Future Work

  • Enhanced Dataset Creation: Expand support to automatically generate datasets from the saved model outputs.
  • Fine-Tuning Scripts: Add scripts for fine-tuning models based on user data or custom datasets.
  • Additional VLM Support: Include more Vision-Language Models to extend comparison options.

License

This project is licensed under the Apache 2.0 License. See the full license here.