Multimodal Instruct

Instruction tuning dataset generation inspired by LLaVA-Instruct-158k via any LLM.

Goals

become independent of LLaVA-Instruct-158k which cannot be used commercially.
Add more datasets, like OpenImages to overcome limited object class range in Coco.
Add more modalities other than images.

Sponsoring data generation

I currently plan to create the following datasets:

equivalent to LLaVA-Instruct-158k on COCO dataset using Llama2 70b and Mixtral 8x7B
a more powerful instruction dataset on Open Images V7 including localized narratives, bounding boxes with metadata, image level labels and object relationships.
adding more data sources to COCO and Opent Images, specifically

These improved datasets will help multimodal LLM architectures like LLaVA which require pretraining, but even more so architectures like LaVIN which only have instruction tuning steps.

Work in progress

Datasets and Prompt Configs

To generate a multimodal instruction dataset:

pick a dataset
pick or set up a prompt config

Datasets

Available datasets are

Source.COCO2014 and Source.COCO2017
- COCO has been used to generate the LLaVA-Instruct-158k dataset.
- Provides the following data for instruction dataset generation:
  - captions: 5 sentences by different annotators describing the image
  - object bounding boxes in the format category_name: [min_x, min_y, max_x, max_y]
Source.OPENIMAGESV7
- Provides the following data for instruction dataset generation:
  - captions: narratives from voice recordings of annotators describing the image in one or more sentences.
  - object bounding boxes in the format category_name: [min_x, min_y, max_x, max_y] [confidence, is_occluded, is_truncated, is_group_of, is_depiction, is_inside]

Usage Examples

Generate dataset with huggingface chat model

python generate.py COCO2014 --model_source huggingface --model meta-llama/Llama-2-7b-chat-hf

Generate dataset with adjusted prompts for sub 4096 token context model

python generate.py COCO2014 --model_source huggingface --model TinyLlama/TinyLlama-1.1B-Chat-v1.0 --prompt_config prompt_config_llava_smallcontext.yaml

Generate dataset with llama.cpp gguf model

python generate.py COCO2014 --model_source llama.cpp --model ./PATH/TO/MODEL.gguf

Generate dataset with OpenAI API

python generate.py COCO2014 --model_source openai --model gpt-3.5-turbo

Generate dataset with custom OpenAI API endpoint

python generate.py COCO2014 --model_source openai --model mymodel --openai_base_url BASE_URL

Generate OpenImages dataset

python generate.py OPENIMAGESV7 --prompt_config prompt_config_openimagesv7.yaml ...

Notes

Huggingface chat models: only supports models with chat templates in tokenizer_config.json

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.github		.github
cache		cache
conversation		conversation
dataset		dataset
thirdparty		thirdparty
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
data_definitions.py		data_definitions.py
enum_definitions.py		enum_definitions.py
generate.py		generate.py
prompt_config_llava.yaml		prompt_config_llava.yaml
prompt_config_llava_smallcontext.yaml		prompt_config_llava_smallcontext.yaml
prompt_config_openimagesv7.yaml		prompt_config_openimagesv7.yaml
requirements.txt		requirements.txt
sample_generation.py		sample_generation.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multimodal Instruct

Goals

Sponsoring data generation

Work in progress

Datasets and Prompt Configs

Datasets

Usage Examples

Generate dataset with huggingface chat model

Generate dataset with adjusted prompts for sub 4096 token context model

Generate dataset with llama.cpp gguf model

Generate dataset with OpenAI API

Generate dataset with custom OpenAI API endpoint

Generate OpenImages dataset

Notes

About

Releases

Sponsor this project

Packages

Languages

License

RobertBiehl/multimodal-instruct

Folders and files

Latest commit

History

Repository files navigation

Multimodal Instruct

Goals

Sponsoring data generation

Work in progress

Datasets and Prompt Configs

Datasets

Usage Examples

Generate dataset with huggingface chat model

Generate dataset with adjusted prompts for sub 4096 token context model

Generate dataset with llama.cpp gguf model

Generate dataset with OpenAI API

Generate dataset with custom OpenAI API endpoint

Generate OpenImages dataset

Notes

About

Resources

License

Stars

Watchers

Forks

Releases

Sponsor this project

Packages 0

Languages

Packages