Welcome to the experimental stable diffusion inference repository. This repository aims to test and evaluate the model as a new benchmark candidate for MLPerf inference .
To set up the environment, build and launch the container using the following commands:
docker build . -t sd_mlperf_inference
docker run --rm -it --gpus=all -v {PWD}:/workspace sd_mlperf_inference bash
Note : The subsequent commands are assumed to be executed within this container.
This repository leverages the coco-2014
validation set for image generation and computation of FID and CLIP scores. COCO (Common Objects in Context) is a diverse dataset instrumental for object detection, segmentation, and captioning tasks. It boasts a substantial validation set with over 40,000 images encompassing more than 200,000 labels.
For benchmarking purposes, we utilize a random subset of {TBD} images and their associated labels, determined by a preset seed of {TBD}. As the focus is on image generation from labels and subsequent score calculation, downloading the entire COCO dataset is unnecessary. The required files for the benchmark, already part of the repository:
captions.tsv
: Contains processed coco2014 validation annotations with 40,504 prompts and their respective IDs. Essential for image generation and CLIP scoring.val2014.npz
: Consists of precomputed statistics of the coco2014 validation set. Utilized for FID scoring.
For details on file generation, refer to Appendix A .
Execute the main.py
script to generate images:
python main.py \
--model-id 2 \ # xl for SD-XL, xlr for SD-XL + Refiner
--guidance 8.0 \
--precision fp16 \ # fp16, bf16 and fp32
--scheduler ddim \
--steps 50
For additional execution options:
python main.py --help
python fid/fid_score.py \
--batch-size 1 \ # batch size for the inception network. keep it 1.
--subset-size 35000 \ # validation subset size, if you want to score the full dataset don't set the argument
--shuffle-seed 2023 \ # the seed used for random the random subset selection
./val2014.npz \ # ground truth (coco 2014 validation) statistics
./output # folder with the generated images
For more options:
python fid/fid_score.py --help
python clip/clip_score.py \
--subset-size 35000 \ # validation subset size, if you want to score the full dataset don't set the argument
--shuffle-seed 2023 \ # the seed used for random the random subset selection
--tsv-file captions.tsv \ # captions file
--image-folder ./output # Folder with the generated images
--device cuda # Device in which CLIP model is run (cpu, cuda)
For more options:
python clip/clip_score.py --help
To create the captions.tsv
and val2014.npz
files:
- Download the coco2014 validation set:
scripts/coco-2014-validation-download.sh
- Process the downloaded annotations (provided in JSON format):
python process-coco-annotations.py \
--input-captions-file {PATH_TO_COCO_ANNOTATIONS_FILE} \ # Input annotations file
--output-tsv-file captions.tsv \ # Output annotations
--allow-duplicate-images # Pick one prompt per image
- Generate ground truth statistics:
python fid/fid_score.py \
--batch-size 1 \ # inception network batch size
--save-stats {COCO_2014_IMAGES} \ # Input folder with coco images
val2014.npz # Output file