python run_gradio.py
- The demo also defaults to the FLUX.1-schnell model. To switch to the FLUX.1-dev model, use
-m dev
. - By default, the Gemma-2B model is loaded as a safety checker. To disable this feature and save GPU memory, use
--no-safety-checker
. - To further reduce GPU memory usage, you can enable the W4A16 text encoder by specifying
--use-qencoder
. - By default, only the INT4 DiT is loaded. Use
-p int4 bf16
to add a BF16 DiT for side-by-side comparison, or-p bf16
to load only the BF16 model.
We provide a script, generate.py, that generates an image from a text prompt directly from the command line, similar to the demo. Simply run:
python generate.py --prompt "You Text Prompt"
-
The generated image will be saved as
output.png
by default. You can specify a different path using the-o
or--output-path
options. -
The script defaults to using the FLUX.1-schnell model. To switch to the FLUX.1-dev model, use
-m dev
. -
By default, the script uses our INT4 model. To use the BF16 model instead, specify
-p bf16
. -
You can specify
--use-qencoder
to use our W4A16 text encoder. -
You can adjust the number of inference steps and guidance scale with
-t
and-g
, respectively. For the FLUX.1-schnell model, the defaults are 4 steps and a guidance scale of 0; for the FLUX.1-dev model, the defaults are 50 steps and a guidance scale of 3.5. -
When using the FLUX.1-dev model, you also have the option to load a LoRA adapter with
--lora-name
. Available choices areNone
,Anime
,GHIBSKY Illustration
,Realism
,Children Sketch
, andYarn Art
, with the default set toNone
. You can also specify the LoRA weight with--lora-weight
, which defaults to 1.
To measure the latency of our INT4 models, use the following command:
python latency.py
- The script defaults to the INT4 FLUX.1-schnell model. To switch to FLUX.1-dev, use the
-m dev
option. For BF16 precision, add-p bf16
. - Adjust the number of inference steps and the guidance scale using
-t
and-g
, respectively.- For FLUX.1-schnell, the defaults are 4 steps and a guidance scale of 0.
- For FLUX.1-dev, the defaults are 50 steps and a guidance scale of 3.5.
- By default, the script measures the end-to-end latency for generating a single image. To measure the latency of a single DiT forward step instead, use the
--mode step
flag. - Specify the number of warmup and test runs using
--warmup_times
and--test_times
. The defaults are 2 warmup runs and 10 test runs.
Below are the steps to reproduce the quality metrics reported in our paper. Firstly, you will need to install slightly more packages for the image quality metrics:
pip install clean-fid torchmetrics image-reward clip datasets
Then generate images using both the original BF16 model and our INT4 model on the MJHQ and DCI datasets:
python evaluate.py -p int4
python evaluate.py -p bf16
- The commands above will generate images from FLUX.1-schnell on both datasets. Use
-m dev
to switch to FLUX.1-dev, or specify a single dataset with-d MJHQ
or-d DCI
. - By default, generated images are saved to
results/$MODEL/$PRECISION
. Customize the output path using the-o
option if desired. - You can also adjust the number of inference steps and the guidance scale using
-t
and-g
, respectively.- For FLUX.1-schnell, the defaults are 4 steps and a guidance scale of 0.
- For FLUX.1-dev, the defaults are 50 steps and a guidance scale of 3.5.
- To accelerate the generation process, you can distribute the workload across multiple GPUs. For instance, if you have
$N$ GPUs, on GPU$i (0 \le i < N)$ , you can add the options--chunk-start $i --chunk-step $N
. This setup ensures each GPU handles a distinct portion of the workload, enhancing overall efficiency.
Finally you can compute the metrics for the images with
python get_metrics.py results/schnell/int4 results/schnell/bf16
Remember to replace the example paths with the actual paths to your image folders.
Notes:
- The script will calculate quality metrics (CLIP IQA, CLIP Score, Image Reward, FID) only for the first folder specified. Ensure the INT4 results folder is listed first.
- Similarity Metrics: If a second folder path is not provided, similarity metrics (LPIPS, PSNR, SSIM) will be skipped.
- Output File: Metric results are saved in
metrics.json
by default. Use-o
to specify a custom output file if needed.