Skip to content

Latest commit

 

History

History
79 lines (56 loc) · 4.92 KB

README.md

File metadata and controls

79 lines (56 loc) · 4.92 KB

Nunchaku INT4 FLUX.1 Models

Text-to-Image Gradio Demo

python run_gradio.py
  • The demo also defaults to the FLUX.1-schnell model. To switch to the FLUX.1-dev model, use -m dev.
  • By default, the Gemma-2B model is loaded as a safety checker. To disable this feature and save GPU memory, use --no-safety-checker.
  • To further reduce GPU memory usage, you can enable the W4A16 text encoder by specifying --use-qencoder.
  • By default, only the INT4 DiT is loaded. Use -p int4 bf16 to add a BF16 DiT for side-by-side comparison, or -p bf16 to load only the BF16 model.

Command Line Inference

We provide a script, generate.py, that generates an image from a text prompt directly from the command line, similar to the demo. Simply run:

python generate.py --prompt "You Text Prompt"
  • The generated image will be saved as output.png by default. You can specify a different path using the -o or --output-path options.

  • The script defaults to using the FLUX.1-schnell model. To switch to the FLUX.1-dev model, use -m dev.

  • By default, the script uses our INT4 model. To use the BF16 model instead, specify -p bf16.

  • You can specify --use-qencoder to use our W4A16 text encoder.

  • You can adjust the number of inference steps and guidance scale with -t and -g, respectively. For the FLUX.1-schnell model, the defaults are 4 steps and a guidance scale of 0; for the FLUX.1-dev model, the defaults are 50 steps and a guidance scale of 3.5.

  • When using the FLUX.1-dev model, you also have the option to load a LoRA adapter with --lora-name. Available choices are None, Anime, GHIBSKY Illustration, Realism, Children Sketch, and Yarn Art, with the default set to None. You can also specify the LoRA weight with --lora-weight, which defaults to 1.

Latency Benchmark

To measure the latency of our INT4 models, use the following command:

python latency.py
  • The script defaults to the INT4 FLUX.1-schnell model. To switch to FLUX.1-dev, use the -m dev option. For BF16 precision, add -p bf16.
  • Adjust the number of inference steps and the guidance scale using -t and -g, respectively.
    • For FLUX.1-schnell, the defaults are 4 steps and a guidance scale of 0.
    • For FLUX.1-dev, the defaults are 50 steps and a guidance scale of 3.5.
  • By default, the script measures the end-to-end latency for generating a single image. To measure the latency of a single DiT forward step instead, use the --mode step flag.
  • Specify the number of warmup and test runs using --warmup_times and --test_times. The defaults are 2 warmup runs and 10 test runs.

Quality Results

Below are the steps to reproduce the quality metrics reported in our paper. Firstly, you will need to install slightly more packages for the image quality metrics:

pip install clean-fid torchmetrics image-reward clip datasets

Then generate images using both the original BF16 model and our INT4 model on the MJHQ and DCI datasets:

python evaluate.py -p int4
python evaluate.py -p bf16
  • The commands above will generate images from FLUX.1-schnell on both datasets. Use -m dev to switch to FLUX.1-dev, or specify a single dataset with -d MJHQ or -d DCI.
  • By default, generated images are saved to results/$MODEL/$PRECISION. Customize the output path using the -o option if desired.
  • You can also adjust the number of inference steps and the guidance scale using -t and -g, respectively.
    • For FLUX.1-schnell, the defaults are 4 steps and a guidance scale of 0.
    • For FLUX.1-dev, the defaults are 50 steps and a guidance scale of 3.5.
  • To accelerate the generation process, you can distribute the workload across multiple GPUs. For instance, if you have $N$ GPUs, on GPU $i (0 \le i < N)$ , you can add the options --chunk-start $i --chunk-step $N. This setup ensures each GPU handles a distinct portion of the workload, enhancing overall efficiency.

Finally you can compute the metrics for the images with

python get_metrics.py results/schnell/int4 results/schnell/bf16

Remember to replace the example paths with the actual paths to your image folders.

Notes:

  • The script will calculate quality metrics (CLIP IQA, CLIP Score, Image Reward, FID) only for the first folder specified. Ensure the INT4 results folder is listed first.
  • Similarity Metrics: If a second folder path is not provided, similarity metrics (LPIPS, PSNR, SSIM) will be skipped.
  • Output File: Metric results are saved in metrics.json by default. Use -o to specify a custom output file if needed.