Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add zh translation for sd3-5 #147

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions zh/_blog.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2074,3 +2074,13 @@
- gradio
- spaces
- open-source

- local: sd3-5
title: "欢迎 Stable Diffusion 3.5 Large 加入 🧨 Diffusers"
author: diffusers
thumbnail: /blog/assets/sd3-5/thumbnail.png
date: October 22, 2024
tags:
- diffusers
- guide
- sd3-5
228 changes: 228 additions & 0 deletions zh/sd3-5.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,228 @@
---
title: 欢迎 Stable Diffusion 3.5 Large 加入 🧨 Diffusers
thumbnail: /blog/assets/sd3-5/thumbnail.png
authors:
- user: YiYiXu
- user: a-r-r-o-w
- user: dn6
- user: sayakpaul
- user: linoyts
- user: multimodalart
- user: OzzyGT
- user: ariG23498
translators:
- user: hugging-hoi2022
---

# 欢迎 Stable Diffusion 3.5 Large 加入 🧨 Diffusers

作为 [Stable Diffusion 3](https://huggingface.co/blog/sd3) 的改进版本,Stable Diffusion 3.5 如今已在 Hugging Face Hub 中可用,并可以直接使用 🧨 Diffusers 中的代码运行。

本次发布包含 [两套模型参数](https://huggingface.co/collections/stabilityai/stable-diffusion-35-671785cca799084f71fa2838):

- 一个大型的模型(large,8B)
- 该模型经过时间步蒸馏的版本,仅需几步推理即可生成图片

在本文中,我们将介绍如何在 Diffusers 中使用 Stable Diffusion 3.5(SD3.5),涵盖推理和训练两方面内容。

## 目录

- [模型结构改进](#模型结构改进)
- [在 Diffusers 中使用 SD3.5](#在 Diffusers 中使用 SD3.5)
- [在推理过程中使用量化策略](#在推理过程中使用量化策略)
- [在 SD3.5-large 上使用量化策略训练 LoRA](#在 SD3.5-large 上使用量化策略训练 LoRA)
- [使用 single-file 方法加载 SD3.5 的 Transformer 模型](#使用 single-file 方法加载 SD3.5 的 Transformer 模型)
- [重要链接](#重要链接)

## 模型结构改进

对于 SD3.5-large 使用的 transformer 模型,其结构基本和 SD3-medium 里的相同,但有以下更改:

- QK normalization:对于训练大型的 Transformer 模型,使用 [QK normalization](https://research.google/blog/scaling-vision-transformers-to-22-billion-parameters/) 已经成为标准做法,所以 SD3.5-large 也不例外。
- 双注意力层:在 MMDiT 结构中,文本和图像两个模态都在使用同一个注意力层;而 SD3.5-large 则使用了两个注意力层。

除此之外,文本编码器(text encoder)、图像的变分自编码器(VAE)以及噪声调度器(noise scheduler)均和 SD3-medium 保持一致。如果对 SD3 感兴趣,可以参考 [这篇论文](https://arxiv.org/abs/2403.03206)。

## 在 Diffusers 中使用 SD3.5

首先你需要确保安装的 Diffusers 是最新版本:

```bash
pip install -U diffusers
```

由于模型存在访问限制,你还需要到 [Hugging Face 上 Stable Diffusion 3.5 Large 的页面](https://huggingface.co/stabilityai/stable-diffusion-3.5-large)填写表格并同意相关条款。完成后你还需要登陆账号,才能访问到模型。使用如下方法登陆 Hugging Face 账号:

```bash
huggingface-cli login
```

下列代码将下载 SD3.5 的 8B 模型。下载的模型使用 `torch.bfloat16` 精度,这是 Stability AI 的原版格式,也推荐使用该精度进行推理。

```python
import torch
from diffusers import StableDiffusion3Pipeline

pipe = StableDiffusion3Pipeline.from_pretrained(
"stabilityai/stable-diffusion-3.5-large", torch_dtype=torch.bfloat16
).to("cuda")

image = pipe(
prompt="a photo of a cat holding a sign that says hello world",
negative_prompt="",
num_inference_steps=40,
height=1024,
width=1024,
guidance_scale=4.5,
).images[0]

image.save("sd3_hello_world.png")
```
![hello_world_cat](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/sd3-5/hello_world_cat.png)

本次发布也包含了一个 **“时间步蒸馏”** 的模型,该模型推理时无需 classifier-free guidance,可在短短几步推理内生成图片(通常是 4 到 8 步)。

```python
import torch
from diffusers import StableDiffusion3Pipeline

pipe = StableDiffusion3Pipeline.from_pretrained(
"stabilityai/stable-diffusion-3.5-large-turbo", torch_dtype=torch.bfloat16
).to("cuda")

image = pipe(
prompt="a photo of a cat holding a sign that says hello world",
num_inference_steps=4,
height=1024,
width=1024,
guidance_scale=1.0,
).images[0]

image.save("sd3_hello_world.png")
```
![hello_world_cat_2](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/sd3-5/hello_world_cat_2.png)

此外,在 [SD3 博客](https://huggingface.co/blog/zh/sd3) 和 [官方 Diffusers 文档](https://huggingface.co/docs/diffusers/main/en/api/pipelines/stable_diffusion/stable_diffusion_3) 中出现过的优化策略在 SD3.5 中都可使用。 这些策略都对推理时显存优化做了大量工作。由于SD3.5-large 是一个比 SD3-medium 大得多的模型,显存优化对于消费级场景下的使用显得尤为重要。

## 在推理过程中使用量化策略

Diffusers 原生支持使用 [`bitsandbytes`](https://github.com/bitsandbytes-foundation/bitsandbytes) 进行量化,这可以进一步降低显存使用。

首先,我们需要安装必要的库:

```bash
pip install -Uq git+https://github.com/huggingface/transformers@main
pip install -Uq bitsandbytes
```

接下来加载[“NF4”精度](https://huggingface.co/blog/4bit-transformers-bitsandbytes) 的模型:

```python
from diffusers import BitsAndBytesConfig, SD3Transformer2DModel
import torch

model_id = "stabilityai/stable-diffusion-3.5-large"
nf4_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16
)
model_nf4 = SD3Transformer2DModel.from_pretrained(
model_id,
subfolder="transformer",
quantization_config=nf4_config,
torch_dtype=torch.bfloat16
)
```

然后我们就能进行推理了:

```python
from diffusers import StableDiffusion3Pipeline

pipeline = StableDiffusion3Pipeline.from_pretrained(
model_id,
transformer=model_nf4,
torch_dtype=torch.bfloat16
)
pipeline.enable_model_cpu_offload()

prompt = "A whimsical and creative image depicting a hybrid creature that is a mix of a waffle and a hippopotamus, basking in a river of melted butter amidst a breakfast-themed landscape. It features the distinctive, bulky body shape of a hippo. However, instead of the usual grey skin, the creature's body resembles a golden-brown, crispy waffle fresh off the griddle. The skin is textured with the familiar grid pattern of a waffle, each square filled with a glistening sheen of syrup. The environment combines the natural habitat of a hippo with elements of a breakfast table setting, a river of warm, melted butter, with oversized utensils or plates peeking out from the lush, pancake-like foliage in the background, a towering pepper mill standing in for a tree. As the sun rises in this fantastical world, it casts a warm, buttery glow over the scene. The creature, content in its butter river, lets out a yawn. Nearby, a flock of birds take flight"
image = pipeline(
prompt=prompt,
negative_prompt="",
num_inference_steps=28,
guidance_scale=4.5,
max_sequence_length=512,
).images[0]
image.save("whimsical.png")
```
![happy_hippo](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/sd3-5/hippo.png)

如果你想调节 `BitsAndBytesConfig` 中其它配置,你可以在[这里](https://huggingface.co/docs/diffusers/main/en/quantization/bitsandbytes)参考官方文档。

直接载入相同 `nf4_config` 配置的已量化模型也是可以的,这对 RAM 较低的机器来说非常实用,读者可以在[这里的 Colab Notebook](https://colab.research.google.com/drive/1nK5hOCPY3RoGi0yqddscGdKvo1r-rHqE?usp=sharing) 来获取完整示例。

## 在 SD3.5-large 上使用量化策略训练 LoRA
借助 `bitsandbytes` 和 `peft`,我们可以在消费级显卡(24GB 显存)上微调像 SD3.5 这样的大模型。我们提供的 [SD3 训练脚本](https://huggingface.co/blog/zh/sd3#%E4%BD%BF%E7%94%A8-dreambooth-%E5%92%8C-lora-%E8%BF%9B%E8%A1%8C%E5%BE%AE%E8%B0%83)可以在这里用来训练 LoRA,使用如下命令即可:

```bash
accelerate launch train_dreambooth_lora_sd3.py \
--pretrained_model_name_or_path="stabilityai/stable-diffusion-3.5-large" \
--dataset_name="Norod78/Yarn-art-style" \
--output_dir="yart_art_sd3-5_lora" \
--mixed_precision="bf16" \
--instance_prompt="Frog, yarn art style" \
--caption_column="text"\
--resolution=768 \
--train_batch_size=1 \
--gradient_accumulation_steps=1 \
--learning_rate=4e-4 \
--report_to="wandb" \
--lr_scheduler="constant" \
--lr_warmup_steps=0 \
--max_train_steps=700 \
--rank=16 \
--seed="0" \
--push_to_hub
```

但如果想在训练中加入量化,还需要调整一些地方,这包括以下几个大概方向:

- 在初始化代码中的 `transformer` 时,加上量化配置,或者直接加载量化过的模型。
- 然后使用 `peft` 中的 `prepare_model_for_kbit_training()` 函数对模型进行准备操作。
- 其它步骤和原代码保持一致即可(感谢 `peft` 对 `bitsandbytes` 的强力支持)。

读者可参考 [这里](https://gist.github.com/sayakpaul/05afd428bc089b47af7c016e42004527) 的完整示例。

## 使用 single-file 方法加载 SD3.5 的 Transformer 模型

Stable Diffusion 3.5 的 transformer 模型还可以使用 Stability AI 发布的原生参数文件来进行初始化 。 这里需要使用 `from_single_file` 方法:

```python
import torch
from diffusers import SD3Transformer2DModel, StableDiffusion3Pipeline

transformer = SD3Transformer2DModel.from_single_file(
"https://huggingface.co/stabilityai/stable-diffusion-3.5-large-turbo/blob/main/sd3.5_large.safetensors",
torch_dtype=torch.bfloat16,
)
pipe = StableDiffusion3Pipeline.from_pretrained(
"stabilityai/stable-diffusion-3.5-large",
transformer=transformer,
torch_dtype=torch.bfloat16,
)
pipe.enable_model_cpu_offload()
image = pipe("a cat holding a sign that says hello world").images[0]
image.save("sd35.png")
```

### 重要链接
- SD3.5-large 在 Hugging Face Hub 上的[模型集合](https://huggingface.co/collections/stabilityai/stable-diffusion-35-671785cca799084f71fa2838)
- Diffusers 中 SD3.5 的 [官方文档](https://huggingface.co/docs/diffusers/main/en/api/pipelines/stable_diffusion/stable_diffusion_3)
- 用来运行 SD3.5 量化推理的 [Colab Notebook](https://colab.research.google.com/drive/1nK5hOCPY3RoGi0yqddscGdKvo1r-rHqE?usp=sharing)
- LoRA [训练代码](https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/README_sd3.md)
- Stable Diffusion 3 [官方论文](https://arxiv.org/abs/2403.03206)
- Stable Diffusion 3 [中文博客](https://huggingface.co/blog/zh/sd3)

_声明:感谢 [Daniel Frank](https://www.pexels.com/@fr3nks/) 为本博客提供了封面图,感谢 [Pedro Cuenca](https://huggingface.co/pcuenq) 和 [Tom Aarsen](https://huggingface.co/tomaarsen) 对本文的审校。_