huggingface-cn · Hoi2022 · Oct 25, 2024 · Oct 25, 2024
diff --git a/zh/_blog.yml b/zh/_blog.yml
@@ -2074,3 +2074,13 @@
     - gradio
     - spaces
     - open-source
+
+- local: sd3-5
+  title: "欢迎 Stable Diffusion 3.5 Large 加入 🧨 Diffusers"
+  author: diffusers
+  thumbnail: /blog/assets/sd3-5/thumbnail.png
+  date: October 22, 2024
+  tags:
+    - diffusers
+    - guide
+    - sd3-5
diff --git a/zh/sd3-5.md b/zh/sd3-5.md
@@ -0,0 +1,228 @@
+---
+title: 欢迎 Stable Diffusion 3.5 Large 加入 🧨 Diffusers
+thumbnail: /blog/assets/sd3-5/thumbnail.png
+authors:
+- user: YiYiXu
+- user: a-r-r-o-w
+- user: dn6
+- user: sayakpaul
+- user: linoyts
+- user: multimodalart
+- user: OzzyGT
+- user: ariG23498
+translators:
+- user: hugging-hoi2022
+---
+
+# 欢迎 Stable Diffusion 3.5 Large 加入 🧨 Diffusers
+
+作为 [Stable Diffusion 3](https://huggingface.co/blog/sd3) 的改进版本，Stable Diffusion 3.5 如今已在 Hugging Face Hub 中可用，并可以直接使用 🧨 Diffusers 中的代码运行。
+
+本次发布包含 [两套模型参数](https://huggingface.co/collections/stabilityai/stable-diffusion-35-671785cca799084f71fa2838):
+
+- 一个大型的模型（large，8B）
+- 该模型经过时间步蒸馏的版本，仅需几步推理即可生成图片
+
+在本文中，我们将介绍如何在 Diffusers 中使用 Stable Diffusion 3.5（SD3.5），涵盖推理和训练两方面内容。
+
+## 目录
+
+- [模型结构改进](#模型结构改进)
+- [在 Diffusers 中使用 SD3.5](#在 Diffusers 中使用 SD3.5)
+- [在推理过程中使用量化策略](#在推理过程中使用量化策略)
+- [在 SD3.5-large 上使用量化策略训练 LoRA](#在 SD3.5-large 上使用量化策略训练 LoRA)
+- [使用 single-file 方法加载 SD3.5 的 Transformer 模型](#使用 single-file 方法加载 SD3.5 的 Transformer 模型)
+- [重要链接](#重要链接)
+
+## 模型结构改进
+
+对于 SD3.5-large 使用的 transformer 模型，其结构基本和 SD3-medium 里的相同，但有以下更改：
+
+- QK normalization：对于训练大型的 Transformer 模型，使用 [QK normalization](https://research.google/blog/scaling-vision-transformers-to-22-billion-parameters/) 已经成为标准做法，所以 SD3.5-large 也不例外。
+- 双注意力层：在 MMDiT 结构中，文本和图像两个模态都在使用同一个注意力层；而 SD3.5-large 则使用了两个注意力层。
+
+除此之外，文本编码器（text encoder）、图像的变分自编码器（VAE）以及噪声调度器（noise scheduler）均和 SD3-medium 保持一致。如果对 SD3 感兴趣，可以参考 [这篇论文](https://arxiv.org/abs/2403.03206)。
+
+## 在 Diffusers 中使用 SD3.5
+
+首先你需要确保安装的 Diffusers 是最新版本：
+
+```bash
+pip install -U diffusers
+```
+
+由于模型存在访问限制，你还需要到 [Hugging Face 上 Stable Diffusion 3.5 Large 的页面](https://huggingface.co/stabilityai/stable-diffusion-3.5-large)填写表格并同意相关条款。完成后你还需要登陆账号，才能访问到模型。使用如下方法登陆 Hugging Face 账号：
+
+```bash
+huggingface-cli login
+```
+
+下列代码将下载 SD3.5 的 8B 模型。下载的模型使用 `torch.bfloat16` 精度，这是 Stability AI 的原版格式，也推荐使用该精度进行推理。
+
+```python
+import torch
+from diffusers import StableDiffusion3Pipeline
+
+pipe = StableDiffusion3Pipeline.from_pretrained(
+	"stabilityai/stable-diffusion-3.5-large", torch_dtype=torch.bfloat16
+).to("cuda")
+
+image = pipe(
+    prompt="a photo of a cat holding a sign that says hello world",
+    negative_prompt="",
+    num_inference_steps=40,
+    height=1024,
+    width=1024,
+    guidance_scale=4.5,
+).images[0]
+
+image.save("sd3_hello_world.png")
+```
+![hello_world_cat](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/sd3-5/hello_world_cat.png)
+
+本次发布也包含了一个 **“时间步蒸馏”** 的模型，该模型推理时无需 classifier-free guidance，可在短短几步推理内生成图片（通常是 4 到 8 步）。
+
+```python
+import torch
+from diffusers import StableDiffusion3Pipeline
+
+pipe = StableDiffusion3Pipeline.from_pretrained(
+	"stabilityai/stable-diffusion-3.5-large-turbo", torch_dtype=torch.bfloat16
+).to("cuda")
+
+image = pipe(
+    prompt="a photo of a cat holding a sign that says hello world",
+    num_inference_steps=4,
+    height=1024,
+    width=1024,
+    guidance_scale=1.0,
+).images[0]
+
+image.save("sd3_hello_world.png")
+```
+![hello_world_cat_2](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/sd3-5/hello_world_cat_2.png)
+
+此外，在 [SD3 博客](https://huggingface.co/blog/zh/sd3) 和 [官方 Diffusers 文档](https://huggingface.co/docs/diffusers/main/en/api/pipelines/stable_diffusion/stable_diffusion_3) 中出现过的优化策略在 SD3.5 中都可使用。 这些策略都对推理时显存优化做了大量工作。由于SD3.5-large 是一个比 SD3-medium 大得多的模型，显存优化对于消费级场景下的使用显得尤为重要。
+
+## 在推理过程中使用量化策略
+
+Diffusers 原生支持使用 [`bitsandbytes`](https://github.com/bitsandbytes-foundation/bitsandbytes) 进行量化，这可以进一步降低显存使用。 
+
+首先，我们需要安装必要的库：
+
+```bash
+pip install -Uq git+https://github.com/huggingface/transformers@main
+pip install -Uq bitsandbytes
+```
+
+接下来加载[“NF4”精度](https://huggingface.co/blog/4bit-transformers-bitsandbytes) 的模型：
+
+```python
+from diffusers import BitsAndBytesConfig, SD3Transformer2DModel
+import torch
+
+model_id = "stabilityai/stable-diffusion-3.5-large"
+nf4_config = BitsAndBytesConfig(
+    load_in_4bit=True,
+    bnb_4bit_quant_type="nf4",
+    bnb_4bit_compute_dtype=torch.bfloat16
+)
+model_nf4 = SD3Transformer2DModel.from_pretrained(
+    model_id,
+    subfolder="transformer",
+    quantization_config=nf4_config,
+    torch_dtype=torch.bfloat16
+)
+```
+
+然后我们就能进行推理了：
+
+```python
+from diffusers import StableDiffusion3Pipeline
+
+pipeline = StableDiffusion3Pipeline.from_pretrained(
+    model_id, 
+    transformer=model_nf4,
+    torch_dtype=torch.bfloat16
+)
+pipeline.enable_model_cpu_offload()
+
+prompt = "A whimsical and creative image depicting a hybrid creature that is a mix of a waffle and a hippopotamus, basking in a river of melted butter amidst a breakfast-themed landscape. It features the distinctive, bulky body shape of a hippo. However, instead of the usual grey skin, the creature's body resembles a golden-brown, crispy waffle fresh off the griddle. The skin is textured with the familiar grid pattern of a waffle, each square filled with a glistening sheen of syrup. The environment combines the natural habitat of a hippo with elements of a breakfast table setting, a river of warm, melted butter, with oversized utensils or plates peeking out from the lush, pancake-like foliage in the background, a towering pepper mill standing in for a tree.  As the sun rises in this fantastical world, it casts a warm, buttery glow over the scene. The creature, content in its butter river, lets out a yawn. Nearby, a flock of birds take flight"
+image = pipeline(
+    prompt=prompt,
+    negative_prompt="",
+    num_inference_steps=28,
+    guidance_scale=4.5,
+    max_sequence_length=512,
+).images[0]
+image.save("whimsical.png")
+```
+![happy_hippo](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/sd3-5/hippo.png)
+
+如果你想调节 `BitsAndBytesConfig` 中其它配置，你可以在[这里](https://huggingface.co/docs/diffusers/main/en/quantization/bitsandbytes)参考官方文档。 
+
+直接载入相同 `nf4_config` 配置的已量化模型也是可以的，这对 RAM 较低的机器来说非常实用，读者可以在[这里的 Colab Notebook](https://colab.research.google.com/drive/1nK5hOCPY3RoGi0yqddscGdKvo1r-rHqE?usp=sharing) 来获取完整示例。
+
+## 在 SD3.5-large 上使用量化策略训练 LoRA
+借助 `bitsandbytes` 和 `peft`，我们可以在消费级显卡（24GB 显存）上微调像 SD3.5 这样的大模型。我们提供的 [SD3 训练脚本](https://huggingface.co/blog/zh/sd3#%E4%BD%BF%E7%94%A8-dreambooth-%E5%92%8C-lora-%E8%BF%9B%E8%A1%8C%E5%BE%AE%E8%B0%83)可以在这里用来训练 LoRA，使用如下命令即可：
+
+```bash
+accelerate launch train_dreambooth_lora_sd3.py \
+  --pretrained_model_name_or_path="stabilityai/stable-diffusion-3.5-large"  \
+  --dataset_name="Norod78/Yarn-art-style" \
+  --output_dir="yart_art_sd3-5_lora" \
+  --mixed_precision="bf16" \
+  --instance_prompt="Frog, yarn art style" \
+  --caption_column="text"\
+  --resolution=768 \
+  --train_batch_size=1 \
+  --gradient_accumulation_steps=1 \
+  --learning_rate=4e-4 \
+  --report_to="wandb" \
+  --lr_scheduler="constant" \
+  --lr_warmup_steps=0 \
+  --max_train_steps=700 \
+  --rank=16 \
+  --seed="0" \
+  --push_to_hub
+```
+
+但如果想在训练中加入量化，还需要调整一些地方，这包括以下几个大概方向：
+
+- 在初始化代码中的 `transformer` 时，加上量化配置，或者直接加载量化过的模型。
+- 然后使用 `peft` 中的 `prepare_model_for_kbit_training()` 函数对模型进行准备操作。
+- 其它步骤和原代码保持一致即可（感谢 `peft` 对 `bitsandbytes` 的强力支持）。
+
+读者可参考 [这里](https://gist.github.com/sayakpaul/05afd428bc089b47af7c016e42004527) 的完整示例。
+
+## 使用 single-file 方法加载 SD3.5 的 Transformer 模型
+
+Stable Diffusion 3.5 的 transformer 模型还可以使用 Stability AI 发布的原生参数文件来进行初始化 。 这里需要使用 `from_single_file` 方法：
+
+```python
+import torch
+from diffusers import SD3Transformer2DModel, StableDiffusion3Pipeline
+
+transformer = SD3Transformer2DModel.from_single_file(
+    "https://huggingface.co/stabilityai/stable-diffusion-3.5-large-turbo/blob/main/sd3.5_large.safetensors",
+    torch_dtype=torch.bfloat16,
+)
+pipe = StableDiffusion3Pipeline.from_pretrained(
+    "stabilityai/stable-diffusion-3.5-large",
+    transformer=transformer,
+    torch_dtype=torch.bfloat16,
+)
+pipe.enable_model_cpu_offload()
+image = pipe("a cat holding a sign that says hello world").images[0]
+image.save("sd35.png") 
+```
+
+### 重要链接
+- SD3.5-large 在 Hugging Face Hub 上的[模型集合](https://huggingface.co/collections/stabilityai/stable-diffusion-35-671785cca799084f71fa2838)
+- Diffusers 中 SD3.5 的 [官方文档](https://huggingface.co/docs/diffusers/main/en/api/pipelines/stable_diffusion/stable_diffusion_3) 
+- 用来运行 SD3.5 量化推理的 [Colab Notebook](https://colab.research.google.com/drive/1nK5hOCPY3RoGi0yqddscGdKvo1r-rHqE?usp=sharing)
+- LoRA [训练代码](https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/README_sd3.md)
+- Stable Diffusion 3 [官方论文](https://arxiv.org/abs/2403.03206)
+- Stable Diffusion 3 [中文博客](https://huggingface.co/blog/zh/sd3)
+
+_声明：感谢 [Daniel Frank](https://www.pexels.com/@fr3nks/) 为本博客提供了封面图，感谢 [Pedro Cuenca](https://huggingface.co/pcuenq) 和 [Tom Aarsen](https://huggingface.co/tomaarsen) 对本文的审校。_