Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AssertionError: You can't use same Accelerator() instance with multiple models when using DeepSpeed #241

Open
dizhenx opened this issue Jul 10, 2023 · 0 comments
Labels
bug Something isn't working

Comments

@dizhenx
Copy link

dizhenx commented Jul 10, 2023

Describe the bug

The same configuration, training on an 8 GB GPU, can work fine, but when executing the script to fine-tune the text encoder with UNet, an error occurs.
The config file is as follows:
compute_environment: LOCAL_MACHINE
deepspeed_config:
gradient_accumulation_steps: 1
gradient_clipping: 1.0
offload_optimizer_device: cpu
offload_param_device: cpu
zero3_init_flag: true
zero_stage: 2
distributed_type: DEEPSPEED
downcast_bf16: 'no'
dynamo_config:
dynamo_backend: FX2TRT
machine_rank: 0
main_training_function: main
mixed_precision: fp16
num_machines: 1
num_processes: 4
rdzv_backend: static
same_network: true
tpu_env: []
tpu_use_cluster: false
tpu_use_sudo: false
use_cpu: false
The error message is as follows:
You can't use same Accelerator() instance with multiple models when using DeepSpeed

Reproduction

accelerate launch --config_file /mnt/data/huggingface/accelerate/default_config1.yaml train_dreambooth.py --pretrained_model_name_or_path=$MODEL_NAME --train_text_encoder --instance_data_dir=$INSTANCE_DIR --class_data_dir=$CLASS_DIR --output_dir=$OUTPUT_DIR --with_prior_preservation --prior_loss_weight=1.0 --instance_prompt="a photo of sks dog" --class_prompt="a photo of dog" --resolution=512 --train_batch_size=1 --use_8bit_adam --gradient_checkpointing --learning_rate=2e-6 --lr_scheduler="constant" --lr_warmup_steps=0 --num_class_images=200 --max_train_steps=20

Logs

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /mnt/data/creative/diffusers/examples/dreambooth/train_dreambooth.py:869 in <module>             │
│                                                                                                  │
│   866                                                                                            │
│   867 if __name__ == "__main__":                                                                 │
│   868 │   args = parse_args()                                                                    │
│ ❱ 869 │   main(args)                                                                             │
│   870                                                                                            │
│                                                                                                  │
│ /mnt/data/creative/diffusers/examples/dreambooth/train_dreambooth.py:684 in main                 │
│                                                                                                  │
│   681 │   )                                                                                      │
│   682 │                                                                                          │
│   683 │   if args.train_text_encoder:                                                            │
│ ❱ 684 │   │   unet, text_encoder, optimizer, train_dataloader, lr_scheduler = accelerator.prep   │
│   685 │   │   │   unet, text_encoder, optimizer, train_dataloader, lr_scheduler                  │
│   686 │   │   )                                                                                  │
│   687 │   else:                                                                                  │
│                                                                                                  │
│ /mnt/data/creative/miniconda3/envs/diffusers/lib/python3.9/site-packages/accelerate/accelerator. │
│ py:1148 in prepare                                                                               │
│                                                                                                  │
│   1145 │   │   │   │   if isinstance(obj, torch.nn.Module):                                      │
│   1146 │   │   │   │   │   model_count += 1                                                      │
│   1147 │   │   │   if model_count > 1:                                                           │
│ ❱ 1148 │   │   │   │   raise AssertionError(                                                     │
│   1149 │   │   │   │   │   "You can't use same `Accelerator()` instance with multiple models wh  │
│   1150 │   │   │   │   )                                                                         │
│   1151                                                                                           │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
AssertionError: You can't use same `Accelerator()` instance with multiple models when using DeepSpeed
[10:57:19] ERROR    failed (exitcode: 1) local_rank: 0 (pid: 1146) of binary: /mnt/data/creative/miniconda3/envs/diffusers/bin/python3.9

System Info

  • diffusers version: 0.15.0.dev0
  • Platform: Linux-3.10.0-1160.92.1.el7.x86_64-x86_64-with-glibc2.17
  • Python version: 3.9.2
  • PyTorch version (GPU?): 2.0.1+cu117 (True)
  • Huggingface_hub version: 0.16.2
  • Transformers version: 4.27.1
  • Accelerate version: 0.20.3
  • xFormers version: not installed
  • Using GPU in script?: yes
  • Using distributed or parallel set-up in script?: yes
@dizhenx dizhenx added the bug Something isn't working label Jul 10, 2023
@dizhenx dizhenx changed the title You can't use same Accelerator() instance with multiple models when using DeepSpeed AssertionError: You can't use same Accelerator() instance with multiple models when using DeepSpeed Jul 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant