RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn #23

brian6091 · 2022-12-29T11:06:29Z

Running with LoRA restricted to text_encoder, with no unet training produces title error.

brian6091 · 2022-12-29T11:06:43Z

https://discuss.pytorch.org/t/why-are-my-tensors-gradients-unexpectedly-none-or-not-none/111461
https://discuss.pytorch.org/t/element-0-of-tensors-does-not-require-grad-and-does-not-have-a-grad-fn/32908

brian6091 · 2022-12-29T11:40:50Z

Manually setting loss.requires_grad=True (fp16) reveals this:

Steps: 0% 0/1000 [00:00<?, ?it/s]Before
loss= tensor(0.6998, device='cuda:0')
loss.requires_grad= False
leaf= True
grad_fn= None
After
loss= tensor(0.6998, device='cuda:0', requires_grad=True)
loss.requires_grad= True
Traceback (most recent call last):
File "/content/Dreambooth/finetune.py", line 697, in
main(args)
File "/content/Dreambooth/finetune.py", line 619, in main
optimizer.step()
File "/usr/local/lib/python3.8/dist-packages/accelerate/optimizer.py", line 134, in step
self.scaler.step(self.optimizer, closure)
File "/usr/local/lib/python3.8/dist-packages/torch/cuda/amp/grad_scaler.py", line 339, in step
assert len(optimizer_state["found_inf_per_device"]) > 0, "No inf checks were recorded for this optimizer."
AssertionError: No inf checks were recorded for this optimizer.

However, in fp32 (mixed_precision="no"), it runs, but loss.requires_grad=False on each iter:

Steps: 10% 98/1000 [02:23<21:06, 1.40s/it, GPU=9038, Loss/pred=0.1, Loss/prior=0.0045, Loss/total=0.105, lr/text=4.9e-5]
Before
loss= tensor(0.4947, device='cuda:0')
loss.requires_grad= False
leaf= True
grad_fn= None
After
loss= tensor(0.4947, device='cuda:0', requires_grad=True)
loss.requires_grad= True

brian6091 · 2022-12-29T16:25:35Z

train_unet_module_or_class: [attn2]
train_unet_submodule: [to_k, to_v]

train_text_module_or_class: [embeddings]
train_text_submodule: [token_embedding]

lora_unet_layer: null
lora_unet_train_off_target: null
lora_unet_rank: 4
lora_unet_alpha: 4.0

lora_text_layer: null
lora_text_train_off_target: null
lora_text_rank: 4
lora_text_alpha: 4.0

add_instance_token: true

Steps: 0% 48/10000 [01:14<3:21:28, 1.21s/it, GPU=10718, Loss/pred=0.368, Loss/prior=0.0126, Loss/total=0.19, lr/token=2.4e-7, lr/unet=4.8e-7]
Before
loss= tensor(0.3415, device='cuda:0', grad_fn=)
is_leaf(False) requires_grad(True) retains_grad(False) grad_fn(<AddBackward0 object at 0x7f2c7db1fe80>) grad(None)
After
loss= tensor(0.1707, device='cuda:0', grad_fn=)
is_leaf(False) requires_grad(True) retains_grad(False) grad_fn(<DivBackward0 object at 0x7f2c7db1fe80>) grad(None)

train_unet_module_or_class: [attn2]
train_unet_submodule: [to_k, to_v]

train_text_module_or_class: [embeddings]
train_text_submodule: [token_embedding]

lora_unet_layer: [Linear]
lora_unet_train_off_target: null
lora_unet_rank: 4
lora_unet_alpha: 4.0

lora_text_layer: null
lora_text_train_off_target: null
lora_text_rank: 4
lora_text_alpha: 4.0

add_instance_token: true
separate_token_embedding: true

Steps: 0% 6/10000 [00:12<4:09:57, 1.50s/it, GPU=10666, Loss/pred=0.152, Loss/prior=0.212, Loss/total=0.182, lr/token=3e-8, lr/unet=6e-8] Before
loss= tensor(0.3792, device='cuda:0', grad_fn=)
is_leaf(False) requires_grad(True) retains_grad(False) grad_fn(<AddBackward0 object at 0x7fcb2e93b7f0>) grad(None)
After
loss= tensor(0.1896, device='cuda:0', grad_fn=)
is_leaf(False) requires_grad(True) retains_grad(False) grad_fn(<DivBackward0 object at 0x7fcb2e93b7f0>) grad(None)

train_unet_module_or_class: [attn2]
train_unet_submodule: [to_k, to_v]

train_text_module_or_class: [embeddings]
train_text_submodule: [token_embedding]

lora_unet_layer: [Linear]
lora_unet_train_off_target: null
lora_unet_rank: 4
lora_unet_alpha: 4.0

lora_text_layer: null
lora_text_train_off_target: null
lora_text_rank: 4
lora_text_alpha: 4.0

add_instance_token: true
separate_token_embedding: false

Steps: 0% 2/10000 [00:07<9:09:35, 3.30s/it, GPU=10664, Loss/pred=0.252, Loss/prior=0.184, Loss/total=0.218, lr/token=2e-8, lr/unet=1e-7] Before
loss= tensor(0.2853, device='cuda:0', grad_fn=)
is_leaf(False) requires_grad(True) retains_grad(False) grad_fn(<AddBackward0 object at 0x7f3c2179d460>) grad(None)
After
loss= tensor(0.1426, device='cuda:0', grad_fn=)
is_leaf(False) requires_grad(True) retains_grad(False) grad_fn(<DivBackward0 object at 0x7f3c2179d460>) grad(None)

train_unet_module_or_class: [attn2]
train_unet_submodule: [to_k, to_v]

train_text_module_or_class: null
train_text_submodule: null

lora_unet_layer: [Linear]
lora_unet_train_off_target: null
lora_unet_rank: 4
lora_unet_alpha: 4.0

lora_text_layer: null
lora_text_train_off_target: null
lora_text_rank: 4
lora_text_alpha: 4.0

add_instance_token: true
separate_token_embedding: false

Steps: 0% 4/10000 [00:09<4:38:40, 1.67s/it, GPU=10284, Loss/pred=0.335, Loss/prior=0.0151, Loss/total=0.175, lr/unet=2e-7]
Before
loss= tensor(0.0186, device='cuda:0', grad_fn=)
is_leaf(False) requires_grad(True) retains_grad(False) grad_fn(<AddBackward0 object at 0x7f4021a96c70>) grad(None)
After
loss= tensor(0.0093, device='cuda:0', grad_fn=)
is_leaf(False) requires_grad(True) retains_grad(False) grad_fn(<DivBackward0 object at 0x7f4021a96c70>) grad(None)

train_unet_module_or_class: [attn2]
train_unet_submodule: [to_k, to_v]

train_text_module_or_class: null
train_text_submodule: null

lora_unet_layer: [Linear]
lora_unet_train_off_target: null
lora_unet_rank: 4
lora_unet_alpha: 4.0

lora_text_layer: null
lora_text_train_off_target: null
lora_text_rank: 4
lora_text_alpha: 4.0

add_instance_token: false
separate_token_embedding: false

Steps: 0% 7/10000 [00:13<3:37:29, 1.31s/it, GPU=10210, Loss/pred=0.406, Loss/prior=0.01, Loss/total=0.208, lr/unet=4e-7]
Before
loss= tensor(0.1137, device='cuda:0', grad_fn=)
is_leaf(False) requires_grad(True) retains_grad(False) grad_fn(<AddBackward0 object at 0x7f0c698ca3a0>) grad(None)
After
loss= tensor(0.0569, device='cuda:0', grad_fn=)
is_leaf(False) requires_grad(True) retains_grad(False) grad_fn(<DivBackward0 object at 0x7f0c698ca3a0>) grad(None)

train_unet_module_or_class: [attn2]
train_unet_submodule: [to_k, to_v]

train_text_module_or_class: null
train_text_submodule: null

lora_unet_layer: [Linear]
lora_unet_train_off_target: null
lora_unet_rank: 4
lora_unet_alpha: 4.0

lora_text_layer: null
lora_text_train_off_target: null
lora_text_rank: 4
lora_text_alpha: 4.0

add_instance_token: false
separate_token_embedding: false

Steps: 0% 2/10000 [00:07<8:40:02, 3.12s/it, GPU=10244, Loss/pred=0.214, Loss/prior=0.219, Loss/total=0.217, lr/unet=1e-7] Before
loss= tensor(0.3070, device='cuda:0', grad_fn=)
is_leaf(False) requires_grad(True) retains_grad(False) grad_fn(<AddBackward0 object at 0x7f61ed950250>) grad(None)
After
loss= tensor(0.1535, device='cuda:0', grad_fn=)
is_leaf(False) requires_grad(True) retains_grad(False) grad_fn(<DivBackward0 object at 0x7f61ed950250>) grad(None)

train_unet_module_or_class: null
train_unet_submodule: null

train_text_module_or_class: [CLIPAttention]
train_text_submodule: [k_proj, q_proj, v_proj, out_proj]

lora_unet_layer: null
lora_unet_train_off_target: null
lora_unet_rank: 4
lora_unet_alpha: 4.0

lora_text_layer: null
lora_text_train_off_target: null
lora_text_rank: 4
lora_text_alpha: 4.0

add_instance_token: false
separate_token_embedding: false

Steps: 0% 0/10000 [00:00<?, ?it/s]Before
loss= tensor(0.7092, device='cuda:0')
is_leaf(True) requires_grad(False) retains_grad(False) grad_fn(None) grad(None)
After
loss= tensor(0.3546, device='cuda:0')
is_leaf(True) requires_grad(False) retains_grad(False) grad_fn(None) grad(None)
Traceback (most recent call last):
File "/content/Dreambooth/finetune.py", line 701, in
main(args)
File "/content/Dreambooth/finetune.py", line 618, in main
accelerator.backward(loss)
File "/usr/local/lib/python3.8/dist-packages/accelerate/accelerator.py", line 1314, in backward
self.scaler.scale(loss).backward(**kwargs)
File "/usr/local/lib/python3.8/dist-packages/torch/_tensor.py", line 487, in backward
torch.autograd.backward(
File "/usr/local/lib/python3.8/dist-packages/torch/autograd/init.py", line 197, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

train_unet_module_or_class: null
train_unet_submodule: null

train_text_module_or_class: [CLIPAttention]
train_text_submodule: [k_proj, q_proj, v_proj, out_proj]

lora_unet_layer: null
lora_unet_train_off_target: null
lora_unet_rank: 4
lora_unet_alpha: 4.0

lora_text_layer: [Linear]
lora_text_train_off_target: null
lora_text_rank: 4
lora_text_alpha: 4.0

add_instance_token: false
separate_token_embedding: false

Steps: 0% 0/10000 [00:00<?, ?it/s]Before
loss= tensor(0.6998, device='cuda:0')
is_leaf(True) requires_grad(False) retains_grad(False) grad_fn(None) grad(None)
After
loss= tensor(0.3499, device='cuda:0')
is_leaf(True) requires_grad(False) retains_grad(False) grad_fn(None) grad(None)
Traceback (most recent call last):
File "/content/Dreambooth/finetune.py", line 701, in
main(args)
File "/content/Dreambooth/finetune.py", line 618, in main
accelerator.backward(loss)
File "/usr/local/lib/python3.8/dist-packages/accelerate/accelerator.py", line 1314, in backward
self.scaler.scale(loss).backward(**kwargs)
File "/usr/local/lib/python3.8/dist-packages/torch/_tensor.py", line 487, in backward
torch.autograd.backward(
File "/usr/local/lib/python3.8/dist-packages/torch/autograd/init.py", line 197, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

train_unet_module_or_class: null
train_unet_submodule: null

train_text_module_or_class: [CLIPAttention]
train_text_submodule: [k_proj, q_proj, v_proj, out_proj]

lora_unet_layer: null
lora_unet_train_off_target: null
lora_unet_rank: 4
lora_unet_alpha: 4.0

lora_text_layer: [Linear]
lora_text_train_off_target: null
lora_text_rank: 4
lora_text_alpha: 4.0

add_instance_token: false
separate_token_embedding: false
gradient_checkpointing: false

Steps: 0% 10/10000 [00:13<2:25:40, 1.14it/s, GPU=9436, Loss/pred=0.0642, Loss/prior=0.0144, Loss/total=0.0393, lr/text=1e-7]
Before
loss= tensor(0.0557, device='cuda:0', grad_fn=)
is_leaf(False) requires_grad(True) retains_grad(False) grad_fn(<AddBackward0 object at 0x7fa1936883d0>) grad(None)
After
loss= tensor(0.0279, device='cuda:0', grad_fn=)

brian6091 · 2022-12-29T16:28:04Z

So it took a little time, but I traced this to setting gradient_checkpointing: True, but only for the text_encoder. This has a method name different from the Unet, since it appears to come from the Transformers library. Either way, it does something that changes the loss.is_leaf to True?

For now, I disable gradient_checkpointing for the text_encoder

brian6091 added the bug Something isn't working label Jan 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn #23

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn #23

brian6091 commented Dec 29, 2022 •

edited

Loading

brian6091 commented Dec 29, 2022

brian6091 commented Dec 29, 2022 •

edited

Loading

brian6091 commented Dec 29, 2022

brian6091 commented Dec 29, 2022

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn #23

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn #23

Comments

brian6091 commented Dec 29, 2022 • edited Loading

brian6091 commented Dec 29, 2022

brian6091 commented Dec 29, 2022 • edited Loading

brian6091 commented Dec 29, 2022

brian6091 commented Dec 29, 2022

brian6091 commented Dec 29, 2022 •

edited

Loading

brian6091 commented Dec 29, 2022 •

edited

Loading