Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn #23

Open
brian6091 opened this issue Dec 29, 2022 · 4 comments
Labels
bug Something isn't working

Comments

@brian6091
Copy link
Owner

brian6091 commented Dec 29, 2022

Running with LoRA restricted to text_encoder, with no unet training produces title error.

@brian6091
Copy link
Owner Author

brian6091 commented Dec 29, 2022

Manually setting loss.requires_grad=True (fp16) reveals this:

Steps: 0% 0/1000 [00:00<?, ?it/s]Before
loss= tensor(0.6998, device='cuda:0')
loss.requires_grad= False
leaf= True
grad_fn= None
After
loss= tensor(0.6998, device='cuda:0', requires_grad=True)
loss.requires_grad= True
Traceback (most recent call last):
File "/content/Dreambooth/finetune.py", line 697, in
main(args)
File "/content/Dreambooth/finetune.py", line 619, in main
optimizer.step()
File "/usr/local/lib/python3.8/dist-packages/accelerate/optimizer.py", line 134, in step
self.scaler.step(self.optimizer, closure)
File "/usr/local/lib/python3.8/dist-packages/torch/cuda/amp/grad_scaler.py", line 339, in step
assert len(optimizer_state["found_inf_per_device"]) > 0, "No inf checks were recorded for this optimizer."
AssertionError: No inf checks were recorded for this optimizer.

However, in fp32 (mixed_precision="no"), it runs, but loss.requires_grad=False on each iter:

Steps: 10% 98/1000 [02:23<21:06, 1.40s/it, GPU=9038, Loss/pred=0.1, Loss/prior=0.0045, Loss/total=0.105, lr/text=4.9e-5]
Before
loss= tensor(0.4947, device='cuda:0')
loss.requires_grad= False
leaf= True
grad_fn= None
After
loss= tensor(0.4947, device='cuda:0', requires_grad=True)
loss.requires_grad= True

@brian6091
Copy link
Owner Author

train_unet_module_or_class: [attn2]
train_unet_submodule: [to_k, to_v]

train_text_module_or_class: [embeddings]
train_text_submodule: [token_embedding]

lora_unet_layer: null
lora_unet_train_off_target: null
lora_unet_rank: 4
lora_unet_alpha: 4.0

lora_text_layer: null
lora_text_train_off_target: null
lora_text_rank: 4
lora_text_alpha: 4.0

add_instance_token: true

Steps: 0% 48/10000 [01:14<3:21:28, 1.21s/it, GPU=10718, Loss/pred=0.368, Loss/prior=0.0126, Loss/total=0.19, lr/token=2.4e-7, lr/unet=4.8e-7]
Before
loss= tensor(0.3415, device='cuda:0', grad_fn=)
is_leaf(False) requires_grad(True) retains_grad(False) grad_fn(<AddBackward0 object at 0x7f2c7db1fe80>) grad(None)
After
loss= tensor(0.1707, device='cuda:0', grad_fn=)
is_leaf(False) requires_grad(True) retains_grad(False) grad_fn(<DivBackward0 object at 0x7f2c7db1fe80>) grad(None)

train_unet_module_or_class: [attn2]
train_unet_submodule: [to_k, to_v]

train_text_module_or_class: [embeddings]
train_text_submodule: [token_embedding]

lora_unet_layer: [Linear]
lora_unet_train_off_target: null
lora_unet_rank: 4
lora_unet_alpha: 4.0

lora_text_layer: null
lora_text_train_off_target: null
lora_text_rank: 4
lora_text_alpha: 4.0

add_instance_token: true
separate_token_embedding: true

Steps: 0% 6/10000 [00:12<4:09:57, 1.50s/it, GPU=10666, Loss/pred=0.152, Loss/prior=0.212, Loss/total=0.182, lr/token=3e-8, lr/unet=6e-8] Before
loss= tensor(0.3792, device='cuda:0', grad_fn=)
is_leaf(False) requires_grad(True) retains_grad(False) grad_fn(<AddBackward0 object at 0x7fcb2e93b7f0>) grad(None)
After
loss= tensor(0.1896, device='cuda:0', grad_fn=)
is_leaf(False) requires_grad(True) retains_grad(False) grad_fn(<DivBackward0 object at 0x7fcb2e93b7f0>) grad(None)

train_unet_module_or_class: [attn2]
train_unet_submodule: [to_k, to_v]

train_text_module_or_class: [embeddings]
train_text_submodule: [token_embedding]

lora_unet_layer: [Linear]
lora_unet_train_off_target: null
lora_unet_rank: 4
lora_unet_alpha: 4.0

lora_text_layer: null
lora_text_train_off_target: null
lora_text_rank: 4
lora_text_alpha: 4.0

add_instance_token: true
separate_token_embedding: false

Steps: 0% 2/10000 [00:07<9:09:35, 3.30s/it, GPU=10664, Loss/pred=0.252, Loss/prior=0.184, Loss/total=0.218, lr/token=2e-8, lr/unet=1e-7] Before
loss= tensor(0.2853, device='cuda:0', grad_fn=)
is_leaf(False) requires_grad(True) retains_grad(False) grad_fn(<AddBackward0 object at 0x7f3c2179d460>) grad(None)
After
loss= tensor(0.1426, device='cuda:0', grad_fn=)
is_leaf(False) requires_grad(True) retains_grad(False) grad_fn(<DivBackward0 object at 0x7f3c2179d460>) grad(None)

train_unet_module_or_class: [attn2]
train_unet_submodule: [to_k, to_v]

train_text_module_or_class: null
train_text_submodule: null

lora_unet_layer: [Linear]
lora_unet_train_off_target: null
lora_unet_rank: 4
lora_unet_alpha: 4.0

lora_text_layer: null
lora_text_train_off_target: null
lora_text_rank: 4
lora_text_alpha: 4.0

add_instance_token: true
separate_token_embedding: false

Steps: 0% 4/10000 [00:09<4:38:40, 1.67s/it, GPU=10284, Loss/pred=0.335, Loss/prior=0.0151, Loss/total=0.175, lr/unet=2e-7]
Before
loss= tensor(0.0186, device='cuda:0', grad_fn=)
is_leaf(False) requires_grad(True) retains_grad(False) grad_fn(<AddBackward0 object at 0x7f4021a96c70>) grad(None)
After
loss= tensor(0.0093, device='cuda:0', grad_fn=)
is_leaf(False) requires_grad(True) retains_grad(False) grad_fn(<DivBackward0 object at 0x7f4021a96c70>) grad(None)

train_unet_module_or_class: [attn2]
train_unet_submodule: [to_k, to_v]

train_text_module_or_class: null
train_text_submodule: null

lora_unet_layer: [Linear]
lora_unet_train_off_target: null
lora_unet_rank: 4
lora_unet_alpha: 4.0

lora_text_layer: null
lora_text_train_off_target: null
lora_text_rank: 4
lora_text_alpha: 4.0

add_instance_token: false
separate_token_embedding: false

Steps: 0% 7/10000 [00:13<3:37:29, 1.31s/it, GPU=10210, Loss/pred=0.406, Loss/prior=0.01, Loss/total=0.208, lr/unet=4e-7]
Before
loss= tensor(0.1137, device='cuda:0', grad_fn=)
is_leaf(False) requires_grad(True) retains_grad(False) grad_fn(<AddBackward0 object at 0x7f0c698ca3a0>) grad(None)
After
loss= tensor(0.0569, device='cuda:0', grad_fn=)
is_leaf(False) requires_grad(True) retains_grad(False) grad_fn(<DivBackward0 object at 0x7f0c698ca3a0>) grad(None)

train_unet_module_or_class: [attn2]
train_unet_submodule: [to_k, to_v]

train_text_module_or_class: null
train_text_submodule: null

lora_unet_layer: [Linear]
lora_unet_train_off_target: null
lora_unet_rank: 4
lora_unet_alpha: 4.0

lora_text_layer: null
lora_text_train_off_target: null
lora_text_rank: 4
lora_text_alpha: 4.0

add_instance_token: false
separate_token_embedding: false

Steps: 0% 2/10000 [00:07<8:40:02, 3.12s/it, GPU=10244, Loss/pred=0.214, Loss/prior=0.219, Loss/total=0.217, lr/unet=1e-7] Before
loss= tensor(0.3070, device='cuda:0', grad_fn=)
is_leaf(False) requires_grad(True) retains_grad(False) grad_fn(<AddBackward0 object at 0x7f61ed950250>) grad(None)
After
loss= tensor(0.1535, device='cuda:0', grad_fn=)
is_leaf(False) requires_grad(True) retains_grad(False) grad_fn(<DivBackward0 object at 0x7f61ed950250>) grad(None)

train_unet_module_or_class: null
train_unet_submodule: null

train_text_module_or_class: [CLIPAttention]
train_text_submodule: [k_proj, q_proj, v_proj, out_proj]

lora_unet_layer: null
lora_unet_train_off_target: null
lora_unet_rank: 4
lora_unet_alpha: 4.0

lora_text_layer: null
lora_text_train_off_target: null
lora_text_rank: 4
lora_text_alpha: 4.0

add_instance_token: false
separate_token_embedding: false

Steps: 0% 0/10000 [00:00<?, ?it/s]Before
loss= tensor(0.7092, device='cuda:0')
is_leaf(True) requires_grad(False) retains_grad(False) grad_fn(None) grad(None)
After
loss= tensor(0.3546, device='cuda:0')
is_leaf(True) requires_grad(False) retains_grad(False) grad_fn(None) grad(None)
Traceback (most recent call last):
File "/content/Dreambooth/finetune.py", line 701, in
main(args)
File "/content/Dreambooth/finetune.py", line 618, in main
accelerator.backward(loss)
File "/usr/local/lib/python3.8/dist-packages/accelerate/accelerator.py", line 1314, in backward
self.scaler.scale(loss).backward(**kwargs)
File "/usr/local/lib/python3.8/dist-packages/torch/_tensor.py", line 487, in backward
torch.autograd.backward(
File "/usr/local/lib/python3.8/dist-packages/torch/autograd/init.py", line 197, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

train_unet_module_or_class: null
train_unet_submodule: null

train_text_module_or_class: [CLIPAttention]
train_text_submodule: [k_proj, q_proj, v_proj, out_proj]

lora_unet_layer: null
lora_unet_train_off_target: null
lora_unet_rank: 4
lora_unet_alpha: 4.0

lora_text_layer: [Linear]
lora_text_train_off_target: null
lora_text_rank: 4
lora_text_alpha: 4.0

add_instance_token: false
separate_token_embedding: false

Steps: 0% 0/10000 [00:00<?, ?it/s]Before
loss= tensor(0.6998, device='cuda:0')
is_leaf(True) requires_grad(False) retains_grad(False) grad_fn(None) grad(None)
After
loss= tensor(0.3499, device='cuda:0')
is_leaf(True) requires_grad(False) retains_grad(False) grad_fn(None) grad(None)
Traceback (most recent call last):
File "/content/Dreambooth/finetune.py", line 701, in
main(args)
File "/content/Dreambooth/finetune.py", line 618, in main
accelerator.backward(loss)
File "/usr/local/lib/python3.8/dist-packages/accelerate/accelerator.py", line 1314, in backward
self.scaler.scale(loss).backward(**kwargs)
File "/usr/local/lib/python3.8/dist-packages/torch/_tensor.py", line 487, in backward
torch.autograd.backward(
File "/usr/local/lib/python3.8/dist-packages/torch/autograd/init.py", line 197, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

train_unet_module_or_class: null
train_unet_submodule: null

train_text_module_or_class: [CLIPAttention]
train_text_submodule: [k_proj, q_proj, v_proj, out_proj]

lora_unet_layer: null
lora_unet_train_off_target: null
lora_unet_rank: 4
lora_unet_alpha: 4.0

lora_text_layer: [Linear]
lora_text_train_off_target: null
lora_text_rank: 4
lora_text_alpha: 4.0

add_instance_token: false
separate_token_embedding: false
gradient_checkpointing: false

Steps: 0% 10/10000 [00:13<2:25:40, 1.14it/s, GPU=9436, Loss/pred=0.0642, Loss/prior=0.0144, Loss/total=0.0393, lr/text=1e-7]
Before
loss= tensor(0.0557, device='cuda:0', grad_fn=)
is_leaf(False) requires_grad(True) retains_grad(False) grad_fn(<AddBackward0 object at 0x7fa1936883d0>) grad(None)
After
loss= tensor(0.0279, device='cuda:0', grad_fn=)

@brian6091
Copy link
Owner Author

So it took a little time, but I traced this to setting gradient_checkpointing: True, but only for the text_encoder. This has a method name different from the Unet, since it appears to come from the Transformers library. Either way, it does something that changes the loss.is_leaf to True?

For now, I disable gradient_checkpointing for the text_encoder

@brian6091 brian6091 added the bug Something isn't working label Jan 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant