You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi! I'm trying to train CoCa using the pretrained RoBERTa weights (has the casual masking issue #445 been addressed?), but I am running into an error with the Attention Maps sizes. Any help would be greatly appreciated :).
Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "src/training/main.py", line 508, in <module>
main(sys.argv[1:])
File "src/training/main.py", line 436, in main
train_one_epoch(model, data, loss, epoch, optimizer, scaler, scheduler, dist_model, args, tb_writer=writer)
File "src/training/train.py", line 101, in train_one_epoch
model_out = model(images, texts)
... (omitted for brevity)
File ".venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File ".venv/lib/python3.10/site-packages/torch/nn/modules/activation.py", line 1241, in forward
attn_output, attn_output_weights = F.multi_head_attention_forward(
File ".venv/lib/python3.10/site-packages/torch/nn/functional.py", line 5354, in multi_head_attention_forward
raise RuntimeError(f"The shape of the 2D attn_mask is {attn_mask.shape}, but should be {correct_2d_size}.")
RuntimeError: The shape of the 2D attn_mask is torch.Size([76, 76]), but should be (77, 77).
Inspecting the error, I tried to change the multi-modal context length to 77, which yields the following error:
@sandeepmukh
I think a few things wrong for this .... first, update to main branch.
Then, I think this is needed in CocaModel to replace current vocab_size logic btw text and multimodal text towers
if getattr(text_cfg, "hf_model_name", None) is not None:
vocab_size = getattr(self.text, "vocab_size", text_cfg.vocab_size)
else:
vocab_size = text_cfg.vocab_size
Also, the context_len used by tokenzier sources from text_cfg by default, so text_cfg and multimodal_cfg should have same context_len values in config (I think) to work best but I'm not 100% sure there.
Hi! I'm trying to train CoCa using the pretrained RoBERTa weights (has the casual masking issue #445 been addressed?), but I am running into an error with the Attention Maps sizes. Any help would be greatly appreciated :).
Below is the command I'm running:
However, this errors:
Inspecting the error, I tried to change the multi-modal context length to 77, which yields the following error:
The text was updated successfully, but these errors were encountered: