Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: Subtraction, the - operator, with a bool tensor is not supported. If you are trying to invert a mask, use the ~ or logical_not() operator instead. #33

Open
Maaaj opened this issue Nov 18, 2021 · 9 comments

Comments

@Maaaj
Copy link

Maaaj commented Nov 18, 2021

Hello,
Please guide me through this issue, I am unable to train the sequencer-train.sh, due to following ERRORS.

mj@ubuntu:~/Desktop/AEC/chai-master/src $ ./sequencer-train.sh
sequencer-train.sh start
Starting data preprocessing
Please backup existing pt files: /home/mj/Desktop/AEC/chai-master/results/Golden/final.train*.pt, to avoid overwriting them!
Starting training
[2021-11-17 20:34:22,616 INFO] * vocabulary size. source = 1004; target = 1004
[2021-11-17 20:34:22,616 INFO] Building model...
[2021-11-17 20:34:22,645 INFO] NMTModel(
(encoder): RNNEncoder(
(embeddings): Embeddings(
(make_embedding): Sequential(
(emb_luts): Elementwise(
(0): Embedding(1004, 256, padding_idx=1)
)
)
)
(rnn): LSTM(256, 128, num_layers=2, dropout=0.3, bidirectional=True)
(bridge): ModuleList(
(0): Linear(in_features=256, out_features=256, bias=True)
(1): Linear(in_features=256, out_features=256, bias=True)
)
)
(decoder): InputFeedRNNDecoder(
(embeddings): Embeddings(
(make_embedding): Sequential(
(emb_luts): Elementwise(
(0): Embedding(1004, 256, padding_idx=1)
)
)
)
(dropout): Dropout(p=0.3, inplace=False)
(rnn): StackedLSTM(
(dropout): Dropout(p=0.3, inplace=False)
(layers): ModuleList(
(0): LSTMCell(512, 256)
(1): LSTMCell(256, 256)
)
)
(attn): GlobalAttention(
(linear_in): Linear(in_features=256, out_features=256, bias=False)
(linear_out): Linear(in_features=512, out_features=256, bias=False)
)
)
(generator): CopyGenerator(
(linear): Linear(in_features=256, out_features=1004, bias=True)
(linear_copy): Linear(in_features=256, out_features=1, bias=True)
)
)
[2021-11-17 20:34:22,645 INFO] encoder: 1179136
[2021-11-17 20:34:22,645 INFO] decoder: 2026733
[2021-11-17 20:34:22,645 INFO] * number of parameters: 3205869
[2021-11-17 20:34:22,646 INFO] Starting training on CPU, could be very slow
[2021-11-17 20:34:22,646 INFO] Start training...
[2021-11-17 20:34:32,331 INFO] Loading dataset from /home/mj/Desktop/AEC/chai-master/results/Golden/final.train.0.pt, number of examples: 33469
/home/mj/.local/lib/python3.8/site-packages/torchtext/data/field.py:359: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
var = torch.tensor(arr, dtype=self.dtype, device=device)
Traceback (most recent call last):
File "train.py", line 120, in
main(opt)
File "train.py", line 53, in main
single_main(opt, -1)
File "/home/mj/Desktop/AEC/chai-master/src/lib/OpenNMT-py/onmt/train_single.py", line 154, in main
trainer.train(train_iter, valid_iter, opt.train_steps, opt.valid_steps)
File "/home/mj/Desktop/AEC/chai-master/src/lib/OpenNMT-py/onmt/trainer.py", line 172, in train
self._gradient_accumulation(
File "/home/mj/Desktop/AEC/chai-master/src/lib/OpenNMT-py/onmt/trainer.py", line 280, in _gradient_accumulation
self.model(src, tgt, src_lengths)
File "/home/mj/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/mj/Desktop/AEC/chai-master/src/lib/OpenNMT-py/onmt/models/model.py", line 45, in forward
dec_out, attns = self.decoder(tgt, memory_bank,
File "/home/mj/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/mj/Desktop/AEC/chai-master/src/lib/OpenNMT-py/onmt/decoders/decoder.py", line 160, in forward
dec_state, dec_outs, attns = self._run_forward_pass(
File "/home/mj/Desktop/AEC/chai-master/src/lib/OpenNMT-py/onmt/decoders/decoder.py", line 336, in _run_forward_pass
decoder_output, p_attn = self.attn(
File "/home/mj/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in call_impl
return forward_call(*input, **kwargs)
File "/home/mj/Desktop/AEC/chai-master/src/lib/OpenNMT-py/onmt/modules/global_attention.py", line 183, in forward
align.masked_fill
(1 - mask, -float('inf'))
File "/home/mj/.local/lib/python3.8/site-packages/torch/_tensor.py", line 30, in wrapped
return f(*args, **kwargs)
File "/home/mj/.local/lib/python3.8/site-packages/torch/_tensor.py", line 548, in rsub
return _C._VariableFunctions.rsub(self, other)
RuntimeError: Subtraction, the - operator, with a bool tensor is not supported. If you are trying to invert a mask, use the ~ or logical_not() operator instead.
sequencer-train.sh done

@Maaaj
Copy link
Author

Maaaj commented Nov 23, 2021

I am having this issue while I am training.
UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor). var = torch.tensor(arr, dtype=self.dtype, device=device)

and another

RuntimeError: Subtraction, the - operator, with a bool tensor is not supported. If you are trying to invert a mask, use the ~ or logical_not() operator instead

@hungkien05
Copy link

hungkien05 commented Nov 27, 2021

Same problem over here, when I was trying to run the predict script.
Edited:
I got this error sinceI didn't use the git-lfs when cloning the whole repo. I only used git-lfs to get the model.pt file.

@Maaaj
Copy link
Author

Maaaj commented Nov 28, 2021

Same problem over here, when I was trying to run the predict script. Edited: I got this error sinceI didn't use the git-lfs when cloning the whole repo. I only used git-lfs to get the model.pt file.

Can you please share the command for getting the clone using git-lfs for whole repo. I used git-lfs clone https://github.com/KTH/chai.git, but I am still getting the same error while training

@chenzimin
Copy link
Collaborator

Thanks all for the report, I will look into this issue.

@hungkien05
Copy link

Same problem over here, when I was trying to run the predict script. Edited: I got this error sinceI didn't use the git-lfs when cloning the whole repo. I only used git-lfs to get the model.pt file.

Can you please share the command for getting the clone using git-lfs for whole repo. I used git-lfs clone https://github.com/KTH/chai.git, but I am still getting the same error while training

You need to install git-lfs first. It's the same as normal git command, only change "git" into "git-lfs". Here is my commands:
git-lfs clone https://github.com/KTH/chai.git

@SamraMehboob
Copy link

I am also facing same error. If anyone managed to solve it, please share.
Thanks.

@SamraMehboob
Copy link

I resolved this issue.
You need to edit /chai-master/src/lib/OpenNMT-py/onmt/modules/global_attention.py.

change line 183 to align.masked_fill_(~mask, -float('inf'))
Thanks.

@chenzimin
Copy link
Collaborator

This indeed seems like a PyTorch version issue, one way to solve it is to change 1 - BOOL to ~BOOL, another way is to downgrade the torch version to 1.1.0

@XiaoyaWang-gh
Copy link

find the wrong file and change "1-mask" to "~mask".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

5 participants