Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to run the model on mps device? #25

Open
vinbrule opened this issue Nov 17, 2022 · 5 comments
Open

how to run the model on mps device? #25

vinbrule opened this issue Nov 17, 2022 · 5 comments

Comments

@vinbrule
Copy link

how to run the model on mps device?

@mkardas
Copy link
Collaborator

mkardas commented Dec 9, 2022

Hi @vinbrule, can you try something like this:

from galai import gal
model = gal.load_model("base", num_gpus=0)
model.model.to("mps")
model.generate("The Transformer architecture [START_REF]")

?

@cerkut
Copy link

cerkut commented Dec 14, 2022

Good suggestion, but unfortunately does not work. Due pytorch bug pytorch/pytorch#77764
File "/opt/homebrew/Caskroom/miniforge/base/envs/galai/lib/python3.9/site-packages/transformers/models/opt/modeling_opt.py", line 113, in forward positions = (torch.cumsum(attention_mask, dim=1).type_as(attention_mask) * attention_mask).long() - 1 NotImplementedError: The operator 'aten::cumsum.out' is not currently implemented for the MPS device. If you want this op to be added in priority during the prototype phase of this feature, please comment on https://github.com/pytorch/pytorch/issues/77764. As a temporary fix, you can set the environment variablePYTORCH_ENABLE_MPS_FALLBACK=1 to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS.

@mkardas
Copy link
Collaborator

mkardas commented Dec 14, 2022

Thanks @cerkut for testing it and for the stack trace. It seems that cumsum.out operator is implemented in the pytorch nightly (pytorch/pytorch#88319).

@mkardas
Copy link
Collaborator

mkardas commented Dec 14, 2022

Also, did you try the PYTORCH_ENABLE_MPS_FALLBACK=1 suggestion? The fallback for the cumsum operator shouldn't cause any significant slowdown in this case.

@cerkut
Copy link

cerkut commented Dec 15, 2022

Thanks, almost working now.
I created an environment with pytorch-nightly # 2.0.0.dev20221215, installed galai with pip+https and tried

import galai as gal # note it is not from galai import gal
model = gal.load_model("base", num_gpus=0)

At this point, when I run the model on CPU, I get the perfect answer:
model.generate("The Transformer architecture [START_REF]")

'The Transformer architecture [START_REF] Attention is All you Need, Vaswani[END_REF] is a popular choice for sequence-to-sequence models. It consists of a stack of encoder and decoder layers, each of which is composed of a multi-head self-attention mechanism and a feed-forward network. The encoder is used to encode the'

But on MPS
model.model.to("mps") # prompts the full model
and I get
'The Transformer architecture [START_REF] following following following following following following following following following following following following following following following following following following following following following following following following following following following following following following following following following following following following following following following following following following following following following following following following following following following following following following following following following following following following'
This might be a torch-nightly bug, but the CPU seems indeed faster than MPS in inference so I'll stick to CPU.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants