Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't run steps of dynamics with NNPOps TorchForce #28

Open
dominicrufa opened this issue Sep 10, 2021 · 3 comments
Open

Can't run steps of dynamics with NNPOps TorchForce #28

dominicrufa opened this issue Sep 10, 2021 · 3 comments
Labels
help wanted Extra attention is needed

Comments

@dominicrufa
Copy link

In attempting to run MD on a TorchForce-equipped System (the TorchForce has the NNPOps symmetry functions equipped as described here ), I am observing strange behavior. Namely, I am able to create a Context with the System and return the State object with a potential energy, but when i run a step of dynamics, I observe

Traceback (most recent call last):
  File "/lila/home/rufad/github/qmlify/qmlify/openmm_torch/notebooks/yield_dynamics.py", line 119, in <module>
    ml_int.step(1)
  File "/home/rufad/anaconda3/envs/nnpops/lib/python3.9/site-packages/simtk/openmm/openmm.py", line 7036, in step
    return _openmm.CustomIntegrator_step(self, steps)
simtk.openmm.OpenMMException: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript, serialized code (most recent call last):
  File "code/__torch__/torchani/nn.py", line 95, in forward
    if torch.gt((torch.size(midx))[0], 0):
      input_ = torch.index_select(aev0, 0, midx)
      _29 = torch.flatten((_22).forward(input_, ), 0, -1)
                           ~~~~~~~~~~~~ <--- HERE
      _30 = torch.masked_scatter_(output, mask, _29)
    else:
  File "code/__torch__/torch/nn/modules/container.py", line 22, in forward
    _5 = getattr(self, "5")
    _6 = getattr(self, "6")
    input0 = (_0).forward(input, )
              ~~~~~~~~~~~ <--- HERE
    input1 = (_1).forward(input0, )
    input2 = (_2).forward(input1, )
  File "code/__torch__/torch/nn/modules/linear.py", line 13, in forward
    input: Tensor) -> Tensor:
    _0 = __torch__.torch.nn.functional.linear
    return _0(input, self.weight, self.bias, )
           ~~ <--- HERE
  File "code/__torch__/torch/nn/functional.py", line 4, in linear
    weight: Tensor,
    bias: Optional[Tensor]=None) -> Tensor:
  return torch.linear(input, weight, bias)
         ~~~~~~~~~~~~ <--- HERE
def celu(input: Tensor,
    alpha: float=1.,

Traceback of TorchScript, original code (most recent call last):
  File "/home/rufad/anaconda3/envs/nnpops/lib/python3.9/site-packages/torchani/nn.py", line 68, in forward
            if midx.shape[0] > 0:
                input_ = aev.index_select(0, midx)
                output.masked_scatter_(mask, m(input_).flatten())
                                             ~ <--- HERE
        output = output.view_as(species)
        return SpeciesEnergies(species, torch.sum(output, dim=1))
  File "/home/rufad/anaconda3/envs/nnpops/lib/python3.9/site-packages/torch/nn/modules/container.py", line 119, in forward
    def forward(self, input):
        for module in self:
            input = module(input)
                    ~~~~~~ <--- HERE
        return input
  File "/home/rufad/anaconda3/envs/nnpops/lib/python3.9/site-packages/torch/nn/modules/linear.py", line 94, in forward
    def forward(self, input: Tensor) -> Tensor:
        return F.linear(input, self.weight, self.bias)
               ~~~~~~~~ <--- HERE
  File "/home/rufad/anaconda3/envs/nnpops/lib/python3.9/site-packages/torch/nn/functional.py", line 1753, in linear
    if has_torch_function_variadic(input, weight):
        return handle_torch_function(linear, (input, weight), input, weight, bias=bias)
    return torch._C._nn.linear(input, weight, bias)
           ~~~~~~~~~~~~~~~~~~~ <--- HERE
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)`

On the other hand, if i do not equip the NNPops ani symmetry functions, this error is not encountered. I didnt notice any examples/pytests in this repo re: equipping a TorchForce with ANISymmetryFunctions. I'm not sure if this interoperability has been tested yet. If so, would it be possible to add a pytest/example? I'm not sure if this should go into the openmm-torch repo instead (since the functionality I was to practice uses NNPOPS). I'd be happy to troubleshoot if needed.

@peastman
Copy link
Member

This seems to be a common error. This issue has lots of discussion by people encountering it.

NVIDIA/apex#580

Here's one where the problem was fixed by upgrading to PyTorch 1.9.

allenai/allennlp#5064

In this one it was fixed by upgrading to CUDA 11.2.

https://stackoverflow.com/questions/66600362/runtimeerror-cuda-error-cublas-status-execution-failed-when-calling-cublassge

There are many other pages discussing the same error. Often it seems related to inconsistencies in the shapes or dtypes of tensors.

@dominicrufa
Copy link
Author

I noticed these, too. Will give these solutions a try and get back. Thanks for the sleuthing.

@jchodera jchodera changed the title porting NNPOPS to TorchForce Can't run steps of dynamics with NNPOps to TorchForce Sep 17, 2021
@jchodera jchodera changed the title Can't run steps of dynamics with NNPOps to TorchForce Can't run steps of dynamics with NNPOps TorchForce Sep 17, 2021
@raimis raimis added the help wanted Extra attention is needed label May 24, 2022
@jchodera
Copy link
Member

@dominicrufa : Is this still an active issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

4 participants