Can't run steps of dynamics with NNPOps `TorchForce` #28

dominicrufa · 2021-09-10T14:47:44Z

In attempting to run MD on a TorchForce-equipped System (the TorchForce has the NNPOps symmetry functions equipped as described here ), I am observing strange behavior. Namely, I am able to create a Context with the System and return the State object with a potential energy, but when i run a step of dynamics, I observe

Traceback (most recent call last):
  File "/lila/home/rufad/github/qmlify/qmlify/openmm_torch/notebooks/yield_dynamics.py", line 119, in <module>
    ml_int.step(1)
  File "/home/rufad/anaconda3/envs/nnpops/lib/python3.9/site-packages/simtk/openmm/openmm.py", line 7036, in step
    return _openmm.CustomIntegrator_step(self, steps)
simtk.openmm.OpenMMException: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript, serialized code (most recent call last):
  File "code/__torch__/torchani/nn.py", line 95, in forward
    if torch.gt((torch.size(midx))[0], 0):
      input_ = torch.index_select(aev0, 0, midx)
      _29 = torch.flatten((_22).forward(input_, ), 0, -1)
                           ~~~~~~~~~~~~ <--- HERE
      _30 = torch.masked_scatter_(output, mask, _29)
    else:
  File "code/__torch__/torch/nn/modules/container.py", line 22, in forward
    _5 = getattr(self, "5")
    _6 = getattr(self, "6")
    input0 = (_0).forward(input, )
              ~~~~~~~~~~~ <--- HERE
    input1 = (_1).forward(input0, )
    input2 = (_2).forward(input1, )
  File "code/__torch__/torch/nn/modules/linear.py", line 13, in forward
    input: Tensor) -> Tensor:
    _0 = __torch__.torch.nn.functional.linear
    return _0(input, self.weight, self.bias, )
           ~~ <--- HERE
  File "code/__torch__/torch/nn/functional.py", line 4, in linear
    weight: Tensor,
    bias: Optional[Tensor]=None) -> Tensor:
  return torch.linear(input, weight, bias)
         ~~~~~~~~~~~~ <--- HERE
def celu(input: Tensor,
    alpha: float=1.,

Traceback of TorchScript, original code (most recent call last):
  File "/home/rufad/anaconda3/envs/nnpops/lib/python3.9/site-packages/torchani/nn.py", line 68, in forward
            if midx.shape[0] > 0:
                input_ = aev.index_select(0, midx)
                output.masked_scatter_(mask, m(input_).flatten())
                                             ~ <--- HERE
        output = output.view_as(species)
        return SpeciesEnergies(species, torch.sum(output, dim=1))
  File "/home/rufad/anaconda3/envs/nnpops/lib/python3.9/site-packages/torch/nn/modules/container.py", line 119, in forward
    def forward(self, input):
        for module in self:
            input = module(input)
                    ~~~~~~ <--- HERE
        return input
  File "/home/rufad/anaconda3/envs/nnpops/lib/python3.9/site-packages/torch/nn/modules/linear.py", line 94, in forward
    def forward(self, input: Tensor) -> Tensor:
        return F.linear(input, self.weight, self.bias)
               ~~~~~~~~ <--- HERE
  File "/home/rufad/anaconda3/envs/nnpops/lib/python3.9/site-packages/torch/nn/functional.py", line 1753, in linear
    if has_torch_function_variadic(input, weight):
        return handle_torch_function(linear, (input, weight), input, weight, bias=bias)
    return torch._C._nn.linear(input, weight, bias)
           ~~~~~~~~~~~~~~~~~~~ <--- HERE
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)`

On the other hand, if i do not equip the NNPops ani symmetry functions, this error is not encountered. I didnt notice any examples/pytests in this repo re: equipping a TorchForce with ANISymmetryFunctions. I'm not sure if this interoperability has been tested yet. If so, would it be possible to add a pytest/example? I'm not sure if this should go into the openmm-torch repo instead (since the functionality I was to practice uses NNPOPS). I'd be happy to troubleshoot if needed.

The text was updated successfully, but these errors were encountered:

peastman · 2021-09-10T21:00:33Z

This seems to be a common error. This issue has lots of discussion by people encountering it.

NVIDIA/apex#580

Here's one where the problem was fixed by upgrading to PyTorch 1.9.

allenai/allennlp#5064

In this one it was fixed by upgrading to CUDA 11.2.

https://stackoverflow.com/questions/66600362/runtimeerror-cuda-error-cublas-status-execution-failed-when-calling-cublassge

There are many other pages discussing the same error. Often it seems related to inconsistencies in the shapes or dtypes of tensors.

dominicrufa · 2021-09-10T21:07:38Z

I noticed these, too. Will give these solutions a try and get back. Thanks for the sleuthing.

jchodera · 2022-07-12T17:11:43Z

@dominicrufa : Is this still an active issue?

jchodera changed the title ~~porting NNPOPS to TorchForce~~ Can't run steps of dynamics with NNPOps to TorchForce Sep 17, 2021

jchodera changed the title ~~Can't run steps of dynamics with NNPOps to TorchForce~~ Can't run steps of dynamics with NNPOps TorchForce Sep 17, 2021

raimis added the help wanted Extra attention is needed label May 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can't run steps of dynamics with NNPOps `TorchForce` #28

Can't run steps of dynamics with NNPOps `TorchForce` #28

dominicrufa commented Sep 10, 2021

peastman commented Sep 10, 2021

dominicrufa commented Sep 10, 2021

jchodera commented Jul 12, 2022

Can't run steps of dynamics with NNPOps TorchForce #28

Can't run steps of dynamics with NNPOps TorchForce #28

Comments

dominicrufa commented Sep 10, 2021

peastman commented Sep 10, 2021

dominicrufa commented Sep 10, 2021

jchodera commented Jul 12, 2022

Can't run steps of dynamics with NNPOps `TorchForce` #28

Can't run steps of dynamics with NNPOps `TorchForce` #28