Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] An error when MC simulation in lammps #4207

Open
Zch102xjtumse opened this issue Oct 12, 2024 · 2 comments
Open

[BUG] An error when MC simulation in lammps #4207

Zch102xjtumse opened this issue Oct 12, 2024 · 2 comments
Assignees
Labels

Comments

@Zch102xjtumse
Copy link

Bug summary

Hello everyone. I met an error when I use the finetuned DPA2 model in the lammps MC simulation. The error informations is as below, I don't know what caused this.I'd appreciate it if you could help me with this.

DeePMD-kit Version

DeePMD-kit v3.0.0b4

Backend and its version

PyTorch v2.0.0.post200-gc263bd43e8e

How did you download the software?

Offline packages

Input Files, Running Commands, Error Log, etc.

ERROR on proc 2: DeePMD-kit C API Error: DeePMD-kit Error: DeePMD-kit PyTorch backend error: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript, serialized code (most recent call last):
File "code/torch/deepmd/pt/model/model/ener_model.py", line 56, in forward_lower
comm_dict: Optional[Dict[str, Tensor]]=None) -> Dict[str, Tensor]:
_5 = (self).need_sorted_nlist_for_lower()
model_ret = (self).forward_common_lower(extended_coord, extended_atype, nlist, mapping, fparam, aparam, do_atomic_virial, comm_dict, _5, )
~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
_6 = (self).get_fitting_net()
model_predict = annotate(Dict[str, Tensor], {})
File "code/torch/deepmd/pt/model/model/ener_model.py", line 213, in forward_common_lower
cc_ext, _36, fp, ap, input_prec, = _35
atomic_model = self.atomic_model
atomic_ret = (atomic_model).forward_common_atomic(cc_ext, extended_atype, nlist0, mapping, fp, ap, comm_dict, )
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
_37 = (self).atomic_output_def()
training = self.training
File "code/torch/deepmd/pt/model/atomic_model/energy_atomic_model.py", line 50, in forward_common_atomic
ext_atom_mask = (self).make_atom_mask(extended_atype, )
_3 = torch.where(ext_atom_mask, extended_atype, 0)
ret_dict = (self).forward_atomic(extended_coord, _3, nlist, mapping, fparam, aparam, comm_dict, )
~~~~~~~~~~~~~~~~~~~~ <--- HERE
ret_dict0 = (self).apply_out_stat(ret_dict, atype, )
_4 = torch.slice(torch.slice(ext_atom_mask), 1, None, nloc)
File "code/torch/deepmd/pt/model/atomic_model/energy_atomic_model.py", line 93, in forward_atomic
pass
descriptor = self.descriptor
_16 = (descriptor).forward(extended_coord, extended_atype, nlist, mapping, comm_dict, )
~~~~~~~~~~~~~~~~~~~ <--- HERE
descriptor0, rot_mat, g2, h2, sw, = _16
fitting_net = self.fitting_net
File "code/torch/deepmd/pt/model/descriptor/dpa2.py", line 84, in forward
repformers3 = self.repformers
_17 = nlist_dict[_1(_16, (repformers3).get_nsel(), )]
_18 = (repformers1).forward(_17, extended_coord, extended_atype, g11, mapping0, comm_dict0, )
~~~~~~~~~~~~~~~~~~~~ <--- HERE
g12, g2, h2, rot_mat, sw, = _18
concat_output_tebd = self.concat_output_tebd
File "code/torch/deepmd/pt/model/descriptor/repformers.py", line 226, in forward
_32 = torch.tensor(nloc)
_33 = torch.tensor(torch.sub(nall, nloc))
ret = ops.deepmd.border_op(_25, _26, _27, _28, _29, g10, _31, _32, _33)
~~~~~~~~~~~~~~~~~~~~ <--- HERE
g1_ext, comm_dict6, mapping6 = torch.unsqueeze(ret[0], 0), comm_dict7, mapping2
_34 = (_00).forward(g1_ext, g23, h2, nlist0, nlist_mask, sw1, )

Traceback of TorchScript, original code (most recent call last):
File "/home/zhaochenhao/soft/deepmd3.0b3/lib/python3.10/site-packages/deepmd/pt/model/model/ener_model.py", line 109, in forward_lower
comm_dict: Optional[Dict[str, torch.Tensor]] = None,
):
model_ret = self.forward_common_lower(
~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
extended_coord,
extended_atype,
File "/home/zhaochenhao/soft/deepmd3.0b3/lib/python3.10/site-packages/deepmd/pt/model/model/make_model.py", line 261, in forward_common_lower
)
del extended_coord, fparam, aparam
atomic_ret = self.atomic_model.forward_common_atomic(
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
cc_ext,
extended_atype,
File "/home/zhaochenhao/soft/deepmd3.0b3/lib/python3.10/site-packages/deepmd/pt/model/atomic_model/base_atomic_model.py", line 241, in forward_common_atomic

    ext_atom_mask = self.make_atom_mask(extended_atype)
    ret_dict = self.forward_atomic(
               ~~~~~~~~~~~~~~~~~~~ <--- HERE
        extended_coord,
        torch.where(ext_atom_mask, extended_atype, 0),

File "/home/zhaochenhao/soft/deepmd3.0b3/lib/python3.10/site-packages/deepmd/pt/model/atomic_model/dp_atomic_model.py", line 189, in forward_atomic
if self.do_grad_r() or self.do_grad_c():
extended_coord.requires_grad_(True)
descriptor, rot_mat, g2, h2, sw = self.descriptor(
~~~~~~~~~~~~~~~ <--- HERE
extended_coord,
extended_atype,
File "/home/zhaochenhao/soft/deepmd3.0b3/lib/python3.10/site-packages/deepmd/pt/model/descriptor/dpa2.py", line 652, in forward
g1 = g1_ext
# repformer
g1, g2, h2, rot_mat, sw = self.repformers(
~~~~~~~~~~~~~~~ <--- HERE
nlist_dict[
get_multiple_nlist_key(
File "/home/zhaochenhao/soft/deepmd3.0b3/lib/python3.10/site-packages/deepmd/pt/model/descriptor/repformers.py", line 480, in forward
assert "recv_num" in comm_dict
assert "communicator" in comm_dict
ret = torch.ops.deepmd.border_op(
~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
comm_dict["send_list"],
comm_dict["send_proc"],
RuntimeError: Trying to create tensor with negative dimension -1873441304: [-1873441304]
(/home/conda/feedstock_root/build_artifacts/deepmd-kit_1722057353391/work/source/lmp/pair_deepmd.cpp:586)
Last command: run 150000

MPI_ABORT was invoked on rank 2 in communicator MPI_COMM_WORLD
with errorcode 1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.

Steps to Reproduce

the lammps in.file is as follow:
label i
variable i loop 2
variable ts equal 0+300*$i
variable ta equal 0+300*$i
shell mkdir dpav1-${ta}
units metal
boundary p p p
atom_style atomic
timestep 0.001
read_data min.data
pair_style deepmd ../dpav1.pth
pair_coeff * * x x x
compute 1 all temp
compute Ek all ke/atom
compute Ep all pe/atom
compute_modify 1 dynamic yes
thermo_style custom step dt time temp ke pe etotal press lx ly lz vol
thermo 100
dump 1 all custom 5000 dpav1-${ta}/dumpthermo.atom.* id type x y z c_Ek c_Ep
velocity all create ${ts} 82765577 rot yes dist gaussian
fix r2 all npt temp ${ta} ${ta} 0.1 iso 0.0 0.0 1.0
fix mc4 all atom/swap 20 5 82765577 ${ts} types 1 2
fix mc5 all atom/swap 20 5 82765577 ${ts} types 1 3
fix mc6 all atom/swap 20 5 82765577 ${ts} types 2 3
run 100000
min_style cg
minimize 1.0e-6 1.0e-7 10000 10000

clear
next i
jump SELF i

Further Information, Files, and Links

No response

@njzjz
Copy link
Member

njzjz commented Oct 13, 2024

How many atoms are there? It looks like an integer overflow bug. Could you provide files to reproduce the bug?

@Zch102xjtumse
Copy link
Author

有多少个原子?它看起来像一个整数溢出错误。您能否提供文件来重现该错误?
108 Here are the files.file.zip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants