-
Notifications
You must be signed in to change notification settings - Fork 508
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat pt: Support multi structures input for single frame #3566
base: devel
Are you sure you want to change the base?
Conversation
for more information, see https://pre-commit.ci
Signed-off-by: Chenqqian Zhang <[email protected]>
deepmd/utils/data.py
Outdated
if self.multistru: | ||
size = frames["coord"].shape[1] | ||
rng = np.random.default_rng() | ||
sample_idx = rng.integers(low=0, high=size, size=1)[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it really random?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use the random generator in deepmd.utils.rndom
. The generator here cannot be controlled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have change it to
from deepmd.utils import random as dp_random
sample_idx = dp_random.choice(range(num_stru), size=1)[0]
But I find that when num_stru = 11, the sample_idx
is 1,4,8,10 all the time. I hope that the value is randomly selected from range(0--10)
. How can I solve it? @njzjz
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## devel #3566 +/- ##
==========================================
- Coverage 77.49% 77.47% -0.02%
==========================================
Files 432 432
Lines 37164 37185 +21
Branches 1620 1620
==========================================
+ Hits 28801 28810 +9
- Misses 7495 7507 +12
Partials 868 868 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't it be documented? Otherwise no one knows how to use it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I add doc to deepmd/doc
. Should I add doc to other places?
deepmd/utils/data.py
Outdated
if self.multistru: | ||
size = frames["coord"].shape[1] | ||
rng = np.random.default_rng() | ||
sample_idx = rng.integers(low=0, high=size, size=1)[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use the random generator in deepmd.utils.rndom
. The generator here cannot be controlled.
elif data.size == nframes * natoms * ndof_: | ||
if output_natoms_for_type_sel: | ||
pass | ||
if key == "coord" and self.multistru: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we assume aparam and other properties are the same for all conformations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't thought of a situation where there are different other properties(aparam fparam) of all conformations. The reason I add this feature is that when one only have molecules in SMILES format, one may generate several possible conformations using tools such as rdkit. Do you have more better ideas?
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not find the the feature is necessary. if different structures have different labels, they should not be placed in one dataset, because the dp model cannot fit to diverging labels.
@wanghan-iapcm Different 3D conformations of a molecule have exactly the same label, and each epoch takes one of all 3D conformations of this molecule. Because molecule may only has SMILES, the exact 3D sturcture is unknown. |
I do not find any reason why different 3D configurations cannot be place in one batch. |
Support multi structures input for one frame:
background:
When we predict the properties of small molecules, sometimes multiple 3D conformations are generated from the SMILES format structure, but their targets are the same. In each epoch we would like to pick only one of the 3D configurations as input.
Impletation:
When the file "multistru" is present in the data folder (like nopbc), "coord.npy" can be written in the format (nframe * struc_num * natom * 3), where struc_num is the number of 3D conformations for each structure.