fix(cc): fix message passing when nghost is 0 and send list is empty #4237

CaRoLZhangxy · 2024-10-22T13:23:35Z

fix errors mentioned in following pr:
#4220
#4209
#4144

Summary by CodeRabbit

New Features
- Enhanced message passing logic in the computation process for improved efficiency.
- Added new test functions to evaluate DeepMD model performance under various conditions.
Bug Fixes
- Improved error handling and assertions in test cases to ensure robustness.
Refactor
- Streamlined tensor operations in the communication process to enhance clarity and reduce unnecessary computations.
- Removed outdated test cases related to neighbor list handling in the DeepPot class.

for more information, see https://pre-commit.ci

coderabbitai · 2024-10-22T13:25:22Z

📝 Walkthrough

Walkthrough

The changes in this pull request primarily focus on the DeepPotPT class in the DeepPotPT.cc file, where the compute method has been simplified by removing checks for ghost atoms and modifying how communication data is prepared. The test suite for the LAMMPS DeepMD integration has been enhanced with new tests and improved error handling. Additionally, the Border class in comm.cc has been updated to optimize tensor operations related to MPI communication by conditionally initializing tensors based on the number of elements to send or receive.

Changes

File	Change Summary
source/api_cc/src/DeepPotPT.cc	Simplified `compute` method by removing checks for `nghost`. Updated method signature by removing `nghost` and `lmp_list` parameters.
source/lmp/tests/test_lammps_dpa_pt_nopbc.py	Added new test functions for DeepMD model performance. Modified existing tests for parameterization and error handling. Enhanced assertions and data handling.
source/op/pt/comm.cc	Updated `forward_t` and `backward_t` methods in `Border` class to conditionally initialize tensors based on `nsend` and `nswap`. Improved clarity of tensor operations.
source/api_cc/tests/test_deeppot_dpa_pt.cc	Removed test cases `cpu_lmp_nlist` and `cpu_lmp_nlist_atomic` from `TestInferDeepPotDpaPtNopbc`.

Possibly related PRs

chore(pt): make comm_dict for dpa2 noncompulsory when nghost is 0 #4144: Modifies the DeepPotPT::compute function to handle message passing based on the presence of ghost atoms, which is directly related to the changes made in the main PR that also simplifies the handling of ghost atoms in the compute method.
fix(pt): keep mapping not none during lmp steps when nghost == 0 #4209: This PR enhances the DeepPotPT class by adding a mapping_tensor and refining the compute method to include logic for handling ghost atoms, which aligns with the changes in the main PR that also involve modifications to the compute method and the handling of ghost atoms.

Suggested reviewers

njzjz
wanghan-iapcm
iProzd

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (1)

source/op/pt/comm.cc (1)
255-268: Efficient optimization for backward pass

The changes effectively optimize the backward pass by conditionally allocating tensors only when nswap != 0. This improvement aligns with the PR objective and enhances memory efficiency.

Consider adding a brief comment explaining the purpose of max_recvnum for improved code clarity:
// Allocate receive buffer large enough for the maximum number of atoms received in any swap
int max_recvnum = sendnum_tensor.max().item<int>();

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between b4701da and 90d0648.

📒 Files selected for processing (3)

source/api_cc/src/DeepPotPT.cc (2 hunks)
source/lmp/tests/test_lammps_dpa_pt_nopbc.py (0 hunks)
source/op/pt/comm.cc (2 hunks)

💤 Files with no reviewable changes (1)

source/lmp/tests/test_lammps_dpa_pt_nopbc.py

🧰 Additional context used

🔇 Additional comments (4)

source/op/pt/comm.cc (2)

123-131: Excellent optimization for empty send lists

The changes effectively address the PR objective by introducing a conditional check for nsend != 0. This optimization prevents unnecessary tensor operations when there's nothing to send, improving efficiency and avoiding potential issues with empty tensors.

Line range hint 1-431: Summary: Effective optimization of message passing

The changes in this file successfully address the PR objective of fixing message passing when the send list is empty. The optimizations in both the forward and backward passes improve efficiency by avoiding unnecessary operations and tensor allocations. These changes enhance the overall performance and robustness of the Border class, particularly in edge cases with empty send lists or no swaps.
source/api_cc/src/DeepPotPT.cc (2)
Line range hint 307-332: Verify all call sites for the updated compute method signature

The compute method signature has been modified to remove the nghost and lmp_list parameters. This change suggests that the method no longer relies on external neighbor list information. While this simplification might improve the interface, it's crucial to ensure that all call sites in the codebase have been updated accordingly.

Please run the following verification to check for any remaining calls with the old signature:
#!/bin/bash
# Search for calls to compute with the old signature
rg "compute\s*\([^)]*nghost" source/
rg "compute\s*\([^)]*lmp_list" source/
If any results are found, they will need to be updated to match the new signature. Additionally, consider updating the documentation to reflect this change in the method's interface.

Also applies to: 431-456

Line range hint 171-221: Verify the impact of simplified message passing logic

The changes to the message passing logic have simplified the conditions, which aligns with the PR objective. However, this simplification might have some implications:

The removal of the nghost > 0 check could lead to unnecessary operations when there are no ghost atoms.

The behavior in edge cases (e.g., when nghost is 0 and the send list is empty) might have changed.

To ensure these changes don't introduce any regressions, please run the following verification:

Additionally, consider adding more test cases to cover these edge scenarios if they don't already exist.

for more information, see https://pre-commit.ci

coderabbitai

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (1)

source/api_cc/src/DeepPotPT.cc (1)

Line range hint 171-189: LGTM! Consider adding a comment for clarity.

The changes effectively address the PR objective by simplifying the message passing condition and handling communication data more robustly. This should resolve issues when nghost is 0 and the send list is empty.

Consider adding a brief comment explaining the purpose of the communicator_tensor initialization, especially the reason for the empty tensor when lmp_list.world == 0. This would improve code readability and maintainability.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 90d0648 and d8541da.

📒 Files selected for processing (1)

source/api_cc/src/DeepPotPT.cc (3 hunks)

🧰 Additional context used

🔇 Additional comments (2)

source/api_cc/src/DeepPotPT.cc (2)

Line range hint 1-577: Overall changes align well with PR objectives.

The modifications in this file are focused on the compute method, particularly on simplifying the message passing logic and tensor operations. These changes directly address the PR objective of fixing message passing when nghost is 0 and the send list is empty.

The targeted nature of the changes, along with the unchanged error handling and type mapping, suggests a careful and focused approach to resolving the specific issues mentioned in the PR description.

225-225: Verify the initialization of mapping_tensor.

The simplification of the condition for calling different run_method variants is consistent with the earlier changes. However, mapping_tensor is being used here without a visible initialization in the provided code segment.

Please ensure that mapping_tensor is properly initialized before this line. If it's initialized elsewhere in the file, consider moving the initialization closer to its usage or adding a comment to clarify where it's set up.

for more information, see https://pre-commit.ci

coderabbitai

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (2)

source/api_cc/src/DeepPotPT.cc (2)
Confirmed: const_cast Usage in DeepPotPT.cc

The use of const_cast in source/api_cc/src/DeepPotPT.cc lines 180-189 is confirmed and may lead to undefined behavior. Please refactor to eliminate the need for const_cast.
🔗 Analysis chain

Line range hint 171-186: Review the new tensor operations for MPI communication

The new tensor operations for MPI communication data look good and should improve performance by using zero-copy tensor creation with torch::from_blob. However, there's a potential issue with the use of const_cast for lmp_list.world.

The use of const_cast in line 184-185 could lead to undefined behavior if the underlying data is modified. Consider refactoring to avoid the need for const_cast. For example:
-      communicator_tensor = torch::from_blob(const_cast<void*>(lmp_list.world),
-                                             {1}, torch::kInt64);
+      communicator_tensor = torch::from_blob(const_cast<std::int64_t*>(static_cast<const std::int64_t*>(lmp_list.world)),
+                                             {1}, torch::kInt64);
This change makes the cast more explicit and type-safe. However, it's crucial to ensure that lmp_list.world is indeed of type std::int64_t*.

To ensure that this change doesn't introduce any issues, please run the following script to check for any other instances of const_cast in the codebase:

This will help identify any other potential issues related to const-correctness in the codebase.
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Search for other instances of const_cast
rg "const_cast" --type cpp
Length of output: 570
Remaining nghost References Detected

The search found multiple instances of nghost across various files in the codebase. These remnants indicate that the removal of ghost atom checks was incomplete and may lead to inconsistencies.

source/op/pt/comm.cc

source/op/tf/pairwise.cc

source/lmp/pair_deepmd.cpp

source/lmp/fix_dplr.cpp

source/lmp/pppm_dplr.cpp

source/lmp/compute_deeptensor_atom.cpp

source/api_c/tests/test_select_by_type.cc

source/api_c/tests/test_dipolecharge.cc

source/api_cc/src/common.cc

source/api_cc/src/DeepTensorTF.cc

source/api_cc/tests/test_select_by_type.cc

source/api_cc/src/DeepPot.cc

source/api_cc/src/DataModifierTF.cc

source/api_cc/src/DataModifier.cc

source/api_cc/src/DeepTensor.cc

source/api_c/include/deepmd.hpp

source/api_c/src/c_api.cc

source/api_cc/include/commonTF.h

source/api_cc/include/common.h

source/api_cc/include/DeepTensorTF.h

source/api_cc/include/DeepPotTF.h

source/api_cc/include/DeepPotPT.h

source/api_cc/include/DeepPot.h

source/api_cc/include/DataModifierTF.h

source/api_cc/include/DeepTensor.h

source/api_cc/include/DataModifier.h

Please address these remaining references to ensure the removal is thorough and the system behaves as expected.

🔗 Analysis chain

Line range hint 171-186: Verify the impact of removing ghost atom checks

The removal of nghost checks simplifies the code, but it may affect the system's behavior, especially in multi-node simulations or when dealing with boundary conditions.

Please confirm that this change doesn't introduce any issues in the following scenarios:

Single-node simulations with periodic boundary conditions

Multi-node simulations with domain decomposition

To assist in verification, you can run the following script to check for any remaining references to nghost in the codebase:

This will help ensure that the removal of ghost atom checks is consistent across the codebase.
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Search for remaining references to nghost
rg "nghost" --type cpp
Length of output: 43957

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between d8541da and 389b914.

📒 Files selected for processing (2)

source/api_cc/src/DeepPotPT.cc (3 hunks)
source/api_cc/tests/test_deeppot_dpa_pt.cc (0 hunks)

💤 Files with no reviewable changes (1)

source/api_cc/tests/test_deeppot_dpa_pt.cc

🧰 Additional context used

🔇 Additional comments (1)

source/api_cc/src/DeepPotPT.cc (1)
Line range hint 221-230: Review changes in the module's forward method call

The modification to the forward method call, which now includes a check for do_message_passing, is a good improvement. It allows for conditional execution of message passing in the neural network model, which can potentially improve performance when message passing is not required.

The change aligns well with the removal of ghost atom checks, as it moves the communication handling to a higher level within the model.

To ensure that this change is consistently applied and doesn't introduce any issues, please run the following script:
#!/bin/bash
# Search for other instances of forward_lower method calls
rg "forward_lower" --type cpp

# Check for any remaining references to ghost atoms in forward calls
rg "forward.*ghost" --type cpp
This will help verify that the changes are consistent across the codebase and that there are no remaining references to ghost atoms in forward method calls that might have been missed.

codecov · 2024-10-22T17:04:33Z

Codecov Report

Attention: Patch coverage is 85.71429% with 5 lines in your changes missing coverage. Please review.

Project coverage is 84.55%. Comparing base (b4701da) to head (e3f9433).
Report is 5 commits behind head on devel.

Files with missing lines	Patch %	Lines
source/op/pt/comm.cc	85.71%	0 Missing and 4 partials ⚠️
source/api_cc/src/DeepPotPT.cc	85.71%	0 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##            devel    #4237      +/-   ##
==========================================
- Coverage   84.55%   84.55%   -0.01%     
==========================================
  Files         537      537              
  Lines       51237    51244       +7     
  Branches     3047     3051       +4     
==========================================
+ Hits        43324    43327       +3     
- Misses       6965     6967       +2     
- Partials      948      950       +2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

njzjz

This PR seems to remove #4144, the minimal DPA-2 support for the old interface? I do need it work for AmberTools.

for more information, see https://pre-commit.ci

coderabbitai

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (1)

source/api_cc/src/DeepPotPT.cc (1)

Line range hint 225-231: Refactor the ternary operator for improved readability

Using a ternary operator with a complex multi-line function call can reduce code readability and maintainability. It can make the code harder to understand and debug.

Consider refactoring this code segment to use an if-else statement for better clarity.

Apply this diff to refactor the code:

-c10::Dict<c10::IValue, c10::IValue> outputs =
-    (do_message_passing == 1)
-        ? module
-              .run_method("forward_lower", coord_wrapped_Tensor, atype_Tensor,
-                          firstneigh_tensor, mapping_tensor, fparam_tensor,
-                          aparam_tensor, do_atom_virial_tensor, comm_dict)
-              .toGenericDict()
-        : module
-              .run_method("forward_lower", coord_wrapped_Tensor, atype_Tensor,
-                          firstneigh_tensor, mapping_tensor, fparam_tensor,
-                          aparam_tensor, do_atom_virial_tensor)
-              .toGenericDict();
+c10::Dict<c10::IValue, c10::IValue> outputs;
+if (do_message_passing == 1) {
+    outputs = module
+        .run_method("forward_lower", coord_wrapped_Tensor, atype_Tensor,
+                    firstneigh_tensor, mapping_tensor, fparam_tensor,
+                    aparam_tensor, do_atom_virial_tensor, comm_dict)
+        .toGenericDict();
+} else {
+    outputs = module
+        .run_method("forward_lower", coord_wrapped_Tensor, atype_Tensor,
+                    firstneigh_tensor, mapping_tensor, fparam_tensor,
+                    aparam_tensor, do_atom_virial_tensor)
+        .toGenericDict();
+}

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 389b914 and 410da82.

📒 Files selected for processing (2)

source/api_cc/src/DeepPotPT.cc (3 hunks)
source/op/pt/comm.cc (4 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

source/op/pt/comm.cc

🧰 Additional context used

🔇 Additional comments (1)

source/api_cc/src/DeepPotPT.cc (1)

Line range hint 171-190: Validate the safety of casting lmp_list.world

The use of const_cast<void*>(lmp_list.world) when creating communicator_tensor may lead to undefined behavior if lmp_list.world is not a valid pointer or if it is nullptr. This could potentially cause segmentation faults or other runtime errors.

Please ensure that lmp_list.world is a valid and appropriately typed pointer before casting. Consider adding a check to verify its validity or using safer casting mechanisms.

fix error in nopbc (nghost = 0)

8e9694d

github-actions bot added OP C++ LAMMPS labels Oct 22, 2024

[pre-commit.ci] auto fixes from pre-commit.com hooks

90d0648

for more information, see https://pre-commit.ci

CaRoLZhangxy requested review from njzjz and wanghan-iapcm October 22, 2024 13:24

coderabbitai bot reviewed Oct 22, 2024

View reviewed changes

CaRoLZhangxy and others added 2 commits October 22, 2024 23:43

fix dpa2 c++ when no lammps comm data pass in

550f18d

[pre-commit.ci] auto fixes from pre-commit.com hooks

d8541da

for more information, see https://pre-commit.ci

coderabbitai bot reviewed Oct 22, 2024

View reviewed changes

CaRoLZhangxy and others added 2 commits October 23, 2024 00:18

remove c++ lmp nlist ut due to no comm data

50621bd

[pre-commit.ci] auto fixes from pre-commit.com hooks

389b914

for more information, see https://pre-commit.ci

coderabbitai bot reviewed Oct 22, 2024

View reviewed changes

njzjz requested changes Oct 22, 2024

View reviewed changes

njzjz linked an issue Oct 22, 2024 that may be closed by this pull request

[BUG] DPA2 Lammps on nopbc systems causes torchscript error #4167

Closed

CaRoLZhangxy and others added 2 commits October 23, 2024 11:22

support old nlist api

8e82b9f

[pre-commit.ci] auto fixes from pre-commit.com hooks

410da82

for more information, see https://pre-commit.ci

coderabbitai bot reviewed Oct 23, 2024

View reviewed changes

add tests

e3f9433

njzjz approved these changes Oct 23, 2024

View reviewed changes

wanghan-iapcm approved these changes Oct 23, 2024

View reviewed changes

iProzd approved these changes Oct 23, 2024

View reviewed changes

iProzd added this pull request to the merge queue Oct 23, 2024

Merged via the queue into deepmodeling:devel with commit 18026eb Oct 23, 2024
60 checks passed

njzjz mentioned this pull request Oct 23, 2024

[BUG] DPA2 Lammps on nopbc systems causes torchscript error #4167

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(cc): fix message passing when nghost is 0 and send list is empty #4237

fix(cc): fix message passing when nghost is 0 and send list is empty #4237

CaRoLZhangxy commented Oct 22, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 22, 2024 •

edited

Loading

Walkthrough

Changes

Possibly related PRs

Suggested reviewers

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

coderabbitai bot left a comment

coderabbitai bot left a comment

coderabbitai bot left a comment

codecov bot commented Oct 22, 2024 •

edited

Loading

njzjz left a comment

coderabbitai bot left a comment

fix(cc): fix message passing when nghost is 0 and send list is empty #4237

fix(cc): fix message passing when nghost is 0 and send list is empty #4237

Conversation

CaRoLZhangxy commented Oct 22, 2024 • edited by coderabbitai bot Loading

Summary by CodeRabbit

coderabbitai bot commented Oct 22, 2024 • edited Loading

Walkthrough

Changes

Possibly related PRs

Suggested reviewers

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

codecov bot commented Oct 22, 2024 • edited Loading

Codecov Report

njzjz left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

CaRoLZhangxy commented Oct 22, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 22, 2024 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

codecov bot commented Oct 22, 2024 •

edited

Loading