Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of Interchange.from_smirnoff on polymers #1122

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

mattwthompson
Copy link
Member

Description

Provide a brief description of the PR's purpose here.

Checklist

  • Add tests
  • Lint
  • Update docstrings

Copy link

codecov bot commented Dec 10, 2024

Codecov Report

Attention: Patch coverage is 96.55172% with 1 line in your changes missing coverage. Please review.

Project coverage is 93.62%. Comparing base (fcf4399) to head (5d34566).

Additional details and impacted files

@mattwthompson
Copy link
Member Author

Ultimately I wasn't able to find a major bottleneck (i.e. something $O(n^2)$ accidentally snuck into things) with the amount of time I had to look. Still, I found a few things that were unnecessarily slow and could be fixes quickly.

I'm seeing performance gains across the board in calling ForceField.create_interchange on systems with large (> 100 heavy atoms) molecule(s).

For a polymer-ish compound of increasing length, seeing closer to single-digit differences at larger molecule sizes:

image

Source: https://gist.github.com/mattwthompson/f64b4ba936147492b1b43db5b28f3e55

And on a large protein (run this in Jupyter):

%%timeit
ForceField(
    "ff14sb_off_impropers_0.0.3.offxml",
).create_interchange(
    Topology.from_pdb(
        "../proteinbenchmark/proteinbenchmark/data/pdbs/hewl-1E8L-model-1.pdb",
    ),
)

# fcf439975b2a5283228b4d10c55d63c360820d90: 25.6 s ± 2.23 s per loop (mean ± std. dev. of 7 runs, 1 loop each)
# 5d34566af0bdd34392915a4bb39a7f0d00cb3c4e: 22.6 s ± 1.21 s per loop (mean ± std. dev. of 7 runs, 1 loop each)

@mattwthompson mattwthompson marked this pull request as ready for review December 10, 2024 15:01
@mattwthompson
Copy link
Member Author

This drops the runtime of sage_ff14sb.create_interchange(top) in the toolkit showcase from ~75 to ~65 seconds

@timbernat
Copy link
Contributor

timbernat commented Dec 13, 2024

@mattwthompson Did my own quick benchmark with a profiler to get some more details on what's causing a bottleneck, code and results linked. For reproducibility, I ran the linked script with interchange v0.4.0 and toolkit v0.16.7. I ran with 1,200 repeat units but changed the repeat unit identity from "CO" to "CCO" (i.e. to PEG, as [CO]n doesn't correspond to any real polymer to the best of my knowledge).

Moving down the call stack, looks like the highest-cost primitive functions are:

Make of that what you will, I haven't dug thru the source code deeply enough to know why those calls in particular dominate runtime, but I suspect this'll at least help direct effort towards the key 20% of bugs.

I also have access to ~1,400 diverse polymer chemistries which I've run thru Interchange as part of an unrelated collaboration; the individual chains are only a few hundred atoms, but are packed into 10k atom melts before porting thru Interchange. If it would be of use, I can get back with Interchange output runtimes for these chemistries sometime next week after making some tweaks to my pipeline (viz incorporating profilers into a Signac-based workflow). Hope I could be of some help!

@mattwthompson
Copy link
Member Author

@timbernat this PR aims to streamline the bottleneck which I believe is the cause of this poor performance. Could you install against this branch (pip install git+https://github.com/openforcefield/openff-interchange.git@cache-parameter-lookups) and compare timings for a few polymers?

@timbernat
Copy link
Contributor

@mattwthompson Apologies for the long turnaround, see here for updated profile times (same code as prior but with PR-version of Interchange). Runtimes are pretty similar, with components from _nonbonded now dominating much of the runtime.

Sorry I couldn't be of more help on this front yet, I'll be consolidating some concurrent projects following the holidays which should hopefully give me some more time on the side. Let me know if I can contribute anything else here!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants