-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Very large molecules consume huge amounts of memory #101
Comments
My first thought is usually resonance structures but I'll do a proper profile of what's going on. Thanks for the report! |
Ok... I think I've got this consistently down to ~110 seconds with and without DGL, ~1.9 GiB with OpenEye, ~450 MiB with RDKit for reasons I describe in openforcefield/openff-toolkit#1855. It's still not linear scaling, and undoubtedly there's a lot of optimization that could be further done -- most of the additional time takes place in the resonance enumeration that still makes a lot of use of |
THAT'S AWESOME!!! 110 seconds is EXTREMELY usable. This is going to make the NCAA vignette such a hard flex. |
GitHub was down for a little but just made a release that will hopefully show up on conda soon! |
I tried to charge a 5177 atom (349 residue) protein with NAGL. It took about fifty minutes before my kernel killed it because it had consumed all 64 GB of memory on my machine. I charged it with the OpenFF Toolkit's
Molecule.assign_partial_charges()
method.My understanding is that NAGL should be roughly linear-ish in time and space, and I've successfully charged a 10ish residue peptide in about 250 ms (which was awesome), which means ~50x more atoms lead to ~12000x time. I wonder if there's any low hanging fruit that could be picked to optimize NAGL for large molecules?
Jeff thought it might be related to the combinatorics of computing tautomers - the Toolkit recently changed from by default only producing the top ~10 tautomers to asking for as many as the backend can give.
Code to reproduce the behavior
(SMILES is my favorite format for sharing proteins)
Current environment
python -V
)? 3.10.14pip list
?pip list
conda list
?micromamba list
The text was updated successfully, but these errors were encountered: