Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refined mapping of perturbable regions for AMBER input files #382

Closed
annamherz opened this issue Jan 14, 2025 · 15 comments · Fixed by #385
Closed

Refined mapping of perturbable regions for AMBER input files #382

annamherz opened this issue Jan 14, 2025 · 15 comments · Fixed by #385

Comments

@annamherz
Copy link
Contributor

Hello! When mapping molecules to each other for AFE simulations, AMBER experiences issues when running simulations where a perturbable part of the molecule separates two shared regions of the molecule. This causes the following error message:
image
All of these simulations were tested at 4 fs with HMR applied, but the error also persisted at 2 fs.

An example of such a mapping is (lig_2n~lig_2o):
image

But also in another case, where ring breaking was permitted so that the mapping was able to proceed:(lig_CHEMBL3265002~lig_CHEMBL3265004):
image

In all of these cases, there is a shared atom separated from the rest of the molecule. In some cases, additionally pruning the atoms using the prune_crossing_constraints is able to overcome this (lig_2n~lig_2o):
image

but there are other cases where the atoms remain in the common atoms and the crash then still occurs (lig_CHEMBL3265002~lig_CHEMBL3265004):
image

Here are sample files for both: example_ligands.zip

I think a potential solution could be to check that all atoms that are in the common core are connected for the AMBER files, and if they are not connected to the largest common core, to remove them from the shared mapping (in the cfg file) and designate them as perturbable?

@lohedges
Copy link
Contributor

Thanks for this, I should be able to take a look tomorrow. It should certainly be something that can be identified via use of the Sire connectivity object. Is this a situation that should already be handled by prune_crossing_constraints, i.e. an edge case that slips through that pruning function, or is it something unique that would require it's own pruning function. I'll also check to make sure that the code in core matches that original Exscientia implementation. If it does, then I'm surprised that they've not come across (or mentioned) this before. (I have seen it before, but can't remember what system it was for.) Perhaps they have a internal workaround.

@lohedges
Copy link
Contributor

Oh, and when I saw this before I believe part of the issue was due to naming, rather than the mapping itself, i.e. it thought that there was something to constrain that shouldn't be constrained. I'll check this when I take a look tomorrow.

@annamherz
Copy link
Contributor Author

annamherz commented Jan 15, 2025

The prune_crossing_constraints should prune so that there are no constrained bonds between a common core and a softcore atom. This is why for the lig_2n~lig_2o it then removes the H from the mapping. For lig_CHEMBL3265002~lig_CHEMBL3265004, I think it should also remove the H from the mapping as those H are next to perturbing atoms, but I'm not sure if it does not carry this out as ring breaking is permitted in that case?

@lohedges
Copy link
Contributor

The pruning is done at the mapping stage, so knows nothing about ring breaking. I'll try to figure out why it is removed in one case, but not in the other.

@lohedges
Copy link
Contributor

Just to say that using the Exs AMBER mapping options, i.e. prune_crossing_constraints=True and prune_perturbed_constraints=True I can merge the molecules directly, i.e. there is no need for forcing or allowing ring breaking. Will see if I get the same issue as you when I come to run anything.

@lohedges
Copy link
Contributor

Do you have any more details about your setup/protocol since I'm not seeing this issue locally. This is my test script:

import BioSimSpace as BSS

exe = "/home/lester/.conda/envs/pmemd/bin/pmemd.cuda"

# Load.
print("Loading molecules...")
mol0 = BSS.IO.readMolecules("lig_CHEMBL3265002.sdf")[0]
mol1 = BSS.IO.readMolecules("lig_CHEMBL3265004.sdf")[0]

# Parameterise.
print("Parameterising molecules...")
mol0 = BSS.Parameters.gaff(mol0).getMolecule()
mol1 = BSS.Parameters.gaff(mol1).getMolecule()

# Map.
print("Mapping molecules...")
mapping = BSS.Align.matchAtoms(
    mol0, mol1, prune_crossing_constraints=True, prune_perturbed_constraints=True
)

# Align.
print("Aligning molecules...")
mol0 = BSS.Align.rmsdAlign(mol0, mol1, mapping)

# Merge.
print("Merging molecules...")
merged = BSS.Align.merge(mol0, mol1, mapping)

# Solvate.
print("Solvating molecules...")
solvated = BSS.Solvent.tip3p(merged, box=3 * [5 * BSS.Units.Length.nanometer])

# Minimise.
print("Minimising molecules...")
process = BSS.Process.Amber(solvated, BSS.Protocol.FreeEnergyMinimisation(), exe=exe)
process.start()
process.wait()
assert not process.isError()

# Equilibrate.
print("Equilibrating molecules...")
process = BSS.Process.Amber(
    process.getSystem(), BSS.Protocol.FreeEnergyEquilibration(runtime="50ps"), exe=exe
)
process.start()
process.wait()
assert not process.isError()

# Run.
print("Running production...")
process = BSS.Process.Amber(
    process.getSystem(), BSS.Protocol.FreeEnergyProduction(runtime="50ps"), exe=exe
)
process.start()
process.wait()
assert not process.isError()

print("Success!")

I've also tried with HMR and a larger time step. Also no issues.

From my local environment it appears that I'm using pmemd version 22.

@annamherz
Copy link
Contributor Author

Hello! Thank you for checking. I ran the above script and also experienced no issues, and then I also tried it with the previously equilibrated ligands and also had no issues. Comparing the files I had previously, I've realised it is not necessarily an issue in the BSS mapping (although manually adjusting this is how I fixed the error), but more so a different atommask in the amber.cfg . The different atommask happens when just generating the inputs:

BSS.FreeEnergy.Relative(
                    solvated,
                    BSS.Protocol.FreeEnergyProduction(runtime="50ps"),
                    engine="AMBER",
                    work_dir="prod_test",
                    ignore_warnings=True,
                    explicit_dummies=True,
                )

It is specifically the explicit_dummies=True that causes different atommask. When set to explicit_dummies=False, the atommask is as with the script above. I can replicate the error in the above script by using:

process = BSS.Process.Amber(
    process.getSystem(), BSS.Protocol.FreeEnergyProduction(runtime="50ps"), exe=exe,
    work_dir="prod_test",
    explicit_dummies=True,
)

I thought that explicit dummy atoms were required for AMBER, but I guess that seems to be creating issues here?

@lohedges
Copy link
Contributor

Ah, interesting, I had forgotten about the explicit_dummies option. I just say, I'm not 100% sure when or why this is needed. The fact that the default option (which was in the original implementation) works makes me think it's only needed in certain cases. I'll see if I can ask anyone on our Slack channel and will report back.

@annamherz
Copy link
Contributor Author

annamherz commented Jan 16, 2025

As far as I understand, AMBER topology requires explicit dummy atoms as that is how the atommasks work. However looking at the code in BSS, I think I may have misunderstood the description of it as when I follow it, it's used in _squash.py ~L50 as:

 # Generate a "system" from the molecule at lambda = 0 and another copy at lambda = 1.
    if explicit_dummies:
        mol0 = molecule.copy()._toRegularMolecule(
            is_lambda1=False, convert_amber_dummies=True, generate_intrascale=True
        )
        mol1 = molecule.copy()._toRegularMolecule(
            is_lambda1=True, convert_amber_dummies=True, generate_intrascale=True
        )
    else:
        mol0 = _removeDummies(molecule, False)
        mol1 = _removeDummies(molecule, True)
    system = (mol0 + mol1).toSystem()

and then in the _toRegularMolecule() function as:

"""
        convert_amber_dummies : bool
            Whether to convert dummies to the correct AMBER formatting for
            non-FEP simulations. This will replace the "du" ambertype
            and "Xx" element with the properties from the other end state.
"""

So by default it is converting the amber molecule as required, and then the explicit_dummies would then just be if it is a non-FEP run? So I guess I've been setting it wrong 😅

@lohedges
Copy link
Contributor

Yes, I agree that this is confusing, so I'm not sure what the use case is. The squash function is never triggered unless running a FreeEnergy protocol, so it shouldn't have any handling for the non-FEP case. This is handled by the base Process class here, which converts all perturbable molecules to regular ones at the chosen end states, making the AMBER dummies actual elements, as is done in the squash function above.

I'm thinking that this must have been a misunderstanding on the Exs part, since the option seems completely redundant to me. As such, do you think it makes sense if I just remove it?

@annamherz
Copy link
Contributor Author

I think not remove it, as AMBER still requires two seperate topologies for each ligand in the input files, so I think removing the explicit_dummies argument is also incorrect as then all dummies are removed. I'm currently doing some tests using convert_amber_dummies=False in the _squash_molecule function:

 # Generate a "system" from the molecule at lambda = 0 and another copy at lambda = 1.
    if explicit_dummies:
        mol0 = molecule.copy()._toRegularMolecule(
            is_lambda1=False, convert_amber_dummies=False, generate_intrascale=True
        )
        mol1 = molecule.copy()._toRegularMolecule(
            is_lambda1=True, convert_amber_dummies=False, generate_intrascale=True
        )
    else:
        mol0 = _removeDummies(molecule, False)
        mol1 = _removeDummies(molecule, True)
    system = (mol0 + mol1).toSystem()

As this is what is triggered in the FEP case, and then in the normal amber process, it should then still convert them to non-dummy atoms if it is not the FreeEnergyMixin .

This is handled by the base Process class here, which converts all perturbable molecules to regular ones at the chosen end states,

as that is handled by the part you mentioned. Although actually I see what you mean and I guess the squash could be changed to not have the explicit_dummies option and always have e.g. _toRegularMolecule(is_lambda1=True, convert_amber_dummies=False, generate_intrascale=True) ?

I'll have a close look at the generated files I'm testing and if they make sense 😄

@lohedges
Copy link
Contributor

I see what you mean and I guess the squash could be changed to not have the explicit_dummies option and always have e.g. _toRegularMolecule(is_lambda1=True, convert_amber_dummies=False, generate_intrascale=True) ?

Yes, that's what I was thinking. Let me know if this is a valid way of doing things and I'll update.

@annamherz
Copy link
Contributor Author

Hello! So I've done some tests and turns out convert_amber_dummies=True is essential, as otherwise the runs crash. I think looking again at the function description this actually makes sense as it states will replace the "du" ambertype and in the FEP simulations for AMBER there are no du atomtypes, these are all the regular atomtype and specified as 'dummy atoms' by the mask. So I think the function description is misleading in the sense that the conversion from du to regular atom types is not just for non-FEP simulations?

@lohedges
Copy link
Contributor

Yes, that makes sense. I added the convert_amber_dummies on Exs request, but the function was intended to be used for non-FEP simulations using a perturbable system, e.g. what you do for the other engines by extracting the end states. As you say, AMBER needs non du types. I guess they just used this internally for FEP too, where it is apparently needed, so the documentation is not referring to that use case. I can update the docs accordingly.

I guess I'm now confused as to what the correct set of options are, i.e. is there always a correct value for explicit_dummies that should be used for FEP simulations. (It's clearly not needed for non-FEP.)

@annamherz
Copy link
Contributor Author

After some more extensive testing, the MAE/RMSE statistics for >10 perturbations with explicit_dummies=True and explicit_dummies=False are the same, however the False option is more robust (less likely to crash at 4 fs, no hydrogen bond issues). As such keeping it set to this as a default fo False for AMBER in the Process and also Relative seems like the best way to proceed.

Potentially this can also be a hidden option, as it would then not be recommended for any of the engines.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants