Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Espaloma performance #19

Open
stefdoerr opened this issue Mar 2, 2023 · 13 comments
Open

Espaloma performance #19

stefdoerr opened this issue Mar 2, 2023 · 13 comments

Comments

@stefdoerr
Copy link

Hi all and thanks for this repo!
I tried to evaluate on my favorite toy (Benzamidine) and I didn't get very good performance and was curious if you had any comment on it.
image

First of all the charges look very different from AM1-BCC charges (init here is Espaloma, fit is AM1-BCC)

=== Molecule MOL, 18 atoms, 18 bonds ===
Name (init)  Name (new)  Element  Atomtype  Charge (init)  Charge (fit)  Chiral centers
          C          C1        C   (C1) za         -0.254        -0.262                
          C          C2        C   (C2) zb         -0.164        -0.082                
          C          C3        C        C3         -0.162        -0.116                
          C          C4        C        C4         -0.188        -0.058                
          C          C5        C        C5         -0.162        -0.116                
          C          C6        C   (C6) zb         -0.164        -0.082                
          C          C7        C   (C7) zc          0.300         0.522                
          N          N1        N   (N1) zd         -0.362        -0.469                
          N          N2        N   (N2) ze         -0.363        -0.453                
          H          H1        H        H1          0.273         0.149                
          H          H2        H        H2          0.251         0.170                
          H          H3        H        H3          0.258         0.170                
          H          H4        H        H4          0.251         0.170                
          H          H5        H        H5          0.273         0.149                
          H          H6        H   (H6) zf          0.363         0.327                
          H          H7        H   (H7) zf          0.363         0.327                
          H          H8        H   (H8) zg          0.245         0.327                
          H          H9        H   (H9) zg          0.245         0.327 

Beyond that, if I use Espaloma charges for torsion scans of the C-C-C-N torsion I get very weird torsion profiles compared to AM1-BCC. Left: AM1-BCC, Right: Espaloma. Focus on the green curve which is the initial MM evaluation with Sage FF and AM1-BCC and Espaloma respectively. I find it concerning that it considers the barrier at 0 degrees the highest instead of the one at 90 degrees which is also shown by xTB to be the highest barrier.
image

@stefdoerr
Copy link
Author

stefdoerr commented Mar 2, 2023

This is the code I used to obtain the parameters:

off_mol = OFFMolecule.from_file("./BEN.sdf")

from espaloma_charge.openff_wrapper import EspalomaChargeToolkitWrapper

toolkit_registry = EspalomaChargeToolkitWrapper()
# Espaloma doesn't like conformations so remove it and add it back afterwards
conf = off_mol.conformers[0].copy()
del off_mol.conformers[0]
off_mol.assign_partial_charges(
    "espaloma-am1bcc", toolkit_registry=toolkit_registry
)
off_mol.add_conformer(conf)  # Add conformer back

top = off_mol.to_topology()
ff = ForceField("Sage")
mol2 = Interchange.from_smirnoff(
    force_field=ff, topology=top, charge_from_molecules=[off_mol]
)
mol2.to_prmtop(tmpprm)
mol2.to_pdb(tmppdb)
prm = parmed.amber.AmberParameterSet.from_structure(parmed.load_file(tmpprm))

By the way, also the warning Espaloma gives (for which I'm deleting here the conformer) seems a bit excessive. It feels to me like it could just silently ignore the existence of conformers in the molecule.

IncorrectNumConformersWarning: Molecule 'Molecule with name 'BEN_pH7_4.sdf' and SMILES '[H][c]1[c]([H])[c]([H])[c]([C]([N]([H])[H])=[N+]([H])[H])[c]([H])[c]1[H]'' has 1 conformers, but charge method 'espaloma-am1bcc' expects exactly 0

@jchodera
Copy link
Member

jchodera commented Mar 2, 2023

Thanks for the detailed analysis, @stefdoerr! Could you also attach a ZIP file containing the BEN.sdf you used?

By the way, also the warning Espaloma gives (for which I'm deleting here the conformer) seems a bit excessive. It feels to me like it could just silently ignore the existence of conformers in the molecule.

Yeah, that's weird---we'll look into where this warning is generated. We should just ignore conformers if present.

@stefdoerr
Copy link
Author

Sure, here is the SDF file. Thanks!
BEN_pH7_4.zip

@jchodera
Copy link
Member

jchodera commented Mar 2, 2023

@stefdoerr : Hm, the charge RMS here is 0.098, which is not extraordinarily large, but it is in the tail of the EspalomaCharge for a molecule of this charge (left) and part of a second hump for molecules of this size (right; see red asterisk). The data shows that AmberTools is worse on average for molecules of this charge and size.
Uploading image.png…

In the Supporting Information, we explore some other evaluation metrics (some fast, some slow) that we could integrate into the loss function beyond just charge deviation L2 loss. The surprising impact of the modest charge deviations on the torsion profile suggests we should consider conformer electrostatic energies as well (cc @yuanqing-wang).

@stefdoerr
Copy link
Author

For some reason I cannot see the image. I assume it's this one?
image

To clarify, in our AM1-BCC method we first assign AM1-BCC charges to minimize the initial conformation of the molecule and then re-assign AM1-BCC charges on that minimized conformer.
After that we perform dihedral scans where we minimize the conformer at each (restrained) angle with MM (here Sage and AM1-BCC/Espaloma).

This has me thinking though if RMSE is the right method for evaluation. I assume the spatial distribution of the charges on the molecule might be more important like it is for dipole moments.

@stefdoerr
Copy link
Author

stefdoerr commented Mar 2, 2023

Also I find the last lines interesting:

          H          H6        H   (H6) zf          0.363         0.327                
          H          H7        H   (H7) zf          0.363         0.327                
          H          H8        H   (H8) zg          0.245         0.327                
          H          H9        H   (H9) zg          0.245         0.327 

Espaloma gives different charges to the hydrogens attached to the charged nitrogen versus the non-charged while AM1 gives the same charge
image

This is interesting because as far as I know the double bond is resonant (sorry my chemistry is rather bad but from what I've been told, that electron is delocalized and moves between the two nitrogens). So I would believe that AM1 is right in assigning same charges to all 4 hydrogens.

@jchodera
Copy link
Member

jchodera commented Mar 2, 2023

Here's what I was trying to post (from a plane!):
Uploading image.png…

To clarify, in our AM1-BCC method we first assign AM1-BCC charges to minimize the initial conformation of the molecule and then re-assign AM1-BCC charges on that minimized conformer.

Oh! Do you have some examples of before/after minimization for single-snapshot AM1-BCC charge assignment? How are you assigning AM1-BCC charges normally in this process?

AmberTools sqm supposedly does a short AM1 minimization to give relatively conformation-independent charges, though starting with a different conformer will end up with a different minimum, so there can be surprising differences between how conformers were generated.

OpenEye uses the "ELF10" method to try to eliminate conformer dependence, and ends up averaging over 10 conformations selected to have minimal intramolecular hydrogen bonding interactions.

After that we perform dihedral scans where we minimize the conformer at each (restrained) angle with MM (here Sage and AM1-BCC/Espaloma).

Restrained? Or is the torsion constrained?

This has me thinking though if RMSE is the right method for evaluation. I assume the spatial distribution of the charges on the molecule might be more important like it is for dipole moments.

What do you think of the SI figures that example ESP on molecular surfaces near the molecule? We've been thinking of using these ESPs in addition to charge deviation, but I do wonder if we should add intramolecular electrostatic energies as well.

@jchodera
Copy link
Member

jchodera commented Mar 2, 2023

benzamidine-plot

@stefdoerr
Copy link
Author

stefdoerr commented Mar 3, 2023

Restrained? Or is the torsion constrained?

Sorry I meant just the torsion constrained.

Oh! Do you have some examples of before/after minimization for single-snapshot AM1-BCC charge assignment? How are you assigning AM1-BCC charges normally in this process?

I checked, apparently it doesn't matter at all to AM1-BCC so you are right, it must be doing a minimization itself.

In any case I have the feeling that the issue here is the resonance. Can you check if OpenEye gives same charges to all nitrogen Hs like AM1-BCC?

@jchodera
Copy link
Member

jchodera commented Mar 3, 2023

In any case I have the feeling that the issue here is the resonance. Can you check if OpenEye gives same charges to all nitrogen Hs like AM1-BCC?

Good point---we used the OEAM1BCCELF10Charges to generate training data, which lacks options to set the symmetrization---we should check if it does or does not symmetrize chemically equivalent atoms.

Other options, such as OEAM1BCCCharges, do have this option, but can also lead to pathological cases because they omit the ELF10 step.

@stefdoerr
Copy link
Author

Hi, I just tested pure Espaloma (not espaloma_charge) version 0.3.1 and it gave symmetrical charges for my molecule. Does this mean this repo is deprecated? I have not seen here any significant development since January.

Does Espaloma use the same method as espaloma_charge for the charge estimation? Is it trained on AM1-BCC charges?

@jchodera
Copy link
Member

jchodera commented Aug 2, 2023

We are in the process of porting all the fixes back to espaloma_charge and retraining it on a superset of the espaloma 0.3.1 dataset! Apologies for the delay here---we were tied up with the CADD GRC until recently. Thanks again for the great issue report that led us to important fixes!

@mikemhenry
Copy link
Contributor

@stefdoerr Thank you for this very detailed report! We are getting ready to release an updated version!

@mikemhenry mikemhenry mentioned this issue Apr 25, 2024
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants