Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explicit valence for atom # 18 N, 4, is greater than permitted #1984

Open
emainas opened this issue Dec 12, 2024 · 9 comments
Open

Explicit valence for atom # 18 N, 4, is greater than permitted #1984

emainas opened this issue Dec 12, 2024 · 9 comments

Comments

@emainas
Copy link

emainas commented Dec 12, 2024

Hi everyone, I am simply doing ligand_off_molecule = Molecule.from_file("molecule.sdf", file_format='sdf')
where my molecule has a positive charge on nitrogen 19 (18 if you start counting from 0) and I do not know how to pass it properly. I attach my molecule.sdf file. I added the flag "M CHG 1 19 1" at the end but it does not work. Any ideas?

@emainas
Copy link
Author

emainas commented Dec 12, 2024

I am not sure my file got pasted here. I add it here again

new
  -ISIS-            3D

 78 81  0  0  0  0  0  0  0  0  0
   44.4530   84.8580   23.7330 N   0  0  0  0  0
   44.3400   84.1110   24.4020 H   0  0  0  0  0
   45.1030   84.6780   22.5520 C   0  0  0  0  0
   45.0740   86.0120   21.9230 C   0  0  0  0  0
   44.4070   86.9160   22.7410 C   0  0  0  0  0
   44.0190   86.1480   23.9010 C   0  0  0  0  0
   45.7680   86.2120   20.5950 C   0  0  0  0  0
   46.5160   85.4330   20.4060 H   0  0  0  0  0
   46.2930   87.1740   20.5680 H   0  0  0  0  0
   45.0470   86.1880   19.7700 H   0  0  0  0  0
   45.5960   83.6350   22.1480 O   0  0  0  0  0
   44.1750   88.3500   22.5040 C   0  0  0  0  0
   44.0220   88.9660   23.3930 H   0  0  0  0  0
   44.1190   88.9460   21.2890 C   0  0  0  0  0
   43.9360   90.0180   21.2030 H   0  0  0  0  0
   44.2260   88.3900   20.3600 H   0  0  0  0  0
   43.3340   86.6150   24.9990 C   0  0  0  0  0
   43.0790   87.6680   24.9400 H   0  0  0  0  0
   43.0070   84.6780   26.4310 N   0  0  0  0  0
   43.3680   83.9770   25.8000 H   0  0  0  0  0
   42.8860   85.9810   26.2410 C   0  0  0  0  0
   42.3100   86.6850   27.3430 C   0  0  0  0  0
   42.0740   85.7470   28.3110 C   0  0  0  0  0
   42.5140   84.5180   27.7130 C   0  0  0  0  0
   42.0530   88.1740   27.3690 C   0  0  0  0  0
   41.4470   88.4810   26.5080 H   0  0  0  0  0
   42.9950   88.7340   27.3390 H   0  0  0  0  0
   41.5110   88.4820   28.2700 H   0  0  0  0  0
   42.4830   83.2880   28.3400 C   0  0  0  0  0
   42.1460   83.3110   29.3740 H   0  0  0  0  0
   42.7350   81.5480   26.5380 N   0  0  0  0  0
   42.4470   82.1650   25.7940 H   0  0  0  0  0
   42.8170   81.9440   27.8690 C   0  0  0  0  0
   43.2220   80.8330   28.5650 C   0  0  0  0  0
   43.3940   79.7630   27.6420 C   0  0  0  0  0
   43.0740   80.2070   26.3820 C   0  0  0  0  0
   43.8330   78.3560   27.9880 C   0  0  0  0  0
   44.1580   78.2600   29.0290 H   0  0  0  0  0
   44.6750   78.0440   27.3580 H   0  0  0  0  0
   43.0150   77.6430   27.8310 H   0  0  0  0  0
   43.0460   79.4490   25.1280 C   0  0  0  0  0
   43.3810   78.4240   25.2680 H   0  0  0  0  0
   42.2470   80.9250   23.3250 N   0  0  0  0  0
   42.0660   81.7680   23.8440 H   0  0  0  0  0
   42.7120   79.7320   23.8200 C   0  0  0  0  0
   42.7780   78.7990   22.7240 C   0  0  0  0  0
   42.3490   79.4830   21.5930 C   0  0  0  0  0
   42.0020   80.8730   21.9860 C   0  0  0  0  0
   43.2230   77.3550   22.8140 C   0  0  0  0  0
   42.9530   76.7790   21.9230 H   0  0  0  0  0
   42.7560   76.8500   23.6670 H   0  0  0  0  0
   44.3100   77.2900   22.9380 H   0  0  0  0  0
   41.5900   81.8050   21.3110 O   0  0  0  0  0
   42.2470   78.9630   20.2180 C   0  0  0  0  0
   42.7130   77.9890   20.0650 H   0  0  0  0  0
   41.6460   79.5560   19.1630 C   0  0  0  0  0
   41.6280   79.0780   18.1820 H   0  0  0  0  0
   41.1460   80.5220   19.2260 H   0  0  0  0  0
   43.4670   80.7580   30.0570 C   0  0  0  0  0
   42.3260   80.0500   30.8310 C   0  0  0  0  0
   41.0140   80.8430   30.7690 C   0  0  0  0  0
   40.8530   81.7740   31.7420 O   0  0  0  0  0
   40.1900   80.7240   29.8780 O   0  0  0  0  0
   43.6180   81.7620   30.4720 H   0  0  0  0  0
   44.4070   80.2270   30.2490 H   0  0  0  0  0
   42.6570   79.9520   31.8710 H   0  0  0  0  0
   42.1640   79.0570   30.3970 H   0  0  0  0  0
   41.6320   81.6800   32.3500 H   0  0  0  0  0
   41.4780   85.9580   29.6870 C   0  0  0  0  0
   39.9640   85.6330   29.7220 C   0  0  0  0  0
   39.3500   85.9600   31.0890 C   0  0  0  0  0
   39.4030   85.2100   32.0490 O   0  0  0  0  0
   38.7750   87.1830   31.1990 O   0  0  0  0  0
   42.0010   85.3430   30.4290 H   0  0  0  0  0
   41.6350   86.9940   30.0110 H   0  0  0  0  0
   39.4800   86.2220   28.9340 H   0  0  0  0  0
   39.8310   84.5640   29.5210 H   0  0  0  0  0
   38.8610   87.6120   30.3090 H   0  0  0  0  0
  1  2  1  0  0  0
  1  3  1  0  0  0
  1  6  1  0  0  0
  3  4  1  0  0  0
  3 11  2  0  0  0
  4  5  2  0  0  0
  4  7  1  0  0  0
  5  6  1  0  0  0
  5 12  1  0  0  0
  6 17  2  0  0  0
  7  8  1  0  0  0
  7  9  1  0  0  0
  7 10  1  0  0  0
 12 13  1  0  0  0
 12 14  2  0  0  0
 14 15  1  0  0  0
 14 16  1  0  0  0
 17 18  1  0  0  0
 17 21  1  0  0  0
 19 20  1  0  0  0
 19 21  2  0  0  0
 19 24  1  0  0  0
 21 22  1  0  0  0
 22 23  2  0  0  0
 22 25  1  0  0  0
 23 24  1  0  0  0
 23 69  1  0  0  0
 24 29  2  0  0  0
 25 26  1  0  0  0
 25 27  1  0  0  0
 25 28  1  0  0  0
 29 30  1  0  0  0
 29 33  1  0  0  0
 31 32  1  0  0  0
 31 33  1  0  0  0
 31 36  1  0  0  0
 33 34  2  0  0  0
 34 35  1  0  0  0
 34 59  1  0  0  0
 35 36  2  0  0  0
 35 37  1  0  0  0
 36 41  1  0  0  0
 37 38  1  0  0  0
 37 39  1  0  0  0
 37 40  1  0  0  0
 41 42  1  0  0  0
 41 45  2  0  0  0
 43 44  1  0  0  0
 43 45  1  0  0  0
 43 48  1  0  0  0
 45 46  1  0  0  0
 46 47  2  0  0  0
 46 49  1  0  0  0
 47 48  1  0  0  0
 47 54  1  0  0  0
 48 53  2  0  0  0
 49 50  1  0  0  0
 49 51  1  0  0  0
 49 52  1  0  0  0
 54 55  1  0  0  0
 54 56  2  0  0  0
 56 57  1  0  0  0
 56 58  1  0  0  0
 59 60  1  0  0  0
 59 64  1  0  0  0
 59 65  1  0  0  0
 60 61  1  0  0  0
 60 66  1  0  0  0
 60 67  1  0  0  0
 61 62  1  0  0  0
 61 63  2  0  0  0
 62 68  1  0  0  0
 69 70  1  0  0  0
 69 74  1  0  0  0
 69 75  1  0  0  0
 70 71  1  0  0  0
 70 76  1  0  0  0
 70 77  1  0  0  0
 71 72  2  0  0  0
 71 73  1  0  0  0
 73 78  1  0  0  0
M  CHG  1   19   1
M  END

@j-wags
Copy link
Member

j-wags commented Dec 16, 2024

Thanks for writing in - I've reproduced the error and am working to get to the bottom of it.

@j-wags
Copy link
Member

j-wags commented Dec 18, 2024

Still looking into this, it's got something to do with having this nitrogen be the only one in the molecule with an explicit double bond.

Screenshot 2024-12-17 at 6 18 10 PM

@j-wags
Copy link
Member

j-wags commented Dec 18, 2024

Looking at the canonical structure of biliverdin (which this either is, or is very similar to), it appears that only three of the pyrrole groups should be protonated. I don't have an editor close at hand that could modify this, but if you do @emainas, could you try deprotonating the nitrogen with the explicit double bond and see if that clears it up?

@emainas
Copy link
Author

emainas commented Dec 18, 2024

yes it works without the proton but I want to study different protonation states, one of which is the structure with all 4 rings protonated. why would it not work though? all the valencies look good (unless I am missing sth?).

@mattwthompson
Copy link
Member

(Is protomer synonymous with protonation state? I forget a lot of cheminformatics jargon, disregard if I'm confused)

The toolkit provides Molecule.enumerate_protomers which could serve as programmatic bridge between molecules' less exotic forms and their protonation states, each of which can be roundtripped with disk and other representations

>>> from openff.toolkit import Molecule
>>> bill = Molecule.from_inchi("InChI=1S/C33H34N4O6/c1-7-20-19(6)32(42)37-27(20)14-25-18(5)23(10-12-31(40)41)29(35-25)15-28-22(9-11-30(38)39)17(4)24(34-28)13-26-16(3)21(8-2)33(43)36-26/h7-8,13-15,35H,1-2,9-12H2,3-6H3,(H,36,43)(H,37,42)(H,38,39)(H,40,41)/b26-13-,27-14-,28-15-")
>>> for index, protomer in enumerate(bill.enumerate_protomers()):
...     protomer.to_file(f"protomer{index}.sdf", file_format="sdf")
...     Molecule.from_file(f"protomer{index}.sdf", file_format="sdf").n_atoms
...
75
76
76
77

@j-wags
Copy link
Member

j-wags commented Dec 19, 2024

Thanks for checking on the deprotonated form, @emainas. My big-conjugated-molecules knowledge is a little rusty but I'm inclined to think (4/10) that the fully protonated form is chemically valid. Could you share how you made the structure in this report (namely, was the kekule structure hand-drawn, or interpreted by some tool? This will help me debug)

Matt - I think the issue is with RDKit file I/O specifically. Since your code block runs enumerate_protomers successfully, it means that it's using OpenEye for that (and thereofre also for file I/O)

@emainas
Copy link
Author

emainas commented Dec 19, 2024

@mattwthompson I suspect that this is the protomers of the propionic tails. biliverdin is technically a tetrapyrrolic di-acid because of the two tails so I guess your code generated the atom numbers for

  1. doubly deprotonated - 75
  2. proton on the left tail only - 76
  3. proton on the right - 76
  4. proton on both - 77

and I believe that all 4 of these protomers refer to the same version of the ring system where 3 out of 4 rings are protonated. nevertheless the snippet you attached actually comes super handy for stuff I work on so thanks for that

@j-wags I have a pdb (I don't remember where I got it initially - I think from some biliprotein in the PDBdatabank) that I then edited manually to get the nomenclature correct. next I opened it with pymol and added the +1 charge on the 4th (normally deprotonated) ring. I attach the pdb file I had before and after the pymol proton addition. I might have created a monster of an sdf file with all this editing and converting. see the attached pdb (the extra proton I added with pymol is
HETATM 79 HB TPS 1 43.368 83.977 25.800 0.00 0.00 H) -
twohydro.pdb.zip

question: what is the proper way to generate the sdf file? with amber only mol2 files are needed instead of sdf for ligands.

@j-wags
Copy link
Member

j-wags commented Dec 20, 2024

I went down so many rabbit holes, and I think I found the issues in the funniest possible place.

In the final line of the original file, you have M CHG 1 19 1. Could you remove one space between the first 1 and the 19? That gets it to load for me. (it's due to the use of fixed-width columns, see the page marked 49 in the spec for details)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants