Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AssertionError #43

Open
orubaba opened this issue Jul 17, 2022 · 12 comments
Open

AssertionError #43

orubaba opened this issue Jul 17, 2022 · 12 comments

Comments

@orubaba
Copy link

orubaba commented Jul 17, 2022

Hi gurus,
pls, I need your help. I am trying to run the get-vocab.py on my small dataset around 100. but keep getting this error as shown below:
Is there a way to go around this. the reference for the error is to the mol_graph.py line82:
"assert n - m <= 1 #must be connected"
image

@max-unfried
Copy link

I'm having the exact same issue - any idea why this is?

@NiharikaVadlamudi
Copy link

@wengong-jin Can you please suggest a solution for this ? Thanks

@max-unfried
Copy link

Anything that you could suggest - still hung up on it?

@AdamIzdebski
Copy link

+1

@marshallcase
Copy link

Can you post a picture of your molecule / encoded graph? In my experience you get n-m > 1 when there's atoms that aren't connected to the rest of the molecule. This could be because the cluster generation process in find_clusters() wasn't built for a particular combination of rings / bonds

@JonathanBroadbent
Copy link

JonathanBroadbent commented Sep 27, 2023

Hi, reopening this as I am currently experiencing the same problem.
It says the issue occurs in tree_decomp line 82

Has anybody found a fix yet?

@orubaba
Copy link
Author

orubaba commented Sep 28, 2023 via email

@JonathanBroadbent
Copy link

JonathanBroadbent commented Sep 28, 2023 via email

@orubaba
Copy link
Author

orubaba commented Sep 29, 2023 via email

@JonathanBroadbent
Copy link

JonathanBroadbent commented Sep 29, 2023 via email

@orubaba
Copy link
Author

orubaba commented Sep 30, 2023

import sys
import argparse
from hgraph import *
from rdkit import Chem
from multiprocessing.dummy import Pool

def process(data):
vocab = set()
for line in data:
s = line.strip("\r\n ")
hmol = MolGraph(s)
for node,attr in hmol.mol_tree.nodes(data=True):
smiles = attr['smiles']
vocab.add( attr['label'] )
for i,s in attr['inter_label']:
vocab.add( (smiles, s) )
return vocab

if name == "main":

parser = argparse.ArgumentParser()
parser.add_argument('--ncpu', type=int, default=1)
args = parser.parse_args()

data = [mol for line in sys.stdin for mol in line.split()[:2]]
data = list(set(data))

batch_size = len(data) // args.ncpu + 1
batches = [data[i : i + batch_size] for i in range(0, len(data), batch_size)]

with Pool(args.ncpu) as pool:
    vocab_list = pool.map(process, batches, chunksize=1)
    vocab = [(x,y) for vocab in vocab_list for x,y in vocab]

@orubaba
Copy link
Author

orubaba commented Sep 30, 2023

Interestingly, I removed all contents from line 34 downwards (I don't know why that should solve the problem for me) and it ran well. sorry you couldn't get the img. I was responding within my mail. let's see if it works. btw, chatgpt was really helpful with the error clearings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants