-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dbcan_utils CGC_substrate_abund and dbcan_utils CGC_abund error #179
Comments
It seems the issue happens when the script reads the file "cgc_standard.out". Can you share this file here? So I can debug the code. Jinfang |
Hi Jinfang, the cgc_standard.out looks like this: During the prediction of CGCs, the manual said I need to have my own gff file, so I modified the gff file from Prodigal output, which change : I don't know if this is wrong. |
Yes, you did the correct modification on gff file. And you got the output file "cgc_standard.out". Otherwise, you can not get this output. |
It seems the codes also look normal. So, what happens? Could you check the input file again, to look for another string "Gene Start" except for 1st line? If this still does not work. Can you send me all the input files? I will debug on my PC. |
Hi, I have found a similar issue. |
Hello,Have you successfully solved this problem?I met the question too |
Not yet, really.
I have used metaeuk for genes prediction. Could be this the problem?
Il Lun 1 Lug 2024, 11:11 powerby66 ***@***.***> ha scritto:
… Hi, I have found a similar issue. I'm trying ti follow the tutorial on raw
reads. I have shotgun sequencing. I arrived in the tutorial at this poin:
P13. dbcan_utils to calculate the abundance of CAZyme families,
subfamilies, CGCs, and substrates (i have skipped the point P12 because I
don't need a particular region, is it correct?... when i run this command:
dbcan_utils fam_abund -bt IS1_EF.depth.txt -i ../subs/IS1_ef.dbCAN -a TPM i
have this error: you are estimating the abundance of CAZyme! Reads are
single end! Total read count: 156453394! Can not find read count
information for CAZyme: k141_10018_1. In the directory IS3_ef.dbCAN i have
all the 17 files...Can you help me?
Hello,Have you successfully solved this problem?I met the question too
—
Reply to this email directly, view it on GitHub
<#179 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/BBFTCGZWCPNEMRKC3EXOXMLZKEMMFAVCNFSM6AAAAABJQCXDWKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOJZGYZTIMRRGM>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Hi, guys. We have fixed the bug in the updated version of dbCAN(several months ago). If still use the older version. please follow the steps: |
Thank you, i have another question. |
Paola,
We do not recommend using run_dbcan for CGC prediction and CGC-based abundance profiling. The reason is that the CGC/PUL concept does not exist in eukaryotes. The gff generated from metaeuk contains exons which will be wrongly treated as separate CDS/genes in run_dbcan. But, you can still use run_dbcan for CAZyme predictions and CAZyme-based abundance profiling, as no gff file will be used.
Yanbin
…________________________________
From: Paola88 ***@***.***>
Sent: Monday, July 1, 2024 9:24 AM
To: linnabrown/run_dbcan ***@***.***>
Cc: Subscribed ***@***.***>
Subject: Re: [linnabrown/run_dbcan] dbcan_utils CGC_substrate_abund and dbcan_utils CGC_abund error (Issue #179)
Caution: Non-NU Email
Thank you, i have another question.
I have used MetaEuk gor genes prediction, consequently i have to generate file.ffn with bedtools, what file is better to use? The output of metaeuk or the profigal.gff files generated at the substrate prediction? Than you
Paola
—
Reply to this email directly, view it on GitHub<#179 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AEXNKZS2ENS74QD5QZC5FLLZKFRBRAVCNFSM6AAAAABJQCXDWKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMBQGI4TSMJRGQ>.
You are receiving this because you are subscribed to this thread.
|
Thank you for your answer.
If i humderstand i have to do the steps p5 and after p9 in the tutorial, is
it right?
Il Lun 1 Lug 2024, 16:45 Yanbin Yin ***@***.***> ha scritto:
… Paola,
We do not recommend using run_dbcan for CGC prediction and CGC-based
abundance profiling. The reason is that the CGC/PUL concept does not exist
in eukaryotes. The gff generated from metaeuk contains exons which will be
wrongly treated as separate CDS/genes in run_dbcan. But, you can still use
run_dbcan for CAZyme predictions and CAZyme-based abundance profiling, as
no gff file will be used.
Yanbin
________________________________
From: Paola88 ***@***.***>
Sent: Monday, July 1, 2024 9:24 AM
To: linnabrown/run_dbcan ***@***.***>
Cc: Subscribed ***@***.***>
Subject: Re: [linnabrown/run_dbcan] dbcan_utils CGC_substrate_abund and
dbcan_utils CGC_abund error (Issue #179)
Caution: Non-NU Email
Thank you, i have another question.
I have used MetaEuk gor genes prediction, consequently i have to generate
file.ffn with bedtools, what file is better to use? The output of metaeuk
or the profigal.gff files generated at the substrate prediction? Than you
Paola
—
Reply to this email directly, view it on GitHub<
#179 (comment)>,
or unsubscribe<
https://github.com/notifications/unsubscribe-auth/AEXNKZS2ENS74QD5QZC5FLLZKFRBRAVCNFSM6AAAAABJQCXDWKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMBQGI4TSMJRGQ>.
You are receiving this because you are subscribed to this thread.
—
Reply to this email directly, view it on GitHub
<#179 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/BBFTCG4ZUQ44P6QVQJVT4WTZKFTSLAVCNFSM6AAAAABJQCXDWKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMBQGM2TONJZHA>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Sorry for the question, bit if I don't generate files.ffn how can I
estimate abundance, if i hunderstand, i need the depth file.
Thank you
Il Lun 1 Lug 2024, 17:38 Paola Di Gianvito ***@***.***> ha
scritto:
… Thank you for your answer.
If i humderstand i have to do the steps p5 and after p9 in the tutorial,
is it right?
Il Lun 1 Lug 2024, 16:45 Yanbin Yin ***@***.***> ha scritto:
> Paola,
>
> We do not recommend using run_dbcan for CGC prediction and CGC-based
> abundance profiling. The reason is that the CGC/PUL concept does not exist
> in eukaryotes. The gff generated from metaeuk contains exons which will be
> wrongly treated as separate CDS/genes in run_dbcan. But, you can still use
> run_dbcan for CAZyme predictions and CAZyme-based abundance profiling, as
> no gff file will be used.
>
> Yanbin
> ________________________________
> From: Paola88 ***@***.***>
> Sent: Monday, July 1, 2024 9:24 AM
> To: linnabrown/run_dbcan ***@***.***>
> Cc: Subscribed ***@***.***>
> Subject: Re: [linnabrown/run_dbcan] dbcan_utils CGC_substrate_abund and
> dbcan_utils CGC_abund error (Issue #179)
>
> Caution: Non-NU Email
>
>
> Thank you, i have another question.
> I have used MetaEuk gor genes prediction, consequently i have to generate
> file.ffn with bedtools, what file is better to use? The output of metaeuk
> or the profigal.gff files generated at the substrate prediction? Than you
> Paola
>
> —
> Reply to this email directly, view it on GitHub<
> #179 (comment)>,
> or unsubscribe<
> https://github.com/notifications/unsubscribe-auth/AEXNKZS2ENS74QD5QZC5FLLZKFRBRAVCNFSM6AAAAABJQCXDWKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMBQGI4TSMJRGQ>.
>
> You are receiving this because you are subscribed to this thread.
>
> —
> Reply to this email directly, view it on GitHub
> <#179 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/BBFTCG4ZUQ44P6QVQJVT4WTZKFTSLAVCNFSM6AAAAABJQCXDWKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMBQGM2TONJZHA>
> .
> You are receiving this because you commented.Message ID:
> ***@***.***>
>
|
That's right. For CAZyme-based abundance profiling, you only need to predict CAZymes (provide your own faa in p5), and you need ffn in p8 and p11. Any processes using contigs can be skipped.
________________________________
From: Paola88 ***@***.***>
Sent: Monday, July 1, 2024 11:31 AM
To: linnabrown/run_dbcan ***@***.***>
Cc: Yanbin Yin ***@***.***>; Comment ***@***.***>
Subject: Re: [linnabrown/run_dbcan] dbcan_utils CGC_substrate_abund and dbcan_utils CGC_abund error (Issue #179)
Caution: Non-NU Email
Sorry for the question, bit if I don't generate files.ffn how can I
estimate abundance, if i hunderstand, i need the depth file.
Thank you
Il Lun 1 Lug 2024, 17:38 Paola Di Gianvito ***@***.***> ha
scritto:
Thank you for your answer.
If i humderstand i have to do the steps p5 and after p9 in the tutorial,
is it right?
Il Lun 1 Lug 2024, 16:45 Yanbin Yin ***@***.***> ha scritto:
> Paola,
>
> We do not recommend using run_dbcan for CGC prediction and CGC-based
> abundance profiling. The reason is that the CGC/PUL concept does not exist
> in eukaryotes. The gff generated from metaeuk contains exons which will be
> wrongly treated as separate CDS/genes in run_dbcan. But, you can still use
> run_dbcan for CAZyme predictions and CAZyme-based abundance profiling, as
> no gff file will be used.
>
> Yanbin
> ________________________________
> From: Paola88 ***@***.***>
> Sent: Monday, July 1, 2024 9:24 AM
> To: linnabrown/run_dbcan ***@***.***>
> Cc: Subscribed ***@***.***>
> Subject: Re: [linnabrown/run_dbcan] dbcan_utils CGC_substrate_abund and
> dbcan_utils CGC_abund error (Issue #179)
>
> Caution: Non-NU Email
>
>
> Thank you, i have another question.
> I have used MetaEuk gor genes prediction, consequently i have to generate
> file.ffn with bedtools, what file is better to use? The output of metaeuk
> or the profigal.gff files generated at the substrate prediction? Than you
> Paola
>
> —
> Reply to this email directly, view it on GitHub<
> #179 (comment)>,
> or unsubscribe<
> https://github.com/notifications/unsubscribe-auth/AEXNKZS2ENS74QD5QZC5FLLZKFRBRAVCNFSM6AAAAABJQCXDWKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMBQGI4TSMJRGQ>.
>
> You are receiving this because you are subscribed to this thread.
>
> —
> Reply to this email directly, view it on GitHub
> <#179 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/BBFTCG4ZUQ44P6QVQJVT4WTZKFTSLAVCNFSM6AAAAABJQCXDWKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMBQGM2TONJZHA>
> .
> You are receiving this because you commented.Message ID:
> ***@***.***>
>
—
Reply to this email directly, view it on GitHub<#179 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AEXNKZRH75FTSOGCUWDQI6TZKF76TAVCNFSM6AAAAABJQCXDWKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMBQGU4DONZQGQ>.
You are receiving this because you commented.
|
Hi,
i have tried as you suggested to me, but i write you another time because
it doesn't work.
these ar3e my steps:
I have shotgun metagenomic data during wine fermentation and i have done
the gene prediction with metaeuk,
i have done the steps p5, p8 (after duplication removing), p10 and p11.
At step 13 i have this new error :
dbcan_utils fam_abund -bt GC1_D2.depth.txt -i
/home/pdigianv/ita_gre/CAZyme/GC1_D2.CAZyme -a TPM
You are estimating the abundance of CAZyme!
Reads are single end!
Total reads count: 549495!
Can not find read count information for CAZyme:
AA1.aln|k141_836|-|195|8.216e-54|1|149518|151017|151017[151017]:149518[149518]:1500[1500]
even if i have modified the utyls.py as suggested.
Can you help me?
Paola Di Gianvito, PhD
Tecnologo della ricerca, DISAFA, University of Turin
Agricultural Microbiology and Food Technology Sector
Corso Enotria 2/C, Ampelion
12051 Alba - Cuneo - ITALY
Il giorno lun 1 lug 2024 alle ore 18:44 Yanbin Yin ***@***.***>
ha scritto:
… That's right. For CAZyme-based abundance profiling, you only need to
predict CAZymes (provide your own faa in p5), and you need ffn in p8 and
p11. Any processes using contigs can be skipped.
________________________________
From: Paola88 ***@***.***>
Sent: Monday, July 1, 2024 11:31 AM
To: linnabrown/run_dbcan ***@***.***>
Cc: Yanbin Yin ***@***.***>; Comment ***@***.***>
Subject: Re: [linnabrown/run_dbcan] dbcan_utils CGC_substrate_abund and
dbcan_utils CGC_abund error (Issue #179)
Caution: Non-NU Email
Sorry for the question, bit if I don't generate files.ffn how can I
estimate abundance, if i hunderstand, i need the depth file.
Thank you
Il Lun 1 Lug 2024, 17:38 Paola Di Gianvito ***@***.***> ha
scritto:
> Thank you for your answer.
> If i humderstand i have to do the steps p5 and after p9 in the tutorial,
> is it right?
>
> Il Lun 1 Lug 2024, 16:45 Yanbin Yin ***@***.***> ha scritto:
>
>> Paola,
>>
>> We do not recommend using run_dbcan for CGC prediction and CGC-based
>> abundance profiling. The reason is that the CGC/PUL concept does not
exist
>> in eukaryotes. The gff generated from metaeuk contains exons which will
be
>> wrongly treated as separate CDS/genes in run_dbcan. But, you can still
use
>> run_dbcan for CAZyme predictions and CAZyme-based abundance profiling,
as
>> no gff file will be used.
>>
>> Yanbin
>> ________________________________
>> From: Paola88 ***@***.***>
>> Sent: Monday, July 1, 2024 9:24 AM
>> To: linnabrown/run_dbcan ***@***.***>
>> Cc: Subscribed ***@***.***>
>> Subject: Re: [linnabrown/run_dbcan] dbcan_utils CGC_substrate_abund and
>> dbcan_utils CGC_abund error (Issue #179)
>>
>> Caution: Non-NU Email
>>
>>
>> Thank you, i have another question.
>> I have used MetaEuk gor genes prediction, consequently i have to
generate
>> file.ffn with bedtools, what file is better to use? The output of
metaeuk
>> or the profigal.gff files generated at the substrate prediction? Than
you
>> Paola
>>
>> —
>> Reply to this email directly, view it on GitHub<
>>
#179 (comment)>,
>> or unsubscribe<
>>
https://github.com/notifications/unsubscribe-auth/AEXNKZS2ENS74QD5QZC5FLLZKFRBRAVCNFSM6AAAAABJQCXDWKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMBQGI4TSMJRGQ>.
>>
>> You are receiving this because you are subscribed to this thread.
>>
>> —
>> Reply to this email directly, view it on GitHub
>> <
#179 (comment)>,
>> or unsubscribe
>> <
https://github.com/notifications/unsubscribe-auth/BBFTCG4ZUQ44P6QVQJVT4WTZKFTSLAVCNFSM6AAAAABJQCXDWKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMBQGM2TONJZHA>
>> .
>> You are receiving this because you commented.Message ID:
>> ***@***.***>
>>
>
—
Reply to this email directly, view it on GitHub<
#179 (comment)>,
or unsubscribe<
https://github.com/notifications/unsubscribe-auth/AEXNKZRH75FTSOGCUWDQI6TZKF76TAVCNFSM6AAAAABJQCXDWKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMBQGU4DONZQGQ>.
You are receiving this because you commented.
—
Reply to this email directly, view it on GitHub
<#179 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/BBFTCG74KTJYIQS527LHKIDZKGBPFAVCNFSM6AAAAABJQCXDWKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMBQGYYDQMZXGQ>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Good morning, Can you help me? |
May I ask if the GFF and FFN annotation files predicted by Prokka for fungi and protozoa are reliable? |
Report
hi, I have encounter issues with the estimation of CGC substrate abundance and CGC abundance.
I followed all the steps from the manual and it ran smoothly, including dbcan_utils fam_abund and dbcan_utils fam_substrate_abund, however, when I ran dbcan_utils CGC_substrate_abund and dbcan_utils CGC_abund, error raise:
You are estimating the abundance of CGC/CGC substrate!
Reads are single end!
Total reads count: 218847!
Traceback (most recent call last):
File "/home/cdd/anaconda3/envs/dbcan/bin/dbcan_utils", line 10, in
sys.exit(main())
File "/home/cdd/anaconda3/envs/dbcan/lib/python3.8/site-packages/dbcan/utils/utils.py", line 621, in main
PUL_abundance(args)
File "/home/cdd/anaconda3/envs/dbcan/lib/python3.8/site-packages/dbcan/utils/utils.py", line 492, in PUL_abundance
PUL_abund = CAZyme_Abundance_estimate(paras)
File "/home/cdd/anaconda3/envs/dbcan/lib/python3.8/site-packages/dbcan/utils/utils.py", line 254, in init
seqid2dbcan_annotation,cgcid2cgc_standard = Read_cgc_standard_out(parameters.PUL_annotation)
File "/home/cdd/anaconda3/envs/dbcan/lib/python3.8/site-packages/dbcan/utils/utils.py", line 203, in Read_cgc_standard_out
tmp_record = cgc_standard_line(line.rstrip().split("\t"))
File "/home/cdd/anaconda3/envs/dbcan/lib/python3.8/site-packages/dbcan/utils/utils.py", line 191, in init
self.gene_start = int(lines[4])
ValueError: invalid literal for int() with base 10: 'Gene Start'
Version information
No response
The text was updated successfully, but these errors were encountered: