-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
issue with split_on_adapter output #47
Comments
Okay, so there was a smaller truncated pod5 written by the device (a software crash occurred during this run) that I included in the above sample data. I thought maybe that was the issue, so I omitted it and made a new FASTQ. I repeated everything I did above but got an extremely similar error: [E::aux_parse] Incomplete aux field The offending line is: 125659e2-2885-4568-b285-f3304db0097e 16 chr4 68049456 60 16S119M1I32M1I93M1I52M3I130M1I70M1D5M3D70M41S * 0 0 CACCATTGTGATTTTACTTCTGTTTCCTTAGGTATAGTTGAATATGGTTTTAAGTAGGTTGTTGTCTTAATCATAAAACTGGTTTGGTCCAACAATATACTTTGCAGAACACCATGTAAGTTTAGCTTACTTCACTTTTTTTTTTTTTTTTGGCATCCAATTGATGTGAATATAAGATCTTGCAATTCTTATTATTCAATTACAGAAATTAAGCATAATTTACTTACTTAGGGCTTTGAGGATTCTTGGTCTGATTTAACCTGAAGTTTCTAGTTTAATATCTAAAATATTGGTGAGATTGGCAGGAATAGGGCTGAAGAGGTAATAATGAATGGTGGTAGGAGGAGGATCTGTTTTATGTTTTAAAATTACAATTTTAATGTTATTAGTTGTTGGGAACATCATTAGTGATGTGGAATAAATTAGATGTATATAAAAGTACAGGTTGTAGAAGAGCTATGATAGCAATGAATGTAAGGATAATGAGGCTATTATTTTTTGTTATTTCTTGAATCATAGTCATTGTAAAAATCCTGTTAGCAGGGGCAGTTCTCCTAAGGATAGTAGAATAACAAGGATTATAGATGTTAGTGAACAATGTTTGAGTTGTTCAAGGTATAGGCATAACACCAGAG %&&'(%%&(('''),0667>CE:9<:;2176798&&&'(.1299<>ED@?@A>;;;<;;;;==??CA==<=@@b;;:2385::953332;>AA@D:::?@,,..005558763333559>:6:::<;;:;322;=?CFDEJGDCBB::64777989@@<644(((/(75;@==?<;<;<4++**+,---...0D::::?@bef@90.-,.97:72.--/.16779;89:D=:78::=?@@??577345569:@=6+((,>?@fc@=>?@FDDBA>>?@JNIFEEC<;::::;=C=:====AA@=<<;<=5*('(3:=>>CDCC=<<<<<<;===@?=9889898<<=BD>?888;CIBBBIFHF????JKOFD6=?;77-,,,.//0756612888==<;++++89<:;8=6688@C<8833336ISJCCGE--.=::;;<))(+,+)&(++-/<==>=>?AB>334*)+*-/117677897767;>>>DEEL988;?>?A@<8877('''3)'&('&#$'+++,--/4>>>=@=8777:==>=>=;92108==>*)))3++,++++-;<?@acc??@@CBA512/'''((##$%'%$$&$%&%$%%))'%$%$%&$&'&)&$###$"#" NM:i:17 ms:i:1060 AS:i:1056 nn:i:0 tp:A:P cm:i:80 s1:i:477 s2:i:0 de:f:0.0225 rl:i:15 None I would note that the unsplit FASTQ produced no errors whatsoever after and looked perfect in IGV (other than all the duplex reads I'm trying to separate). |
Hi @itslittman, thanks for the question. To me it looks like the trailing Cheers |
@ollenordesjo The first offending line doesn't have a None at the end and does have a '_1' in the readname - does that one have anything else that looks fishy or could the modified base tags be causing an issue? Should I run split with the 'debug_output' option? Noah |
@ollenordesjo I pulled the line of the unsplit read from the mapped SAM file (the one I generated before realizing there were a lot of concatenated reads): 125659e2-2885-4568-b285-f3304db0097e 16 chr4 68049456 60 16S119M1I32M1I93M1I52M3I130M1I70M1D5M3D70M41S * 0 0 CACCATTGTGATTTTACTTCTGTTTCCTTAGGTATAGTTGAATATGGTTTTAAGTAGGTTGTTGTCTTAATCATAAAACTGGTTTGGTCCAACAATATACTTTGCAGAACACCATGTAAGTTTAGCTTACTTCACTTTTTTTTTTTTTTTTGGCATCCAATTGATGTGAATATAAGATCTTGCAATTCTTATTATTCAATTACAGAAATTAAGCATAATTTACTTACTTAGGGCTTTGAGGATTCTTGGTCTGATTTAACCTGAAGTTTCTAGTTTAATATCTAAAATATTGGTGAGATTGGCAGGAATAGGGCTGAAGAGGTAATAATGAATGGTGGTAGGAGGAGGATCTGTTTTATGTTTTAAAATTACAATTTTAATGTTATTAGTTGTTGGGAACATCATTAGTGATGTGGAATAAATTAGATGTATATAAAAGTACAGGTTGTAGAAGAGCTATGATAGCAATGAATGTAAGGATAATGAGGCTATTATTTTTTGTTATTTCTTGAATCATAGTCATTGTAAAAATCCTGTTAGCAGGGGCAGTTCTCCTAAGGATAGTAGAATAACAAGGATTATAGATGTTAGTGAACAATGTTTGAGTTGTTCAAGGTATAGGCATAACACCAGAG %&&'(%%&(('''),0667>CE:9<:;2176798&&&'(.1299<>ED@?@A>;;;<;;;;==??CA==<=@@b;;:2385::953332;>AA@D:::?@,,..005558763333559>:6:::<;;:;322;=?CFDEJGDCBB::64777989@@<644(((/(75;@==?<;<;<4++**+,---...0D::::?@bef@90.-,.97:72.--/.16779;89:D=:78::=?@@??577345569:@=6+((,>?@fc@=>?@FDDBA>>?@JNIFEEC<;::::;=C=:====AA@=<<;<=5*('(3:=>>CDCC=<<<<<<;===@?=9889898<<=BD>?888;CIBBBIFHF????JKOFD6=?;77-,,,.//0756612888==<;++++89<:;8=6688@C<8833336ISJCCGE--.=::;;<))(+,+)&(++-/<==>=>?AB>334*)+*-/117677897767;>>>DEEL988;?>?A@<8877('''3)'&('&#$'+++,--/4>>>=@=8777:==>=>=;92108==>*)))3++,++++-;<?@acc??@@CBA512/'''((##$%'%$$&$%&%$%%))'%$%$%&$&'&)&$###$"#" NM:i:17 ms:i:1060 AS:i:1056 nn:i:0 tp:A:P cm:i:80 s1:i:477 s2:i:0 de:f:0.0225 rl:i:15 qs:i:12 du:f:1.24775 ns:i:4991 ts:i:10 mx:i:2 ch:i:8 st:Z:2023-04-12T23:10:51.660+00:00 rn:i:12959 f5:Z:FAW60540_pass_a6ad0ce3_11d41050_402.pod5 sm:f:92.951 sd:f:27.937 sv:Z:quantile MM:Z:C+h?;C+m?; ML:B:C |
Hi @itslittman, Yes, it does indeed look like some tags have been removed: Having a closer look, it looks like adding the span of bases from which the read was extracted may have caused a problem: https://github.com/nanoporetech/duplex-tools/blob/master/duplex_tools/split_on_adapter.py#L177 We add the If you either remove this data from the files, and then proceed with your steps 4 and 5 from your original comment, I think you might get it working. In that case, I'll put a TODO to either move or remove this additional metadata in the fastq comments. Cheers! |
@ollenordesjo how do I remove this data? |
@ollenordesjo also how are other people getting good results without doing this if the start/end comment is automatically written? Noah |
Hi @itslittman, sorry for late reply. The easiest way to remove it from files you already have is probably this (which will remove -> from the sam file)
I think it's likely that people have not hit this issue if they haven't passed the tags from the usam through to mapping. Would it be possible to share any flags you used for minimap for the workflow below that you shared? I'm guessing you're using -y, but would be great to know the exact process so we're not missing any details when writing the tests
|
@ollenordesjo okay thanks i'll try that. Also the only other flag I'm using with minimap2 is -Y, so that everything is soft clipped (no hard clipping). Makes it easier for me to BLAT stuff (I'm dealing with fusion genes in leukemia samples). Noah |
@ollenordesjo the sed command did not work though - same error. Edit: I ran split with the non-modified-basecalling files output during the original run and everything converted to BAM fine. So it seems like the -y flag is to blame. Yet removing the comments the way you suggested didn't work - could it be MM and ML tags themselves that are causing the problem somehow? Even though they don't cause a problem when reads are not split? Noah |
@ollenordesjo could my Samtools version be an issue? |
Hi @itslittman, sorry for responding earlier. Not sure if the samtools version could be the issue. Which version have you tried? If you're able to upload the sam that is showing the issue I'm happy to see if I can spot the problem. |
@ollenordesjo It's all good! I'm using Samtools version 1.16.1 and htslib 1.16, which are from 2022 so pretty recent. How would I give you access to the SAM? It's like 30GB so even if I wanted to post it right on here I don't think I could. Would a dropbox link work? |
Yep, samtools version shouldn't be the problem then. If you send an email to me on olle[email protected] I can send you a link where you can upload it |
@ollenordesjo it's saying undeliverable to that address |
Ah, so sorry! It's olle [dot] [email protected] |
Hi @itslittman, received the sam and can confirm that it's possible to fix the offending lines with the following:
Can you let me know if this fixes the problem? Thanks! |
I successfully ran split_on_adapter with the --allow-multiple-splits option and it split a bunch of reads for me. When converting the new mapped SAM file to BAM format, it wrote about 570mb successfully and then failed with this error:
[E::aux_parse] B aux field type not followed by ','
[W::sam_read1_sam] Parse error at line 413764
samtools view: error reading file "sample_name.mapped.sam"
My workflow:
This is the offending line:
789cdf07-b995-461e-84fa-bf4ef4c62f1e_1 16 chr1 151237868 60 157M1D3M1D73M35S * 0 0 ATTCCCAAAATTGCTGAGTAGTGGCAATTTTAGATTCTCTTTGGTGGAATCAGAGTGGAAGAGGTAGGCAAGAAGATTTGGAGAAAACTAGATTATAATACATACTGTAGAGAGTTCCTGGGGTTAGAGGAAGGATCTCATTTTCTCCTGTTTTTTTATGATTTTTTTCTCTTTTTGTTTTCTTGATCACTTATTATCTGACCTTCTGGTTTATGGAGGATGAGGCAGTTATGAGCAATATGATGGAACCAGGTACTAACATAAACAG @@>;>9:::A?=-,**///44564455E7200,,.02056;<>A==><=>==;;88@;?=<;;<;;<=BB@<<<=CB@=844..--,--55;<=D<<;<>=?>>>>@acs>>?>;9:=GHEBB42<==@?<;;<?CBCEHKJ@=:::;;<=>?<70.--.<;<DGJ8HB=ACB>;97732(&&&(<;<:9;<>9568899<@=D>===?CBA76<;;;<@>><<99:90//)()+.-'&&%%&&+'''&%%'%##$$'&%%&$##" NM:i:2 ms:i:454 AS:i:454 nn:i:0 tp:A:P cm:i:40 s1:i:216 s2:i:0 de:f:0.0085 rl:i:15 NM:i:4 ms:i:1108 AS:i:1108 nn:i:0 tp:A:P cm:i:84 s1:i:487 s2:i:0 de:f:0.007 rl:i:114 qs:i:15 du:f:1.43725 ns:i:5749 ts:i:10 mx:i:2 ch:i:349 st:Z:2023-04-14T03:12:29.59+00:00 rn:i:25470 f5:Z:FAW60540_pass_9382fa45_a155afd1_502.pod5 sm:f:91.767 sd:f:26.724 sv:Z:quantile MM:Z:C+h?;C+m?; ML:B:C 0->268
Any ideas?
Noah
The text was updated successfully, but these errors were encountered: