Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

getfasta -s not working #140

Open
malj390 opened this issue Dec 5, 2018 · 2 comments
Open

getfasta -s not working #140

malj390 opened this issue Dec 5, 2018 · 2 comments

Comments

@malj390
Copy link

malj390 commented Dec 5, 2018

Hello everyone,

I have a test.bed file with this structure:

chr4	74445406	74446534	AREG1	+
chr4	74446782	74449047	AREG2	+
chr20	57228229	57266628	BMP71	-
chr20	57202475	57228421	BMP72	-

I would like to get the fasta file so for that I am using bedtools getfasta like this:

bedtools getfasta -name -s -fullHeader -fi hg38.fa -bed test.bed -fo test.fasta

-s should give the proper strand but it doesn't

bedtools doesn't allow the "start and end" to be on the opposite way "end and start" so I reverse the position but I conserve the sign for the strand to get the reverse complement sequences in those genes that are in the negative strand.

Anyone knows what I am doing wrong or how to achieve the reverse complement obeying the strand?

I know how to make the reverse complement in Python and I can solve it, but I just don't want to add more unnecessary steps to the code, I would like to use the function that bedtools already has to solve this.

Thank you,
Miguel

@caadr
Copy link

caadr commented Apr 25, 2019

I was also running into this problem.

The sequence seems to get reversed correctly with a 6-column BEDfile (in your example there is no 'score' column).

Still, even with 6-column files, when -name and -s are used, getfasta appends (-) or (+) to the name which is not the expected behaviour (there should be no change to the name?), that is shown in the docs.

https://bedtools.readthedocs.io/en/latest/content/tools/getfasta.html#s-forcing-the-extracted-sequence-to-reflect-the-requested-strand

@maximus-sci
Copy link

I had this problem and it ended up being a problem with my bed file.

First, I'd recommend you insert a "score" column and just have a 1 on all rows.
Next, make sure that the "end of line" character is formatted for unix (it should just be \n not \r\n). If you have the windows EOL character bedtools will not read the strand correctly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants