-
Notifications
You must be signed in to change notification settings - Fork 12
HW6 2018
Save each of the two script as a PLAIN TEXT files and email them to [email protected]. Do not use Microsoft Word. PLAIN TEXT.
Due by midnight on October 24th for full credit. 20% off if submitted up to midnight Oct 31st.
Write a script that reads in a file of sequences. The input file is input_seqs.txt, which has one sequence per line, and you can use this file name in the script. The script should iterate over the lines in the file, find the index where the adapter starts in each sequence, and print the index of each adapter from each sequence in individual lines to a new file called indices.txt.
Adapter sequence: "ATCTCGTATGCCGTCTTCTGCTTG"
For example, I might use this file, named input_seqs.txt, to test:
AATCTCGTATGCCGTCTTCTGCTTGTTTTTTTTTT
AAATCTCGTATGCCGTCTTCTGCTTGGGGGGGGGGGGG
AAAAATCTCGTATGCCGTCTTCTGCTTGAAAAAAAAAAAAA
AAAAAAAAAATCTCGTATGCCGTCTTCTGCTTGCCCCCCCCCCCC
AAAAAATCTCGTATGCCGTCTTCTGCTTGATATATATATATATAT
AAAA
For full credit, I would run your code, which would create a file called indices.txt with the following content:
1
2
4
9
5
-1
Write a function that will take a dna sequence and an adapter sequence and print everything after the adapter sequence. Use this as the base of the script:
def trim_adapter(dna, adapter):
# your code goes here
assert trim_adapter('ACGGGTTT', 'GGG') == 'TTT'
assert trim_adapter('ACGGGTTT', 'GG') == 'GTTT'
assert trim_adapter('ACGGGTTT', 'CGGG') == 'TTT'
assert trim_adapter('AATCTCGTATGCCGTCTTCTGCTTGTTTTTTTTTT', 'ATCTCGTATGCCGTCTTCTGCTTG') == 'TTTTTTTTTT'
assert trim_adapter('AATCTCGTATGCCGTCTTCTGCTTGTTTTTTTTTT', '') == 'AATCTCGTATGCCGTCTTCTGCTTGTTTTTTTTTT'
In the most simple implementation, you will get unexpected behaviour when the adapter isn't found. If you can make the function meet this assertion as well, +2 EC points:
assert trim_adapter('AAA','CCC') == 'AAA'