You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A partitioned FASTA file includes one or more # characters in each sequence. This means that the sequences of each taxaon in he file will have exactly the same number of # characters in their corresponding sequence.
Each # breaks the dynamic character into two separate characters which will be aligned and optimized independently. Because the dynamic characters are all in the same file, they will default to being in the same block for network optimizations.
For example, the following FASTA file:
> Alpha
ACCT#GATT#CATTAG
> Bravo
CCT#GAT#CATAG
> Charlie
ACC#ATTT#CATTAG
In the above we can see that Bravo, Charlie, and Delta all have empty partitions.
Implementation:
The FASTA parser already exists in a very usable state and accepts # characters, though it does not currently interpret them in the special way described above.
We should perform a post-parsing pass over the FASTA data. If any sequence as one or more # chars present, we will enforce that all sequences have the same number of # present or raise a parse error.
We should make the parse error as human readable as possible. For example if all but one sequence had four # chars and the other sequence had a different number, the parse error should focus the user's attention to only the outlier sequence.
The text was updated successfully, but these errors were encountered:
Details:
A partitioned FASTA file includes one or more
#
characters in each sequence. This means that the sequences of each taxaon in he file will have exactly the same number of#
characters in their corresponding sequence.Each
#
breaks the dynamic character into two separate characters which will be aligned and optimized independently. Because the dynamic characters are all in the same file, they will default to being in the same block for network optimizations.For example, the following FASTA file:
Would return the following
Map String [String]
:Which represents each taxon having 3 dynamic characters in a single block.
We should also decide if we will allow empty partitions in a FASTA file.
For example, would the following file be allowed:
In the above we can see that
Bravo
,Charlie
, andDelta
all have empty partitions.Implementation:
The FASTA parser already exists in a very usable state and accepts
#
characters, though it does not currently interpret them in the special way described above.We should perform a post-parsing pass over the FASTA data. If any sequence as one or more
#
chars present, we will enforce that all sequences have the same number of#
present or raise a parse error.We should make the parse error as human readable as possible. For example if all but one sequence had four
#
chars and the other sequence had a different number, the parse error should focus the user's attention to only the outlier sequence.The text was updated successfully, but these errors were encountered: