Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implied alignment implementation #64

Closed
recursion-ninja opened this issue Feb 16, 2018 · 6 comments
Closed

Implied alignment implementation #64

recursion-ninja opened this issue Feb 16, 2018 · 6 comments
Assignees
Milestone

Comments

@recursion-ninja
Copy link
Collaborator

Implement implied alignment algorithm. It's already implement with the old data types. Some type changes are required and simplifying the logic for a post order & pre-order traversal.

@recursion-ninja recursion-ninja added this to the Version 1.0 milestone Feb 16, 2018
@recursion-ninja recursion-ninja self-assigned this Feb 16, 2018
@recursion-ninja
Copy link
Collaborator Author

recursion-ninja commented May 17, 2019

We'll need to encode an AlignmentContext value in each of the DynamicCharacter streams whch replace the current DynamicCharacterElement values.

Something like this:

data  AlignmentContext
    = Align  x y z
    | Delete x   z
    | Insert   y z

However, this representation will need to be "bit-packed" into the dynamic character stream so that it can be represented by a BitMatrix. This can be done by using 2 prefix bits to uniquely identify which context the element is from, and then 3x the number of bits for a single DynamicCharacterElement to store x, y, & z. Note that in the Delete and Insert value representations, the y and z values will be represented by a "gap" character, respectively.

This representation will take up three times the memory for each dynamic character stream. However, by using this representation we can avoid performing any alignment on the pre-order pass of the tree and instead use a stream processing function to generate the final states. Empirical evidence shows that this results in an asymptotic speed up, resulting in several orders of magnitude less work.

Note: We should be able coerce the output from the C code's FFI to this format by zipping over the three aligned sequences and interpreting their results accordingly.

@recursion-ninja
Copy link
Collaborator Author

recursion-ninja commented Feb 26, 2020

Note: Turns out that we cannot coerce the output from the C code's FFI to this format. There are alignments which return sequences from which it cannot not be inferred which output elements correspond to which input elements.

The implied alignment pre-order code has been added. See this commit: fd49df4.

This will be merged into master after more work has been done optimizing the dynamic character post-order code. The hope is that the optimized Haskell code will run comparable to the C code, and the issue of coercing the data across the FFI will be side stepped.

@wardwheeler
Copy link
Collaborator

wardwheeler commented Feb 26, 2020 via email

@recursion-ninja
Copy link
Collaborator Author

recursion-ninja commented Apr 4, 2020

I came up with a solution to converting the C results to our Haskell data type. I had to modify the backtrace code and the get_medians code on the C side.

In the backtrace section, instead of adding a gap_char on the traceback when a leftward or upward arrow was encountered, we instead added a '\0' value (no bits set). Since gap_char has one bit set at the gap character index and '\0' has no bits set, we can distinguish between "input gaps" and "output gaps" across the FFI.

The get_medians section had to be modified as well, to not look for gap_char in the alignments but instead look for '\0' and then correctly generate the median for sigma(x, -) when (x, '\0') was encountered.

The C code is now compatible with the Haskell data-types and the C pairwise alignment returns the same results as the Haskell pairwise alignments.

@wardwheeler
Copy link
Collaborator

wardwheeler commented Apr 4, 2020 via email

@recursion-ninja
Copy link
Collaborator Author

Closing as the implied alignment algorithm has been added (with a defect #166).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants