Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experiment with Haskell-side, hashable memoized function(s) #100

Open
recursion-ninja opened this issue Nov 15, 2018 · 8 comments
Open

Experiment with Haskell-side, hashable memoized function(s) #100

recursion-ninja opened this issue Nov 15, 2018 · 8 comments

Comments

@recursion-ninja
Copy link
Collaborator

recursion-ninja commented Nov 15, 2018

Experiment with ST, TVars, and HashTable libraries to implement a Haskell-side memoization library for Hashable hashable structures. See if this is more efficient than the FFI binding to C++.

We want a type signature that looks like this:
(Eq a, Hashable a) => (a -> b) -> a -> b

@recursion-ninja recursion-ninja added this to the Version 0.2.0 milestone Nov 15, 2018
@recursion-ninja recursion-ninja self-assigned this Nov 15, 2018
@recursion-ninja recursion-ninja changed the title Expeiment with Haskell-side, hashable memoized function(s) Experiment with Haskell-side, hashable memoized function(s) Nov 17, 2018
@recursion-ninja
Copy link
Collaborator Author

See this commit 9271b8e where I removed the old, unsafe IO version.

@recursion-ninja
Copy link
Collaborator Author

There is a branch haskell-function-memoize in which I added an initial HashTable, ST, and TVar implementation. It currently doesn't compile because of a type error involving the s state type in the ST monad. I'm hoping @Boarders can help me look at this and see if it can be rectified.

@recursion-ninja
Copy link
Collaborator Author

I switched to an IO based implementation. It kinda seems to work... in a finicky way. I'll try shoving this in our codebase and measure the performance against the FFI-based memoized TCM.

@recursion-ninja
Copy link
Collaborator Author

The experiment was inconclusive. There is a fold over the bit vectors (DynamicCharacterElement) to perform the pairwise or threeway Sankoff algorithm for the cost and median that takes up all the time when using the native Haskell HashTable embedded in IO and a TVar. Optimizing the fold over the bit vector to allocate far less memory will be required for this to be viable.

We are de-prioritizing this fold optimization for the time being. This should be revisited at a later date. See the overlap, overlap2, & overlap3 functions in 971d8cf.

@recursion-ninja
Copy link
Collaborator Author

recursion-ninja commented Jun 14, 2019

After completing #141, we can use the benchmarks to quantify if this Haskell memoized TCM is better for usage with string-alignment median lookups.

@recursion-ninja
Copy link
Collaborator Author

We have moderate confidence that the memoize function is working as intended and is thread safe.

However, the memoized function is not being retained and reapplied between string alignment functions. The consequence is that the memoization work is discarded between each string alignment.

There may be a technique to remedy this. The hypothesis is that because we define the pairwise "overlap" function though a Getter that special cases a MetricRepresentation value, it re-creates the memoized function on each call to the getter even though the input and output are the same.
We are considering embedding the memoized function into the MetricRepresentation like so:

data DynamicCharacterMetadataDec c                                                                       
   = DynamicCharacterMetadataDec                                                                         
   { optimalTraversalFoci        :: !(Maybe TraversalFoci)                                               
   , structuralRepresentationTCM :: !(Either                                                             
                                        (DenseTransitionCostMatrix, MetricRepresentation ())             
                                        (MetricRepresentation (c -> c -> (c,Word))) -- ^ Changes here                       
                                     )                                                                   
   , metadata                    :: {-# UNPACK #-} !DiscreteCharacterMetadataDec                         
   } deriving (Generic, NFData) 

Because this would involve having a function in the "metadata sequence," we could no longer utilize compact regions. This is a potentially large downside.

The upside is that, if successful, we would not require a C++ FFI binding, could simplify the Exportable type-class, and further modularize the sub-library build architecture.

This hypothesis would constitute a fairly decent effort to test.

@recursion-ninja
Copy link
Collaborator Author

We should also look into using concurrent-hashtable.

@recursion-ninja
Copy link
Collaborator Author

recursion-ninja commented Feb 29, 2020

I have added the memoized functions to the metadata and excised the usage of compact region from the code. However, this has broken the SAVE and LOAD commands, due to these using compact regions to generate the save states.

We are switching to Binary instances from the binary package to substitute the functionality. Unfortunately this requires writing a lot of Binary instances for almost every data-type in our codebase (either explicitly or through a deriving mechanism). It constitutes to a fair amount of mechanical, straightforward work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants