The R
data structure should not have the methos that perform the rewrite
operations. There should be a .rewrite
method that will be used as a visitor,
and then that can rewrite the children. The children can then be passed up to
the parent at which point the parent can control how it is rewritten. So in the
case of intersect, if any of the children are rewritten as failed, then that
becomes a failed object itself.
- the evaluation of intersect should be fairly simple as it only has to evaluate all of its children and then determine if it still exists, it is basically just a not interesting operation.
- the union and the partition as well as the memo tables are the same structure
basically. There should track which ground values have been assigned to a
variable, and then construct those branches.
- this doesn’t quite capture the ability to delay the checking of the domains such that it is lazy. This means that the story is not quite complete? I suppose that is basically just a question about how far it wants to go with trying to ground out the memos. In the case that something is fully ground, then we can handle that directly.
- this just means that there needs to be some policy that is attached to the R’s partition object such that it can determine if it will allow for non-ground expressions to be memoized. If it doesn’t then it has to push those to an agenda and then keep running those expressions. This just has to always memoize terminal states, but anything else should be fine to choose between.
- then there is questions about how forward chaining would work? Or how would this add new expressions to a parttion. I suppose that this is dok with this adding a new expression to the table and then that having to be further forward chained.
- in the case of a cycle, I suppose that there needs to be some explicit unk
slot for expressions that is is unable to find in the memoized table? In
which case, if it doesn’t find something that is in the table, it ends up
making a query against the unk slot and memoizing the result. That unk slot
could contain another partition that is internally going to be managing
whatever expressions it has to deal with.
- this would basically be that we want to first guess null, in the case that it cycles around and finds itself, but then we want to guess 1
- We would like to track where rewrites are performed, which would mean that we are keeping around the old version of stuff? In that case, we could match against the older version of something? If we rewrite something away, then we want to eliminate the entire branch of something.
- Priority union partitions, in that if there is not a match, then this has to go up to the higher level. This high level can then be memoized based off what is returned down to a child union.
- the partitions should have some optimizations such that they can quickly dispatch to whichever branch contains the relevant values.
- In the case of unioned, partitions, we are going to want to compress multiple levels of partitions such that we don’t have long chains of unions that we are having to handle. That is assuming that we can handle the partition, otherwise then I suppose that we are just keeping the partition around??
- would be nice if there was some dispatching over outer functor of variables. That would let us use the same structure for a dyna base where we are looking up entries using the outer functor as the identifier.
- There should be something similar to what is in dyna-phi with the memo table structure. This would be something like there is a dense array where we are mapping integers to different partitions. I suppose that there also would need to be something that is the unset case, and then there might also be stronger backoffs with these handling things that are not arrays?
- these storage classes, would need to eventually be capable of supporting matrices such that we can identify the matmul cases I suppose.
- dispatching over different disjoint branches
- efficient dispatch. Some trie over the different keys, and maybe something that could dispatch over the functor of a variable?
- constructing the union iterators which ensure that we can iterate the domain
of a variable with checking the multiple branches at the same time.
- in the case that there is some fallback, then this should only be “enabled”
in the case of the fallback being empty. (Meaning that the memo table
default is null), otherwise, then we would have to merge the results of the
partition with the table. Which is really just the iterating case when
there is some overrides.
- this would have to be moded or something, so that we know if there was something that overwrote the memo, otherwise we are going to not know if there is something that overwrote the different values.
- in the case that there is some fallback, then this should only be “enabled”
in the case of the fallback being empty. (Meaning that the memo table
default is null), otherwise, then we would have to merge the results of the
partition with the table. Which is really just the iterating case when
there is some overrides.
- updates / changes of writes into the memo table
- subscribed downstream operations
- these subscribe operations need to ensure that they don’t accidental do duplication in the case that it references the same upstream table multiple times. I think that should be done by just adding something into the R-expr, and then we won’t have to worry about that happening again? This would just be some constraint that is like not equal between the tuple of variables, in the case that some of the variables are bound, it could eliminate itself early, though that would violate the thing where the mode it runs in is only dependent on the values and not having to check later
- subscribed downstream operations
- A fallback pointer to something in the case that it can’t find it explicitly
memoized.
- So if something is memoized (either with
terminal(0)
or otherwise), then it would be fine, this is really the “did not find” case.- the default on this would simply be just to have this fallback to
terminal(0)
, as that indicates that there is nothing left to do. - We should always be able to memoize the values that are returned from the
fallback?
- however what if there is something that is processing guesses on the entry? I suppose in that case, these would have that the guess null is Terminal, and thus non contributing.
- the default on this would simply be just to have this fallback to
- This is already needed in the case of unk defaults, where we are allowed to memoize anything across the call boundary. These tables are not allowed to be used for iteration. Even fallback to earlier expressions would be a problem? Those would have to handle the overwrites over invalid values (this is basically the stuff that comes from lipics)
- So if something is memoized (either with
- Assumption tracking
- in the case that some value changes, then it would depend what it, and then some way to notify those operations
- Memoization policy
- Are we allowed to memoize non ground expressions, and if not, how do we handle those expressions. This would be something like push the things that we are not allowed to memoize to the agenda to be processed later.
- so there is some policy which is just memoize everything that is null, then something that
- partition dispatching over the outer functor
- partition dispatching over potentially overlapping heads (a list that it has to check all)
- a partition where it is something like the most specific wins
- the memoization table requires some notion of modes, in that variables that were not bound in the query need to have their full domain already known. In the case that we have some backoff where we are allowed to handle the different grounding levels.
- unique partition
- Something where there is a unique branch that will be selected, so if there
isn’t a unique branch, then that would be an error. This would be helpful
in the case of things like branches over
+/2
, as those would want to select a unique operator to implement the operation.- to make the dispatch over operators work well, that would require some type inference about what is going to be used to call a particular location.
- Something where there is a unique branch that will be selected, so if there
isn’t a unique branch, then that would be an error. This would be helpful
in the case of things like branches over
- Needs to be able to scan the memos and identify which have a different result from compute. In the case that there are some ground values, then we need to determine which expressions are contained. I suppose that if we have the fully free version, then that version is not allowed to perform queries against its elements, and it can’t union as it doesn’t
- Assumptions are going to be represented as both that there is something that is currently assigned a value, and then that there is something that is currently null. In the case that something is currently null, this means that there was a read of a table that touch an entry that is not currently present. We are going to have to identify which key was performing the read and then track that R-expr as a forward pointer down stream.
- This is that there is some extension on the program such that we are going to
have to correct the guesses on the agenda.
- in the case of cycles, the priories are reading from a prior version of the memo table, and thus
- Would like if there is only a single memo table for something like fib. The
idea that there are layers and these layers are the additional rewrites on the
new memo table.
- so we are going to take the fib rule and then partition it into smaller
expressions. We then want to insert these into the table. The rule
fib(X) = fib(X - 1) + fib(X - 2)
, should become something where there are delayed checks that are getting placed on the agenda. So we are going to want to compute all of theX
that are currently assigned some value, and then put this on the agenda. In this case, we are attaching these to the memo tables that it reads from. (which just happens to be itself). We want to do that attach operation before we write any of the values into the memo table.- this means that the memo tables, are
- so we are going to take the fib rule and then partition it into smaller
expressions. We then want to insert these into the table. The rule
- There are some things that we do not want to memoize, for example, in the case
of a null default, we are saying that there is no more computation required,
and thus if we fail to look something up, then that means its multiplicity is
zero. The agenda is running to try and fix something up such that it only
memoizing ground expressions.
- for example, suppose that we want go guess that
fib(X) = 17
, for allX
. Then we are going to create a memoized entry like{ <fib(X), 17>@1 }
. When matching up this guess, if we allow for anything to be set into the memo table, then we are could get{ <fib(X), 17>@fib(X-1) + fib(X-2) == 17 }
which is now consistent, though we are going to have perform the computation for further fib to figure out which have the value 17. So in this case, this isn’t necessarily that useful
- for example, suppose that we want go guess that
- In the case that we are memoizing the result of R-exprs, then we are going to have a delta be that we decrement one memoized result and increase another. This
- in the case that something changes that is a null memo, then it is possible that something downstream needs to be also changed. Tracking this can be represented as an R-expr in that this will identify which variables are downstream.
- delta vs just notifications
- in the case that we don’t go through any additional aggregators, then we should be able to just add the update. partitions should have some way of combining results? I suppose that if there are two things with the same key/value pairs, then we are just going to have a terminal state with a higher multiplicity. If there are
- if there is some block which is tracking the representation of null in the
memo table that we have to find first, then it would need to be able to
determine that it wouldn’t look back to something that is at a higher level.
- while the representation is commutative and can be combined, the representation as being able to be compiled and be able to identify what values are getting looked up seems off.
- To be able to work well, this is going to have to know what modes are
supported by a memo table.
- Q1: does it make sense for this to support multiple modes for the memo table. Or does this just become something degenerate where it just supporting more than it needs. In the case that there is something that is not memoized, then it has to fall through. ATM this is using the absence of something memoized to identifiy cases where it must fall through. If there are two non-overlapping query modes, then this gets into the complication that this needs to handle having some overrides in the case that something in is partially memoized
- Can we delay some of the memoization but not others
- basically want to say that some of the keys are unk while others are null. Thinking that this could represent something like the key that tracks what sentence is getting parse is unk, while the parse chart itself is null.
- one way in which this could be done is by having some memo table of memo tables, but this becomes complicated when dealing with what the returned shapes are? The memo table will need to be able to determine the shape of the returned values. I think that this is something that could be done using only the mode?
- inference of unground types, as well as static tracking of what ground types
would be present (which constraints have already executed)
- in the case that we know the type of some variable, then we should be able
to replace the
*X
operator directly with the call to the method. There is also the construction/destruction of named values that would be something that should be considered
- in the case that we know the type of some variable, then we should be able
to replace the
- elimination of construction terms that are not used. So this would be that the result variable is not attached to anything. In which case we would like to delete that variable and the build operation from the R-expr
- this requires that there is some in and out variables or that we mark things as semi-det, such that if two of the variables are the same, then the last would also be the same.
+
- this is required to emulate prolog where we are doing non-ground unification
- These should be something that we allowed to be defined in dyna itself. This means that we are going to want to
https://www.ravenbrook.com/project/mps/ – garbage collection
https://networkx.github.io/ – for pattern matching against different rewrites, this should be sub-graph isomorphism to identify where a pattern occurs, we can then add in additional inferred constraints or replace part of the computation.
https://github.com/efficient/libcuckoo https://github.com/martinus/robin-hood-hashing
There should be some support for loading in packages which already written in dyna.
pkg_resources.iter_entry_points
lets a python package get a list of other packages that are annotated (during their install) with special parameters.- can also
distutils.setup_keywords
to define a custom keyword for setup.py which would invoke some custom handler code when a dyna package is installed.
- can also
- If something is known to be safe in some mode and doesn’t come back with delayed
constraints, then it should be willing to keep inlining it? But in that case,
it would potentially run the interpreter backwards forever. So it would need to
identify these cases, and then be willing to do prolog style unconsolidated
results. So those cases would have to be detected.
- if the heads are known to not be unifiable, then it could know that it can do prolog style safely. this would let it do peano without having extra consolidation.
- cuts in prolog can make the system more efficient if there are no free variables that are present in the arguments to an expression. As in the case that an expression is fully ground, it would at best only prove the same expression again. When there are free variables, then it could end up proving different states which are true.
- static analysis, is useful for checking that the program is correctly typed and won’t perform any “stupid” errors. Once the program has been determined to be correctly typed, it doesn’t actually need to use this information for running, and instead we can take an JIT style approach based around the expressions that we actually see in the program. This has the advantage of being a potentially tighter analysis?
- things that we would like to JIT
- the types of a primitive, is it an int or a float
- if something is akin to a matrix (meaning that the axis are small integers and the contained value is some floating point)
- that we can use a more efficient execution strategy, eg identifying that something could call a matrix subroutine. Though that might require that we perform static analysis? Though identifying that something is a matrix would potentially be something that we would have to handle at runtime?
- non deterministic compilation? There could be many different ways in which something is compiled, though this would require searching over different strategies to identify what is the best approach.
- can the memo tables know the shape of what is contained in them. This would
allow it to have a more efficient storage story. The return results could all
be of some form in which case it would just have to identify the form and what
the ground values are.
- the bag with any values contained in it makes it difficult to perform static analysis
- with aggregation, we are required to collect all of the contributions to a final value, this also means that there are differences with what the final result should look like.
- because we are forward chaining with delayed constraints
Memoization goes early, as we lead with memoizing code optimizations can go later, so the most basic execution story can go up front. the interpreter story with basic left right would be something that we should lead with. then there is something later that is going on for the compiler.
- Atm the compiler can not support unexpected returned results. It would be
nice if there was some way in which it could transfer to the interpreter in
the case that it got some unexpected result. This would make it closer to a
tracing JIT as it could just guess what the returned result would be. To
handle this case, it would have to know what the structure of the R-expr would
look like at some point in time, and then be able to handle that the R-expr?
for the newly replaced expression, it would then have to identify where in the
tree that expression would be represented.
- I suppose that this could track which R-expr has been run, and then represent the entire tree. The complication would be if there was some attempt at returning a compact representation? Most things do not return unexpected results, so it would probably be ok. later being able to update the compiled code would be nice, but the current design requires that it predict what is coming back
- only external calls and memos could possibly return something that is not expected. In these cases, it could just throw an exception, and convert itself back to an interpreted R-expr. Otherwise this is going to have issues where a memoized value could be difficult to predict.
- if an expression could be backed off to the interpreter at any point in time, even for builtins, then it would
Can represent this using an additional argument at the start of a sequence.
- there needs to be some way of identifying which arguments are getting passed from the context. That should be handled by the parser, thoug that would require that
- notes about how this can be implemented in the docs/ir/dyna-core file