Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimizing type checks for point free compositions. #384

Closed
JAForbes opened this issue Apr 15, 2017 · 6 comments
Closed

Optimizing type checks for point free compositions. #384

JAForbes opened this issue Apr 15, 2017 · 6 comments

Comments

@JAForbes
Copy link
Member

In light of fantasy land's move to prefixing, we'll probably find ourselves using pipe more frequently.

S.pipe gives us a lot of information about types and code paths quite early. In #383 I'm positing that we may be able to get type errors before applying the value. Here I'd like to discuss potentially skipping type checks for all but the first sub-function application in a pipeline when a value is applied.

If we can assume the behaviour described in #383 is implemented, then we could theoretically assume that if an exception hasn't thrown yet, that the pipeline is valid, and the only invalid case remaining is invalid input.

Again, this may not be feasible, but seems like a huge win for performance if it is possible. A lot of us will be composing point free S functions in pipelines, it is the common case. So optimising that case seems valuable to me. Also both suggestions shouldn't require any changes to sanctuary-def itself. It's simply a sanctuary implementation concern.

I'd appreciate any critiques / thoughts / suggestions. 😃

@Avaq
Copy link
Member

Avaq commented Apr 17, 2017

then we could theoretically assume that if an exception hasn't thrown yet, that the pipeline is valid

That's not entirely true, as there is no way in JavaScript to conclusively determine the output type of a function without calling it with every possible input. This means we have to be prepared for invalid output (and therefore input to the next function) anywhere in the pipeline. The only way to catch these errors is by checking the output of every function, and so their run-time type-checking has to stay enabled. You might think that it's double to check the output of f and the input to g in pipe([f, g]) if we can determine beforehand that a is fully contained in c in f :: a -> b; g :: c -> d. That may prove difficult though, as I see now way to determine whether all members of a type are also members of another type without calling both predicates with every possible input.

Even in the expression: compose(String -> Number, String -> Number), it is easy for us to see that this is not going to work, but difficult for the computer to know whether Number is not contained in String. Maybe via sampling, eg:

$.String.test($.Number.sample);

This will work in most cases. The sample could be 42, for example, which StringRep.test will return false for, immediately excluding the possibility of all members of Number also being String. It will not prevent a function from being partial in its signature though. So the only application I can see for this is "fast failure", eg: You know immediately that pipe([S.toString, S.negate]) is not going to work.

When it comes to performance though, this is not going to improve matters. In fact, it will make matters worse for two reasons:

  1. Functions have to carry around their type information, eg: S.negate.meta.
  2. We are doing more checks, not less ;)

@JAForbes
Copy link
Member Author

@Avaq You're correct that we can't know that a String isn't a Number without special code paths. But you're solution isn't the only solution, so its hard to definitively dismiss the idea without further exploring the space.

For instance. What if sanctuary-def did know that a Number isn't a String. Just because in the general case we want to support custom user defined types, and just because those types rely on a test function that isn't analysable without execution doesn't mean we're out of options.

One approach could be to have a hard coded index of possible type overlaps that could ship with sanctuary. Not every type need be included, but many could be.

const typeRelationships = {
  Number: { FiniteNumber: true, NegativeNumber: true, String: false, etc... }
  ,String: { Number: false, FiniteNumber: false, etc... }
  ,FiniteNumber: { ... }
  , ... etc
}

If we've explicitly said "a string can't be a number" by checking typeRelationships.String.Number === false, then we can make this pipe optimisation (and possibly other optimisations).

The above could be generated on initialization from a more compressed format if checkTypes is true. Its not elegant, but it is fast. It doesn't handle every case, but it handles common cases.

Another option could be to add a new property to types that state the name of known parent types. Initially it could be an internal mechanism only, eventually we could add it to the public API.

So when you define a type, you could say what other domains this type falls into. You could mark the type as closed for extension, so that any type that doesn't exist in this list would be detectable as a type error in our pipe case.

Functions have to carry around their type information, eg: S.negate.meta.

First, not necessarily, it could live on a WeakMap, and could only be defined if checkTypes is true. (For older environments it could be an object where the key is the function name). Secondly, I don't see this as a potential performance bottleneck in Sanctuary's case.

To illustrate how large the possibility space is: if we're worried about RAM, we could assign a bit flag for every function and type that ships with sanctuary and use bit operations to determine relationships.
I don't think this is necessary, but it is easy to do.

So as a proof of concept

const bNumber = 0b001
const bFiniteNumber = 0b010 | bNumber
const bString = 0b100


// is a String a Number
(bString & bNumber) === bString

I'm kind of being silly here, but I'm just trying to show that there are many paths to take. Saying what I'm proposing is impossible or slow by coming up with one potential solution out of an infinitude, isn't entirely fair. ;)

I just want to reiterate: yes this won't work for all types and all functions. And in those cases we just carry on with the normal type checking algorithm. But it will work for enough types and functions that I think it's worth at least exploring.

@Avaq
Copy link
Member

Avaq commented Apr 18, 2017

its hard to definitively dismiss the idea without further exploring the space

I'm merely presenting the problems I see with this kind of optimization. I would not want an optimization to take away functionality, and I think there's a chance that functionality which may seem unnecessary is actually helpful in certain cases.

What if sanctuary-def did know that a Number isn't a String[,] then we can make this pipe optimisation

We could even take both suggested approaches to giving Sanctuary this information. A mapping types known to overlap, plus the sampling mechanism for user-defined types which in the common case will immediately exclude each other. However, this still doesn't prevent functions from "misbehaving" (being partial in their signature). I can easily define a function like:

const isSmallNumber = def('isSmallNumber', {}, [Number, Boolean], (x) => {
  if(x < 4294967296) return true;
})

When this function is applied to a number greater than 4294967296, the function will return undefined, due to a silly mistake. If pipe were to disable type-checking of functions in the pipe-line, this may lead to subtle bugs where functions are being applied to unexpected input and due to JavaScript's "everything works"-design are coerced to the expected output type. I think Sanctuary's type checking was designed to catch -and inform about- exactly these kinds of errors.

when you define a type, you could say what other domains this type falls into

This plays into a suggestion I made here: sanctuary-js/sanctuary-def#128 (comment)

Functions have to carry around their type information, eg: S.negate.meta.

First, not necessarily, it could live on a WeakMap

That's a cool idea! Though I'm more worried about the issue of partial functions.

@JAForbes
Copy link
Member Author

I'm merely presenting the problems I see with this kind of optimization. I would not want an optimization to take away functionality, and I think there's a chance that functionality which may seem unnecessary is actually helpful in certain cases.

@Avaq I'm grateful for your criticism. My apologies, I was taking your criticism as an abrupt authoritative ~"this is not possible", as opposed to "I see problems".

I don't see how we'd lose any functionality, could you elaborate on that?

the function will return undefined, due to a silly mistake

Barring the fact, eslint's consistent-return would pick that up. Sure, we could define misbehaving functions, but sanctuary could know its own functions are sane.

So if I wrote const isSmallNumber = S.lt(4294967296) instead, sanctuary should be able to know that isSmallNumber is guaranteed to return a Boolean, because the body of the function ships with sanctuary.

My common case is, using basically sanctuary's api, and not much else. And it seems odd to assume that sanctuary functions, and functions derived via sanctuary functions are unsafe.

I recently was using K(null), where I then passed a huge data structure to K(null) and it crashed the browser.

@davidchambers proposed we could identify type variables that don't form constraints, and change them to Any. Which would solve the problem. But I think the problem is broader.

Sanctuary owns K, it knows K is type safe. I can't make it unsafe, because I can't access the body of the function. So K and functions like K could benefit from skipping checks entirely in many (not all) cases.

Now if I wrote pipe( [ toUpper, add(1) ]) sanctuary really can know that will fail before we apply any values. And if we could rely on this behaviour for pipelines of sanctuary functions, we could skip checking for inner functions in a composition like this:

// Ignoring this function's behaviour is redundant
// instead just focus on its type signature being valid
pipe([ toUpper, toLower, toUpper, toLower, toUpper, toLower ])

Sanctuary could know (via the relationship graph I showed earlier) that these functions (provided by sanctuary) are consistent. And because we did not throw an error at definition time, and we need not check the input and output of each individual step at invocation time.

Now, if there was a custom function, we could just check at the boundaries of that function.

If you squint, getting type errors at composition time, is a lot like getting type errors at compile time. In some styles of programming, you'd get type errors the moment you load your program, instead of waiting for data to flow through it. Perhaps we could exploit that even further and have type checking modes of execution, where we just evaluate compositions and nothing else.

What we'd need is a method of detecting when a function shipped with sanctuary, versus a function that was defined by the user. I can think of a number of reliable ways to do this.

We'd also need a graph of pre-computed incompatibilities within the env. We could do that a number of ways as I demonstrated earlier.

When would this really help? In user interfaces, where the same function runs many times on often primitive data sets, like values from user inputs. If we could skip checking, or provide earlier checking I think it's worth the exploration cost.

@davidchambers
Copy link
Member

This is an interesting proposal. There are many Sanctuary-related tasks and features which I consider more important, though, so realistically any such change will need to come from another contributor. I'm happy to provide guidance and feedback to anyone interested in exploring these ideas.

@davidchambers
Copy link
Member

I'm going to close this issue as it is not actionable, but you are all welcome to post additional comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants