Skip to content

How to represent and do type derivations in BIR

Charles Zhang edited this page Sep 17, 2020 · 8 revisions

Following some ideas of Python and previous experience writing type inference in HIR, I have the following suggestions for how to get Python-like compile-time type inference effects that enables compile time type checking, notes, and fine grain policy control.

The IR types that participate in type information will be linear-datum and lexical-variable, but in different ways. Let's focus on linear-datum first because this participates in local type propagation and is the basis for passing type information around leveraging CL standard library function semantics. (This is also what is currently highly lacking in HIR).

Linear-datum should have two slots that carry CTYPEs (compiler representations of full lisp types that have a client independent interface), namely

ASSERTED-TYPE and DERIVED-TYPE .

Basically, the asserted-type slot is type information asserted and supplied by the user, for example through the use of THE. Therefore (the fixnum <e>) will simply annotate <e>'s linear datum with the fixnum ctype.

Derived type is slightly different. This is what the compiler PROVES about the type of a linear-datum, using the asserted and derived types of its inputs. For example, let

<g> = (+ (the (mod 32) <e>) (the (mod 64) <q>)). Thus with suitable information about the + known function, for example a type deriver given by a deftransform, the compiler derives that the linear datum <g> has type (mod 96) with the type information on the linear datums <e> and <q>. Now this is good, since now BIR and later IRs can use this derived information about <g> stored in the derived-type slot. You can also detect type conflicts at compile time in the following way:

Suppose we have the same expression <g> but it is wrapped in (the char <g>). The compiler has proven that <g> has type (mod 96) but now we are asserting that it has type char. This is a compile-time type conflict! The compiler can emit a note about how the asserted type differs from the derived type, like in Python.

You can also see how policy plays a role. With derived types, you never need to emit a type check because this is the type PROVEN by the compiler, even on extremely high safety. However, for linear-datums where the asserted type is narrower than the derived type, (more precisely, cases where (type-intersect DERIVED-TYPE (type-complement ASSERTED-TYPE)) is BOTTOM), on high safety a runtime type check must be emitted for that operation corresponding to the linear-datum.

So for linear datums, type derivation is rather straightforward, and you get a lot of bang for buck with type derivation functions for known functions like +, - etc.

What about lexical variables?

Lexical variables are interesting. Lexical variables should not get any type slots attached to them, since we can assert different types at different control points for variables. For example, you could have

(defun f (x)
    (if (read)
        (locally (declare (type integer x))
          ...)
        (locally (declare (type char x))
          ...)))

Especially since you can have SETQ on a lexical variable, a slot for type information for the lexical variable doesn't make much sense. This is why lexical variable type information needs to participate in global flow analysis. This is what Python calls constraint propagation, which is pushing the types of lexical variables around, and what I'll call global type derivation, as opposed to local type derivation on expression temporaries represented by linear-datum. You have to do global flow analysis to figure out the type information of a lexical variable at the relevant control points. This is alleviated somewhat with SSA variables (those lexical variables without setq) because you don't need to keep track of kill sets. Python punts on SETQ'd lexical variables but we can do better if we (optionally) convert to SSA.

So what should (the fixnum <var x>) and (declare (fixnum <var x>)) do?

In BIR lexical variable references are represented like this <linear-datum e> = (REF <var x>). Therefore when we declare the type of a lexical variable to be FIXNUM, this should be treated as saying the linear-datums of all REFs in the scope of that declaration should have an ASSERTED-TYPE of fixnum. Similarly, (the fixnum <var x>) just asserts that the linear datum of the (REF <var x>) instruction should have type fixnum. And the role of global flow analysis for lexical variable types is to supply the derived-type slot of the linear-datum for the REF instruction.

This pretty much covers everything I think.