Skip to content

The lexical variable representational hierarchy

Charles Zhang edited this page Sep 29, 2020 · 1 revision

Lexical variables actually can be compiled down in many different ways. The complexity of the lexical variable is associated with how much runtime cost each representation takes. For example, we need a cell in the worst case to represent a lexical variable like in the following function: (in the case of flat closures)

(lambda (x)
  (lambda ()
    (print x)
    (setq x (read))

Here we see that since X is an upward funarg (i.e. a variable with indefinite extent), the environment containing it must be allocated on the heap. In addition, a value cell must encapsulate X because it is a mutable value.

However, not all lexical variables need such an expensive treatment. We attempt to classify the basic properties of lexical variables and their properties relevant for representation here, as well as the best way to treat optimization on these variables, and a bit on how these interact with closure environment representation more broadly.

Basic properties

We probably don’t need slots for these properties, and only an accessor, since they can be figured out in a straightforward manner from the IR cheaply.

Mutability

First and foremost, a variable has the property of being mutable, or not. A variable written to by SETQ is mutable. This is a basic property not derived from other properties of the variable. This property is affected mainly by optimizations which delete SETQs. An immutable variable can never become mutable by the process of optimization. Hence, we have a small lattice:

MUTABLE -> IMMUTABLE (via setq deletion)

Most variables in Lisp actually start out immutable.

Closed Over

A variable is closed over if it has a reference or setq in a different function (environment). Variables cannot get closed over during optimization, but can become local through deletion of non-local references or SETQs, either by making the references or SETQs local or deleting them altogether. Therefore we have the one-way transition:

CLOSED OVER -> LOCAL (non-local reference or SETQ deletion)

Representation

The above properties are the basics needed to figure out representation of the lexical variable at runtime. Given the above, we could have the following representations:

SSA value

Immutable + local gives the cheapest representation of a lexical variable: as an immutable value.

Local variable

Mutable + local means a mutable location must be allocated in the backend.

A variety of representations for non-local variables

There are now a variety of different ways to allocate the storage of a variable that has to do with its extent and the extent of its allocators. Under a flat closure representation however, no matter how the environment is allocated, an indirection is needed for a mutable closed over variable. The structure representing this indirection may be stack allocated or heap allocated.

Once the compiler has figured out these two properties, the details of how exactly the closure environments will be represented or allocated is a different issue covered in a separate page.