Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Engine: rework global name representation #1199

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Conversation

W95Psp
Copy link
Collaborator

@W95Psp W95Psp commented Dec 18, 2024

This PR implements #1163.
It is expected most of the issues in #1135.

This PR is about:

  • the internal representation of the global concrete identifiers in the engine
  • the interface for manipulating those identifiers in the rest of the engine

Motivation

The previous design for global identifiers assumed every identifiers came from Rust. This was true at that time.

Since then, we shifted from that. The engine now creates names in the following situations:

  • pre and post conditions on traits (those don't exist in Rust, we thus need to create identifiers)
  • casts operations for enum with explicit representations (those are primitive in Rust, thus have no identifier)
  • corss-module bundles of mutually recursive items (we now create new modules, containing only new identifiers)

The assumption that all the identifiers come from Rust is thus now completly broken, hence the need to move out from the current design.

A second motivation is the fact Rust identifiers were coming without any "kind" information: in the engine, we were not able to state whether a given identifier was, say, a struct or a function.
This was a big problem when printing names in the backends: the name of, e.g., a type cannot be printed with the same logic as, e.g., a constant.

Previous design

A concrete identifier in hax was basically a tuple containing:

  • a slightly modified raw Rust DefId
  • a metadata that gave a hint about the kind of item we were dealing with

In the previous design, we added incrementally hooks into that raw rust DefId representation so that we could alter the identifiers. This was providing us a way to create new identifiers.

Printing names

Rust have a very flexible namespacing.
Modules are just one way of doing namespacing: item declaration is allowed in expression bodies, thus items can be nested arbitrarily.

It is possible e.g. to define a module within an anonymous const within a method implementation.

The logic for printing name was very complicated and hard to maintain or to fix.

New design

Frontend

#1198 made the frontend output a definition kind along with any Rust definition identifier. Before that, #1054 added a parent information for each Rust defition identifier.

Together, those two PRs makes the frontend output, for each defintiion identifiers: (1) the full chain of identifiers up to the crate root (2) a precise definition kind, informing us precisely about the definition.

Engine

Representation

Now the engine has a representation for concrete identifiers in three layers:

  • raw Rust identifiers: the DefId type, generated from Rust to OCaml, defined by the frontend
  • Explicit_def_id: wraps a rust raw identifier, adding a metadata to disambiguate types and constructor (see the documentation of this module for more details)
  • Concrete_ident: a Explicit_def_id that can be moved to a fresh module name and/or have a hygenic suffix

Concrete_ident is an type that describes an eventual need of freshness: the underlying explicit def id are never touched. Before Concrete_ident, every identifier comes from Rust.

The freshness of concrete identifier is guaranteed lazily: when one request the rendering of a concrete identifier, then the engine will produce a stable but fresh name.

View

Working with raw rust identifier is difficult, espcially for rendering identifiers as string in the backends.

Rust represents identifiers as a crate and a path. Each chunk of the path of an Rust identifier is roughly a level of nest in Rust. The path lacks informations about definition kinds.

There is two kinds of nesting for items.

  • Confort: e.g. the user decides to embed a struct within a function to work with it locally.
  • Relational: e.g. an associated method has to be under a trait, or a field as to be under a constructor.

Instead, the view transform each path as a list of smaller relational paths. For instance, consider the following piece of code:

mod a {
   impl MyTrait for MyType {
        fn assoc_fn() {
            struct LocalStruct {
                 field: u8,
            };
        }
   }
}

Here, the Rust raw definition identifier of LocalStruct is roughly a::my_crate::<Impl 0>::assoc_fn::LocalStruct::field.

The view for LocalStruct looks like:

{
   path: ["mycrate"; "a"],
   name_path: [
        `AssociatedItem ("assoc_fn", `Impl 0);
        `Field ("field", `Constructor ("LocalStruct", `Struct "LocalStruct"))
   ]
}

Such a hierachical path approach makes printing names much easier under the following constraints:

  • ensuring names do not clash
  • making sure the Rust namespaces map to the correct namespaces in the backends (e.g. Rust puts constructor in the same namespace as functions, this is not true for F*)
  • items can generally cannot be nested in the backends, thus we need to fake nesting.

Progress

Review status

@karthikbhargavan and @maximebuyse, you can already take a look at the all modified OCaml modules in the subdirectoy engine/lib/concrete_idents.

@W95Psp W95Psp force-pushed the rework-name-repr branch 4 times, most recently from 33ea9a2 to e0a3dc3 Compare January 15, 2025 15:18
@W95Psp W95Psp force-pushed the rework-name-repr branch 3 times, most recently from 3b1c176 to 1fca9c3 Compare January 15, 2025 19:46
@W95Psp W95Psp marked this pull request as ready for review January 16, 2025 07:12
]}

Here, the Rust raw definition identifier of [LocalStruct] is roughly
[a::my_crate::<Impl 0>::assoc_fn::LocalStruct::field].
Copy link
Collaborator Author

@W95Psp W95Psp Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo: fix the path, my_crate comes first

(match List.last_exn (Explicit_def_id.to_def_id did).path with
| { data = GlobalAsm; disambiguator } -> into_d did disambiguator
| _ -> broken_invariant "last path chunk to be GlobalAsm" did)
| TyAlias | TyParam | ConstParam | InlineConst | LifetimeParam | Closure
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This crashed on a TyAlias in one of my tests. I don't have a simple reproducer for now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants