RFC: stable memory block names for globals #220

jeremie-koenig · 2018-02-05T21:52:16Z

These changes introduce a more structured namespace for memory blocks: a block id can either be a global ident (new), or a dynamically allocated positive (as before). Using this approach, we can ensure that global environments always assign the same block name to a global identifier id, so that for all ge1, ge2 we have:

Genv.find_symbol ge1 id = Some b1 ->
Genv.find_symbol ge2 id = Some b2 ->
b1 = b2.

Rationale

In all work around compositional semantics, we need to ensure that the memory blocks for globals are named consistently across modules. For instance,

In CompCertX, we force programs to follow a predetermined structure so that the same identifiers will always be mapped to the same block ids.
Compositional CompCert assumes that the global environments of linked modules are consistent, arguing that we can always add external definitions to ensure that's the case.

Our changes should make it possible to eliminate these hacks/assumptions.

Status

We have updated all of CompCert v3.2 to use this new naming scheme.

We plan to use these changes in our own ongoing work on compositional compilation, and we may tweak them further over the coming weeks. However, since these changes have a broader potential applicability and we're hoping for them to be merged into CompCert at some point, we want to collect some feedback as we go forward with that plan.

Notes:

The new namespace is specified as an abstract Module Type, making it easy to explore richer namespace structures in the future.
In Initializers.v, it is no longer necessary for constval/transl_init to abuse Vptr to store the global identifier of referenced variables, since it is now possible to use a pointer value that makes sense.
Since global pointers are preserved by compilation, it is now possible to use val directly in events, as demonstrated by our globmem-eventval branch, which defines eventval := val.
With the uniform block naming convention, it should be possible to define a linker for global environments corresponding to the existing linker for programs, and to relate new notions of semantic composition to the already implemented syntactic composition.

Known remaining issue:

Our implementation of BMap using EMap may be problematic for the performance of the interpreter. This can be easily fixed with some more work.
Since block names for globals are predetermined, it might be advantageous to replace find_symbol with a simpler has_symbol boolean predicate (for example), or remove it entirely.

This redefines the [block] type to a richer namespace where block identifier can be of two disjoint kinds: blocks associated to globals, and dynamic block ids allocated at runtime. This is work in progress.

Then use it to finish updating Memory.v

…e and introduce blomega tactic

…Update the interpreter

xavierleroy · 2018-02-10T16:31:54Z

Thanks for the proposal. I haven't had time yet to study the diff but will do soon. @andrew-appel, @jhjourdan, @robbertkrebbers, any opinions on this proposal?

andrew-appel · 2018-02-10T19:13:00Z

We will discuss this in the VST group and get back to you.

andrew-appel · 2018-02-15T19:23:22Z

Today we (@andrew-appel, @lennartberinger, @scuellar, @jeremie-koenig, @pwilke) discussed this by phone. We agree that the proposal addresses part of the problem, but does not address the problem of consistent (across compilation units) mapping of ASCII strings ("names") to AST.ident. We believe that very possibly this pull request could be a useful part of the solution to that larger problem, but we are still studying and discussing it.

jeremie-koenig · 2018-02-15T23:03:48Z

One thing @pwilke and I did not think to mention this during the call: in this pull request, the ASCII names <-> AST.ident mapping does in fact show up, because the interpreter needs to be able to print out pointers:

In BlockNames.v we use an axiom ident_to_string: ident -> string to implement a Block.to_string : block -> string function;
In extraction.v we realize this axiom as (fun i -> Camlcoq.coqstring_of_camlstring (Camlcoq.extern_atom i));
In Interp.ml we use Block.to_string in the definition of format_value.

This allows us to print out a global pointer &globvar[12] as <glob:globvar+48>, and a dynamic one &localvar as <dyn:123+0> (say). Previously both would have been printed out using numerical block-ids. We could move the definition of ident_to_string to a more central place if we want to make that a more general feature or use it in other ways.

This does not really address the problem mentioned by @andrew-appel, which I understand as: What if I want to use strings that never went through the parser, therefore are not in the map? This happens anytime you use CompCert as a library rather than a self-contained compiler: in VST, when you use the output of clightgen; likewise in CertiKOS, our "source-ish" code is Clight/Asm ASTs in .v files.

VST/CertiKOS use things like:

Definition foo : ident := 1%positive.
Definition bar : ident := 2%positive.
...
Definition myast : function := ... foo ... bar ...

Instead, we would want to use something like:

Definition foo : ident := ident_of_string "foo".
Definition bar : ident := ident_of_string "bar",

or perhaps even declare ident_of_string as a coercion and use strings directly when defining myast. This way we would never again need to mess around with potentially inconsistent tables of names <-> ident mappings in VST and CertiKOS, and there would never be a problem putting together modules that have been developped independently.

I believe we can accomplish this in the following way:

In the Coq part of CompCert, we axiomatize ident_to_string and string_to_ident as a predefined, but unspecified bijection between strings and ident's. Such a mapping does exists, but any particular one will be inefficient in general (the strings we do happen to use in a given case will correpond to sparsely distributed, large numbers).

But since we're free to use any such mapping, in the ML part, we can simply implement string_to_ident using a stateful process that allocates small numbers on-demand (as is already done). The key thing here is that although string_to_ident will be effectful, this can never be observed from the outside. In effect, we progressively "discover" the specific mapping we want to use.

Using this approach, we don't need to deal with the statefulness of the table in Coq, but still get the benefits. I believe implementing this can be done independently of our changes to the block namespace.

jeremie-koenig · 2018-02-16T04:36:51Z

Here's a proof-of-concept CertiKOS@85d2b87

We can use "MAP with Definition" for the same effect.

In particular, use the name [AST.string_of_ident] and declare it in the same places, so that a merge would be straightforward.

jeremie-koenig · 2018-02-20T00:59:48Z

At this point I thought I'd write some kind of "diff walk-through" to facilitate any digging people would want to do.

The main changes occur in:

common/AST.v declares the string_of_ident parameter (realized together with string_of_positive with Caml code in extraction.v). This is needed in BlockNames.v to print out block names as strings.
common/BlockNames.v is a new file which defines the interface and implementation of the new block namespace. It includes an empirical eauto-based blomega tactic that can discharge most of the inequalities about blocks that xomega used to handle.
common/Values.v now defines block := Block.t, rather than positive.
common/Memtype.v is modified to reflect changes to the memory model interface:
- There is a new invariant that Block.init <= nextblock m.
- There is a new operation alloc_at used for allocating global blocks by name.
- inject_neutral no longer needs a threshold (Block.init is always what we want).
common/Memory.v implements those changes.
- Note that memory states now use BMap rather than PMap. In the most recent version BMap is implemented using IMap through an embedding of the new block type into positive, meaning the performance implications for the interpreter should be minimal.
common/Globalenvs.v is somewhat simplified:
- genv_next is no longer necessary (the constant Block.init now plays this role)
- genv_defs is now indexed by AST.ident values rather than block
- the remaining maps and invariants in Genv.t are no longer necessary (they can be simple definitions and lemmas based on genv_defs and block operations); only genv_public remains
- init_mem is modified to use alloc_at

Less remarkable changes are:

common/Maps.v: make sure the MAP interface has everything needed of BMap, and make sure EMap and IMap implement it.
cfrontend/Initializers(proof).v has one hack pleasantly removed, whereby ident values would be stored as block in pointers (now stored as Block.glob i).
driver/Interp.ml now uses block_compare in state comparisons, and Block.to_string to print out pointers.

The rest of the patch consists of replacing positive operators, predicates, lemmas, and tactics by their new block equivalents --- a set of numerous but fairly trivial changes. The ~550 new lines introduced by the patch mainly are BlockNames.v and the new Mem.alloc_at.

We can get linker errors for addresses of the form "symbol + offset" where "symbol" is in the small data area and "offset" is large enough to overflow the relative displacement from the SDA base register. To avoid this, this commit enriches `C2C.atom_is_small_data`, which is the implementation of `Asm.symbol_is_small_data` in the PPC port, with a check that the offset is within the bounds of the symbol. If it is not, `Asm.symbol_is_small_data` returns `false` and Asmgen produces an absolute addressing instead of a SDA-relative addressing. To implement the check, we record the sizes of symbols in the atom table, just like we already record their alignments.

jeremie-koenig and others added 19 commits January 30, 2018 17:06

Add set2 to the general MAP interface

de3c418

Improved namespace for memory block ids

f93ca78

This redefines the [block] type to a richer namespace where block identifier can be of two disjoint kinds: blocks associated to globals, and dynamic block ids allocated at runtime. This is work in progress.

BlockNames: Add a Block.le relation

6274b59

Then use it to finish updating Memory.v

BlockNames: update some of Globalenvs.v

4fd3c3f

BlockNames: Progress on Globalenvs

1284fb5

BlockNames: finished Globalenvs and Memory

4ec83f0

BlockNames: update Events.v

baf0ba3

BlockNames: reintroduce find_symbol

4696267

BlockNames: add a Decidable instance Block.eq

f69d799

BlockNames: update Separation.v

e53faa6

BlockNames: update ValueDomain and ValueAnalysis

4ef2be8

BlockNames: Update Inlining + Unusedglob

82dde56

BlockNames: update Deadcode and Stacking

ca0631f

Add hints for Block.*

8a204da

BlockNames: encapsulate hints for Block.lt/le into a specific databas…

616ec9e

…e and introduce blomega tactic

BlockNames: update frontend

cb38d63

BlockNames: add to_string and block_compare operations to BlockType. …

2f152ed

…Update the interpreter

BlockNames: minor changes to minimize diff with CompCert.

a5974d3

BlockNames: cosmetic improvements to Initializers.v

1a4dd43

jeremie-koenig added 3 commits February 16, 2018 15:23

Make sure IMap satisfies the MAP interface

8ab2cd3

BlockNames: use an efficient IMap to implement BMap

e268e60

Merge remote-tracking branch 'absint/master' into globmem

2f18434

jeremie-koenig mentioned this pull request Feb 17, 2018

Make the string-ident maps available from Coq code #222

Closed

jeremie-koenig added 2 commits February 19, 2018 13:37

BlockNames: get rid of the BMapType interface

f9a2754

We can use "MAP with Definition" for the same effect.

Cosmetic improvements to BlockNames.v

74bb7e0

Better compatibility with pull request AbsInt#222

7a0da56

In particular, use the name [AST.string_of_ident] and declare it in the same places, so that a merge would be straightforward.

Merge remote-tracking branch 'absint/master' into globmem

024b3bb

alxest mentioned this pull request Aug 6, 2019

Consistent string-to-ident mapping across compilation units #311

Closed

xavierleroy force-pushed the master branch from aa95f89 to 0aeff47 Compare November 7, 2020 14:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: stable memory block names for globals #220

RFC: stable memory block names for globals #220

jeremie-koenig commented Feb 5, 2018

xavierleroy commented Feb 10, 2018

andrew-appel commented Feb 10, 2018

andrew-appel commented Feb 15, 2018

jeremie-koenig commented Feb 15, 2018 •

edited

Loading

jeremie-koenig commented Feb 16, 2018

jeremie-koenig commented Feb 20, 2018

RFC: stable memory block names for globals #220

Are you sure you want to change the base?

RFC: stable memory block names for globals #220

Conversation

jeremie-koenig commented Feb 5, 2018

Rationale

Status

xavierleroy commented Feb 10, 2018

andrew-appel commented Feb 10, 2018

andrew-appel commented Feb 15, 2018

jeremie-koenig commented Feb 15, 2018 • edited Loading

jeremie-koenig commented Feb 16, 2018

jeremie-koenig commented Feb 20, 2018

jeremie-koenig commented Feb 15, 2018 •

edited

Loading