Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

devicetree: use stable identifiers for LLEXT-exported DT object names #77799

Merged
merged 3 commits into from
Jan 22, 2025

Conversation

pillo79
Copy link
Collaborator

@pillo79 pillo79 commented Aug 30, 2024

This is the candidate implementation for RFC #83800.

@pillo79 pillo79 added area: Devicetree RFC Request For Comments: want input from the community area: llext Linkable Loadable Extensions labels Aug 30, 2024
@pillo79
Copy link
Collaborator Author

pillo79 commented Aug 30, 2024

Compliance check found a leftover continue introduced by commit d25e5c2, 3 years ago.
It seems not many people dare to venture here! 🙂

@pillo79 pillo79 force-pushed the pr-stable-dt-hash branch 3 times, most recently from c0a7a54 to 067dd65 Compare September 17, 2024 07:22
@pillo79 pillo79 force-pushed the pr-stable-dt-hash branch 3 times, most recently from f8b8c3f to 42656dd Compare September 25, 2024 12:45
@pillo79
Copy link
Collaborator Author

pillo79 commented Sep 25, 2024

v2:

Copy link
Collaborator

@mathieuchopstm mathieuchopstm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just stumbled across this PR, and I must say it's a neat idea!
A few thoughts:

Comment on lines 94 to 100
def _compute_hash(path: str) -> str:
# Calculates the hash associated with the node's full path.
hasher = hashlib.sha256()
hasher.update(path.encode())
return hasher.hexdigest()

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few thoughts:

  • we could save space (~33%) by base64-encoding the hash's raw bytes
    • Limitation: the resulting hash must be a valid C identifier
      • base64 uses +, / and = which are illegal...
      • base32 saves only ~20% but doesn't have this problem
  • appending a known-unique value (e.g., node address) to the hash string would guarantee that collisions can never happen
    • not sure this is worth the trouble with how unlikely SHA-256 collisions are, though

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the idea!

base64 uses +, / and = which are illegal...

True, but we don't really need an invertible transformation, just a hash, so we can cheat a little:

  • = is only used in padding, so we can drop it.
  • Replacing both + and / with the legal symbol _ would make clashes a little more probable, but still vanishingly small: ChatGPT said with a convincing formula that it's 1.23 times more probable than a direct SHA256 collision (5.5E-78...).

At the same time, though, the identifier would become "only" 43 chars. Looks like a nice tradeoff. 🎉

.name = STRINGIFY(x), .addr = (const void *)&x, \
/* LLEXT-enabled application: export symbols */
#define Z_EXPORT_SYMBOL_NAMED(sym_ident, sym_name) \
static const STRUCT_SECTION_ITERABLE(llext_const_symbol, sym_ident ## _sym) = { \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see why it's been done this way (sym_name is a string), but wouldn't using sym_ident to form the llext_const_symbol's name prevent exporting the same symbol twice under different names? (at least if both are in the same translation unit)

I think taking the sym_name in token form and STRINGIFY()ing it in the macro would be acceptable, if that can help solve this issue.

Same remark for SLID version of the macro.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would you have to export the same symbol twice with two names? One should be enough for everybody! 🤭

... seriously, this PR wanted to transparently rename symbols when appropriate, so I wasn't thinking about the use case you mentioned (there's always one EXPORT_SYMBOL for each identifier).
It makes sense though, I will have to think about using the sym_name - I remember there was a problem that made me use strings and not tokens, but I'm not sure what that was.

Copy link
Collaborator Author

@pillo79 pillo79 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback @mathieuchopstm . Had so many open PRs these days I lost track of this one! 🤦‍♂️ I need to finish it and push it on.

Comment on lines 94 to 100
def _compute_hash(path: str) -> str:
# Calculates the hash associated with the node's full path.
hasher = hashlib.sha256()
hasher.update(path.encode())
return hasher.hexdigest()

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the idea!

base64 uses +, / and = which are illegal...

True, but we don't really need an invertible transformation, just a hash, so we can cheat a little:

  • = is only used in padding, so we can drop it.
  • Replacing both + and / with the legal symbol _ would make clashes a little more probable, but still vanishingly small: ChatGPT said with a convincing formula that it's 1.23 times more probable than a direct SHA256 collision (5.5E-78...).

At the same time, though, the identifier would become "only" 43 chars. Looks like a nice tradeoff. 🎉

.name = STRINGIFY(x), .addr = (const void *)&x, \
/* LLEXT-enabled application: export symbols */
#define Z_EXPORT_SYMBOL_NAMED(sym_ident, sym_name) \
static const STRUCT_SECTION_ITERABLE(llext_const_symbol, sym_ident ## _sym) = { \
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would you have to export the same symbol twice with two names? One should be enough for everybody! 🤭

... seriously, this PR wanted to transparently rename symbols when appropriate, so I wasn't thinking about the use case you mentioned (there's always one EXPORT_SYMBOL for each identifier).
It makes sense though, I will have to think about using the sym_name - I remember there was a problem that made me use strings and not tokens, but I'm not sure what that was.

Copy link

github-actions bot commented Jan 7, 2025

This pull request has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this pull request will automatically be closed in 14 days. Note, that you can always re-open a closed pull request at any time.

@github-actions github-actions bot added the Stale label Jan 7, 2025
@pillo79 pillo79 removed the Stale label Jan 7, 2025
@pillo79 pillo79 force-pushed the pr-stable-dt-hash branch 4 times, most recently from 220641b to 55b07ef Compare January 9, 2025 11:44
@pillo79
Copy link
Collaborator Author

pillo79 commented Jan 9, 2025

v3, almost ready for public review:

  • rebased to current Zephyr
  • applied both suggestions from @mathieuchopstm:
    • switched from plain hex to base64 encoding to shorten the generated identifiers
    • by naming the llext symbols according to the desired alias, the same identifier can now have any number of export aliases

@rruuaanng
Copy link
Collaborator

It seems that #83748 is the main work. Can we close this PR and create an issue for the RFC?

@pillo79
Copy link
Collaborator Author

pillo79 commented Jan 10, 2025

It seems that #83748 is the main work.

In fact it's the opposite - this is the main work and #83748 is only the first commit of this series.
I wanted to split the EDT and C/LLEXT changes to make it easier to review and discuss upon, but alas, ended up creating more confusion! 🤦‍♂️

I have now submitted the RFC as issue #83800.
Will shortly update the code here addressing all the feedback I already got.

Add a new "hash" attribute to all Devicetree EDT nodes. The hash is
calculated on the full path of the node; this means that its value
remains stable across rebuilds.
The hash is checked for uniqueness among nodes in the same EDT.

This computed token is then added to `devicetree_generated.h` and made
accessible to Zephyr code via a new DT_NODE_HASH(node_id) macro.

Signed-off-by: Luca Burelli <[email protected]>
Add a new set of macros that allow customizing the symbol name when
exporting symbols. This is useful when the symbol name that extensions
need to look up is different from the identifier used in the base image.

Signed-off-by: Luca Burelli <[email protected]>
This new option allows to export devices using identifiers generated
from the hash of the devicetree node path, instead of the device's
ordinal number. Identifiers generated this way are stable across
rebuilds.

Add new test cases to test this new option.

Signed-off-by: Luca Burelli <[email protected]>
@pillo79 pillo79 changed the title RFC: devicetree: use stable path-based hashes in DT object names devicetree: use stable path-based hashes in DT object names Jan 10, 2025
@pillo79
Copy link
Collaborator Author

pillo79 commented Jan 10, 2025

v4, incorporating the feedback received in #83748:

  • renamed the DT_HASH macro to DT_NODE_HASH
  • used a more direct solution to obtain identifier-compatible base64 encodings
  • moved the hash test suite from inside edtlib to the Twister-based tests/lib/devicetree

@pillo79 pillo79 changed the title devicetree: use stable path-based hashes in DT object names devicetree: use stable identifiers for LLEXT-exported DT object names Jan 10, 2025
@pillo79
Copy link
Collaborator Author

pillo79 commented Jan 20, 2025

No further feedback received on the RFC or this PR so marking this as open for full review.

Copy link
Collaborator

@rruuaanng rruuaanng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for my limited familiarity with llext. So, I will leave this task to others. Thank you for your work on this.

@kartben kartben merged commit b0dbbb7 into zephyrproject-rtos:main Jan 22, 2025
46 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Device Model area: Devicetree area: llext Linkable Loadable Extensions RFC Request For Comments: want input from the community
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants