-
Notifications
You must be signed in to change notification settings - Fork 6.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
devicetree: use stable identifiers for LLEXT-exported DT object names #77799
Conversation
Compliance check found a leftover |
c0a7a54
to
067dd65
Compare
f8b8c3f
to
42656dd
Compare
v2:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just stumbled across this PR, and I must say it's a neat idea!
A few thoughts:
def _compute_hash(path: str) -> str: | ||
# Calculates the hash associated with the node's full path. | ||
hasher = hashlib.sha256() | ||
hasher.update(path.encode()) | ||
return hasher.hexdigest() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few thoughts:
- we could save space (~33%) by base64-encoding the hash's raw bytes
- Limitation: the resulting hash must be a valid C identifier
- base64 uses
+
,/
and=
which are illegal... - base32 saves only ~20% but doesn't have this problem
- base64 uses
- Limitation: the resulting hash must be a valid C identifier
- appending a known-unique value (e.g., node address) to the hash string would guarantee that collisions can never happen
- not sure this is worth the trouble with how unlikely SHA-256 collisions are, though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the idea!
base64 uses
+
,/
and=
which are illegal...
True, but we don't really need an invertible transformation, just a hash, so we can cheat a little:
=
is only used in padding, so we can drop it.- Replacing both
+
and/
with the legal symbol_
would make clashes a little more probable, but still vanishingly small: ChatGPT said with a convincing formula that it's 1.23 times more probable than a direct SHA256 collision (5.5E-78...).
At the same time, though, the identifier would become "only" 43 chars. Looks like a nice tradeoff. 🎉
include/zephyr/llext/symbol.h
Outdated
.name = STRINGIFY(x), .addr = (const void *)&x, \ | ||
/* LLEXT-enabled application: export symbols */ | ||
#define Z_EXPORT_SYMBOL_NAMED(sym_ident, sym_name) \ | ||
static const STRUCT_SECTION_ITERABLE(llext_const_symbol, sym_ident ## _sym) = { \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see why it's been done this way (sym_name
is a string), but wouldn't using sym_ident
to form the llext_const_symbol
's name prevent exporting the same symbol twice under different names? (at least if both are in the same translation unit)
I think taking the sym_name
in token form and STRINGIFY()
ing it in the macro would be acceptable, if that can help solve this issue.
Same remark for SLID version of the macro.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would you have to export the same symbol twice with two names? One should be enough for everybody! 🤭
... seriously, this PR wanted to transparently rename symbols when appropriate, so I wasn't thinking about the use case you mentioned (there's always one EXPORT_SYMBOL
for each identifier).
It makes sense though, I will have to think about using the sym_name
- I remember there was a problem that made me use strings and not tokens, but I'm not sure what that was.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback @mathieuchopstm . Had so many open PRs these days I lost track of this one! 🤦♂️ I need to finish it and push it on.
def _compute_hash(path: str) -> str: | ||
# Calculates the hash associated with the node's full path. | ||
hasher = hashlib.sha256() | ||
hasher.update(path.encode()) | ||
return hasher.hexdigest() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the idea!
base64 uses
+
,/
and=
which are illegal...
True, but we don't really need an invertible transformation, just a hash, so we can cheat a little:
=
is only used in padding, so we can drop it.- Replacing both
+
and/
with the legal symbol_
would make clashes a little more probable, but still vanishingly small: ChatGPT said with a convincing formula that it's 1.23 times more probable than a direct SHA256 collision (5.5E-78...).
At the same time, though, the identifier would become "only" 43 chars. Looks like a nice tradeoff. 🎉
include/zephyr/llext/symbol.h
Outdated
.name = STRINGIFY(x), .addr = (const void *)&x, \ | ||
/* LLEXT-enabled application: export symbols */ | ||
#define Z_EXPORT_SYMBOL_NAMED(sym_ident, sym_name) \ | ||
static const STRUCT_SECTION_ITERABLE(llext_const_symbol, sym_ident ## _sym) = { \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would you have to export the same symbol twice with two names? One should be enough for everybody! 🤭
... seriously, this PR wanted to transparently rename symbols when appropriate, so I wasn't thinking about the use case you mentioned (there's always one EXPORT_SYMBOL
for each identifier).
It makes sense though, I will have to think about using the sym_name
- I remember there was a problem that made me use strings and not tokens, but I'm not sure what that was.
This pull request has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this pull request will automatically be closed in 14 days. Note, that you can always re-open a closed pull request at any time. |
220641b
to
55b07ef
Compare
v3, almost ready for public review:
|
It seems that #83748 is the main work. Can we close this PR and create an issue for the RFC? |
In fact it's the opposite - this is the main work and #83748 is only the first commit of this series. I have now submitted the RFC as issue #83800. |
55b07ef
to
1533e22
Compare
Add a new "hash" attribute to all Devicetree EDT nodes. The hash is calculated on the full path of the node; this means that its value remains stable across rebuilds. The hash is checked for uniqueness among nodes in the same EDT. This computed token is then added to `devicetree_generated.h` and made accessible to Zephyr code via a new DT_NODE_HASH(node_id) macro. Signed-off-by: Luca Burelli <[email protected]>
Add a new set of macros that allow customizing the symbol name when exporting symbols. This is useful when the symbol name that extensions need to look up is different from the identifier used in the base image. Signed-off-by: Luca Burelli <[email protected]>
This new option allows to export devices using identifiers generated from the hash of the devicetree node path, instead of the device's ordinal number. Identifiers generated this way are stable across rebuilds. Add new test cases to test this new option. Signed-off-by: Luca Burelli <[email protected]>
1533e22
to
3607387
Compare
v4, incorporating the feedback received in #83748:
|
No further feedback received on the RFC or this PR so marking this as open for full review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies for my limited familiarity with llext. So, I will leave this task to others. Thank you for your work on this.
This is the candidate implementation for RFC #83800.