-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using ULT stack to store RPC lineage #300
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #300 +/- ##
==========================================
+ Coverage 57.93% 57.97% +0.04%
==========================================
Files 70 71 +1
Lines 10196 10202 +6
Branches 1341 1339 -2
==========================================
+ Hits 5907 5915 +8
+ Misses 3453 3451 -2
Partials 836 836 ☔ View full report in Codecov by Sentry. |
|
||
static inline int margo_lineage_set(hg_id_t current_rpc_id) { | ||
__MARGO_LINEAGE_COMMON; | ||
memcpy(stackend - 8 - sizeof(hg_id_t), magic, 8); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor stylistic comment. It looks odd for one sizeof() to use the type and the other sizeof() to use the variable (visually it makes you think that they are different values). Maybe just use the same style all the way through? I usually prefer the latter but the former might make more sense here since erase() doesn't have a variable name to use.
Nifty solution to stash the information in the stack :) I would not have thought of that. I would have a mild preference for fixing the key functions in ABT if it were straightforward (since they are intuitive for this use case), but this is probably the more practical performance solution. I'll run some tests with a profiler and report back. |
Unfortunately, based on an initial hpctoolkit profile this looks like it may actually be more costly than the ABT key based implementation: This is part of the "bottom up" view from hpcviewer. It shows Interestingly I think I'm getting more detail on my current hpctoolkit runs than I was before. I'm going to try a clean build without this PR and see if I can find more information about costs in the original implementation. If it indicates something straightforward in the key path that we could improve then maybe we consider that, otherwise this may just be an overhead that we need to pay for now and move on to some higher-value issues. |
It's the |
FWIW, I am not able to resolve anything further below So maybe false alarm about seeing more detail in hpctoolkit this time. If the get_attr() is explicitly using posix_memalign() rather than a simple malloc that could be the culprit in the previous path; I don't know if it has a specific reason for creating aligned memory. |
Hrm. The default key table size is tunable in Argobots via |
Setting |
@carns This PR is an alternative to disabling
ABT_keys
for lineage tracking. It tries to solve the performance problem by using the end of a ULT's stack to write the current RPC ID. I'm also writing a magic number before it so we can check if the RPC ID has been set (rather than getting garbage), and these values are wiped at the end of the ULT so if the stack is reused, we don't risk having valid information there.While we rarely use lineage information for RPCs themselves (i.e. knowing the parent of an RPC), I use lineage information all the time when it comes to tracking performance of bulk transfers, so I'd like to try fixing the performance problem rather than making lineage tracking optional and off by default. Could you give this PR a try with hpctoolkit?