-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lockup: Print tasks which have been on-cpu for too long #109
base: main
Are you sure you want to change the base?
Conversation
richl9
commented
Sep 30, 2024
- add helper for scanning potential lockups on cpus and dumping their info
- add prio in runq dump
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks interesting. I think my biggest question is whether the number of seconds on CPU is actually a reliable indicator of some sort of lockup, and whether 0 is a good default for that parameter.
drgn_tools/lockup.py
Outdated
""" | ||
Corelens Module for lockup | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This text appears in the corelens -L
output. It should be only one line, and useful to tell users what it does. EG:
""" | |
Corelens Module for lockup | |
""" | |
"""Print tasks which have been on-cpu for too long""" |
drgn_tools/lockup.py
Outdated
"--time", | ||
"-t", | ||
type=float, | ||
default=0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default of 0 seems unhelpful, since that prints any running task, right? Can we specify a more useful default?
Does --time
specify the amount of time that a task has been on-CPU without the opportunity to schedule? Or is it just for the current time slice? If there's a low load, it would be totally reasonable for a task to be on-CPU for a long time, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. Forgot to change it back after testing. Was thinking about default = 2. What do you think? And --time is for the current time slice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems reasonable, we can always adjust it in the future.
"-t", | ||
type=float, | ||
default=0, | ||
help="list all the processes that have been running more than <time> seconds", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clarify this: is it the total amount of seconds the task has been running? OR just since the last schedule? Or just since the last cond_resched()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got them time from task_lastrun2now(), so it's the duration from task last run timestamp to now.
drgn_tools/runq.py
Outdated
f' COMMAND: "{comm}"', | ||
f" RUNTIME: {timestamp_str(run_time)}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding this string after a comma makes it a second argument to print()
.
The rest of these strings are specified without commas, which instead performs string concatenation. Could you please get rid of the comma between these two lines?
drgn_tools/runq.py
Outdated
""" | ||
Print tasks which are in the RT and CFS runqueues on each CPU | ||
Print tasks which are in the RT and CFS runqueues on each CPU. Specify min_run_time_seconds to x to only print | ||
processes running more than x seconds. Set skip_swapper = True to not include swapper process in the ouput. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no skip_swapper
arg here. Either add it or delete the sentence.
Orabug: 37187006 Signed-off-by: Richard Li <[email protected]>
f49356e
to
ffac9ff
Compare