Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

X86-64 kernel backtrace #4

Open
wants to merge 25 commits into
base: master
Choose a base branch
from

Conversation

deepfire
Copy link
Contributor

This optionally provides a backtrace mechanism on x86-64:

  • governed by Config.stack_trace
  • looks like this:
FATAL: stack smashed.
    call trace (rbp: fffffe0003a45c80, stack: fffffe0003a45dc0-fffffe0003a41dc0, text: fffffe00030c3000-fffffe00031f43da):
fffffe00031ddaf7 __stack_chk_fail + 0x17
fffffe00031ce5e4 dump_idt + 0x294
fffffe00031ced1e gdb_arch_read_byte + 0x15e
         31cf841 arch_init + 0x5b1
kernel 0 PANIC! finally reached __stack_chk_fail()

@barrelfish-bot
Copy link
Collaborator

Can one of the admins verify this patch?

@@ -367,7 +371,7 @@ config_arrakismon = False
defines :: [RuleToken]
defines = [ Str ("-D" ++ d) | d <- [
if microbenchmarks then "CONFIG_MICROBENCHMARKS" else "",
if trace then "CONFIG_TRACE" else "",
if stack_trace then "CONFIG_KERNEL_STACK_TRACE" else "",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't delete trace/CONFIG_TRACE

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@antiguru, I have welded in a patch that rectifies this travesty, sorry!

void debug_vaddr_identify(lvaddr_t pml4, lvaddr_t vaddr);

void debug_sort_dynsyms (struct Elf64_Sym *dynsyms, int n);
void debug_setup_stackwalker (uint64_t stack_top, uint64_t stack_bottom, uint64_t text_start, uint64_t text_end, struct Elf64_Sym *dynsyms, char *dynstr, int nsyms);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to use more specific types instead of uint64_t, for example lpaddr_t for local physical addresses or lvaddr_t for virtual addresses. We might not do it correctly all the time but at least it gives a hint on how that address could be accessed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed this in additional patches..

@@ -62,6 +62,20 @@

#ifndef __ASSEMBLER__

static inline uint64_t __attribute__((always_inline)) rdrip(void)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this should have a return type of lvaddr_t.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed this in additional patches..

@@ -0,0 +1,266 @@
/*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative of copying newlib's qsort.c here would be to use the one in /lib/newlib/newlib/libc/search/qsort.c directly. I'm undecided what makes more sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, do tell if you come to a decision!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I initially thought there was no precedent to this kind of external inclusion of out-of-kernel-directory sources, but kernel/Hakefile references ../usr/drivers/cpuboot/init_ap_x86_64.S.

Not sure if this constitutes a significant point..

Copy link
Member

@antiguru antiguru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for sending us the pull request. It seems like a valuable addition to the kernel. I added some comments in-line where I see potential room for improvement.
Could the code be unified for all architectures, especially ARMv8?

@simu
Copy link
Member

simu commented Jan 11, 2017

@barrelfish-bot ok to test

deepfire and others added 15 commits January 12, 2017 17:15
..doesn't work due to:  NixOS/nixpkgs#21374

Signed-off-by: Kosyrev Serge <[email protected]>
Signed-off-by: Kosyrev Serge <[email protected]>
…bols missing from .dynsyms

Signed-off-by: Kosyrev Serge <[email protected]>
@deepfire
Copy link
Contributor Author

An extra note ought to be made -- perhaps even documented.

The backtrace mechanism currently depends on the .dynsyms section, which only carries globally-visible symbols, which naturally excludes functions declared as static.
This is somewhat painful, and so had to be worked around by making the stack analysis more precise -- that is, it now considers inter-symbol gaps, and reports when a stack return address points to such.

A more advanced solution would be to promote a more complete symbol section, in a format amenable to runtime parsing, to be marked as ALLOC and LOAD.

Since, apparently (I'd be happy to be wrong!), ld is unable to mark .debug_* sections as ALLOC and LOAD, an objcopy pass is required -- but that is more invasive than I could venture for without reaching for consensus on whether the thing is needed at all.

So, please, tell me what all of this makes you think..

@deepfire
Copy link
Contributor Author

On unification. The code can be generalised, indeed. It would take one more pass, though..

@deepfire
Copy link
Contributor Author

Do I get the interpretation of the test suite right -- namely that the boot on physical hardware has failed?
I see the following:

relinking /home/netos/tftpboot/menu.lst.10.110.4.15 to harness/nos5_harness/menu.lst
DEPRECATED reboot
executing /home/netos/tools/bin/rackpower -r nos5
Host key verification failed.
Warning: rackpower -r nos5 failed
DEPRECATED get_output
[0:00:00] [Enter `^Ec?' for help]
[0:00:00] [no, [email protected] is attached]
[0:06:00] [Error: boot timed out, retrying...]
DEPRECATED reboot
executing /home/netos/tools/bin/rackpower -r nos5
Host key verification failed.
Warning: rackpower -r nos5 failed
[0:12:00] [Error: boot timed out, retrying...]
DEPRECATED reboot
executing /home/netos/tools/bin/rackpower -r nos5
Host key verification failed.
Warning: rackpower -r nos5 failed
[0:18:00] [Error: boot timed out, retry limit reached]
timeout encountered in collect_data
harness: cleanup test
DEPRECATED get_tftp_dir
DEPRECATED shutdown
executing /home/netos/tools/bin/rackpower -d nos5
Host key verification failed.
Warning: rackpower -d nos5 failed
DEPRECATED unlock
quitting console process (12659)
removing /home/netos/tftpboot/harness/nos5_harness
Error: Timeout while running test (attempting to continue)
magic string "root (nd)" or "Kernel starting at address" not found, assuming no garbage in output

0/1 tests passed
Failed tests:
 * perfmontest

@simu
Copy link
Member

simu commented Jan 13, 2017

@deepfire Looks like the machine didn't come up, this does not say anything about the code under test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants