Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kernel bug #2

Open
mingkuang opened this issue Apr 27, 2015 · 2 comments
Open

kernel bug #2

mingkuang opened this issue Apr 27, 2015 · 2 comments

Comments

@mingkuang
Copy link

we use tool like buildroot to build up a linux system with version 3.12.20 and x86_64 GNU/Linux
and sometimes we got following

[ 7861.249280] ------------[ cut here ]------------
[ 7861.253919] kernel BUG at /drivers/tw6869/TW68-core.c:493!
[ 7861.264003] invalid opcode: 0000 [#1] SMP
[ 7861.268157] Modules linked in: tw68(o) r8168(o)
[ 7861.272773] CPU: 0 PID: 6534 Comm: IfSystemStatus Tainted: GO 3.12.20 #2
[ 7861.280429] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M, BIOS L0.19 02/03/2015
[ 7861.289561] task: ffff8801e465bd80 ti: ffff88007b3d0000 task.ti: ffff88007b3d0000
[ 7861.297046] RIP: 0010:[] [] TW68_buffer_next+0x84/0x90 [tw68]
[ 7861.306038] RSP: 0018:ffff88023fc03e40 EFLAGS: 00010286
[ 7861.311354] RAX: 0000000000000000 RBX: ffff8802343f4c08 RCX: 0000000000000001
[ 7861.318491] RDX: 0000000000000001 RSI: ffff8802343f4c08 RDI: ffff8802343f4000
[ 7861.325630] RBP: ffff88023fc03e48 R08: 0000000000000000 R09: 0000000000000003
[ 7861.332767] R10: 0000000000000130 R11: 0000000000000004 R12: ffff8802343f4c08
[ 7861.339904] R13: 00000000000000ff R14: ffff8802343f4000 R15: 0000000000000001
[ 7861.347041] FS: 00007f8bfeb0c700(0000) GS:ffff88023fc00000(0000) knlGS:0000000000000000
[ 7861.355133] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 7861.360883] CR2: 000000001d84ff98 CR3: 000000006e162000 CR4: 00000000001007f0
[ 7861.368019] Stack:
[ 7861.370039] ffff8802343f4000 ffff88023fc03e68 ffffffffa0073a71 0000000000000001
[ 7861.377543] 0000000000000001 ffff88023fc03ea8 ffffffffa006e073 0000009b7b3d1fd8
[ 7861.385043] ffff8802343f55b8 0000000000000000 0000000000000013 ffffffff820050b0
[ 7861.392544] Call Trace:
[ 7861.395000]
[ 7861.396935] [] TW68_irq_video_done+0xb1/0x160 [tw68]
[ 7861.403789] [] video_tasklet+0x73/0x110 [tw68]
[ 7861.409887] [] tasklet_action+0x5e/0x100
[ 7861.415468] [] __do_softirq+0xf0/0x240
[ 7861.420872] [] call_softirq+0x1c/0x30
[ 7861.426189] [] do_softirq+0x35/0x70
[ 7861.431331] [] irq_exit+0x95/0xa0
[ 7861.436300] [] do_IRQ+0x51/0xc0
[ 7861.441097] [] common_interrupt+0x6a/0x6a
[ 7861.446756]
[ 7861.448692] [] ? finish_task_switch+0x48/0xa0
[ 7861.454928] [] __schedule+0x35f/0x7a0
[ 7861.460245] [] schedule+0x24/0x70
[ 7861.465213] [] do_nanosleep+0xbd/0x120
[ 7861.470616] [] hrtimer_nanosleep+0x99/0x150
[ 7861.476452] [] ? hrtimer_get_res+0x40/0x40
[ 7861.482203] [] ? do_nanosleep+0x69/0x120
[ 7861.487780] [] SyS_nanosleep+0x5e/0x80
[ 7861.493184] [] system_call_fastpath+0x1a/0x1f
[ 7861.499192] Code: e8 42 59 fe e0 48 8b 15 eb b4 0b e2 48 8d 7b 30 48 8d 34 10 e8 4e f1 fe e0 5b 5d c3 0f 1f 00 48 8d 7e 30 e8 1f e8 fe e0 5b 5d c3 <0f> 0b 66 2e 0f 1f 84 00 00 00 00 00 55 48 8d 87 80 09 00 00 48
[ 7861.519728] RIP [] TW68_buffer_next+0x84/0x90 [tw68]
[ 7861.526371] RSP
[ 7861.535111] ---[ end trace 63f56627bf9e64ae ]---
[ 7861.539789] Kernel panic - not syncing: Fatal exception in interrupt
[ 7861.546202] drm_kms_helper: panic occurred, switching back to text console
[ 0.906246] [drm:i915_stolen_to_physical] ERROR conflict detected with stolen region: [0x9f000000 - 0xbf000000]
/init start @ Thu Apr 16 05:52:28 UTC 2015

we have checked "TW68-core.c:493!" which was as below. But we still have no idea about what happen. Any idea for this? It doesn't happen always.

void TW68_buffer_next(struct TW68_dev *dev, struct TW68_dmaqueue *q)
{
......
BUG_ON(NULL != q->curr); ==> line 493

@igorizyumin
Copy link
Owner

Thanks for reporting this. Try the latest commit. The TW68_buffer_next function has an assert that the q->curr pointer is not NULL. However, in the TW68_irq_video_done function, this function is called outside the block that checks if this value is NULL (I have no idea why, it's obviously a bug and will crash every time that code path is taken). I moved that call inside the if block, so maybe try the latest commit and see if it works OK now.

Interestingly enough, I've been using this driver 24/7 on my system for over a year now with zero crashes. Must be a hardware configuration thing (perhaps the board shares an IRQ with something else?).

@mingkuang
Copy link
Author

Thank you for the reply.

Interestingly enough, I've been using this driver 24/7 on my system for over a year now with zero crashes. Must be a hardware configuration thing (perhaps the board shares an IRQ with something else?).

I'm not pretty sure if it's hardware-related. I try the reboot 10000 times and see this message less than 50 times. I'll try more. Thank you very much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants