Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] pause resume PID **** had non-zero exit status when checking multiple-pause/resume #8770

Closed
keqiaozhang opened this issue Jan 23, 2024 · 4 comments
Assignees
Labels
bug Something isn't working as expected LNL Applies to Lunar Lake platform MTL Applies to Meteor Lake platform P2 Critical bugs or normal features

Comments

@keqiaozhang
Copy link
Collaborator

Describe the bug
This issue happens when checking the multiple-pause/resume case, it shows "pause resume PID **** had non-zero exit status" in console log, but there's no error in kernel log or mtrace. The reproduction rate is low about 20%. This issue can be reproduced on both MTL and LNL NOCODEC platforms.

Inner test ID:
LNLM_RVP_NOCODEC: 37146
MTLP_RVP_NOCODEC: 37016

To Reproduce
~/sof-test/test-case/multiple-pause-resume.sh -r 50

Reproduction Rate
20%

Environment

  1. Branch name and commit hash of the 2 repositories: sof (firmware/topology) and linux (kernel driver).
  2. Name of the topology file
    • Topology: {sof-lnl-nocodec.tplg and sof-mtl-nocodec.tplg}
  3. Name of the platform(s) on which the bug is observed.
    • Platform: {LNL-NOCODEC and MTL-NOCODEC}

Screenshots or console output

=== PAUSE ===                                                            
(50/50) pcm'Port1' cmd'aplay' id'9': Wait for 32 ms before resume

2024-01-21 23:46:26 UTC [REMOTE_INFO] pipeline: Deepbuffer Port0 with aplay
2024-01-21 23:46:26 UTC [REMOTE_INFO] pipeline: Port1 with aplay
2024-01-21 23:46:26 UTC [REMOTE_INFO] Check expect exit status
declare -- cmd="journalctl_cmd --since=@1705880315"
2024-01-21 23:46:26 UTC [REMOTE_ERROR] pause resume PID 65117 had non-zero exit status
2024-01-21 23:46:26 UTC [REMOTE_INFO] Starting func_exit_handler(1)

dmesg.txt

mtrace.txt

@keqiaozhang keqiaozhang added bug Something isn't working as expected P2 Critical bugs or normal features MTL Applies to Meteor Lake platform LNL Applies to Lunar Lake platform labels Jan 23, 2024
@kv2019i
Copy link
Collaborator

kv2019i commented Feb 2, 2024

@serhiy-katsyuba-intel @abonislawski @lyakh @teburd I think I found the reason why the DMA stops moving. The GEN bit is not set for some reason. When the problem occurs, I can see that bit26 of DGCS is not set (added debug to call the existing intel_adsp_hda_dbg() macro on error:

[16060.576496] <inf> ipc: ipc_cmd: rx   : 0x13020004|0x0
[16060.576601] <inf> copier: copier_prepare: comp:2 0x20004 copier_prepare()
[16060.576613] <inf> pipe: pipeline_trigger: pipe:2 0x0 pipe trigger cmd 7
[16060.576630] <inf> ll_schedule: zephyr_ll_task_schedule_common: task add 0xa0109a80 0x20210U priority 0 flags 0x0
WARNING: GEN bit not set after enable!
ZEPHYR_BASE/drivers/dma/dma_intel_adsp_hda.c:232 hda not move(1:0x0x72840), dgcs: 0x80810000, dgbba 0x109c80, dgbs 768, dgbrp 0, dgbwp 0, dgbsp 0, dgmbs 768, dgbllpi 0x0, dglpibi 0x0
[16060.578693] <inf> host_comp: host_get_copy_bytes_normal: comp:2 0x20004 no bytes to copy, available samples: 0, free_samples: 192

So "dgcs: 0x80810000" is wrong and explains why data is not moving and we get flood of "no bytes to copy" messages. Without GEN set, the DMA cannot write to DSP memory.

It seems with multiple cores using the host DMA engine, at least writes to DGCS are not immediately reflected in the read back values. Based on this, I made a few fix attempt pushed at https://github.com/kv2019i/zephyr/commits/202402-hdadma-gen-bit-fix-attempts but these don't completely solve the problem.

Even if I added a 10ms busyloop to poll for GEN bit to be set, it is read back as 0, so something is blocking the setting.

@kv2019i
Copy link
Collaborator

kv2019i commented Mar 6, 2024

This will be fixed by zephyrproject-rtos/zephyr#69480 . Assigning to @serhiy-katsyuba-intel for follow-up as you are the fix author. Let's not close this SOF issue until we have the fix integrated to SOF main.

@kv2019i
Copy link
Collaborator

kv2019i commented Apr 2, 2024

Closing as fix merged. Please reopen if seen again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working as expected LNL Applies to Lunar Lake platform MTL Applies to Meteor Lake platform P2 Critical bugs or normal features
Projects
None yet
Development

No branches or pull requests

4 participants