Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kpb: Fix draining task and LL conflict #8936

Merged
merged 1 commit into from
Mar 22, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 28 additions & 1 deletion src/audio/kpb.c
Original file line number Diff line number Diff line change
Expand Up @@ -1742,13 +1742,24 @@ static enum task_state kpb_draining_task(void *arg)
uint64_t current_time;
size_t period_bytes = 0;
size_t period_bytes_limit = draining_data->pb_limit;
size_t period_copy_start = sof_cycle_get_64();
size_t period_copy_start;
size_t time_taken;
size_t *rt_stream_update = &draining_data->buffered_while_draining;
struct comp_data *kpb = comp_get_drvdata(draining_data->dev);
bool sync_mode_on = draining_data->sync_mode_on;
bool pm_is_active;

/*
* WORKAROUND: The code below accesses KPB sink buffer and calls comp_copy() on
* component connected to sink. EDF task thread has preemptible low priority and
* so can be preempted by LL thread. This could result in broken state of sink buffer
* or component connected to sink.
* Hence k_sched_lock() is used temporary to block LL from preempting EDF task thread.
*/
#ifdef __ZEPHYR__
k_sched_lock();
#endif

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't this break the concept of LL? The draining part could take quite a while for a large buffer, and even with this cycle of lock/unlock the entire premise of LL executing within 1ms is broken.

That's a priority inversion in my book where a low-priority task prevents a high-priority/low-latency one from being executed...

Why can't we fix the broken state of the sink instead ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On each loop iteration draining task basically just calls comp_copy() on component connected to KPB sink. So each loop iteration should take much less then 1 ms.
If timer interrupt increased semaphore on which LL thread waits, immediately at k_yield() scheduler will switch to executing LL thread since LL thread has higher priority then EDF task thread. Or am I overlooking something?

Looks like original draining task code was written in pre-Zephyr era. And probably the code works or worked without Zephyr just fine. However, the code seems were never properly ported to Zephyr. Original draining task implementation has multiple problems:

  • there is no sleep/yield in loop iteration and so task blocks IPC thread for a long time (several seconds in our test) and waists power (CPU cannot reach idle between loop iterations).
  • LL can preempt draining task and call comp_copy() for same component or access/modify same buffer/stream.

This fix is an attempt to fix CI failure, not to do proper reimplementation of draining task. Without the fix we have random failures on KPB test. We have to remove test temporary just not to block all the work on SOF.

The test fails with draining task runs forever because KPB sink audio_stream has broken state: avail == free == 0. Such state normally cannot happen as those values are always calculated as avail = size - free and free = size - avail. avail and free simultaneously equal to 0 can only happen when LL preempts draining task in unlucky moment.

The fix passes the test. It has no intent to re-implement draining task properly: I'm not sure if KPB is used in any production case other then demo. The intent is to fix CI and return back KPB test.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like you said no one is using KPB in any firmware based on Zephyr (post 2.2).

I am not getting the idea of 'fixing CI' which consists in re-testing a broken solution based on a workaround impacting other usages.

Is KPB is so badly broken, it needs to be disabled and redone, no?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's value in keeping kpb tests. This exercizes a certain type of flow (running an EDF task) in CI that we don't have many others in CI and if we remove the test, this code will bitrot very quickly. The workaround here is local to kpb, so I see no harm outside kpb.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kv2019i "The workaround here is local to kpb" -> not really, it would impact any other compound use case with e.g. speaker protection running.

If KPB is a test component, them let's mark it as non-default and opt-in for CI purposes. This shouldn't go in any releases IMHO.

comp_cl_info(&comp_kpb, "kpb_draining_task(), start.");

pm_is_active = pm_runtime_is_active(PM_RUNTIME_DSP, PLATFORM_PRIMARY_CORE_ID);
Expand All @@ -1760,8 +1771,19 @@ static enum task_state kpb_draining_task(void *arg)
kpb_change_state(kpb, KPB_STATE_DRAINING);

draining_time_start = sof_cycle_get_64();
period_copy_start = draining_time_start;

while (drain_req > 0) {
/*
* Draining task usually runs for quite a lot of time (could be few seconds).
* LL should not be blocked for such a long time.
*/
#ifdef __ZEPHYR__
k_sched_unlock();
k_yield();
k_sched_lock();
#endif

/* Have we received reset request? */
if (kpb->state == KPB_STATE_RESETTING) {
kpb_change_state(kpb, KPB_STATE_RESET_FINISHING);
Expand Down Expand Up @@ -1868,6 +1890,11 @@ static enum task_state kpb_draining_task(void *arg)
comp_cl_info(&comp_kpb, "KPB: kpb_draining_task(), done. %u drained in > %u ms",
drained, UINT_MAX);

/* Restore original EDF thread priority */
#ifdef __ZEPHYR__
k_sched_unlock();
#endif

return SOF_TASK_STATE_COMPLETED;
}

Expand Down
Loading