Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ASoC: SOF: ipc4/Intel: Improve pcm delay reporting #4851

Conversation

ujfalusi
Copy link
Collaborator

@ujfalusi ujfalusi commented Mar 7, 2024

Hi,
this PR is a new implementation of the closed #4791, providing (I hope) more robust support.

main highlights:

  • fix for pause/resume and xrun handling: re-read the start offset from firmware
  • callback for reading host accessible LLP has been renamed and
  • new host DMA position query callback added
  • the delay calculation itself is fixed by using the up-to-date host position, wrapping the LLP with the host boundary and to special case the time while the first frame is still within the DSP processing pipeline

With these patches and an improved audio conformance test (to take the delay into account) we are passing the test quite nicely on hw:0,0, hw:0,31 and DSPless/legacy mode (the later passed before, but indicates that the improved conformance test is not regressing):

command line for the test runs:
./alsa_conformance_test/alsa_conformance_test -P hw:0,X --iterations=20 -B4800 -p4800-d2

hw:0,0

----------RUN RESULT----------
number of recorders: 20
number of points: 1214559
step average: 1.496892
step min: 1
step max: 7
rate average: 47999.996679
rate min: 47999.991852
rate max: 48000.001621
rate error average: 0.290328
rate error min: 0.288493
rate error max: 0.291660
number of underrun: 0
number of overrun: 0

hw:0,31 (100ms deep buffer)

----------RUN RESULT----------
number of recorders: 20
number of points: 582554
step average: 3.001928
step min: 1
step max: 9
rate average: 47999.988525
rate min: 47999.967958
rate max: 48000.006057
rate error average: 0.101782
rate error min: 0.083003
rate error max: 0.115703
number of underrun: 0
number of overrun: 0

Note on the steps for the DB case: it is max 9 despite the fact that the DMA jumps 4800 first then 4608 frames at each burst. This is because the test now takes the delay into account which 'corrects' this jumping.

Copy link
Collaborator

@kv2019i kv2019i left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First 9 patches ok, one question on patch 10, please see inline.

sound/soc/sof/ipc4-pcm.c Outdated Show resolved Hide resolved
sound/soc/sof/ipc4-pcm.c Outdated Show resolved Hide resolved
@ujfalusi ujfalusi force-pushed the peter/sof/pr/ipc4_delay_reporting_batch_01 branch from 63a5bad to 3425a7b Compare March 7, 2024 11:44
@ujfalusi
Copy link
Collaborator Author

ujfalusi commented Mar 7, 2024

Changes since v1:

  • use the correct boundary (time_info->boundary) when the pointers wrap compared to each other.
  • fix misspelled word in comment

sound/soc/sof/sof-priv.h Outdated Show resolved Hide resolved
sound/soc/sof/ipc4-pcm.c Show resolved Hide resolved
sound/soc/sof/ipc4-pcm.c Show resolved Hide resolved
kv2019i
kv2019i previously approved these changes Mar 7, 2024
@ujfalusi ujfalusi force-pushed the peter/sof/pr/ipc4_delay_reporting_batch_01 branch from 3425a7b to 53d764f Compare March 8, 2024 14:11
@ujfalusi
Copy link
Collaborator Author

ujfalusi commented Mar 8, 2024

Changes since v4:

  • the leftover 'break;' is removed
  • marked the get_host_byte_counter callback as optional
  • reworded the last commit message

Copy link
Member

@plbossart plbossart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

other sets of comments.

sound/soc/sof/ipc4-pcm.c Outdated Show resolved Hide resolved
sound/soc/sof/ipc4-pcm.c Outdated Show resolved Hide resolved
Copy link
Member

@plbossart plbossart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one more

sound/soc/sof/ipc4-pcm.c Outdated Show resolved Hide resolved
@ujfalusi
Copy link
Collaborator Author

Changes since v5:

  • Introduce sof_ipc_pcm_ops.pointer callback (@plbossart FYI)
  • Move the delay calculation to the new pointer callback and use the host byte counter to calculate the pointer value
  • In the delay callback only return the pre-calculated delay value
  • Reword the comments and rename the variables to indicate that they are counters

Copy link
Member

@plbossart plbossart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this new version @ujfalusi !

I added a couple of comments but this is mostly cosmetic.

sound/soc/sof/ipc4-pcm.c Show resolved Hide resolved
sound/soc/sof/ipc4-pcm.c Show resolved Hide resolved
sound/soc/sof/ipc4-pcm.c Outdated Show resolved Hide resolved
sound/soc/sof/ipc4-pcm.c Show resolved Hide resolved
sound/soc/sof/ipc4-pcm.c Show resolved Hide resolved
@ujfalusi ujfalusi force-pushed the peter/sof/pr/ipc4_delay_reporting_batch_01 branch from 4e39e9d to c1d7dc9 Compare March 14, 2024 14:18
@ujfalusi
Copy link
Collaborator Author

Changes since v6:

  • commit message and comments added to clarify some details
  • the delay in time_info is initialized to 0, so no magic needed in delay() callback - also documented in comments

@ujfalusi
Copy link
Collaborator Author

@plbossart, @kv2019i, @ranj063, we only have one major remaining issue with the delay: pause/resume or xrun or stop/start w/o stream close on HDA link (when the host register is used for the LLP):
The LLP is not reset to 0 while the host counter is. I could not find a way to reset the LLP back to 0 (not even with playing with RSM bit), so I ended up with these commit:
ujfalusi@8d0b8ba
ujfalusi@b2decbf

with these + plus the constraint things look pretty good on SDW and HDA links and with various media players.

@plbossart
Copy link
Member

@plbossart, @kv2019i, @ranj063, we only have one major remaining issue with the delay: pause/resume or xrun or stop/start w/o stream close on HDA link (when the host register is used for the LLP): The LLP is not reset to 0 while the host counter is.

for playback doesn't the start_offset take care of the issue? In theory the LLP can keep going and the firmware would compensate, no?

@ujfalusi
Copy link
Collaborator Author

@plbossart, @kv2019i, @ranj063, we only have one major remaining issue with the delay: pause/resume or xrun or stop/start w/o stream close on HDA link (when the host register is used for the LLP): The LLP is not reset to 0 while the host counter is.

for playback doesn't the start_offset take care of the issue? In theory the LLP can keep going and the firmware would compensate, no?

Well, in theory yes, the code is there.
In reality the link DMA is always stopped in firmware and the start_offset is always calculated assuming that the link counter is starting from 0 (there is never a COMP_TRIGGER_RELEASE, only COMP_TRIGGER_START). The non HDA LLP is indeed reset to 0 but for some reason the HDA LLP (host accessible) is not. Firmware has no access to this LLP so it needs to be compensated on host side.

@ujfalusi ujfalusi force-pushed the peter/sof/pr/ipc4_delay_reporting_batch_01 branch from c1d7dc9 to 8ec25fa Compare March 15, 2024 07:57
@ujfalusi
Copy link
Collaborator Author

Changes since v7:

@ujfalusi ujfalusi linked an issue Mar 15, 2024 that may be closed by this pull request
Copy link
Collaborator

@kv2019i kv2019i left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested this on MTL system with SDW codecs and works now great with seeks (tested multiple applications), both regular PCMs and deepbuffer configuration. Pauses continue be a problem, easily triggered with mplayer/mpv.

@ujfalusi
Copy link
Collaborator Author

Tested this on MTL system with SDW codecs and works now great with seeks (tested multiple applications), both regular PCMs and deepbuffer configuration. Pauses continue be a problem, easily triggered with mplayer/mpv.

Yes, it can be also tested with aplay -i, basically the firmware starts to report drifting and incorrect stream_start_offset and stream_end_offset values and that blows up the kernel reporting.
So far I was not able to find any way to make sense of the situation, It cannot be fixed in kernel.

A) SNDRV_PCM_INFO_PAUSE
B) delay reporting

We cannot have A & B at the moment, it is either A or B.
PAUSE causes t he delay to be broken if used.

Copy link
Member

@plbossart plbossart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, only a comment on the last patch @ujfalusi

snd_pcm_hw_constraint_minmax(substream->runtime,
SNDRV_PCM_HW_PARAM_BUFFER_TIME,
spcm->stream[substream->stream].dsp_buffer_time_ms * USEC_PER_MSEC * 2,
UINT_MAX);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By default

sps->dsp_buffer_time_ms = SOF_IPC4_MIN_DMA_BUFFER_SIZE;

so this forces the ALSA buffer to be a minimum of 4ms.

Also this is a bit ironic that the race happens for large buffers, IIRC the issue was an xrun with small periods with mplayer? If this is only for large buffers, could we enforce this constraint when

spcm->stream[substream->stream].dsp_buffer_time_ms > SOF_IPC4_MIN_DMA_BUFFER_SIZE;

Bear with me, reviewing on Friday afternoon...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The xrun happens with large buffers because application rarely if ever took small (< 4ms) buffers.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so in practice could we not make this constraint conditional, e.g.

if ( spcm->stream[substream->stream].dsp_buffer_time_ms > SOF_IPC4_MIN_DMA_BUFFER_SIZE) {
    snd_pcm_hw_constraint_minmax(substream->runtime,
			SNDRV_PCM_HW_PARAM_BUFFER_TIME,
			spcm->stream[substream->stream].dsp_buffer_time_ms * USEC_PER_MSEC * 2,
			UINT_MAX);
}

Copy link
Collaborator Author

@ujfalusi ujfalusi Mar 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so in practice could we not make this constraint conditional, e.g.

if ( spcm->stream[substream->stream].dsp_buffer_time_ms > SOF_IPC4_MIN_DMA_BUFFER_SIZE) {
    snd_pcm_hw_constraint_minmax(substream->runtime,
			SNDRV_PCM_HW_PARAM_BUFFER_TIME,
			spcm->stream[substream->stream].dsp_buffer_time_ms * USEC_PER_MSEC * 2,
			UINT_MAX);
}

No, the ALSA buffer must be constrained at all times, if the PCM uses the default 2ms buffer then the first DMA burst will copy 2ms data, the buffer should be double of that.

While on capture the ALSA buffer size should be at least 2ms, the DMA copies 1ms chunks from DSP to host, this I left out, but I can add.

@marc-hb
Copy link
Collaborator

marc-hb commented Mar 16, 2024

SOFCI TEST

@ujfalusi ujfalusi force-pushed the peter/sof/pr/ipc4_delay_reporting_batch_01 branch from 8ec25fa to a0fb868 Compare March 19, 2024 08:46
ujfalusi added 16 commits March 19, 2024 10:46
The dsp_max_burst_size_in_ms can be used to save the length of the maximum
burst size in ms the host DMA will use.

Platform code can place constraint using this to avoid user space
requesting too small ALSA buffer which will result xruns.

Signed-off-by: Peter Ujfalusi <[email protected]>
When setting up the pcm widget, save the DSP buffer size (in ms) for
platform code to place a constraint on playback.

On playback the DMA will fill the buffer on start and if the period
size is smaller it will immediately overrun.

On capture the DMA will move data in 1ms bursts.

Signed-off-by: Peter Ujfalusi <[email protected]>
…traint

If the PCM have the dsp_max_burst_size_in_ms set then place a constraint
to limit the minimum buffer time to avoid xruns caused by DMA bursts
spinning on the ALSA buffer.

Signed-off-by: Peter Ujfalusi <[email protected]>
…ink Position)

Rename the function to hda_dsp_get_stream_llp and add a brief function
description.

Signed-off-by: Peter Ujfalusi <[email protected]>
…orting

For delay calculation we need two information:
Number of bytes transferred between the DSP and host memory (ALSA buffer)
Number of frames transferred between the DSP and external device
(link/codec/DMIC/etc).

The reason for the different units (bytes vs frames) on host and dai side
is that the format on the dai side is decided by the firmware and might
not be the same as on the host side, thus the expectation is that the
counter reflects the number of frames.
The kernel know the host side format and in there we have access to the
DMA position which is in bytes.

In a simplified way, the DSP caused delay is the difference between the
two counters.

The existing get_stream_position callback is defined to retrieve the frame
counter on the DAI side but it's name is too generic to be intuitive and
makes it hard to define a callback for the host side.

This patch introduces a new set of callbacks to replace the
get_stream_position and define the host side equivalent:
get_dai_frame_counter
get_host_byte_counter

Subsequent patches will remove the old callback.

Signed-off-by: Peter Ujfalusi <[email protected]>
Add implementation for reading the LDP (Linear DMA Position) to be used as
get_host_byte_counter().
The LDP is counting the number of bytes moved between the DSP and host
memory.

Set the get_dai_frame_counter to hda_dsp_get_stream_llp, which is counting
the frames on the link side of the DSP.

Signed-off-by: Peter Ujfalusi <[email protected]>
…pcm_delay

Switch to the new callback to retrieve the DAI (link) frame counter.

Signed-off-by: Peter Ujfalusi <[email protected]>
…callback

The get_stream_position has been replaced by get_dai_frame_counter, it
should not be set to allow it to be dropped from core code.

Signed-off-by: Peter Ujfalusi <[email protected]>
The get_stream_position has been replaced by get_dai_frame_counter and all
related code can be dropped form the core.

Signed-off-by: Peter Ujfalusi <[email protected]>
…ocally

The sof_ipc4_timestamp_info is only used by ipc4-pcm.c internally, it
should not be in a generic header implying that it might be used elsewhere.

Signed-off-by: Peter Ujfalusi <[email protected]>
…igger

The SNDRV_PCM_TRIGGER_PAUSE_PUSH does not need to be a separate case, it
can be handled along with STOP and SUSPEND

Signed-off-by: Peter Ujfalusi <[email protected]>
When the final state is SOF_IPC4_PIPE_PAUSED, it is possible that the
stream will be restarted (resume or start) in which case we need to update
the offset from the firmware.

Signed-off-by: Peter Ujfalusi <[email protected]>
The IPC specific pointer callback can be used when additional or custom
handling is needed during the pointer calculation, like executing a delay
calculation at the same time to minimize drift between the reported pointer
and the calculated delay.

Signed-off-by: Peter Ujfalusi <[email protected]>
This patch improves the delay calculation by relying on the
LLP (Linear Link Position) on the DAI side and the
LDP (Linear Data Pointer) on the host side. The LDP provides the same DMA
position as LPIB, but with a linear count instead of a position in the
ALSA ring buffer. The LDP values are provided in bytes and must be
converted to frames. The difference in units means that the host counter
will wrap earlier than the LLP. We need to wrap the LLP at the same
boundary as the host counter.

The ASoC framework relies on separate pointer and delay callback.
Measurement errors can be reduced by processing all the counter values in
the pointer callback. The delay value is stored, and will be reported to
higher levels in the delay callback.

For playback, the firmware provides a stream_start offset to handle
mixing/pause usages, where the DAI might have started earlier than the
PCM device. The delay calculation must be special-cased when the link
counter has not reached the start offset value, i.e. no valid audio has
left the DSP.

Signed-off-by: Peter Ujfalusi <[email protected]>
The pplcllpl/u can be used to save the Link Connection Linear Link
Position register value to be used for compensation of the LLP register
value in case the counter is not reset (after pause/resume or
stop/start without closing the stream).

The LLP can be used along with PPHCLDP to calculate delay caused by the DSP
processing for HDA links.

Signed-off-by: Peter Ujfalusi <[email protected]>
During pause/reset or stop/start the LLP counter is not reset, which will
result broken delay reporting.

Read the LLP value on STOP/PAUSE trigger and use it in LLP reading to
normalize the LLP from the register.

Signed-off-by: Peter Ujfalusi <[email protected]>
@ujfalusi
Copy link
Collaborator Author

Changes since v8:

  • Moved the constraint patches earlier
  • renamed the dsp_buffer_time_ms to dsp_max_burst_size_in_ms and set it to 1ms for capture

Copy link
Collaborator

@kv2019i kv2019i left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ujfalusi , looks good now.

@plbossart
Copy link
Member

Feel free to merge @ujfalusi, I can't recall if we need to disable pause as well or if your last patch corrects the problem.

@ujfalusi
Copy link
Collaborator Author

Feel free to merge @ujfalusi, I can't recall if we need to disable pause as well or if your last patch corrects the problem.

I would try to fix the firmware or at least make it less broken. It would be a major pain to deal with the disable and then re-enabling of the pause.

@ujfalusi ujfalusi merged commit f80b8f3 into thesofproject:topic/sof-dev Mar 20, 2024
10 of 14 checks passed
@ujfalusi ujfalusi deleted the peter/sof/pr/ipc4_delay_reporting_batch_01 branch December 13, 2024 08:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

snd_pcm_delay() reporting unexpected values
4 participants