Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Audio: MDRC: Restructure Multiband DRC for more effective memory alloc… #9195

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

ShriramShastry
Copy link
Contributor

@ShriramShastry ShriramShastry commented Jun 5, 2024

This check-in enhances memory management in the Multiband Dynamic
Range Control (MDRC) component.

Key Changes:

  1. Streamlined memory allocation and initialization of crossover,
    emphasis, and de-emphasis coefficients directly in
    multiband_drc_init_coef.
  2. Updated multiband_drc_init to eliminate the allocation of any
    obsolete blocks.
  3. Adjusted multiband_drc_free to no longer check and free any
    now-nonexistent blocks.
  4. Simplified overall memory management by removing unnecessary
    layers of indirection.

Performance Improvements:

  • Reduced memory allocation overhead.
  • Enhanced data locality, improving cache efficiency.
  • Mitigated heap fragmentation, thereby reducing the chances of memory
    leaks.

@ShriramShastry ShriramShastry force-pushed the improve_multiband_drc_memory_optimization branch 2 times, most recently from d6d6854 to 012e192 Compare June 5, 2024 04:33
@ShriramShastry ShriramShastry force-pushed the improve_multiband_drc_memory_optimization branch 2 times, most recently from a8a3a75 to aee6da8 Compare June 5, 2024 05:44

struct sof_eq_iir_biquad *coefficients_block =
rzalloc(SOF_MEM_ZONE_RUNTIME, 0, SOF_MEM_CAPS_RAM,
total_coefficients_size);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now this seems to be allocated twice: first time inside multiband_drc_init() and a second time here.

Copy link
Contributor Author

@ShriramShastry ShriramShastry Jun 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your feedback regarding the memory allocation within the multiband DRC module

The allocation of memory for coefficients_block in multiband_drc_init_coef() is separate from the allocation of multiband_drc_comp_data in multiband_drc_init(). The latter is the component data for the multiband
DRC module, and it is allocated only once during module initialization.

I 've cross checked again.

The multiband_drc_comp_data allocation is for storing the component's state and is done once upon initialization (multiband_drc_init()).
The coefficients_block allocation is specifically for filter coefficients and occurs later during the module's preparation (multiband_drc_init_coef()).
The allocation for coefficients_block is not done in multiband_drc_init() and hence there is no duplication.
It looks like all allocations and initializations are following the correct sequence and avoiding double allocation for the same data structures. If there are any further concerns or points that require clarification, please let me know.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ShriramShastry Maybe I didn't explain it well. See line 132 below that you're removing

crossover = config->crossover_coef;
- that's where previously crossover was coming from. Now you added an allocation for it. But I don't see where you removed that original data block? So the original array seems to still be there and you add a new one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks ! I'II make the adjustments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done ! Please have a look.

@ShriramShastry ShriramShastry marked this pull request as ready for review June 5, 2024 14:18
@ShriramShastry ShriramShastry requested a review from a team as a code owner June 5, 2024 14:18
* deemphasis is situated after the emphasis coefficients.
* This ensures all filter coefficients are stored contiguously.
*/
crossover = coefficients_block;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer a modification to the datatypes rather than just packing them into a single buffer and crossing fingers no one screws up the math in the future

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the review and comments. The changes to the allocation and pointer setup are intended to enhance memory efficiency and cache performance for the MDRC component.

Here’s a summary of the changes and the rationale behind them:

Memory Allocation Consolidation: By allocating a single block of memory for crossover, emphasis, and deemphasis filter coefficients, we avoid separate heap allocations which can incur additional overhead and fragmentation.

size_t total_coefficients_size = sizeof(struct sof_eq_iir_biquad) *
                                 num_bands * nch * 3;

struct sof_eq_iir_biquad *coefficients_block =
    rzalloc(SOF_MEM_ZONE_RUNTIME, 0, SOF_MEM_CAPS_RAM,
            total_coefficients_size);

if (!coefficients_block) {
    comp_err(dev, "multiband_drc_init_coef(), failed to allocate memory for coefficients");
    return -ENOMEM;
}

Pointer Assignment: After the allocation, pointers for crossover, emphasis, and deemphasis filters are pointed to the appropriate locations within the single allocated block.

crossover = coefficients_block;
emphasis = coefficients_block + num_bands * nch;
deemphasis = emphasis + num_bands * nch;

This design assures that the data used concurrently during processing is also kept in memory, which will assist with cache usage when processing audio data.

Error Handling: A clear error handling path (goto err;) ensures that the allocated memory block is freed in the event of an error occurring after the allocation, preventing memory leaks.

if (ret < 0) {
    comp_err(dev, "multiband_drc_init_coef(), could not assign coeffs to ch %d", ch);
    rfree(coefficients_block);
    goto err;
}

The check-in ensure both accuracy and safety, the pointer arithmetic is based on the size, count, and channel configurations of the coefficients.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the motivation, but the code you have written here (by nature) is much more fragile than if you simply made a toplevel struct which contained everything and just allocated that directly.

Pointer math is hard enough to debug, pointer math on an embedded device is much worse. Lets not add any more than we absolutely should.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that I'm organizing coefficients in contiguous memory, should multiband_drc_init_coef() be designed to handle separate coefficients instantiation for each channel and band?

If our architecture allows for different filters per channel/band, this could mean dynamically revising the processing loop in multiband_drc_init_coef() to fetch and apply the correct filter coefficients for each specific band.

Could you shed some light on whether unique filters per channel/band are an aspect of our system's design? If so, are there any particular considerations or adjustments to the coefficient fetch and allocation logic within multiband_drc_init_coef() that I should be aware of to implement this correctly?

/** 
 *static int multiband_drc_init_coef(struct processing_module *mod, int16_t nch, uint32_t rate)
 *{
 *	struct comp_dev *dev = mod->dev;
 *	struct multiband_drc_comp_data *cd = module_get_private_data(mod);
 *	struct sof_eq_iir_biquad *crossover;
 *	struct sof_eq_iir_biquad *emphasis;
 *	struct sof_eq_iir_biquad *deemphasis;
 *	struct sof_multiband_drc_config *config = cd->config;
 *	struct multiband_drc_state *state = &cd->state;
 *	uint32_t sample_bytes = get_sample_bytes(cd->source_format);
 *	int i, ch, ret, num_bands;
 *
 *	if (!config) {
 *		comp_err(dev, "multiband_drc_init_coef(), no config is set");
 *		return -EINVAL;
 *	}
 *
 *	num_bands = config->num_bands;
 *
 *	// Sanity checks 
 *	if (nch > PLATFORM_MAX_CHANNELS) {
 *		comp_err(dev,
 *			 "multiband_drc_init_coef(), invalid channels count(%i)", nch);
 *		return -EINVAL;
 *	}
 *	if (config->num_bands > SOF_MULTIBAND_DRC_MAX_BANDS) {
 *		comp_err(dev, "multiband_drc_init_coef(), invalid bands count(%i)",
 *			 config->num_bands);
 *		return -EINVAL;
 *	}
 *
 *	comp_info(dev, "multiband_drc_init_coef(), initializing %i-way crossover",
 *		  config->num_bands);
 *
 *	// Crossover: collect the coef array and assign it to every channel 
 *	crossover = config->crossover_coef;
 *	for (ch = 0; ch < nch; ch++) {
 *		ret = crossover_init_coef_ch(crossover, &state->crossover[ch],
 *					     config->num_bands);
 *		// Free all previously allocated blocks in case of an error 
 *		if (ret < 0) {
 *			comp_err(dev,
 *				 "multiband_drc_init_coef(), could not assign coeffs to ch %d", ch);
 *			goto err;
 *		}
 *	}
 *
 *	comp_info(dev, "multiband_drc_init_coef(), initializing emphasis_eq");
 *
 *	// Emphasis: collect the coef array and assign it to every channel 
 *	emphasis = config->emp_coef;
 *	for (ch = 0; ch < nch; ch++) {
 *		ret = multiband_drc_eq_init_coef_ch(emphasis, &state->emphasis[ch]);
 *		// Free all previously allocated blocks in case of an error 
 *		if (ret < 0) {
 *			comp_err(dev, "multiband_drc_init_coef(), could not assign coeffs to ch %d",
 *				 ch);
 *			goto err;
 *		}
 *	}
 *
 *	comp_info(dev, "multiband_drc_init_coef(), initializing deemphasis_eq");
 *
 *	// Deemphasis: collect the coef array and assign it to every channel 
 *	deemphasis = config->deemp_coef;
 *	for (ch = 0; ch < nch; ch++) {
 *		ret = multiband_drc_eq_init_coef_ch(deemphasis, &state->deemphasis[ch]);
 *		// Free all previously allocated blocks in case of an error 
 *		if (ret < 0) {
 *			comp_err(dev, "multiband_drc_init_coef(), could not assign coeffs to ch %d",
 *				 ch);
 *			goto err;
 *		}
 *	}
 *
 *	// Allocate all DRC pre-delay buffers and set delay time with band number 
 *	for (i = 0; i < num_bands; i++) {
 *		comp_info(dev, "multiband_drc_init_coef(), initializing drc band %d", i);
 *
 *		ret = drc_init_pre_delay_buffers(&state->drc[i], (size_t)sample_bytes, (int)nch);
 *		if (ret < 0) {
 *			comp_err(dev,
 *				 "multiband_drc_init_coef(), could not init pre delay buffers");
 *			goto err;
 *		}
 *
 *		ret = drc_set_pre_delay_time(&state->drc[i],
 *					     cd->config->drc_coef[i].pre_delay_time, rate);
 *		if (ret < 0) {
 *			comp_err(dev, "multiband_drc_init_coef(), could not set pre delay time");
 *			goto err;
 *		}
 *	}
 *
 *	return 0;
 *
 *err:
 *	multiband_drc_reset_state(state);
 *	return ret;
 *} 
 */

static int multiband_drc_setup(struct processing_module *mod, int16_t channels, uint32_t rate)
{
	struct multiband_drc_comp_data *cd = module_get_private_data(mod);
	int ret;

	/* Reset any previous state */
	multiband_drc_reset_state(&cd->state);

	/* Setup Crossover, Emphasis EQ, Deemphasis EQ, and DRC */
	ret = multiband_drc_init_coef(mod, channels, rate);

}

Currently, the coefficients for each of the emphasis/de-emphasis filters and crossovers per channel are initialised, but it appears that the multiband DRC assumes a configured number of filters applied uniformly across channels; however, if individual adjustments/adapt to filters per channel/band are required, the code should fetch and set up the coefficients accordingly within the multiband_drc_setup and related functions.

Your expertise in this area will help to clarify the best course of action and ensure that our implementation remains efficient and in line with our system's architectural goals.

Copy link
Contributor Author

@ShriramShastry ShriramShastry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please let me know if you have any specific concerns or suggestions for further improvements. Your feedback is extremely helpful, and I am willing to make any necessary changes to this patch..

Thank you for review.

* deemphasis is situated after the emphasis coefficients.
* This ensures all filter coefficients are stored contiguously.
*/
crossover = coefficients_block;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the review and comments. The changes to the allocation and pointer setup are intended to enhance memory efficiency and cache performance for the MDRC component.

Here’s a summary of the changes and the rationale behind them:

Memory Allocation Consolidation: By allocating a single block of memory for crossover, emphasis, and deemphasis filter coefficients, we avoid separate heap allocations which can incur additional overhead and fragmentation.

size_t total_coefficients_size = sizeof(struct sof_eq_iir_biquad) *
                                 num_bands * nch * 3;

struct sof_eq_iir_biquad *coefficients_block =
    rzalloc(SOF_MEM_ZONE_RUNTIME, 0, SOF_MEM_CAPS_RAM,
            total_coefficients_size);

if (!coefficients_block) {
    comp_err(dev, "multiband_drc_init_coef(), failed to allocate memory for coefficients");
    return -ENOMEM;
}

Pointer Assignment: After the allocation, pointers for crossover, emphasis, and deemphasis filters are pointed to the appropriate locations within the single allocated block.

crossover = coefficients_block;
emphasis = coefficients_block + num_bands * nch;
deemphasis = emphasis + num_bands * nch;

This design assures that the data used concurrently during processing is also kept in memory, which will assist with cache usage when processing audio data.

Error Handling: A clear error handling path (goto err;) ensures that the allocated memory block is freed in the event of an error occurring after the allocation, preventing memory leaks.

if (ret < 0) {
    comp_err(dev, "multiband_drc_init_coef(), could not assign coeffs to ch %d", ch);
    rfree(coefficients_block);
    goto err;
}

The check-in ensure both accuracy and safety, the pointer arithmetic is based on the size, count, and channel configurations of the coefficients.

Copy link
Contributor Author

@ShriramShastry ShriramShastry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reviewing the code, I 'm in need of your/Google guidance on Coefficient allocation logic for Channel/Band-Specific Filters in multiband_drc_init_coef().

Because the task requires us to properly allocate and assign emphasis/de-emphasis filter coefficients within a shared memory block.

However, I need to know if our architecture requires these filters to be unique per channel/band, which will affect our allocation strategy and coefficient fetch logic.

* deemphasis is situated after the emphasis coefficients.
* This ensures all filter coefficients are stored contiguously.
*/
crossover = coefficients_block;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that I'm organizing coefficients in contiguous memory, should multiband_drc_init_coef() be designed to handle separate coefficients instantiation for each channel and band?

If our architecture allows for different filters per channel/band, this could mean dynamically revising the processing loop in multiband_drc_init_coef() to fetch and apply the correct filter coefficients for each specific band.

Could you shed some light on whether unique filters per channel/band are an aspect of our system's design? If so, are there any particular considerations or adjustments to the coefficient fetch and allocation logic within multiband_drc_init_coef() that I should be aware of to implement this correctly?

/** 
 *static int multiband_drc_init_coef(struct processing_module *mod, int16_t nch, uint32_t rate)
 *{
 *	struct comp_dev *dev = mod->dev;
 *	struct multiband_drc_comp_data *cd = module_get_private_data(mod);
 *	struct sof_eq_iir_biquad *crossover;
 *	struct sof_eq_iir_biquad *emphasis;
 *	struct sof_eq_iir_biquad *deemphasis;
 *	struct sof_multiband_drc_config *config = cd->config;
 *	struct multiband_drc_state *state = &cd->state;
 *	uint32_t sample_bytes = get_sample_bytes(cd->source_format);
 *	int i, ch, ret, num_bands;
 *
 *	if (!config) {
 *		comp_err(dev, "multiband_drc_init_coef(), no config is set");
 *		return -EINVAL;
 *	}
 *
 *	num_bands = config->num_bands;
 *
 *	// Sanity checks 
 *	if (nch > PLATFORM_MAX_CHANNELS) {
 *		comp_err(dev,
 *			 "multiband_drc_init_coef(), invalid channels count(%i)", nch);
 *		return -EINVAL;
 *	}
 *	if (config->num_bands > SOF_MULTIBAND_DRC_MAX_BANDS) {
 *		comp_err(dev, "multiband_drc_init_coef(), invalid bands count(%i)",
 *			 config->num_bands);
 *		return -EINVAL;
 *	}
 *
 *	comp_info(dev, "multiband_drc_init_coef(), initializing %i-way crossover",
 *		  config->num_bands);
 *
 *	// Crossover: collect the coef array and assign it to every channel 
 *	crossover = config->crossover_coef;
 *	for (ch = 0; ch < nch; ch++) {
 *		ret = crossover_init_coef_ch(crossover, &state->crossover[ch],
 *					     config->num_bands);
 *		// Free all previously allocated blocks in case of an error 
 *		if (ret < 0) {
 *			comp_err(dev,
 *				 "multiband_drc_init_coef(), could not assign coeffs to ch %d", ch);
 *			goto err;
 *		}
 *	}
 *
 *	comp_info(dev, "multiband_drc_init_coef(), initializing emphasis_eq");
 *
 *	// Emphasis: collect the coef array and assign it to every channel 
 *	emphasis = config->emp_coef;
 *	for (ch = 0; ch < nch; ch++) {
 *		ret = multiband_drc_eq_init_coef_ch(emphasis, &state->emphasis[ch]);
 *		// Free all previously allocated blocks in case of an error 
 *		if (ret < 0) {
 *			comp_err(dev, "multiband_drc_init_coef(), could not assign coeffs to ch %d",
 *				 ch);
 *			goto err;
 *		}
 *	}
 *
 *	comp_info(dev, "multiband_drc_init_coef(), initializing deemphasis_eq");
 *
 *	// Deemphasis: collect the coef array and assign it to every channel 
 *	deemphasis = config->deemp_coef;
 *	for (ch = 0; ch < nch; ch++) {
 *		ret = multiband_drc_eq_init_coef_ch(deemphasis, &state->deemphasis[ch]);
 *		// Free all previously allocated blocks in case of an error 
 *		if (ret < 0) {
 *			comp_err(dev, "multiband_drc_init_coef(), could not assign coeffs to ch %d",
 *				 ch);
 *			goto err;
 *		}
 *	}
 *
 *	// Allocate all DRC pre-delay buffers and set delay time with band number 
 *	for (i = 0; i < num_bands; i++) {
 *		comp_info(dev, "multiband_drc_init_coef(), initializing drc band %d", i);
 *
 *		ret = drc_init_pre_delay_buffers(&state->drc[i], (size_t)sample_bytes, (int)nch);
 *		if (ret < 0) {
 *			comp_err(dev,
 *				 "multiband_drc_init_coef(), could not init pre delay buffers");
 *			goto err;
 *		}
 *
 *		ret = drc_set_pre_delay_time(&state->drc[i],
 *					     cd->config->drc_coef[i].pre_delay_time, rate);
 *		if (ret < 0) {
 *			comp_err(dev, "multiband_drc_init_coef(), could not set pre delay time");
 *			goto err;
 *		}
 *	}
 *
 *	return 0;
 *
 *err:
 *	multiband_drc_reset_state(state);
 *	return ret;
 *} 
 */

static int multiband_drc_setup(struct processing_module *mod, int16_t channels, uint32_t rate)
{
	struct multiband_drc_comp_data *cd = module_get_private_data(mod);
	int ret;

	/* Reset any previous state */
	multiband_drc_reset_state(&cd->state);

	/* Setup Crossover, Emphasis EQ, Deemphasis EQ, and DRC */
	ret = multiband_drc_init_coef(mod, channels, rate);

}

Currently, the coefficients for each of the emphasis/de-emphasis filters and crossovers per channel are initialised, but it appears that the multiband DRC assumes a configured number of filters applied uniformly across channels; however, if individual adjustments/adapt to filters per channel/band are required, the code should fetch and set up the coefficients accordingly within the multiband_drc_setup and related functions.

Your expertise in this area will help to clarify the best course of action and ensure that our implementation remains efficient and in line with our system's architectural goals.

@ShriramShastry ShriramShastry force-pushed the improve_multiband_drc_memory_optimization branch 2 times, most recently from 3b6a9cd to b5919ce Compare June 10, 2024 12:36
struct sof_multiband_drc_config *config = cd->config;
struct multiband_drc_state *state = &cd->state;
uint32_t sample_bytes = get_sample_bytes(cd->source_format);
int i, ch, ret, num_bands;
bool alloc_success = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need this, you can just check if coefficients_block == NULL in the error path.

struct sof_eq_iir_biquad crossover[SOF_MULTIBAND_DRC_MAX_BANDS * PLATFORM_MAX_CHANNELS];
struct sof_eq_iir_biquad emphasis[SOF_MULTIBAND_DRC_MAX_BANDS * PLATFORM_MAX_CHANNELS];
struct sof_eq_iir_biquad deemphasis[SOF_MULTIBAND_DRC_MAX_BANDS * PLATFORM_MAX_CHANNELS];
} __packed;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't packing cause a lot of slow reads/writes because things might not be aligned?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While check-in I was getting the error if I do __attribute__((packed)); instead of "} __packed; so I drop the attribute.

"

No codespell typos will be found - file '/usr/share/codespell/dictionary.txt': No such file or directory
WARNING: Prefer __packed over __attribute__((packed))
#10: FILE: src/audio/multiband_drc/multiband_drc.h:28:
+} __attribute__((packed));

total: 0 errors, 1 warnings, 8 lines checked

Understood. We're using __packed to save memory, but could you confirm if the potential misalignment outweighs the memory gain in our context? And whether our target platform can handle unaligned accesses efficiently.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ShriramShastry we mainly using packing for any data shared with host, but in this case isn't this data provate to FW only ? If so, we can drop the packed and allow compiler to best organise the data.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ShriramShastry please read this

The odds here are you saving at most 10s of bytes and paying back way worse in code size and access times

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree, packed isn't needed. It should only be used where data format is important and must be preserved, e.g. when sending data over networks, or between the host and the DSP

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, removed packed

struct multiband_drc_comp_data *cd = rzalloc(SOF_MEM_ZONE_RUNTIME, 0,
SOF_MEM_CAPS_RAM, sizeof(*cd));
if (!cd) {
comp_err(dev, "multiband_drc_init(), allocation for multiband_drc_comp_data failed");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unneeded

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please clarify if the removal of the NULL check after rzalloc is due to guaranteed error handling within the allocator, or are we following a specific coding standard that omits such checks?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the change is unneeded (i.e. the trace), the check is still very much needed


/* Allocation for coefficients_block */
if (!cd->coefficients_block) {
cd->coefficients_block = rballoc(0, SOF_MEM_CAPS_RAM,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just allocate this upfront directly into cd so you don't have to constantly check this.

Copy link
Contributor Author

@ShriramShastry ShriramShastry Jun 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you provide more detail on how to allocate coefficients_block directly with cd upfront, given that we typically need to check and allocate each separately in standard C?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just do it at the module creation step so you don't have to check if the memory exists when you are parsing params

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@ShriramShastry ShriramShastry force-pushed the improve_multiband_drc_memory_optimization branch from b5919ce to c86a1f7 Compare June 11, 2024 08:14
Copy link
Contributor Author

@ShriramShastry ShriramShastry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The allocation of coefficients_block is currently conditional on whether it is NULL. Would you prefer that this allocation be performed unconditionally at the time of cd allocation, ensuring that coefficients_block is never NULL and eliminating all subsequent NULL checks on it?

@lgirdwood
Copy link
Member

The allocation of coefficients_block is currently conditional on whether it is NULL. Would you prefer that this allocation be performed unconditionally at the time of cd allocation, ensuring that coefficients_block is never NULL and eliminating all subsequent NULL checks on it?

I would allocate it when needed if it relies on IPC configuration data for allocation size, if its always constant size you could allocate per instance when pipeline is created.

@cujomalainey
Copy link
Contributor

The allocation of coefficients_block is currently conditional on whether it is NULL. Would you prefer that this allocation be performed unconditionally at the time of cd allocation, ensuring that coefficients_block is never NULL and eliminating all subsequent NULL checks on it?

I would allocate it when needed if it relies on IPC configuration data for allocation size, if its always constant size you could allocate per instance when pipeline is created.

+1, I think the only thing that would possibly vary is the crossover bands and that is fairly small

@ShriramShastry ShriramShastry force-pushed the improve_multiband_drc_memory_optimization branch from c86a1f7 to f770cd2 Compare June 14, 2024 06:33
@cujomalainey cujomalainey requested a review from johnylin76 June 17, 2024 19:44
struct multiband_drc_comp_data *cd = rzalloc(SOF_MEM_ZONE_RUNTIME, 0,
SOF_MEM_CAPS_RAM, sizeof(*cd));
if (!cd) {
comp_err(dev, "multiband_drc_init(), allocation for multiband_drc_comp_data failed");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the change is unneeded (i.e. the trace), the check is still very much needed

struct sof_eq_iir_biquad crossover[SOF_MULTIBAND_DRC_MAX_BANDS * PLATFORM_MAX_CHANNELS];
struct sof_eq_iir_biquad emphasis[SOF_MULTIBAND_DRC_MAX_BANDS * PLATFORM_MAX_CHANNELS];
struct sof_eq_iir_biquad deemphasis[SOF_MULTIBAND_DRC_MAX_BANDS * PLATFORM_MAX_CHANNELS];
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the more I read this code the more I think this change is incorrect. Look at the struct directly below. This is a copy of that struct which is already directly embedded in the component data. How does this benefit anything when the data was already allocated in place in a single block?

Copy link
Contributor Author

@ShriramShastry ShriramShastry Jun 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks ! Please take a look at the latest changes. The patch saves 54 MCPS in TGL i.e. from 188 to 134

@singalsu
Copy link
Collaborator

singalsu commented Jun 18, 2024

This PR version has an issue with TGL HiFi3 build. If I set with topology sof-hda-benchmark-drc_multiband32.tplg the control amixer cset name='Analog Playback MULTIBAND_DRC enable' on the playback becomes silent. Sound can be heard with setting amixer cset name='Analog Playback MULTIBAND_DRC enable' off. The current DRC versin does not follow the switch control in runtime, the control needs to be applied when streaming is stopped (or stop the stream to get impact of previously applied control).

I have no idea if MTL HiFi4 has the issue, I don't have a suitable device to check own FW builds.

@cujomalainey
Copy link
Contributor

This PR version has an issue with TGL HiFi3 build. If I set with topology sof-hda-benchmark-drc_multiband32.tplg the control amixer cset name='Analog Playback MULTIBAND_DRC enable' on the playback becomes silent. Sound can be heard with setting amixer cset name='Analog Playback MULTIBAND_DRC enable' off. The current DRC versin does not follow the switch control in runtime, the control needs to be applied when streaming is stopped (or stop the stream to get impact of previously applied control).

I have no idea if MTL HiFi4 has the issue, I don't have a suitable device to check own FW builds.

@singalsu no that is by design for our code that the on/off switch is applied while the stream is open.

@ShriramShastry ShriramShastry force-pushed the improve_multiband_drc_memory_optimization branch from f770cd2 to 9ce82c1 Compare June 24, 2024 13:08
@lyakh
Copy link
Collaborator

lyakh commented Aug 12, 2024

Thank you very much for the reviews. As for specific improvements, I'm still not sure whether this actually improves anything It would be helpful to understand what specific concerns exist; the patch reduces TGL by 54 MCPS, from 188 to 134. ~28% savings.

that's the information I was looking for, yes. Maybe it was already provided above, sorry, difficult to find, it has become rather long. So, you're saying that just by rearranging initialisation code to regroup buffers improves run-time (i.e. during copying) performance on TGL by 28%?.. That's the complete .copy() function? Amazing.

I'm glad to clarify the differences that contribute to the performance improvements:

Memory Allocation Optimization:

Original: Separate loops for allocating crossover, emphasis, and de-emphasis coefficients per channel. Modified: A single loop handles allocation and initialization for all coefficients per channel, enhancing data locality and cache efficiency. This is achieved by grouping these allocations by channel (crossover, emphasis, deemphasis), which is more cache-friendly during runtime processing.

Initialization Enhancement:

Original: Filter coefficients were initialized in separate routines. Modified: Consolidated initialization routines (multiband_drc_init_coef) streamline the setup process, reducing computational overhead and simplifying the code structure.

Improved Clean-up Procedures:

Original: Clean-up procedures were less detailed. Modified: Detailed clean-up steps (multiband_drc_free) ensure thorough resource deallocation, preventing memory leaks and enhancing long-term performance stability.

Error Handling:

Original: Error handling was less comprehensive during initialization. Modified: Rigorous error handling during initialization to ensure stability and prevent resource leaks in case of failures.

These changes collectively lead to a 28% reduction in MCPS during the .copy() function execution on TGL platforms by improving cache performance and reducing initialization overhead.

Sorry, I don't follow which of these improvements could bring you a 28% run-time performance improvement. The fact that buffers are now grouped by channel? That might improve runtime a bit, but very unlikely 28%.

Re: improved error handling - I only see a check for cd which I'm not sure is needed, don't think that function would be called if cd allocation failed. If anything, this should be split into multiple commits - one commit per improvement, then you can check which specific commit / change makes the claimed performance difference, and would be good to tell us how exactly you measure that difference - between which points.

src/audio/multiband_drc/multiband_drc.c Show resolved Hide resolved
if (!config) {
comp_err(dev, "multiband_drc_init_coef(), no config is set");
return -EINVAL;
}

num_bands = config->num_bands;

/* Sanity checks */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping on revertion

Copy link
Collaborator

@singalsu singalsu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's merge conflict to fix.

I tried this with UPX i11, topology sof-hda-efx-mbdrc-generic-4ch.tplg. The code runs if processing is not enabled.

If I enable processing with these commands there is a FW crash:

sof-ctl -c name='Post Mixer Analog Playback MBDRC bytes' -s ctl4/multiband_drc/default.txt
amixer cset name='Post Mixer Analog Playback MBDRC switch' on

The the kernel log shows this:

elo 14 11:38:19 ekstremisti kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: ------------[ DSP dump start ]------------
elo 14 11:38:19 ekstremisti kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: DSP panic!
elo 14 11:38:19 ekstremisti kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: fw_state: SOF_FW_BOOT_COMPLETE (7)
elo 14 11:38:19 ekstremisti kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: 0x00000005: module: ROM, state: FW_ENTERED, running
elo 14 11:38:19 ekstremisti kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: FW is built with XCC toolchain
elo 14 11:38:19 ekstremisti kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: error: DSP Firmware Oops
elo 14 11:38:19 ekstremisti kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: error: Exception Cause: InstrPIFDataErrorCause, PIF data error during instruction fetch
elo 14 11:38:19 ekstremisti kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: EXCCAUSE 0x0000000c EXCVADDR 0x00000000 PS       0x00060f20 SAR     0x00000003
elo 14 11:38:19 ekstremisti kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: EPC1     0x00000000 EPC2     0x00000000 EPC3     0x00000000 EPC4    0x00000000
elo 14 11:38:19 ekstremisti kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: EPC5     0x00000000 EPC6     0x00000000 EPC7     0x00000000 DEPC    0x00000000
elo 14 11:38:19 ekstremisti kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: EPS2     0x00000000 EPS3     0x00000000 EPS4     0x00000000 EPS5    0x00000000
elo 14 11:38:19 ekstremisti kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: EPS6     0x00000000 EPS7     0x00000000 INTENABL 0x00000000 INTERRU 0x00000000
elo 14 11:38:19 ekstremisti kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: stack dump from 0x00000000
elo 14 11:38:19 ekstremisti kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: AR registers:
elo 14 11:38:19 ekstremisti kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: 0x0: be044cd2 be0a0b50 be0bbd40 00000000
elo 14 11:38:19 ekstremisti kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: 0x10: be0a0be0 be0a0c00 be0bbdc0 be0a0b60
elo 14 11:38:19 ekstremisti kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: 0x20: be0444e3 be0a0b30 00000000 be0a0b50
elo 14 11:38:19 ekstremisti kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: 0x30: be0444e3 be0a0b30 00000000 be0a0b50
elo 14 11:38:19 ekstremisti kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: ------------[ DSP dump end ]------------

The first AR register address seems to be in multiband_drc_process_emp_crossover() at code lines:

			band_buf_drc_src = buf_drc_src;
			band_buf_drc_sink = buf_drc_sink;
			for (band = 0; band < nband; ++band) {
be044cd2:	5c2192               	l32i	a9, a1, 0x170
be044cd5:	572162               	l32i	a6, a1, 0x15c
be044cd8:	070c                	movi.n	a7, 0
be044cda:	3419a6               	blti	a9, 1, be044d12 <multiband_drc_s32_default+0x142>
be044cdd:	50c122               	addi	a2, a1, 80
be044ce0:	7fc152               	addi	a5, a1, 127
be044ce3:	51c552               	addi	a5, a5, 81


/* Crossover: collect the coef array and assign it to every channel */
/* Initialize constants for shared coefficients */
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original comment is in my opinion a lot more informative.

ret = multiband_drc_eq_init_coef_ch(emphasis, &state->emphasis[ch]);
/* Free all previously allocated blocks in case of an error */
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd not delete so eagerly comments by original author. Please do the functional changes and try to keep original comments. Add another cosmetic commit to improve and clean up the comments if you feel so.

Copy link
Member

@lgirdwood lgirdwood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ShriramShastry can you make this 3 commits, one for each numbered topic in the commit message. This will make it faster to review and merge. Thanks !

@ShriramShastry ShriramShastry force-pushed the improve_multiband_drc_memory_optimization branch 4 times, most recently from fb011df to 828eff1 Compare August 18, 2024 20:59
Copy link
Contributor Author

@ShriramShastry ShriramShastry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I have made new changes.
Pending work. CI Testbench still needs attention. I'II work further.
Can you please take a look.

@lgirdwood lgirdwood added this to the v2.11 milestone Aug 21, 2024
src/audio/multiband_drc/multiband_drc.c Show resolved Hide resolved
}
multiband_drc_reset_state(&cd->state);

/* Initialize to enabled is a workaround for IPC4 kernel version 6.6 and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this moved?

return 0;

cd_fail:
cd_model_fail:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again this isn't needed. cd is initialized to all NULL and the function is smart enough to check to see if NULL is passed in. So this split logic just adds more paths for no reason.


total_size += (crossover_delay_size % __alignof__(uint64_t) != 0) ?
__alignof__(uint64_t) -
(crossover_delay_size % __alignof__(uint64_t)) : 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assuming an initial aligned address, won't the total size per channel be the same? Then you can just multiple instead of looping.

void *ptrs[3 * nch * 3 + num_bands];
size_t sizes[3 * nch * 3 + num_bands];

for (ch = 0; ch < nch; ++ch) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rather than doing this convoluted offset loop, why not simply store your offsets for each field, then do base_addr + offset and store it directly to the field in the struct.

void *base_addr = allocate_contiguous_memory(dev, total_size,
ptrs, sizes, 3 * nch + num_bands);

if (!base_addr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any modifications to the free logic in this commit which has me really concerned this is creating a massive double free

@ShriramShastry ShriramShastry force-pushed the improve_multiband_drc_memory_optimization branch from 828eff1 to 37306fe Compare August 24, 2024 08:06
Copy link
Contributor Author

@ShriramShastry ShriramShastry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for reviewing the code. I have update the code. Please take a look.

src/audio/multiband_drc/multiband_drc.c Show resolved Hide resolved
- Enhance memory allocation by using contiguous memory blocks.
- Ensure proper memory alignment using __alignof__ for sub-structures.
- Implement alignment checks with uint8_t* for precise byte-wise calcs.
- Add detailed comments for improved readability and maintainability.
- Ensure each sub-block starts at a properly aligned address to minimize
  performance issues due to misaligned memory accesses.
- Streamline size calculations for emphasis, crossover, and DRC delay
  blocks.

Advantages:
- Optimizes memory management and ensures efficient access to structures.
- Provides robust performance by aligning memory allocations correctly.
- Prevents potential crashes or bugs caused by improper memory handling.
- Improves code clarity and structure, making future maintenance easier.

These changes result in a more efficient and robust initialization
process within the multiband_drc_init_coef function.

Signed-off-by: Shriram Shastry <[email protected]>
- Improved documentation for the multiband_drc_init function,
  specifying its purpose, parameters, and initialization steps.
- Clearly defined all initialization steps, including memory
  allocations and configuration checks.
- Simplified error handling with a single cleanup path, relying
  on existing functions to handle NULL pointers.
- Ensured proper initialization sequence for all components.

Signed-off-by: Shriram Shastry <[email protected]>
- Added comprehensive documentation for multiband_drc_free function,
  detailing its purpose, parameters, and the cleanup process.
- Introduced validation to check if component data (`cd`) is not NULL
  before freeing associated resources.
- Ensured all allocated resources are properly freed, and nullified
  the private module data pointer upon cleanup.
- Provided clear log messages indicating the start and successful
  completion of the free operation.
- Included necessary comments to clarify code intent and improve
  maintainability.

Signed-off-by: Shriram Shastry <[email protected]>
@ShriramShastry ShriramShastry force-pushed the improve_multiband_drc_memory_optimization branch from 37306fe to ddbdb17 Compare August 24, 2024 12:17
Copy link
Collaborator

@singalsu singalsu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Git main is OK. Can't check the performance impact of this PR.

With this PR the FW crashes in playback to device hw:0,0 with default settings for topology sof-hda-efx-mbdrc-generic-4ch.tplg. Crash happens with both on and off setting of control name='Post Mixer Analog Playback MBDRC switch'.

@singalsu
Copy link
Collaborator

You can also see a valgrind reported error with scripts/host-testbench.sh run. Please try that yourself.

@@ -118,8 +141,8 @@ static int multiband_drc_init_coef(struct processing_module *mod, int16_t nch, u
return -EINVAL;
}
if (config->num_bands > SOF_MULTIBAND_DRC_MAX_BANDS) {
comp_err(dev, "multiband_drc_init_coef(), invalid bands count(%i)",
config->num_bands);
comp_err(dev,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one commit, for one change please. See the scale of my commits here as a reference. https://github.com/thesofproject/sof/pull/9383/commits

* \param[in] nch Number of channels to process.
* \param[in] rate Sample rate of the audio stream.
*
* \return 0 on success, error code otherwise.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be a separate commit

size_t deemp_offset = crossover_offset + crossover_delay_size;

/* Align the emp_offset to the required alignment boundary */
emp_offset = ((uintptr_t)(base_addr + emp_offset) + alignment - 1) &
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use ALIGN_UP

total_size += sizeof(struct drc_state) * num_bands * nch;

/* Allocate base memory */
uint8_t *base_addr = (uint8_t *)rballoc(0, SOF_MEM_CAPS_RAM, total_size);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you need to make sure this is aligned?

goto err;
}

ret = drc_set_pre_delay_time(&state->drc[i],
cd->config->drc_coef[i].pre_delay_time, rate);
ret = drc_set_pre_delay_time(&state->drc[i], cd->config->drc_coef[i].pre_delay_time,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is all formatting and should be done in a different commit or should be dropped

@@ -278,6 +278,21 @@ static int multiband_drc_setup(struct processing_module *mod, int16_t channels,
* End of Multiband DRC setup code. Next the standard component methods.
*/

/**
* @brief Initialize Multiband Dynamic Range Control (DRC) component.
*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

documentation is its own commit, do not mix this with functional changes.

* control. The new kernel sends the IPC4 switch control and sets
* this to the desired state before prepare.
*/
multiband_drc_process_enable(&cd->process_enabled);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why was this moved?

Also your commit message is out of date, please update it

src/audio/multiband_drc/multiband_drc.c Show resolved Hide resolved
@kv2019i
Copy link
Collaborator

kv2019i commented Sep 6, 2024

Release reminder - one week to v2.11-rc1.

@kv2019i
Copy link
Collaborator

kv2019i commented Sep 13, 2024

FYI @ShriramShastry , moving to v2.12

@kv2019i kv2019i modified the milestones: v2.11, v2.12 Sep 13, 2024
@kv2019i
Copy link
Collaborator

kv2019i commented Dec 13, 2024

Feature cutoff for v2.12, moving this to v2.13.

@kv2019i kv2019i modified the milestones: v2.12, v2.13 Dec 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants