-
Notifications
You must be signed in to change notification settings - Fork 322
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dram data storage: fast_get()/_put() #9765
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Were you able to check MCPS impact before/after. I guess it should not impact?
No there was no difference: 44.1kHz to 48kHz SRC coefs in SRAM 44.1kHz to 48kHz SRC coefs in DRAM 44.1kHz to 48kHz SRC coefs copied from DRAM to SRAM The full measurements log is here: |
Thanks, that's very good news! Actually I'm surprised that that load without copy to SRAM isn't higher. |
It is. There was three tests, coefs in SRAM all the time (16.1% load), coefs in DRAM no copying (23.1% load), and coefs in DRAM but copied to SRAM before use (16.1% load). An for comparison, same pipelines without need for SRC 5.9% load. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks quite good! Just a couple of simple mostly cosmetic changes
Kconfig.sof
Outdated
module data from DRAM to SRAM when the data is needed and freeing | ||
the SRAM when the data is not needed anymore. If multiple module | ||
instances need the same chunk the same copy is used with reference | ||
counting. Source is src/lib/gast-get.c. The option should be selected |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo "fast-get.c"
src/include/sof/lib/fast-get.h
Outdated
|
||
#include <stddef.h> | ||
|
||
void *fast_get(const void * const dram_ptr, size_t size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
must return a const
pointer too
src/include/sof/lib/fast-get.h
Outdated
#include <stddef.h> | ||
|
||
void *fast_get(const void * const dram_ptr, size_t size); | ||
void fast_put(void *sram_ptr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const void *
src/lib/fast-get.c
Outdated
/* for cmocka environment */ | ||
#ifndef EXPORT_SYMBOL | ||
#define EXPORT_SYMBOL(a) | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mmh, it is already defined in posix/include/rtos/symbol.h - isn't that enough? Need to
#include <rtos/symbol.h>
then it should work
src/lib/fast-get.c
Outdated
const void *dram_ptr; | ||
void *sram_ptr; | ||
size_t size; | ||
uint32_t refcount; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not just unsigned int
?
{ | ||
void *ret; | ||
|
||
(void)state; /* unused */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does ARG_UNUSED()
work in cmocka?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its not defined in cmocka environment. Its coming from Zephyr side, and there its defined as
#define ARG_UNUSED(x) (void)(x)
I could in make some similar hack header for cmocka code, but open coded void cast is already used in several places cmocka tests.
copy[0][i] = fast_get(testdata[i], sizeof(testdata[0])); | ||
|
||
for (i = 0; i < ARRAY_SIZE(copy[0]); i++) | ||
copy[1][i] = fast_get(testdata[i], sizeof(testdata[0])); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could also test unmatched allocation - with a wrong size. Also you could add a test for too large an allocation - don't think malloc()
will succeed a 1GiB allocation even if it's available, IIRC there was a limit. But maybe I'm wrong. This can be an incremental PR, this test set is already quite good!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I can add one more test. But I do not think the allocation limit should be enforced by fast-get-code.
src/audio/src/src_common.c
Outdated
@@ -593,6 +594,60 @@ int src_param_set(struct comp_dev *dev, struct comp_data *cd) | |||
return 0; | |||
} | |||
|
|||
static void debug_print(struct comp_dev *dev, const char *name, const struct src_stage *s) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left-over
src/audio/src/src_common.c
Outdated
stage_dst1 = rmalloc(SOF_MEM_ZONE_RUNTIME, 0, SOF_MEM_CAPS_RAM, | ||
2 * sizeof(*stage_dst1)); | ||
if (!stage_dst1) { | ||
comp_err(dev, "src_allocate_copy_stages(): failed allocate stages"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please don't use function name in logging, Zephyr does it already
const struct src_stage *stage_src1, | ||
const struct src_stage *stage_src2) | ||
{ | ||
struct src_stage *stage_dst1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe just use an array as struct src_stage *stage_dst
and then stage_dst[0].coefs
etc. but up to you too, just an idea
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't bother with the change, not yet anyway. I have a hunch that there will be more rounds coming.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we put the ifdef FAST_GET at the function level i.e. if FAST_GET=n then src_allocate_copy_stages()
does what it does today, if =y then it does the fast get logic
@jsarha can you list any dependency PRs that need to be merged first here. Thanks ! |
With this data can be assigned to a separate read-only section, which then will be kept in DRAM without copying it to SRAM on module instantiation. Signed-off-by: Guennadi Liakhovetski <[email protected]>
src coefficients take a lot of space, keep them in DRAM to only copy used sets in SRAM. Signed-off-by: Guennadi Liakhovetski <[email protected]>
@lgirdwood I think all the dependencies have been merged now. It just needs to switch to using a newer version of my commits from #9721 |
@lyakh just some conflicts now. |
Implement fast_get() and fast_put() functions. The purpose of these functions is to maintain shared SRAM copies of data stored in DRAM. fast_get() First checks if there is already an SRAM copy of the same DRAM chunk. If there isn't reserve an SRAM chunk of the same size and copy the contents there, store the both pointers, size and reference count to in an internal data structure, and return the SRAM pointer. If there is, return the pointer to the existing SRAM copy and increase the reference count. fast_put() Look up the internal data record based on the SRAM address and decrement reference count. Free the SRAM chunk and the data record if reference count reaches zero, Signed-off-by: Jyri Sarha <[email protected]>
And simple module tests for fast_get and fast_put() implemented in src/lib/fast-get/fast-get.c. Signed-off-by: Jyri Sarha <[email protected]>
The SRC coefficients are loaded to DRAM and commit copies the coefficients to SRAM when they are needed. The copying is done using fast_get() and the copy is released with fast_put() when its not needed anymore. Signed-off-by: Jyri Sarha <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review comment applied and commits rebased on top of the latest #9721
{ | ||
void *ret; | ||
|
||
(void)state; /* unused */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its not defined in cmocka environment. Its coming from Zephyr side, and there its defined as
#define ARG_UNUSED(x) (void)(x)
I could in make some similar hack header for cmocka code, but open coded void cast is already used in several places cmocka tests.
copy[0][i] = fast_get(testdata[i], sizeof(testdata[0])); | ||
|
||
for (i = 0; i < ARRAY_SIZE(copy[0]); i++) | ||
copy[1][i] = fast_get(testdata[i], sizeof(testdata[0])); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I can add one more test. But I do not think the allocation limit should be enforced by fast-get-code.
const struct src_stage *stage_src1, | ||
const struct src_stage *stage_src2) | ||
{ | ||
struct src_stage *stage_dst1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't bother with the change, not yet anyway. I have a hunch that there will be more rounds coming.
1c74172
to
d7f2279
Compare
After the rebase something has happened to the performance of the case where coefficients are left in DRAM and not copied to SRAM. The 44.1kHz playback load of that case has dropped from 23.1% to 18.8%. @lyakh , any idea where this is coming? This result is from MTL so it may not be important, but it is still interesting. Also, now the build fails if SRC is not built as module. |
@lyakh , @singalsu , @lrgirdwo This is the current situation with fast_get()/_put(), and its usage in SRC module.