Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to force I2S routines into IRAM #9115

Open
6 tasks done
kleinesfilmroellchen opened this issue Mar 30, 2024 · 3 comments
Open
6 tasks done

Add option to force I2S routines into IRAM #9115

kleinesfilmroellchen opened this issue Mar 30, 2024 · 3 comments

Comments

@kleinesfilmroellchen
Copy link

Basic Infos

  • This issue complies with the issue POLICY doc.
  • I have read the documentation at readthedocs and the issue is not addressed there.
  • I have tested that the issue is present in current master branch (aka latest git). (The relevant code is identical between master and latest release)
  • I have searched the issue tracker for a similar issue.
  • If there is a stack dump, I have decoded it. (There is none.)
  • I have filled out all fields below.

Platform

  • Hardware: ESP8266EX (Wemos/Lolin D1 mini)
  • Core Version: 3.1.2
  • Development Env: Arduino IDE
  • Operating System: Windows

Settings in IDE

  • Module: Wemos D1 mini r2
  • Flash Mode: dio
  • Flash Size: 4MB, no FS
  • MMU: 16KB/48KB with shared second heap
  • lwip Variant: v2 Lower Memory
  • Flash Frequency: 80Mhz
  • CPU Frequency: 160MHz
  • Upload Using: SERIAL
  • Upload Speed: 921600

Problem Description

In situations where a very high CPU load is present, and the I2S engine is constantly running to play audio, it is important that I2S can run as fast as possible to not slow down other parts of the system. If memory load is additionally significantly restrained, xtensa-gcc can sometimes decide to move code in and out of IRAM for no apparent reason. For me, this happened when several unrelated functions increased in size significantly - suddenly the I2S routines weren't in IRAM anymore, or something else was going on with un-inlining, I am not quite certain. (Some bloaty information that may provide further clues is here.) This slowed down the high-load procedure significantly, and

The fix for this problem was simply to force all virtual functions of I2SClass into IRAM, like so:

  // from Stream
  virtual int IRAM_ATTR available();
  virtual int IRAM_ATTR read(); // Blocking, will wait for incoming data
  virtual int IRAM_ATTR peek(); // Blocking, will wait for incoming data   
  virtual void IRAM_ATTR flush();

  // from Print (see notes on write() methods below)
  virtual size_t IRAM_ATTR write(uint8_t);
  virtual size_t IRAM_ATTR write(const uint8_t *buffer, size_t size);
  virtual int IRAM_ATTR availableForWrite();

Despite my initial expectations, this made xtensa-gcc happy again and the code returned to previous performance.

To avoid having to edit base libraries for this to work, I'd prefer if there was the option to enable these attributes via a define, for those who need it. This would mean no functional changes to everyone who doesn't need it, providing any user full control over IRAM code.

Note: I would like to not receive recommendations for "better" hardware specific to my project. Not only is there no direct replacement for the D1 mini given my peripherals, but I am operating far away enough from the ESP8266's performance and memory limit that it's manageable without weird tricks (not that it's comfortable either, but if you want comfortable programming you become a web developer). I am asking for an option that allows me to optimize my code, and I consider carefully moving functions in and out of IRAM to be a proper optimization technique on architectures like the ESP8266. The inner demoscene kid in me is also having some fun here.

MCVE

This exact commit; note that it requires specific hardware which is laid out in the repository. It doesn't really get more minimal than this, since the issue only presents itself when the CPU load is extremely high.

Debug Messages

(None. I can't even enable the serial console here.)

@mcspr
Copy link
Collaborator

mcspr commented Mar 31, 2024

If memory load is additionally significantly restrained, xtensa-gcc can sometimes decide to move code in and out of IRAM for no apparent reason. For me, this happened when several unrelated functions increased in size significantly - suddenly the I2S routines weren't in IRAM anymore

Can you point to the specific instance of that happening? Bloaty output does not really explain what got moved where, or what functions are expected to be inline that are not.

I2S implementation is already in .cpp and not in header. afaik, we don't have lto enabled to change it into sometimes-inline or am I missing something?
Neither implementations have IRAM_ATTR near functions, so everything i2s_... / I2SClass::... stays on flash by default.

@kleinesfilmroellchen
Copy link
Author

Can you point to the specific instance of that happening? Bloaty output does not really explain what got moved where, or what functions are expected to be inline that are not.

I may not be properly discerning the difference in the binaries here, it's fairly opaque and I'm having a hard time telling where which functions live even when looking at load addresses and segment ranges. I can provide both binaries if required (but have to scrub WiFi passwords from them first) if you want to do your own analysis.

I2S implementation is already in .cpp and not in header. afaik, we don't have lto enabled to change it into sometimes-inline or am I missing something? Neither implementations have IRAM_ATTR near functions, so everything i2s_... / I2SClass::... stays on flash by default.

compile_commands.json says -fno-inline-functions for most files, but not all core files. Excuse me for not looking through all 400 commands :)

@mcspr
Copy link
Collaborator

mcspr commented Apr 2, 2024

compile_commands.json says -fno-inline-functions for most files, but not all core files. Excuse me for not looking through all 400 commands :)

...why look there instead of build definitions? 🙃

For Arduino IDE / arduino-cli

Arduino/platform.txt

Lines 78 to 96 in 685f2c9

compiler.libc.path={runtime.platform.path}/tools/sdk/libc/xtensa-lx106-elf
compiler.cpreprocessor.flags=-D__ets__ -DICACHE_FLASH -U__STRICT_ANSI__ -D_GNU_SOURCE -DESP8266 {build.debug_optim} {build.opt.flags} "-I{compiler.sdk.path}/include" "-I{compiler.sdk.path}/{build.lwip_include}" "-I{compiler.libc.path}/include" "-I{build.path}/core"
# support precompiled libraries in IDE v1.8.6+
compiler.libraries.ldflags=
compiler.c.cmd=xtensa-lx106-elf-gcc
compiler.c.flags=-c "{compiler.warning_flags}-cflags" -std=gnu17 {build.stacksmash_flags} -g -free -fipa-pta -Werror=return-type -Wpointer-arith -Wno-implicit-function-declaration -Wl,-EL -fno-inline-functions -nostdlib -mlongcalls -mtext-section-literals -falign-functions=4 -MMD -ffunction-sections -fdata-sections {build.exception_flags} {build.sslflags} {build.mmuflags} {build.non32xferflags} {build.iramfloat}
compiler.S.cmd=xtensa-lx106-elf-gcc
compiler.S.flags=-c -g -x assembler-with-cpp -MMD -mlongcalls "-I{runtime.tools.xtensa-lx106-elf-gcc.path}/include/"
compiler.c.elf.flags=-g "{compiler.warning_flags}-cflags" -nostdlib -Wl,--no-check-sections -u app_entry {build.float} -Wl,-static "-L{compiler.sdk.path}/lib" "-L{compiler.sdk.path}/lib/{build.sdk}" "-L{build.path}" "-L{compiler.libc.path}/lib" "-Tlocal.eagle.flash.ld" -Wl,--gc-sections -Wl,-wrap,system_restart_local -Wl,-wrap,spi_flash_read
compiler.c.elf.cmd=xtensa-lx106-elf-gcc
compiler.c.elf.libs=-lhal -lphy -lpp -lnet80211 {build.lwip_lib} -lwpa -lcrypto -lmain -lwps -lbearssl -lespnow -lsmartconfig -lairkiss -lwpa2 {build.stdcpp_lib} -lm -lc -lgcc
compiler.cpp.cmd=xtensa-lx106-elf-g++
compiler.cpp.flags=-c "{compiler.warning_flags}-cppflags" {build.stacksmash_flags} -g -free -fipa-pta -Werror=return-type -mlongcalls -mtext-section-literals -fno-rtti -falign-functions=4 {build.stdcpp_level} -MMD -ffunction-sections -fdata-sections {build.exception_flags} {build.sslflags} {build.mmuflags} {build.non32xferflags} {build.iramfloat}

For PlatformIO

env.Append(
ASFLAGS=[
"-mlongcalls",
"-mtext-section-literals",
],
ASPPFLAGS=[
"-x", "assembler-with-cpp",
],
# General options that are passed to the C compiler (C only; not C++)
CFLAGS=[
"-std=gnu17",
"-Wpointer-arith",
"-Wno-implicit-function-declaration",
"-Wl,-EL",
"-fno-inline-functions",
"-nostdlib"
],
# General options that are passed to the C and C++ compilers
CCFLAGS=[
"-Os", # optimize for size
"-mlongcalls",
"-mtext-section-literals",
"-falign-functions=4",
"-U__STRICT_ANSI__",
"-ffunction-sections",
"-fdata-sections",
"-Wall",
"-Werror=return-type",
"-free",
"-fipa-pta"
],
# General options that are passed to the C++ compiler
CXXFLAGS=[
"-fno-rtti",
"-std=gnu++17"
],
# General user options passed to the linker
LINKFLAGS=[
"-Os",
"-nostdlib",
"-Wl,--no-check-sections",
"-Wl,-static",
"-Wl,--gc-sections",
"-Wl,-wrap,system_restart_local",
"-Wl,-wrap,spi_flash_read",
"-u", "app_entry",
"-u", "_printf_float",
"-u", "_scanf_float",
"-u", "_DebugExceptionVector",
"-u", "_DoubleExceptionVector",
"-u", "_KernelExceptionVector",
"-u", "_NMIExceptionVector",
"-u", "_UserExceptionVector"
],
# A platform independent specification of C preprocessor definitions as either:
# - -DFLAG as "FLAG"
# - -DFLAG=VALUE as ("FLAG", "VALUE")
CPPDEFINES=[
("F_CPU", "$BOARD_F_CPU"),
"__ets__",
"ICACHE_FLASH",
"_GNU_SOURCE",
("ARDUINO", 10805),
("ARDUINO_BOARD", '\\"PLATFORMIO_%s\\"' % env.BoardConfig().id.upper()),
("ARDUINO_BOARD_ID", '\\"%s\\"' % env.BoardConfig().id),
"FLASHMODE_${BOARD_FLASH_MODE.upper()}",
"LWIP_OPEN_SRC"
],
# The list of directories that the C preprocessor will search for include directories
CPPPATH=[
join(FRAMEWORK_DIR, "tools", "sdk", "include"),
join(FRAMEWORK_DIR, "cores", env.BoardConfig().get("build.core")),
join(platform.get_package_dir("toolchain-xtensa"), "include")
],
# The list of directories that will be searched for libraries
LIBPATH=[
join("$BUILD_DIR", "ld"), # eagle.app.v6.common.ld
join(FRAMEWORK_DIR, "tools", "sdk", "lib"),
join(FRAMEWORK_DIR, "tools", "sdk", "ld")
],
# LIBS is set at the bottom of the builder script
# where we know about all system libraries to be included
LIBSOURCE_DIRS=[
join(FRAMEWORK_DIR, "libraries")
],
BUILDERS=dict(
ElfToBin=Builder(
action=env.VerboseAction(" ".join([
'"$PYTHONEXE"',
'"%s"' % join(FRAMEWORK_DIR, "tools", "elf2bin.py"),
"--eboot", '"%s"' % join(
FRAMEWORK_DIR, "bootloaders", "eboot", "eboot.elf"),
"--app", "$SOURCE",
"--flash_mode", "$BOARD_FLASH_MODE",
"--flash_freq", "${__get_board_f_flash(__env__)}",
"--flash_size", "${__get_flash_size(__env__)}",
"--path", '"%s"' % join(
platform.get_package_dir("toolchain-xtensa"), "bin"),
"--out", "$TARGET"
] + gzip_switch), "Building $TARGET"),
suffix=".bin"
)
)
)

-fno-inline-functions only for .c files and does not apply to i2s code

I may not be properly discerning the difference in the binaries here, it's fairly opaque and I'm having a hard time telling where which functions live even when looking at load addresses and segment ranges. I can provide both binaries if required (but have to scrub WiFi passwords from them first) if you want to do your own analysis.

Yet again, I would not know where to look as I still not quite sure what is the actual issue. Suppose IRAM for I2S can be changed as suggested above, but class change only applies to the class and not the i2s_... implementation funcs it calls.

iirc manually pre-caching hot path of the app might be the solution
#6628
#6559
(ref. https://github.com/esp8266/Arduino/blob/master/cores/esp8266/core_esp8266_spi_utils.cpp /PRECACHE/, /precache/, etc.)

Or, rewriting .ld script to forcibly move certain funcs to always be in IRAM. Mind the MMU option, though, as your app already severely limits IRAM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants