Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Heap: Improve debug caller address reporting #8720

Open
wants to merge 39 commits into
base: master
Choose a base branch
from

Conversation

mhightower83
Copy link
Contributor

@mhightower83 mhightower83 commented Nov 21, 2022

PR Status: ready

Heap: Improve debug caller address reporting to indicate the address in the sketch or library.

Lite refactoring of heap.cpp and expanded comments

  • Lite refactoring of DEBUG_ESP_OOM printing
  • Overall refactoring to make it easier to follow.
  • Changes to support atomic save of OOM debug info
  • Debug support for C++ "delete" operator - support reporting caller address

Overall avoid identifying the wrapper code as the source of the Heap call. Each outer wrapper passes the caller's address down to the next level.

Improves "caller" address reporting for OOM and Poison events. Previously, some OOM and Poison events reported info pointing to the thick heap wrapper rather than back to the caller. Update abi.cpp, heap.cpp and umm_local.c (umm_poison) to support caller parameters.

Enhanced DEBUG_ESP_OOM to track OOM events when C++ Exceptions: "enabled" is selected. Report more interesting higher-level code addresses as the caller instead of GCC C++ module.

Added DEBUG_ESP_WITHINISR to provide notification for ISRs using C++ "new"/"delete" operators or LIBC calls using _malloc_r, ...

Enhanced poison neighbor reporting to distinguish between allocation vs. neighbor failing poison check.

Support for specifying both UMM_POISON_CHECK and UMM_POISON_CHECK_LITE at the same time did not work as expected. UMM_POISON_CHECK_LITE dominated in the build. Fix: Keep things simple and prevent combining UMM_POISON_CHECK and UMM_POISON_CHECK_LITE. Defines for UMM_POISON_CHECK, UMM_POISON_CHECK_LITE, and UMM_POISON_NONE are checked for mutual exclusivity.

For sketches with extremely limited resources, added UMM_POISON_NONE to prevent UMM_POISON_CHECK_LITE from being added to debug builds.

Moved port related changes for UMM_POISON... from umm_malloc_cfg.h to umm_malloc_cfgport.h.

Added support for aligned allocations. To preserve smaller build sizes, aligned allocations default to disable. When required, supply build option -DUMM_ENABLE_MEMALIGN=1 .

If you see the linker error message: ".../gcc-gnu/libstdc++-v3/libsupc++/new_opa.cc:86: undefined reference to 'memalign'" your Arduino Sketch is using an operation that requires an aligned allocation. Then, add the line -DUMM_ENABLE_MEMALIGN=1 to the build options.

Heap Changes

Enhancements to the UMM_MALLOC library

  • Support for default 8-byte aligned data addresses. (previously was 4-byte)
  • To accommodate any libraries or Sketches that may have workarounds for the old 4-byte alignment, a deprecated build option -DUMM_LEGACY_ALIGN_4BYTE=1 was added.
  • Support the memalign() function needed for the C++17 "new" aligned operator. Enable with build option -DUMM_ENABLE_MEMALIGN=1.
  • Build options -DUMM_LEGACY_ALIGN_4BYTE=1 and -DUMM_ENABLE_MEMALIGN=1 are mutually exclusive.
  • Allocations from memalign() internally appear and are handled like any other UMM_MALLOC memory allocation.
  • The existing free() function handles releasing memalign() memory allocations.
  • Function realloc() should not be called for aligned memory allocations. It can break the alignment. At worst, the alignment falls back to sizeof(umm_block), 8 bytes.
  • The UMM_POISON build option was extended to support memalign().

Modules heap.cpp and abi.cpp updated for build option -DUMM_ENABLE_MEMALIGN=1. C++ "new" operator overloads were updated to support the alignment argument.

(Keep this description updated to reflect the current PR )

lite refactoring of DEBUG_ESP_OOM printing
Overall refactoring to make easier to follow.
Added ABI changes includes atomic save of OOM debug info
Added debug support for C++ "delete" operator

Avoid identifying the wrapper code as the source of the Heap call.
Each outer wrapper passes the caller's address down to the next level.

Improves "caller" address reporting for OOM and Poison events.
Previously, some OOM and Poison events reported info pointing to the thick heap
wrapper rather than back to the caller. Update abi.cpp, heap.cpp and umm_local.c
(umm_poison) to support caller parameter.
sketch or library.

Enhanced DEBUG_ESP_OOM to track OOM events when
C++ Exceptions: "enabled" is selected. Report more interesting
higher level code address as caller instead of GCC C++ module.

Enhanced poison neighbor reporting to distinguish between allocation
vs. neighbor failing poison check.

For draft version, added debug macro HEAP_DEBUG_PROBE_PSFLC_CB
to aid in evaluating caller address results.
Added DEBUG_ESP_WITHINISR to provide notification for ISRs using C++
'new'/'delete' operators or LIBC calls using _malloc_r, ...

Moved block of port related changes for UMM_POISON... from `umm_malloc_cfg.h`
to `umm_malloc_cfgport.h`.

Support for specifying both UMM_POISON_CHECK and UMM_POISON_CHECK_LITE at the
same time did not work as expected. UMM_POISON_CHECK_LITE dominated in the
build. Fix: Keep things simple prevent combining UMM_POISON_CHECK and
UMM_POISON_CHECK_LITE. Defines for UMM_POISON_CHECK, UMM_POISON_CHECK_LITE,
and UMM_POISON_NONE are checked for mutual exclusivity.

For sketches with extremely limited resources, added UMM_POISON_NONE to
prevent UMM_POISON_CHECK_LITE from being added to debug builds.

Make UMM_POISON_CHECK_LITE fail messages more specific about the location. At
free distinguishes between neighbor and allocation. And, identify the caller's
address.
@d-a-v d-a-v added alpha included in alpha release merge-conflict PR has a merge conflict that needs manual correction labels Dec 16, 2022
@d-a-v d-a-v removed the merge-conflict PR has a merge conflict that needs manual correction label Dec 19, 2022
@mhightower83 mhightower83 marked this pull request as ready for review April 18, 2023 20:26
@mhightower83
Copy link
Contributor Author

Let me know if you prefer I factor out experimental heap_cb.h.

@mcspr
Copy link
Collaborator

mcspr commented Apr 28, 2023

Let me know if you prefer I factor out experimental heap_cb.h

If it is still a draft, probably makes sense to leave it out of the PR to reduce loc

Updated umm_poison move from umm_malloc_cfg.h to umm_malloc_cfgport.h
in postmortem and printing OOM from Heap wrappers.

Fixed crash within postmortem when OOM file is NULL

Fixed printing OOM file name when not in ISR.
removed experimental debug extension as requested
added forgotten OOM count to the postmortem report
@mcspr mcspr self-requested a review August 30, 2024 17:10
Copy link
Collaborator

@mcspr mcspr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated this description to reflect the current PR state: not ready

Still true?

cores/esp8266/abi.cpp Outdated Show resolved Hide resolved
cores/esp8266/heap.cpp Outdated Show resolved Hide resolved
ets_printf_P(PSTR("\nlast failed alloc call: %08X(%d)@%S:%d\n"),
(uint32_t)umm_last_fail_alloc_addr, umm_last_fail_alloc_size,
umm_last_fail_alloc_file, umm_last_fail_alloc_line);
ets_printf_P(PSTR("\nlast failed alloc call: 0x%08X(%d), File: %S:%d\n"),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

%s not %S?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

%S, umm_last_fail_alloc_file can be a PROGMEM string
At least some SDK file name strings are in .irom.text.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No special case in newlib here? Implementation thinks of both 's' and 'S' as exactly same
https://github.com/earlephilhower/newlib-xtensa/blob/ebc967552ce827f21fc579fd8c437037c1b472ab/newlib/libc/stdio/nano-vfprintf_i.c#L217

printf(3) says it is '%ls', which is wide-char ('wchar_t') and non-applicable here

Synonym for ls. Don't use.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have always been confused about this duality. Some print functions give an error (or is it a warning?) about %S and PROGMEM strings while other functions don't, like Serial.printf_P.

%S is used throughout core_esp_8266_postmortem.cpp for handling the printing of PROGMEM strings.

The comment for the code you pointed out reads to me that Arduino intends to have %S for PROGMEM strings.

case 'S': // TODO: Verify cap-S under Arduino is "PROGMEM char*", not wchar_t

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

6280e98 / #5376
requested in #4223 (...which I would guess gets dragged down from AVR Core?)
merged in the original newlib 3.x.x code via igrr/newlib-xtensa#5

Add "%S" (capital-S) format that I've been told, but cannot verify, is used in Arduino to specify a PROGMEM string parameter in printfs, as an alias for "%s" since plain "%s" can now handle PROGMEM.

as-is both '%s' and even '%.*s' work seamlessly, no need for extra care
but it is true that it is used across the file, probably better served in a separate PR. at the very least for consistency

cores/esp8266/heap.cpp Outdated Show resolved Hide resolved
Comment on lines 149 to 152
#define UMM_MALLOC_FL(s,f,l,c) umm_poison_malloc(s)
#define UMM_CALLOC_FL(n,s,f,l,c) umm_poison_calloc(n,s)
#define UMM_REALLOC_FL(p,s,f,l,c) umm_poison_realloc_flc(p,s,f,l,c)
#define UMM_FREE_FL(p,f,l,c) umm_poison_free_flc(p,f,l,c)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it make sense to use struct à la source_location to the reduce number of args?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! That would be convenient.

Thoughts and Concerns:

  1. Can source_location be used from .c modules?
  2. Will function names be stored in PROGMEM or DRAM?
  3. Their example shows a longer (full definition) string than we have used for the function name, which could cause build size issues for debug builds.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I am not able to build with #include <source_location>

Detecting libraries used...
/home/mhightow/Arduino/hardware/esp8266com/esp8266/tools/xtensa-lx106-elf/bin/xtensa-lx106-elf-g++ -D__ets__ -DICACHE_FLASH -U__STRICT_ANSI__ -D_GNU_SOURCE -DESP8266 -Os @/tmp/arduino_build_311930/core/build.opt -I/home/mhightow/Arduino/hardware/esp8266com/esp8266/tools/sdk/include -I/home/mhightow/Arduino/hardware/esp8266com/esp8266/tools/sdk/lwip2/include -I/home/mhightow/Arduino/hardware/esp8266com/esp8266/tools/sdk/libc/xtensa-lx106-elf/include -I/tmp/arduino_build_311930/core -c @/home/mhightow/Arduino/hardware/esp8266com/esp8266/tools/warnings/none-cppflags -g -free -fipa-pta -Werror=return-type -mlongcalls -mtext-section-literals -fno-rtti -falign-functions=4 -std=gnu++17 -ffunction-sections -fdata-sections -fexceptions -DMMU_IRAM_SIZE=0xC000 -DMMU_ICACHE_SIZE=0x4000 -DFP_IN_IROM -w -x c++ -E -CC -DNONOSDK22x_190703=1 -DF_CPU=80000000L -DLWIP_OPEN_SRC -DTCP_MSS=536 -DLWIP_FEATURES=1 -DLWIP_IPV6=0 -DDEBUG_ESP_PORT=Serial -DARDUINO=10819 -DARDUINO_ESP8266_NODEMCU_ESP12E -DARDUINO_ARCH_ESP8266 "-DARDUINO_BOARD=\"ESP8266_NODEMCU_ESP12E\"" "-DARDUINO_BOARD_ID=\"nodemcuv2\"" -DLED_BUILTIN=2 -DFLASHMODE_DIO -I/home/mhightow/Arduino/hardware/esp8266com/esp8266/cores/esp8266 -I/home/mhightow/Arduino/hardware/esp8266com/esp8266/variants/nodemcu /tmp/arduino_build_311930/sketch/SrcLoc.ino.cpp -o /dev/null
SrcLoc:3:10: fatal error: source_location: No such file or directoryAlternatives for source_location: []

    3 | #include <source_location>
      |          ^~~~~~~~~~~~~~~~~
compilation terminated.

I also noticed that path ... esp8266com/esp8266/tools/sdk/libc/xtensa-lx106-elf/include does not exist on my machine, even after rerunning ./get.py

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not available in C, as it would need default-arg constructed by the caller.
Not available in -std=c++17, part of the -std=c++2a

What I meant was copying the behaviour, at least in structure which is then passed to the func. May be significantly larger in size, on second though :/
i.e. static struct foo { .file = ..., .lineno = ...}; foo.caller = __builtin__(); func(..., &foo);

Plus, we do have a hack around __FILE__ which is not applied to source_location to put strings in progmem as default .rodata is still in ram

cores/esp8266/heap.cpp Outdated Show resolved Hide resolved
cores/esp8266/heap.cpp Outdated Show resolved Hide resolved
cores/esp8266/heap.cpp Outdated Show resolved Hide resolved
cores/esp8266/heap.cpp Show resolved Hide resolved

extern "C" void __cxa_pure_virtual(void) __attribute__ ((__noreturn__));
extern "C" void __cxa_deleted_virtual(void) __attribute__ ((__noreturn__));

#if defined(__cpp_exceptions) && (defined(DEBUG_ESP_OOM) || defined(DEBUG_ESP_PORT) || defined(DEBUG_ESP_WITHINISR))
Copy link
Collaborator

@mcspr mcspr Sep 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't 'new_handler' be independent of debugging?
since the idea is to allow plain new to sometimes avoid throwing (regardless of whether it builds w/ -fexceptions or without). exception-less builds obviously just call abort for would-be exception
ref new_op.cc and new_opnt.cc, nothrow variant just wraps the other in try catch

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't 'new_handler' be independent of debugging?

It has been a while since I did this, I think the increased build size for non-debug builds was a consideration.
It would add 136 bytes of IROM to the non-debug build so that it always builds with our revised new_handler.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will have to recheck too, then. Debug=enabled without exceptions are not handled though?

Plus there are more missing overloads from -std=c++17
https://en.cppreference.com/w/cpp/memory/new/operator_new

Copy link
Contributor Author

@mhightower83 mhightower83 Sep 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plus there are more missing overloads from -std=c++17

The current master does not support "new" align operations. Compiler reports:
/workdir/repo/gcc-gnu/libstdc++-v3/libsupc++/new_opa.cc:86: undefined reference tomemalign'`

No hits on a quick issue check for memalign.

The library umm_malloc needs an enhancement to support an allocate-aligned option. Also, things will get complicated when incorporating the poison check feature.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've worked through adding an alignment option to the umm_malloc library. In the process, I have addressed an unreported alignment issue. umm_malloc allocated data addresses always landed on an 8-byte aligned - 4-byte.
Now I need to finish working through all the build options in abi.cpp and heap.cpp.

Copy link
Collaborator

@mcspr mcspr Sep 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(edit: comment thread from above)

Can this be a config error? GCC / newlib / hal / etc.

Reading cppreference and some of the internal GCC stuff from '-faligned-new=X', 'Fundamental alignment' and max_align_t seem to come from 'long double' and 'long long' which are afaik 4-byte ones.

Currently, it does indeed show 8-byte

constexpr auto a = alignof(long long); // 8
constexpr auto b = alignof(long double); // 8

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(edit: comment thread from above)

Can this be a config error? GCC / newlib / hal / etc.

Reading cppreference and some of the internal GCC stuff from '-faligned-new=X', 'Fundamental alignment' and max_align_t seem to come from 'long double' and 'long long' which are afaik 4-byte ones.

Currently, it does indeed show 8-byte

constexpr auto a = alignof(long long); // 8
constexpr auto b = alignof(long double); // 8

...which is sort-of expected, as run-of-the-mill xtensa isa pdf does specify alignment req for long double and long long as 8. compiler here works according to spec, stack and heap alignment would stay consistent vs. current approach

no mention / explicit alignment things mentioned in the machine config, besides the BIGGEST_ALIGNMENT of 128 (bits)
https://github.com/gcc-mirror/gcc/tree/master/gcc/config/xtensa /ALIGN/

esp32 variants adhere to a similar standard. however, esp8266 rtos does use 4-byte heap alignment
(which I would assume works just fine for the most part)

Cleanup remnants from backport
Fixed postmortem build issue around optional oom count function
Made corrections for build macro DEBUG_ESP_WITHINISR
Enhancements to the UMM_MALLOC library
 * Support for 8-byte aligned data addresses.
 * To accommodate any libraries or Sketches that may have workarounds for the
   old 4-byte alignment, a deprecated build option "-DUMM_LEGACY_ALIGN_4BYTE=1"
   was added.
 * Support the memalign() function needed for the C++17 "new" aligned operator.
   Enable with build option "-DUMM_ENABLE_MEMALIGN=1."
 * Build options "-DUMM_LEGACY_ALIGN_4BYTE=1" and "-DUMM_ENABLE_MEMALIGN=1"
   are mutually exclusive.
 * Allocations from memalign() internally appear and are handled like any other
   UMM_MALLOC memory allocation.
 * The existing free() function handles releasing memalign() memory allocations.
 * Function realloc() should not be called for aligned memory allocations. It
   can break the alignment. At worst, the alignment falls back to
   sizeof(umm_block), 8 bytes.
 * The UMM_POISON build option was extended to support memalign().

Modules heap.cpp and abi.cpp updated for build option "-DUMM_ENABLE_MEMALIGN=1".
C++ "new" operator overloads are updated to support the alignment argument.
…alloc_cfgport.h

Corrected last umm_block placement and heap_end

Added build define DEV_DEBUG_ABI_CPP. Its intended use is for module code
maintenance. Use DEV_DEBUG_ABI_CPP when debugging the new/delete overload
wrappers in abi.cpp and heap.cpp.
Comment on lines 978 to 999
#if UMM_ENABLE_MEMALIGN

/*
* Ensure alignment is power of 2; however, we allow zero.
*/

if (0u != (alignment & (alignment - 1u))) {
STATS__ALIGNMENT_ERROR(id_malloc, alignment);
return ptr;
}

/*
* Allocations default to 8-byte alignment same as __STDCPP_DEFAULT_NEW_ALIGNMENT__
* No need to execute extra alignment logic for small alignments.
*/

static_assert(__STDCPP_DEFAULT_NEW_ALIGNMENT__ == 8u);
static_assert(__STDCPP_DEFAULT_NEW_ALIGNMENT__ == sizeof(umm_block));
if (alignment <= sizeof(umm_block)) {
alignment = 0u; // Use implementation default, 8.
}
#endif
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note of the -faligned-new=X - https://gcc.gnu.org/onlinedocs/gcc/C_002b_002b-Dialect-Options.html#index-faligned-new

Shouldn't this default to 4-byte instead or alignof(umm block)? But, not sure I follow the size requirement for the block (...including the 4-byte adjustment happening below when initializing).

Or is this solving internal issue of UMM (i.e. 4-byte -> 8-byte) incorrectly handling some of allocation cases?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I add -faligned-new=4 or other non-8 values to the build the GCC compiler complains with:

cc1: warning: command-line option '-faligned-new=4' is valid for C++/ObjC++ but not for C

It will also complain with 16.

I will remove the static_asserts. That will allow the use of -faligned-new=

umm_malloc allocates in blocks of 8 bytes.

Assume the total Heap range is in multiples of 8 bytes and starts at an 8-byte aligned address boundary.
It costs a total of 8 bytes of space (4 bytes at each end of the address range) for resulting data allocation addresses to be 8-byte aligned.

The umm_block structure is sized as two 4-byte fields.
The first holds two 16-bit block addresses to the previous and next umm_block.
The 2nd 4-byte field holds two 16-bit block addresses for the previous and next free umm_blocks.
Take the case of a minimum-sized allocation ocuping 1 umm_block, 8-bytes.
The 2nd 4-byte field becomes the start of the data address when allocated.

Once a block is allocated, the Heap overhead is 4 bytes leaving 4 bytes for data space. If we make sure the heap starts at an address that is 4-byte aligned but not 8-byte aligned, then the resulting data address will be 8-byte aligned. And, this will hold for all allocations.

ets_printf_P(PSTR("\nlast failed alloc call: %08X(%d)@%S:%d\n"),
(uint32_t)umm_last_fail_alloc_addr, umm_last_fail_alloc_size,
umm_last_fail_alloc_file, umm_last_fail_alloc_line);
ets_printf_P(PSTR("\nlast failed alloc call: 0x%08X(%d), File: %S:%d\n"),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No special case in newlib here? Implementation thinks of both 's' and 'S' as exactly same
https://github.com/earlephilhower/newlib-xtensa/blob/ebc967552ce827f21fc579fd8c437037c1b472ab/newlib/libc/stdio/nano-vfprintf_i.c#L217

printf(3) says it is '%ls', which is wide-char ('wchar_t') and non-applicable here

Synonym for ls. Don't use.


extern "C" void __cxa_pure_virtual(void) __attribute__ ((__noreturn__));
extern "C" void __cxa_deleted_virtual(void) __attribute__ ((__noreturn__));

#if defined(__cpp_exceptions) && (defined(DEBUG_ESP_OOM) || defined(DEBUG_ESP_PORT) || defined(DEBUG_ESP_WITHINISR))
Copy link
Collaborator

@mcspr mcspr Sep 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(edit: comment thread from above)

Can this be a config error? GCC / newlib / hal / etc.

Reading cppreference and some of the internal GCC stuff from '-faligned-new=X', 'Fundamental alignment' and max_align_t seem to come from 'long double' and 'long long' which are afaik 4-byte ones.

Currently, it does indeed show 8-byte

constexpr auto a = alignof(long long); // 8
constexpr auto b = alignof(long double); // 8

Cleanup '//D' comments
Improvements to new/delete debug logging
Removed '//D' block comments

Removed redundant title sub-string from DEV_DEBUG_ABI_CPP.

Moved some operation delete's with respect to build macros
such that they were built when needed.
to esp8266/examples. I am not sure this is the best locations.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
alpha included in alpha release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants