I'm trying to enable saving of coredump to flash. I'm running on nrf52840 with NRF SDK 1.6.1.
I have defined the following:
label = "coredump_partition";
reg = <0x000ff000 DT_SIZE_K(4)>;
};
00> coredump erase
00> Stored coredump erased.
I'm trying to enable saving of coredump to flash. I'm running on nrf52840 with NRF SDK 1.6.1.
I have defined the following:
Hi,
You only see this issue on samples that uses Bluetooth?
Could you try setting these configs? (taken from this .config file)
CONFIG_LOG=y CONFIG_LOG_MODE_MINIMAL=y CONFIG_DEBUG_COREDUMP=y CONFIG_DEBUG_COREDUMP_BACKEND_FLASH_PARTITION=y CONFIG_MP_NUM_CPUS=1 CONFIG_FLASH=y CONFIG_DEBUG_COREDUMP_MEMORY_DUMP_MIN=y CONFIG_DEBUG_COREDUMP_MEMORY_DUMP_LINKER_RAM=n
Our application does use Bluetooth. I don't have a sample application without Bluetooth that I can try on our board or even the DK board.
We already had most of these set. The only difference was LOG_MODE. I updated LOG_MODE to be minimal but that did not make any difference. Still getting:
00> =====> Crashing system... (this is expected crash)
00> E: r0/a1: 0x00000003 r1/a2: 0x20013680 r2/a3: 0x00000040
00> E: r3/a4: 0x00000000 r12/ip: 0x00000000 r14/lr: 0x0002532b
00> E: xpsr: 0x41000000
00> E: Faulting instruction address (r15/pc): 0x00042352
00> E: >>> ZEPHYR FATAL ERROR 3: Kernel oops on CPU 0
00> E: Current thread: 0x200031b0 (shell_rtt)
// This is where it's trying to dump the core to flash but is failing
00> E: timeout
00> E: Cannot start coredump!
The code is ETIMEDOUT 116.
Interestingly, if I disable Bluetooth:
OK, the CRC error was being caused because my defined partition size of 4k and the shell thread stack size (where I was triggering the crash from) was also 4k. This caused incomplete coredump from being written to flash. In such a case, the CRC is not being calculated correctly, it's always off by 0x86. This probably should be fixed, even if incomplete coredump is saved.
Once I increased the partition to be 8k, I was able to successfully read it using coredump print.
So we just need to figure out how to make this work with BT enabled.
NRF SDK 1.6.1.
ST-Kon said:coredump_flash_backend_start() adds an offset for the header but does not adjust the size but the same amount
I see there was a fix here recently(fixed in NCS v1.9.x), see these 2 commits:
https://github.com/nrfconnect/sdk-zephyr/commit/1dc74d70f265ab753b9ef3f00450426bc2ccf296
https://github.com/nrfconnect/sdk-zephyr/commit/d5520f2e59b7d6d65bb8f4b6e848df1d46e95836
Seems like it's stream_flash_init() that returns an error-code. Could you print the error-code, and post it here? E.g. change the line to something like this to print the value:
I traced the issue down to nrf_flash_sync_exe() in flash_sync_mpsl.c. The error happens when that function tries to obtain the semaphore (with my slight modification to print the status):
After more digging, it appears that the issue is with how nrf uses k_sem_take(). According to Zephyr docs (https://docs.zephyrproject.org/3.0.0/reference/kernel/synchronization/semaphores.html), if k_sem_take is called from ISR, it must use K_NO_WAIT for timeout.
However, NRF SDK uses K_FOREVER or actual timeout values in various places that are involved in the coredump. Because coredump is happening in the ISR, those timeout values are not valid. An assert gets triggered, or if asserts are disabled, I imagine Zephyr is unhappy, which is why we fail to take the semaphore.
After more digging, it appears that the issue is with how nrf uses k_sem_take(). According to Zephyr docs (https://docs.zephyrproject.org/3.0.0/reference/kernel/synchronization/semaphores.html), if k_sem_take is called from ISR, it must use K_NO_WAIT for timeout.
However, NRF SDK uses K_FOREVER or actual timeout values in various places that are involved in the coredump. Because coredump is happening in the ISR, those timeout values are not valid. An assert gets triggered, or if asserts are disabled, I imagine Zephyr is unhappy, which is why we fail to take the semaphore.
Any update on this? Is there a plan to support coredump to flash with BT enabled?
Yes, user15146 , has there been any follow up?
Hi,
I'm checking with the developers.
Hi,
I got some feedback from the team. We have not heard of anyone using Zephyr's built-in coredump functionality on Nordic ICs (or Arm in general), though it does seem to be supported.
Most nRF Connect customers use memfault instead, would that be an option?
https://developer.nordicsemi.com/nRF_Connect_SDK/doc/latest/nrf/libraries/others/memfault_ncs.html
(Otherwise we will investigate using Zephyr's coredump)
Memfault's integration supports storing coredumps in flash:
CONFIG_MEMFAULT_NCS_INTERNAL_FLASH_BACKED_COREDUMP
Note that memfault has its own coredump handling, it doesn't use the upstream Zephyr one.
Thanks, user15146. We will go with Memfault.