Saving coredump to flash

I'm trying to enable saving of coredump to flash. I'm running on nrf52840 with NRF SDK 1.6.1.

I have defined the following:

CONFIG_DEBUG_COREDUMP=y
CONFIG_DEBUG_COREDUMP_BACKEND_FLASH_PARTITION=y
CONFIG_DEBUG_COREDUMP_MEMORY_DUMP_MIN=y
I set up a partition using pm_static.yml:
coredump_partition:
  address: 0xff000
  size: 0x1000
  region: flash_primary
and added the entry in the dts file under &flash0:
coredump_partition: partition@ff000 {
  label = "coredump_partition";
  reg = <0x000ff000 DT_SIZE_K(4)>;
};
I then added a crash shell command that invokes a crash using either k_oops(); or __asm__ volatile("udf #0" : : : );. In either case, I verified that a coredump is generated if using LOGGING.
However, when I tried to use the BACKEND_FLASH_PARTITION it first complained about not being able to find an MSPL timeslot. There were no instructions regarding MPSL but I checked and saw that the default settings has 0 timeslots. I set it to 1 timeslot. Then I'm getting the following error when it's trying to erase the flash partition in preparation for coredump. 
<err> flash_sync_mpsl: timeout
<err> coredump: Cannot start coredump!
I did verify that I can successfully erase the coredump using coredump erase
< coredump erase
00> coredump erase
00> Stored coredump erased.
There seems to be some issue with accessing flash during a coredump. There are no examples specific to nrf52 anywhere.
  • Hi,

    You only see this issue on samples that uses Bluetooth?

    Could you try setting these configs? (taken from this .config file)

    CONFIG_LOG=y
    CONFIG_LOG_MODE_MINIMAL=y
    CONFIG_DEBUG_COREDUMP=y
    CONFIG_DEBUG_COREDUMP_BACKEND_FLASH_PARTITION=y
    CONFIG_MP_NUM_CPUS=1
    CONFIG_FLASH=y
    CONFIG_DEBUG_COREDUMP_MEMORY_DUMP_MIN=y
    CONFIG_DEBUG_COREDUMP_MEMORY_DUMP_LINKER_RAM=n

  • Our application does use Bluetooth. I don't have a sample application without Bluetooth that I can try on our board or even the DK board.

    We already had most of these set. The only difference was LOG_MODE. I updated LOG_MODE to be minimal but that did not make any difference. Still getting:

    00> =====> Crashing system... (this is expected crash)
    00> E: r0/a1: 0x00000003 r1/a2: 0x20013680 r2/a3: 0x00000040
    00> E: r3/a4: 0x00000000 r12/ip: 0x00000000 r14/lr: 0x0002532b
    00> E: xpsr: 0x41000000
    00> E: Faulting instruction address (r15/pc): 0x00042352
    00> E: >>> ZEPHYR FATAL ERROR 3: Kernel oops on CPU 0
    00> E: Current thread: 0x200031b0 (shell_rtt)

    // This is where it's trying to dump the core to flash but is failing
    00> E: timeout
    00> E: Cannot start coredump!

  • ST-Kon said:
    00> E: Cannot start coredump!

    I see this line is being printed here: https://github.com/nrfconnect/sdk-zephyr/blob/v2.7.99-ncs1/subsys/debug/coredump/coredump_backend_flash_partition.c#L339

    Seems like it's stream_flash_init() that returns an error-code. Could you print the error-code, and post it here? E.g. change the line to something like this to print the value:

    LOG_ERR("Cannot start coredump! return-value: %d",ret);

  • The code is ETIMEDOUT 116.

    Interestingly, if I disable Bluetooth:

    CONFIG_BT=n
    CONFIG_BT_PERIPHERAL=n
    CONFIG_BT_CTLR=n
    Then I no longer see the timeout errors. It then complains about invalid parameter. Tracing that I found that because I put the coredump partition at the end of flash:
    coredump_partition:
    address: 0xff000
    size: 0x1000
    region: flash_primary
    coredump_flash_backend_start() adds an offset for the header but does not adjust the size but the same amount. This causes stream_flash_init() to fail with invalid parameter because the start address (which now includes the offset) plus size of partition is past the end of flash. 

    offset = backend_ctx.flash_area->fa_off;
    offset += ROUND_UP(sizeof(struct flash_hdr_t),
    FLASH_WRITE_SIZE);
    After adjusting the size by the offset (backend_ctx.flash_area->fa_size - 0x10,)I do see the coredump being written to flash. However, printing coredump fails because it fails CRC check. I'm currently trying to figure out why CRC in the header doesn't match read CRC.
    In any case, we need to figure out why there is a timeout if BT is enabled. There also needs to be a bug filed to fix the size adjustment due to offset.
  • OK, the CRC error was being caused because my defined partition size of 4k and the shell thread stack size (where I was triggering the crash from) was also 4k. This caused incomplete coredump from being written to flash. In such a case, the CRC is not being calculated correctly, it's always off by 0x86. This probably should be fixed, even if incomplete coredump is saved.

    Once I increased the partition to be 8k, I was able to successfully read it using coredump print.

    So we just need to figure out how to make this work with BT enabled.

Related