HARD FAULT during xTaskResumeAll after ending a DFU session and disabling the Softdevice

Hi all,

I am working on nrf52840 chip with SDK 17.0.2.

Our application runs smoothly and we want to add it the capabilities of  upgrading another  nrf52840 chip using Nordic DFU service.

In order to do that we turn off our application RADIO, suspend most of our freeRTOS tasks and then call for vTaskSuspendAll to suspend the scheduler.

Then we enable the Softdevice (as part of ble_stack_init) and send the image to the remote nrf52840.

This works well.

When we finish the DFU process we call nrf_sdh_disable_request and wait until we know that the Softdevice is disabled.

Then we resume our tasks and want to resume the scheduler by calling   xTaskResumeAll();

The problem is that we get the following  hard fault: 

<error> hardfault: HARD FAULT at 0x00029350
<error> hardfault: R0: 0x00000A85 R1: 0x08F38168 R2: 0x00684088 R3: 0x0000000B
<error> hardfault: R12: 0x2000FE40 LR: 0x0002AEB7 PSR: 0x21000200
<error> hardfault: Cause: Data bus error (return address in the stack frame is not related to the instruction that caused the error).

The call stack is : 

 

What am I doing wrong ?

Thanks in advance for any assistance ,

Rafalino

  • Either it is a memory corruption or it is an attempt to remove an already removed timer. 
    You can enable stack overflow check to see if there are any stack overflows happening for the first suspicion. 

    You need to add more logs to find out if the last one is an attempt to delete an already deleted timer.


  • Hello Sushell,

    I have added xMessageID  to the prints at  prvProcessReceivedCommands 

    just before  ( void ) uxListRemove( &( pxTimer->xTimerListItem ) );

    app: xMessage.xMessageID = 6, pxTimer=0x0x2000CE98 , pxTimer->pvTimerID=0x0x2002D2D8
    app: xMessage.xMessageID = 8, pxTimer=0x0x00000A81 , pxTimer->pvTimerID=0x0x1E200000

    The last two messages are coming from ISR (message ID 6 and 8)

    The problematic message is  : #define tmrCOMMAND_STOP_FROM_ISR ( ( BaseType_t ) 8 )

    Is there a chance that the the Softdevice has a timer ISR that triggers it ?

    (Even though the Softdevice is already disabled at that stage).

    Thanks in advance,

    Rafalino 

  • Rafalino said:
    app: xMessage.xMessageID = 8, pxTimer=0x0x00000A81 , pxTimer->pvTimerID=0x0x1E200000

    This address is not a valid RAM location. So I strongly suspecting a memory corruption. Since this is happening the Timer task context, I am assuming that your timer stack size is too small.

    Try increasing configTIMER_TASK_STACK_DEPTH in your FreeRTOSConfig.h file to a bigger value for testing.

    Also, Like I suggested in my previous reply. Enable stack overflow checks.

    Aryan said:
    You can enable stack overflow check to see if there are any stack overflows

     

  • Hello Susheel,

    I have increased configTIMER_TASK_STACK_DEPTH  from 80 to 256 but still got the same hard fault .Unamused 

    I then  set 

    #define configCHECK_FOR_STACK_OVERFLOW                                            1

    and added 

    void vApplicationStackOverflowHook( xTaskHandle pxTask, signed char *pcTaskName )
    {
             while (1)
            {
                    NRF_LOG_INFO(" %s %s", __FUNCTION__, pcTaskName);
            }

    }

    But it didn't jump into the above hook and I got the same hard fault .

    What are we missing ? Thinking

  • Then it might be possible that we are overlooking into the memory corruption direction. If this is not a memory corruption by stack overflow, then the application is somehow passing the wrong timerID.

    You can write a small code snippet like below in prvProcessReceivedCommands  just before uxListRemove

    if (pxTimer->pvTimerID == 0x0x1E200000)
    {
        static volatile uint32_t counter = 0;
        counter++;     // <-- Put a breakpoint at this line
    }

    Compile your code, flash and start the code in the debugger. Put the breakpoint at the "counter++" and run the application in an attempt to trigger the hardfault. 

    The debugger should halt at the breakpoint and now your function call stack should allow you to browse through the functions that lead to this breakpoint. Try to understand the context of how this value has been passed to pvTimerID.

    Please note that there is known bug in the libuarte library when using FreeRTOS as the macros use to initialize libuarte instances initialize the app_timer_freertos instances wrongly (after an incompatible casting). Please check if you are affected by this.

Related