HARD FAULT during xTaskResumeAll after ending a DFU session and disabling the Softdevice

Hi all,

I am working on nrf52840 chip with SDK 17.0.2.

Our application runs smoothly and we want to add it the capabilities of  upgrading another  nrf52840 chip using Nordic DFU service.

In order to do that we turn off our application RADIO, suspend most of our freeRTOS tasks and then call for vTaskSuspendAll to suspend the scheduler.

Then we enable the Softdevice (as part of ble_stack_init) and send the image to the remote nrf52840.

This works well.

When we finish the DFU process we call nrf_sdh_disable_request and wait until we know that the Softdevice is disabled.

Then we resume our tasks and want to resume the scheduler by calling   xTaskResumeAll();

The problem is that we get the following  hard fault: 

<error> hardfault: HARD FAULT at 0x00029350
<error> hardfault: R0: 0x00000A85 R1: 0x08F38168 R2: 0x00684088 R3: 0x0000000B
<error> hardfault: R12: 0x2000FE40 LR: 0x0002AEB7 PSR: 0x21000200
<error> hardfault: Cause: Data bus error (return address in the stack frame is not related to the instruction that caused the error).

The call stack is : 

 

What am I doing wrong ?

Thanks in advance for any assistance ,

Rafalino

Parents
  • Hi Ori, 

    Thanks for your patience.
    It looks to me that the pxTimer might be pointing to an invalid instance of timer here in the uxListRemove. 
    Can you put some logs in the timer to print the instances of the pxTimer. 

    This might not be directly related to DFU but might be related to the way you are suspending and resuming all tasks. My best guess right now, is that the timer instance or the list instance within the timer is somehow invalid or corrupted. If that is the case, we need to find out how and when that happened.

  • Hello Sushel,

    I have added prints at timers.c  

    NRF_LOG_INFO(" pxNewTimer=0x%p , pxTimer->pvTimerID=0x%p  ", pxNewTimer, pxNewTimer->pvTimerID);

    at the end of xTimerCreate

    NRF_LOG_INFO(" pxTimer=0x%p , pxTimer->pvTimerID=0x%p  ", pxTimer, pxTimer->pvTimerID);

    at  prvProcessReceivedCommands just before  ( void ) uxListRemove( &( pxTimer->xTimerListItem ) );

     This is what we see:

     <info> app: pxNewTimer=0x0x2000CE40 , pxTimer->pvTimerID=0x0x200169C8
     <info> app: pxNewTimer=0x0x2000D0C0 , pxTimer->pvTimerID=0x0x200168F0
     <info> app: pxNewTimer=0x0x2000D0F0 , pxTimer->pvTimerID=0x0x2002D310
     <info> app: pxNewTimer=0x0x2000D120 , pxTimer->pvTimerID=0x0x2002D330


     <info> app: pxTimer=0x0x2000D120 , pxTimer->pvTimerID=0x0x2002D330
     <info> app: pxTimer=0x0x2000D120 , pxTimer->pvTimerID=0x0x2002D330
     <info> app: pxTimer=0x0x2000D0F0 , pxTimer->pvTimerID=0x0x2002D310

    Just before the hard fault :
     <info> app: pxTimer=0x0x00000A81 , pxTimer->pvTimerID=0x0x1E200000

    We see that all the timers are at address 0x0x2000DXXX

    and the timer that we need to remove is in weird address 0x0x00000A81

    What can cause this corruption ?

  • Either it is a memory corruption or it is an attempt to remove an already removed timer. 
    You can enable stack overflow check to see if there are any stack overflows happening for the first suspicion. 

    You need to add more logs to find out if the last one is an attempt to delete an already deleted timer.


  • Hello Sushell,

    I have added xMessageID  to the prints at  prvProcessReceivedCommands 

    just before  ( void ) uxListRemove( &( pxTimer->xTimerListItem ) );

    app: xMessage.xMessageID = 6, pxTimer=0x0x2000CE98 , pxTimer->pvTimerID=0x0x2002D2D8
    app: xMessage.xMessageID = 8, pxTimer=0x0x00000A81 , pxTimer->pvTimerID=0x0x1E200000

    The last two messages are coming from ISR (message ID 6 and 8)

    The problematic message is  : #define tmrCOMMAND_STOP_FROM_ISR ( ( BaseType_t ) 8 )

    Is there a chance that the the Softdevice has a timer ISR that triggers it ?

    (Even though the Softdevice is already disabled at that stage).

    Thanks in advance,

    Rafalino 

  • Rafalino said:
    app: xMessage.xMessageID = 8, pxTimer=0x0x00000A81 , pxTimer->pvTimerID=0x0x1E200000

    This address is not a valid RAM location. So I strongly suspecting a memory corruption. Since this is happening the Timer task context, I am assuming that your timer stack size is too small.

    Try increasing configTIMER_TASK_STACK_DEPTH in your FreeRTOSConfig.h file to a bigger value for testing.

    Also, Like I suggested in my previous reply. Enable stack overflow checks.

    Aryan said:
    You can enable stack overflow check to see if there are any stack overflows

     

  • Hello Susheel,

    I have increased configTIMER_TASK_STACK_DEPTH  from 80 to 256 but still got the same hard fault .Unamused 

    I then  set 

    #define configCHECK_FOR_STACK_OVERFLOW                                            1

    and added 

    void vApplicationStackOverflowHook( xTaskHandle pxTask, signed char *pcTaskName )
    {
             while (1)
            {
                    NRF_LOG_INFO(" %s %s", __FUNCTION__, pcTaskName);
            }

    }

    But it didn't jump into the above hook and I got the same hard fault .

    What are we missing ? Thinking

Reply
  • Hello Susheel,

    I have increased configTIMER_TASK_STACK_DEPTH  from 80 to 256 but still got the same hard fault .Unamused 

    I then  set 

    #define configCHECK_FOR_STACK_OVERFLOW                                            1

    and added 

    void vApplicationStackOverflowHook( xTaskHandle pxTask, signed char *pcTaskName )
    {
             while (1)
            {
                    NRF_LOG_INFO(" %s %s", __FUNCTION__, pcTaskName);
            }

    }

    But it didn't jump into the above hook and I got the same hard fault .

    What are we missing ? Thinking

Children
No Data
Related