Hard fault from nrf_log_frontend_dequeue()

Hi,

I'm getting a hard fault that I really need some help to debug. My call stack is shown below and includes nrf_log_frontend_dequeue() as well as a custom service function which calls sd_ble_gatts_hvx().

I checked the CFSR upon the fault following this guide and the only error flag set is for the following:

"IACCVIOL - Indicates that an attempt to execute an instruction triggered an MPU or Execute Never (XN) fault."

This error occurs in two cases:

(1) immediately upon connection with the peripheral device if--and this is a sort of strange condition--if SAADC acquisition time is set to 1us or 5us (NRF_SAADC_ACQTIME_1US)

(2) rarely/sporadically if SAADC acquisition time is set to anything >5us

One connection is that my custom function ble_sws_meas_send is called by the SAADC callback. But the frequency of SAADC reads is set independently of acquisition time, so I don't know why this would have an effect. 

Thanks in advance!

  • Hi,

    Can you try to enable the HardFault handling library in your application, to see if this provides some more details about where/why the hardfault occurs?

    What happens inside ble_sws_meas_send()?

    (1) immediately upon connection with the peripheral device if--and this is a sort of strange condition--if SAADC acquisition time is set to 1us or 5us (NRF_SAADC_ACQTIME_1US)

    (2) rarely/sporadically if SAADC acquisition time is set to anything >5us

    There are a few erratas related to the acquisition time of the SAADC peripheral on nRF52832, but I can't see why any of these would cause a hardfault. What is the samplerate of the SAADC, and how large buffers do you use?

    Are you seeing the hardfault if you do not enable BLE/connections in your application, or if you do not call ble_sws_meas_send()?

    Best regards,
    Jørgen

  • Here is the output on the debug terminal after enabling the HardFault handling library:

    <info> app: 141, 29979, 29984 [This is an application message indicating completion of 100 ADC measurements]
    <info> app: Connected. [This is an application message indicating connection to the central device]
    <info> app: 142, 29981, 29985 [This is an application message indicating completion of 100 ADC measurements]
    <error> hardfault: HARD FAULT at 0x0C010107
    <error> hardfault: R0: 0x00000000 R1: 0x00000000 R2: 0xE000E100 R3: 0x00000000
    <error> hardfault: R12: 0x200044D2 LR: 0xDC008E01 PSR: 0x0000051E
    <error> hardfault: Cause: The processor has attempted an illegal load of EXC_RETURN to the PC, as a result of an invalid context, or an invalid EXC_RETURN value.
    <error> app: Fatal error

    The ble_sws_meas_send function first encodes the measurement into a buffer for the BLE data packet, and then queues the packet for transmission with sd_ble_gatts_hvx(p_sws->conn_handle, &hvx_params), which is where the error seems to occur, according to the call stack.

    The SAADC sample rate is 2 kHz. It is scanning, and effectively oversampling, two channels. The buffer is the size of two readings x2 (for double-buffering). The SAADC callback is accordingly executed every 2 kHz, and then after every 100 measurements (for oversampling) the ble_sws_meas_send function is called to transmit an oversampled reading.

    The hardfault does not occur without BLE connection - before connecting the central device, the ADC measurements run fine with no faults.  

  • It looks like the application is hardfaulting due to an invalid execution address (PC=0x0C010107). Something in your application must have made the application jump to this address.

    Do you use static buffers for the SAADC sampling, or could it be some stack overflow causing the problem?

    Can you post the exact code that is running when the hardfault occurs?

  • There is a known bug in nrf_log_frontend_dequeue() which may or may not be an issue in this case; here is a link to the discussion:

    bool nrf_log_frontend_dequeue(void)
    {
        if (buffer_is_empty())
        {
            return false;
        }
        // Note also add atomic flag set before this __DSB() and after this function exits
        // See https://devzone.nordicsemi.com/f/nordic-q-a/39188/nrf_log_frontend_dequeue-must-be-atomically-protected-against-re-entry-from-interrupt-context
        if (nrf_atomic_flag_set_fetch(&m_log_data.log_is_busy))
        {
            return false;
        }
    ...

    nrf_log_frontend_dequeue-must-be-atomically-protected-against-re-entry-from-interrupt-context

  • , thanks for the comment, but as I'm using SDK 17.0.2 (>15.2.0) I think I have an unrelated bug.

    , I am using a static SAADC double buffer, shown below. Should it not be static? (I did try removing the static keyword and got the same error.)

    static nrf_saadc_value_t     m_fr_buffer_pool[2][CHEM_BUFFER_NUM];

    The buffer is initialized as follows:

    err_code = nrf_drv_saadc_buffer_convert(m_fr_buffer_pool[0], CHEM_BUFFER_NUM);
    APP_ERROR_CHECK(err_code);

    err_code = nrf_drv_saadc_buffer_convert(m_fr_buffer_pool[1], CHEM_BUFFER_NUM);
    APP_ERROR_CHECK(err_code);

    As far as the specific code running during the fault, when it breaks, the call stack points to code for 3 unknown functions, and the most recent is at 0xA60. (This is different than described in my original post, but there have been some smaller unrelated changes to the codebase since, and it is still erring immediately upon Bluetooth connection as before.)

    4770 bx lr
    4B01 ldr r3, [pc, #4]          <- 0xA60
    681B ldr r3, [r3]
    68DB ldr r3, [r3, #12]
    4718 bx r3

    Thanks.

    Noelle

Related