Hello,
First let me say I'm new to Nordic tools and SDK. I have been lurking on this forum for a while and it has helped solve a lot of my issues in the past. I couldn't find anything like the current issue I'm having so I apologize if this has come up before.
I'm working on a custom board that is using the nRF52832 and a couple of sensors on the SPI bus to collect data. I started with the BLE UART example project and successfully integrated the buttonless DFU functionality as well as the NRFX_SPIM module. Additionally I stripped out all references (that I could find) to the dev kit buttons and LEDs.
I have spent the last few days trying to track down an issue with the app_timer but I'm having a hard time pinning it down so I figured I would ask some of you experts to see if you had any insight.
The project uses the app_timer2 module to setup a few timers that are used to periodically request data from the two SPI bus sensors. There are a total of 3 app timer instances:
1 single shot timer - started at SPI xfer start and stopped in the SPI complete handler (used for tracking timeouts in the spi transfer)
1 repeating timer (timeout of 10mS) - used to request data from sensors 1
1 repeating timer (timeout of 1mS) - used to request data from sensors 2
I make use of a SPI msg queue, when one of the repeating timers fires, the timer IRQ function puts a spi message in the queue and exits. Separately as the SPI IRQ function is called it will pull the next message from the queue and start the next SPI xfer.
After setting everything up, the main program loop just sits back and runs the following:
if (NRF_LOG_PROCESS() == false)
{
nrf_pwr_mgmt_run();
}
For the most part this system works very well. My SPI message queue never really gets much above 3 deep during operation, because the spi transactions complete relatively quickly compared to the rate at which new spi messages are put in the queue.

The image above shows one sensors sampling at 1mS and the other at 10mS intervals as expected.
Now this is where the strangeness happens, every so often my spi message queue completely fills (10deep) . I tracked this down to an issue with the app timer instances that periodically add messages to the spi message queue. What seems to be happening is that I will get no timer IRQ calls for about ~1Sec. After the 1 second I seem to get all the missing calls that were missed at once. This completely fills my spi msg buffer, at which point my app hits a breakpoint due to msg queue being full.
The time it takes for this condition to occur seems almost random. Sometimes it happens very quickly after boot, other times it can go for 30mins+ without ever seeing the issue. (Possible race condition?).

Debug steps I have already looked at:
I have changed APP_TIMER_CONFIG_RTC_FREQUENCY a few times to see if it had and effect on the timeout and it doesn't seem to (always a pause of ~1Sec).
static bool rtc_schedule(app_timer_t * p_timer, bool * p_rerun)
{
ret_code_t ret = NRF_ERROR_TIMEOUT;
*p_rerun = false;
/* In case timer got stopped in between, end_val will be very far in the
* future. RTC will be reconfigured on the next iteration.
*/
uint64_t end_val = p_timer->end_val;
int64_t remaining = (int64_t)(end_val - get_now());
if (remaining > 0) {
uint32_t cc_val = ((uint32_t)remaining > APP_TIMER_RTC_MAX_VALUE) ?
(app_timer_cnt_get() + APP_TIMER_RTC_MAX_VALUE) : end_val;
ret = drv_rtc_windowed_compare_set(&m_rtc_inst, 0, cc_val, APP_TIMER_SAFE_WINDOW);
NRF_LOG_DEBUG("Setting CC to 0x%08x (err: %d)", cc_val & DRV_RTC_MAX_CNT, ret);
if (ret == NRF_SUCCESS)
{
return true;
}
}
else
{
drv_rtc_compare_disable(&m_rtc_inst, 0);
}
if (ret == NRF_ERROR_TIMEOUT)
{
*p_rerun = timer_expire(p_timer);
}
else
{
NRF_LOG_ERROR("Unexpected error: %d", ret);
ASSERT(0);
}
Additional when this condition occurs and I het my breakpoint, I examined the "remaining" variable inside of "rtc_schedule()" and it shows a value that is very close to the amount of time that I was not getting app_timer interrupts (~1Sec).
Another thought I had was that maybe my program was stuck in some other IRQ during that one sec, but this didn't seem to be the case, I managed to manually pause the debugger (within that 1Sec window) when I noticed that my SPI traffic had timed out on the logic analyzer, however it was simply sitting in the main program loop running nrf_pwr_mgmt_run().
I also enabled the profiler and checked app_timer_op_queue_utilization_get(); it never seems to get over 4.
Any help or insight would be graetly appreciated. Thanks in advance!
Project Details:
Custom board using the NRF52832 IC
SDK: v17.1.0
Softdevice: v7.2.0