Peripheral missing disconnect on timeout

When communication fails on NRF connectivity dongle (In Python via pc_ble_driver) and Error code: NRF_ERROR_BUSY is received on attempts to write and one restarts python script, a timeout appears to occur in peripheral. the connection changes state (after delay) from BT_CONN_CONNECTED to BT_CONN_DISCONNECT_COMPLETE.

This does not trigger a callback and application thus does not know to call bt_conn_unref() thus advertising is not resumed.

System runs Zephyr on nRF Connect SDK 1.8.0 as peripheral only.

Parents Reply Children
  • So far it has been difficult to strip it down to a working example.
    The basic premise is apparently starting new connections while already starting a connection from the python side in such a way as to get the dongle in a "busy" state. When reset later it sometimes gets the peripheral in an odd state where it doesn't disconnect properly?

    Maybe we can share project access, but it would be some work to make it run without the external hardware that is integrated into the code. In theory a blank NUS example with some high traffic should be all it takes while trying to "pester" it with connections, but the sample does not build for me and I did not have time to dig into it.

  • Kyrre Aalerud said:
    When reset later it sometimes gets the peripheral in an odd state where it doesn't disconnect properly?

    Maybe a Bluetooth sniffer trace of the issue could help give some clues about what is needed to trigger this issue. See: https://www.nordicsemi.com/Products/Development-tools/nRF-Sniffer-for-Bluetooth-LE

    It's a long shot, but maybe the fix in this PR could help: https://github.com/zephyrproject-rtos/zephyr/pull/44194

  • I have not yet managed to grab a sniff, but I did discover some issue related to this.
    The code uses NUS and so there is a callback registered with NUS for send, and as we allocate buffers and deliver to NUS we are not able to free them as link goes down.

    bt_gatt_notify_cb() is called by NUS in bt_nus_send() and it gives it (in param structure) a single callback for when it is complete. As long as link is alive when it starts this function, it does not fail, and queues up the data. However, there is no callback for failure if link goes down and there is now nobody to deliver the tx events to. Next connection is established, but nothing can be sent, and when this new connection times out (4 seconds default is a long time) nothing happens.

Related