nRF5 SDK is not maintained anymore
More Info: Consider nRF Connect SDK for new designs
This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

zb_nwk_neighbor_clear triggers zb_osif_abort upon joining a Zigbee network

Hello,

During my evaluation of nRF52840 I discovered an issue which prevents the device from joining a Zigbee network successfully.

I'm using out of the box zigbee/light_bulb with some logging enabled. When I initiate the network joining on Samsung Smartthings coordinator, ziboss function calls  zb_osif_abort and causes the fatal halt.

I'm using nRF Connect v3.7.1.

Some additional information you might find useful:

* I have a coordinator from other vendor and if I use a dedicated channel used by that coordinator, the light bulb successfully and quickly joins it.

* You can see the call stack on the screenshot below:

* I'm attaching the log messages I see before the device halts.

 

Parents
  • Hi,

    Is the Samsung Smartthings coordinator Zigbee 3.0 compatible, or is it a legacy device? Please try calling zb_bdb_set_legacy_device_support(1) on the light bulb after ZBOSS is initialized, for example in ZB_BDB_SIGNAL_DEVICE_FIRST_START or ZB_BDB_SIGNAL_DEVICE_REBOOT. This will enable support for legacy devices.

    If this does not solve the issue, can you get a sniffer log of this behavior and upload it here as a pcap file?

    Best regards,

    Marte

  • Hi Marte,

    Thank you for your quick feedback and the suggested fix!

    As you suspected, Samsung Smartthings hub I'm using does not support Zigbee 3.0.

    So I tried to add zb_bdb_set_legacy_device_support(1) call and, indeed, the light bulb was successfully found by the hub and joined the network. But when I tried to remove it from the network, the same halt with identical call stack was triggered (zb_nlme_network_discovery_confirm -> zb_nwk_neighbor_clear -> zb_osif_abort).

    I'm attaching the sniffer log you asked. It should contain all the moments from joining the network, toggling on/off, and, finally, leaving which ended up in the halt.

    The log might be a bit polluted with communications with other network devices I've got connected, but if you need a clear one, just the hub and the light bulb example, please let me know and I'll try to produce one. 

    Best regards,

    Sergey

    Log

  • Hi,

    Thank you, I am able to see the packets now. I will look into the sniffer log and come back to you later today.

    Best regards,

    Marte

  • Hi,

    I have been unable to figure out what the issue might be yet, but I am still looking and will let you know. I have also asked our Zigbee team internally about the Samsung Smartthings hub.

    Best regards,

    Marte

  • Thank you very much for following up on this case!

    I'm looking forward to hearing from you and your team and I believe you'll get to the bottom of the problem.

    Brst regards,

    Sergey

  • I discovered that the system triggers a fatal halt after disconnecting from a network even if I don't use Smartthings hub but a hub which supports Zigbee 3.0

    Here are the steps to preproduce:

    1. Make a project using plain light_bulb example 

    2. Select channel mask covering default channels, build and upload to the DK

    3. Connect DK to the network (in my case it's provided by a QNECT hub)

    4. Disconnect from the network

    5. Observe the halt

    Attaching the pcap, the communication is very clear, only hub and the nRF52840 DK

    If I use the channel specific to QNECT hub (channel 11) instead of the mask, the situation, for some reason, is slightly different, I still see the halt with identical stack, but the controller gets reset in a couple of seconds.

    Here is the log:

    I hope this information might help a bit.

  • Hi Sergey,

    I am not able to reproduce this on my side. In one of your previous cases you used nRF Connect SDK v1.7.0, so that is what I have been testing with. Could you please confirm if this is the SDK version you are using?

    Best regards,

    Marte

Reply Children
  • Hi Marte,

    Yes, indeed, I'm using SDK v1.7.0 and just to make sure I re-installed it completely just yesterday before the test.

    My DK board is PCA10056 2.0.0 2019.17 683526115. 

    Best regards,

    Sergey

  • Hi Sergey,

    Thank you for providing the additional information!

    I have still not been able to reproduce your issue using the network coordinator sample as a coordinator, so I am suspecting this might be related to the coordinators. Is the QNECT Zigbee hub also a legacy device (not supporting Zigbee 3.0)?

    From looking at the first log, it seems like the device actually left the network and the halt happened after leaving and trying to rejoin, not when trying to join the first time. So the same thing is happening in all three cases, the device leaves the network, tries to rejoin a network without success, and then halts. I have not figured out why the halt happens yet, but I have reported this internally, and our developers are looking into it.

    Best regards,

    Marte

  • Hi Marte,

    Not a problem at all! I'm very interested in solving the issue since I have already invested into a device based on nRF52480 quite some time and want to finish my project. The ICs Nordic is making are truly awesome. But without solving this basic algorithmic issues I can't move any step forward.

    QNECT Zigbee hub is very new and, although I couldn't find exact specification, it must use Zigbee 3.0 since the light bulb example connects to the network immediately, without me specifying zb_bdb_set_legacy_device_support(1). 

    Indeed, it looks like the device actually leaves the network successfully, tries to rejoin, and then halts. When I switch it off/on, it enters the discovery mode and can join the same network again if I allow that on the hub. Here is the value of g_bdb_ctx at the moment of halt.

    I forgot to mention, but I found another strangeness in the light_bulb example, and this one is reproducible without any hub. When you provision the firmware (out of the box, no modifications except logs and SEGGER logging backend enabled) and press IDENTIFY_MODE_BUTTON immediately, zb_bdb_finding_binding_target is called (because dev_ctx.identify_attr.identify_time == 0 (ZB_ZCL_IDENTIFY_IDENTIFY_TIME_DEFAULT_VALUE)) and error code 0x00000004 is returned. 

    Best regards,

    Sergey

  • Hi Sergey,

    I was also unable to find specification for the QNECT hub. However, I see in the sniffer log with the QNECT hub that the Trust Center Link Key is exchanged, so it must be a Zigbee 3.0 device, as legacy devices does not support BDB TC Link Key Exchange. This confirms that the problem is not because of coordinators that are legacy devices, so that is helpful.

    Sanis said:
    Indeed, it looks like the device actually leaves the network successfully, tries to rejoin, and then halts. When I switch it off/on, it enters the discovery mode and can join the same network again if I allow that on the hub.

    Is the network open when the device tries to rejoin after successfully leaving, or is it closed so the device is unable to join? Does it halt if you set the "rejoin" flag in the leave request, requesting the device to rejoin?

    Sanis said:
    error code 0x00000004 is returned

    Is it the function zb_bdb_finding_binding_target or another Zigbee function that is returning error code 4? If so, this error is RET_BUSY, which indicates that the resource is busy and the request cannot be served. When it comes to commissioning, the function that starts the finding and binding procedure, zb_bdb_finding_binding_initiator, checks whether commissioning is in progress or not and returns RET_BUSY if it is, so this might be why. I will look more into this.

    Best regards,

    Marte

  • Hi Marte,

    Is the network open when the device tries to rejoin after successfully leaving, or is it closed so the device is unable to join? Does it halt if you set the "rejoin" flag in the leave request, requesting the device to rejoin?

    The network is not open when the device is leaving and halting. I can't change any property of the leave request that originates from the hub since I have no access to its code (neither to Samsun Smartthing nor QNECT one). I can change some aspects of handling disconnection in the lightbulb example thought. But, I'm afraid, I need to ask you to provide me with more information about what you would like me to change.

    I'm also will be on a short vacation until the end of this week but I'm definitely going to continue my attempts afterward. Please, let me know if you find anything you think could help me with the matter.

    Best regards,

    Sergey

Related