Mesh DFU - Free flash space & bank erasure

I have a few questions about how exactly the free flash space is intended to be managed with a banked DFU transfer. We observed some devices encounter problems after a DFU to a new application version based on Mesh SDK v5.0, which introduces a new file to the flash storage in order to persist the replay cache. On startup the application attempts to allocate flash storage space out of what was previously free flash space. The flash manager code contains sanity checks to ensure that it's not overwriting some other piece of application data. In our case it appears that the flash space the application was intending to claim had been previously used for DFU transfer and never erased, causing the flash manager's assertions to fail.

Here are my questions:

1. What code exists to prevent a DFU transfer from overwriting the "Application data" section, as shown in the diagram here? I understand the code to compute the start of the bank, but shouldn't there also be a sanity check somewhere to ensure that a large (potentially malicious) DFU will not exceed the free space and start overwriting the application data? I wasn't able to find anything that would prevent this from happening.

2. Is there code somewhere to erase the bank after the transfer is complete? I found this comment which I wasn't sure how to interpret: by "erase the bank entry" does that mean simply updating the device page? If there's nothing that would erase the bank itself, then I don't understand how the code in the flash manager which tries to ensure that newly-claimed flash areas are "safe" to use is backwards-compatible. Any flash memory previously used for a DFU transfer could in principle contain arbitrary data. If the device is then DFU'd to a version that defines a new mesh file, the flash manager will then try to claim that space but fail its sanity checks.

  • Hi,

    I have forwarded this to our mesh team, I will give you an update as soon as possible.

  • Hi,

    It looks like both concerns are correct. These are bugs but this will most likely be not be fixed since the nRF5 SDK for Mesh is in maintenance mode. 

    We can only recommend cleaning up the flash area that is required for Mesh
    Configuration before running Mesh after DFU.  

  • Hi,

    An update from the team:

    1. The first issue might be solved in two ways:

    1.1 Modify DFU command DFU_START_TARGET and add as a command field the address of the app mesh config end. Then modify bootloader to consider this address instead of BOOTLOADERADDR. Pros: automated communication between app and bootloader that considers ongoing sizes of areas. Cons: it requires fixing both bootloader and app dfu parts. Upload both bootloader and app but bootloader only once.

    It will require uploading application first. Then uploading of bootloader. Since command format incompatibility it will corrupt some memory in the command buffer but it should be safe enough.

    1.2 Add compile-time definition in bootloader then bootloader can consider BOOTLOADERADDR + NEW_DEFINED_ADDRESS. Then you should follow up on this.
    Pros: it requires fixing only bootloader. Cons: it requires following up on app mesh config size and definition and aligning them manually. Upload both bootloader and app every definition change.

    2. The second issue requires implementation erasing allocated bank if bootloader succeeds uploaded image applying.

    The sequence of applying patches

    1. add_cleaningup_after_succeed   ---- erase flash bank area after dfu succeeded
    2. add_app_data_as_upper_boundary  ------ reject image if it overlaps with application data area
    3. fix_assertion_during_bootbank_erasing ----- fix assertion if bootloader bank is erased
    4. rebuild application and bootloader

    `DFU start target` command between application and bootloader has been changed and is not compatible with old versions. 

    To upgrade devices with 5.0.0 on the current one you need to upload the application image first and then upload the bootloader image (not another way around).

    Please consider that these are only assumptions after the source code investigation. Every solution might face additional issues that will require separate solving. Additionally, both solutions require good testing since the system `bootloader + app ---> DFU` is quite fragile from a stability point of view.  

  • Please consider that these are only assumptions after the source code investigation. Every solution might face additional issues that will require separate solving. Additionally, both solutions require good testing since the system `bootloader + app ---> DFU` is quite fragile from a stability point of view. 

    Has Nordic done a complete test of these patches all the way through, i.e. upgraded an actual device using the sequence of suggested steps?

  • Hi,

    Yes, one of our developer have tested the patches and it works.

Related