I have been exhaustively trying to resolve an issue for which I might just find the reason: a corrupted configuration saved in the RAK3172 Flash.
I have developed a product based on the RAK3172 and we currently have around 30 units running in a Pilot before can approve mass production. Those units are going through some tests and after some time in the field, a few of them the RAK3172 module stopped working completely.
The symptom is that the RAK3172 module simply doesn’t print anything in the Serial port nor respond to any AT command. I also verified that those units were not in Boot Load mode. The RAK3172 power consumption is around 7mA so it’s definitely not in Stop mode.
At this stage, the RAK3172 is basically bricked and the only way I found to recover the unit was to Fully Erase and reflash the firmware using ST-Link (SWIO). Once this is done the bricked RAK3172 gets back to life and works normally.
Further investigating this problem in the Lab, today I run a power cycle test. Basically turning the device ON/OFF multiple times at random intervals… and bingo! I was able to “brick” the RAK3172 module again.
Now with a method to “replicate” the potential issue, I did the following:
- Flash the RAK3172 with the latest firmware (3.4.11) over SWIO
- Read all the RAK3172 Flash memory back via SWIO and store it in a HEX file
- Run my power cycle test until I have the RAK3172 bricked again
- Re-read all the RAK3172 Flash memory via SWIO again and store it in another HEX file
- Compare both HEX files to look for any difference
What I could see is that the “bricked” RAK3172 module had some different data at the end of the Flash memory, which I guess it’s the Flash area reserved for the module Configuration like EUI, APP KEY, etc.
The Configuration Flash address seems to be from
0x0803F000 on the RAK3172 STM32 memory.
I have the 3 HEX dumps but it doesn’t seem to be possible to attach them here.
- One from the unit Full Erased + Flashed with FW v3.4.11
- One from the unit after powered up, configured and running
- One from the unit after the restart test, once it got bricked
In this case, I’m attaching the DIFF screenshots but I’m happy to share the HEX files if you need them for testing/investigation.
My guess for the cause of the issue is that the STM32 on the RAK3172 module gets restarted at the moment a Flash save operation is happening and the data is not valid anymore. This invalid configuration causes the RAK3172 Firmware to get stuck.
If that’s the case I would recommend implementing some form of Flash Configuration CRC/verification, which the RAK3172 Firmware can validate it. If the CRC is invalid, the Firmware simply resets the configuration to keep the unit alive. A more elegant solution would have 2 Flash Configuration areas where the CRC is validated before switching between the two areas.
Unfortunately, I can’t think of a workaround for this problem. Although rare, the final product might be restarted at any time. If that coincides with the moment RAK3172 is saving the configuration the unit is bricked and there’s no way to be recovered in the field.
Please let me know if there’s something I can do to prevent this from occurring.
Thanks and Regards,