RAK3172 module hangs up

We are using the RAK3172 module (fw version v1.0.4_20220218) in our agrotech solution.
The module runs in the LoRaWAN mode.

What we are seeing is after a certain number of days in the field, the RAK3172 module freezes and hangs up. We can observe on a logic analyzer the host MCU (nRF52840) send the AT commands to the RAK3172 module. But we don’t see any response from the RAK module.

In order to get the RAK module back and working, we have to power cycle the whole device.

Is there any known condition or issue that can put the RAK module in this state? Is there any workaround for this problem if it is a known problem?

Welcome back to forum @adityakoparkar ,

v1.0.4 is the legacy firmware for RAK3172. We are now moving to RUI3.

We can try to figure out what possible scenario causes the issue but I suggest you migrate to RUI3 and see if the same situation will occur. The AT commands are almost the same for both firmware. You also have the option to run the firmware directly to RAK3172.

Hello Carl,
Thank you for the reply. We will start work to move to RUI3. But it will take some code changes and quite a bit of testing on our side.
In the mean time can you please help us with finding why the RAK module gets in this state?

We also found that when the RAK module gets in this state, we are not able to reset the module using the reset pin (RST pin 22). We need to physically remove the supercap that powers the whole board and only then the RAK module starts functioning fine. So it basically needs a power cycle to recover from this state.
Do you have any recommendations so that we can avoid getting the RAK module in this state?

Hi @adityakoparkar ,

Can you share me more info about your device like

  • Network server
  • Regional Band
  • ADR setting
  • Operating condition (always on? sleeping then upload in interval?)
  • Size of payload
  • Pin connections in module
  • Power source, voltage level.

These information will be helpful for troubleshooting.

Please find the answers below.

Network server
A: We use Loriot network server.

Regional Band
A: US915

ADR setting
A: We are not using ADR.

Operating condition (always on? sleeping then upload in interval?)
A: The device wakes up every 15 min. Sends LoRaWAN message and then goes back to sleep.

Size of payload
A: The size of payload is fixed and is not more than 11 bytes.

Pin connections in module
A: Please see the attached schematic. The boot pin is always held low.

Power source, voltage level.
A: The power source is a super cap on the board which feeds into 3.3 V DC-DC converter. The output of the DC-DC converted provides power to the RAK module.

Thanks for sharing the details @adityakoparkar .

All seems ok. I also discussed this to the team and didn’t find anything wrong. On how many modules this scenario occurs? With ADR off, what DR do you operate and what RSSI/SNR values?

We have few ideas though:

  • Unstable power supply (maybe fluctuating?)
  • Device goes to boot mode (seems the pin is connected to the host MCU and with high resistance pull down)

We have seen this problem with 18 modules.
I actually have a few devices back from the field where the RAK module is in this bad state.
We are using the slowest DR. We send the messages with SF 10 here in USA.
The RSSI values for devices vary from -90 to -112. There are multiple devices placed in the field.

  1. Unstable power supply.
    We use a DC-DC converter that provides a stable 3.3V supply to the BLE and RAK modules. When we receive back the devices from the field the BLE module seems to be working fine. The RAK module doesn’t communicate through its UART port. If we manually perform a hard reset on the RST pin of the RAK module, it still doesn’t get the device out of this bad state.
    If there was an unstable power supply problem, the RAK module would have recovered and would have reset. In our logs, we don’t see any symptoms of the device going through a reset.

  2. The boot mode pin as part of initialization is pulled low. That pin is not touched anywhere in the code. We measured the pin voltage on a multimeter on a device that had returned back from the field. The pin was held low.

Question: If the BOOT pin gets pulled high momentarily due to some external factors will that cause the RAK module to get in this state?

What really concerns me is the RAK module doesn’t honor the hard reset on the RST pin. This functionality is baked into the microcontroller. And so no matter the state of the firmware in the RAK module it should reset when the RST pin is pulled low.
Have your team seen this behavior in testing of RAK3172 module?

Hi @adityakoparkar ,

The issue seems to be present in many modules so I believe we should be able to replicate it in bench. Unless there is a condition we are not getting while it is outdoor - temperature fluctuation? voltage level changes (but seems it is stable in your setup)?

When the device is not responding to RST, that is very concerning because there is no intermediary component between the RST pin of RAK3172 and STM32WL inside it. The only thing I can’t think of is that when you try to reset, it goes to boot mode.

With regards to BOOT pin gets high, it is actually irrelevant when the RAK3172 already start executing the application code (already pass the bootloader mode). Of course unless there is a reset or fluctuation in supply causing different trigger points.

I am using RUI3 now on most of my RAK3172 usage. When I still use v1.0.4, I can’t remember I got issue on RAK3172 that it wont respond on hard reset. Whenever I reset, it runs normally as expected.

I was concerned with the RST pin behavior too.
We can detect this behavior in our application code and issue a reset to the RAK module. Unfortunately, that option is not available since the RST functionality is getting messed up.

Let’s say that the device is getting stuck in the bootloader mode. When I issue the reset it should get out of it and then start working. Is that correct? The RAK firmware should not go back to bootloader mode on reset if the BOOT pin is low.

We had deployed around 175 device in the field. And every week or so we are getting these failures. We are not able to see any pattern in the failure.

One more information point. We are seeing some really low RSSI values on certain devices. The network connectivity is not the best. We have seen RSSI as low as -115 and -119.
Do you think there is any connection with the RAK module getting in this state with network connectivity?

The module should not go to bootloader mode if the BOOT pin is in low state and with stable power source.

The remark on RSSI might be helpful. Specially if the modules that hangs are the ones having these levels. Btw, RSSI and SNR are two parameters we need to look at.

I know that you operate in DR0/SF10. This will limit your maximum payload. We have experience before (not for RAK3172, a different device) end-nodes hanging when operating at this datarate on US195. There seem to be a problem on piggybacking the MAC commands (FOpts) on the payload itself.

At this point, I can suggest to use DR1 (if possible) to accommodate all possible MAC commands and payload. There could be a chance that the LoRaWAN stack has conflict on handling bigger MAC commands on the low DR settings (with limited payload). These things is within the LoRaWAN stack itself.

Hello Carl,
We just found that the RST pin seems to be working. We had to pull it low for slightly more time.
The RAK module recovered after doing that and rebooted fine.
We still need to get to the bottom of the main issue though. What causes the RAK to get into this state.

I will check if we can use the DR1. The reason we use DR0 is because it allows us to send the data at longer distance compared to DR1.

Thanks,
Aditya

Tough problem for sure.
Now that you’ve discovered reset does sort of work, gets me wondering if a watch dog timer might be able to auto reset?
Would not solve the underlying problem but may buy some time, if not already in place. Of course there are power considerations there as well.

Hello, I experienced the same issue that @adityakoparkar reported, I was testing the RAK3172 in a condition monitoring solution and it stopped working for no apparent reason, I had to carry out a whole process to get it back into a normal operating mode (bootloader mode).

I just wanted to confirm if someone in the community found the root cause of this problem.

Thanks,

Welcome to the forum @arkdigit

What RUI3 version are you using? It could be also related to RAK3172 does not respond to AT commands after 48 days