@guiff
Thank you for that test, I think that is an important information.
To bad you are controlling the JOIN manually from your host MCU. It would be interesting to know if only the UART is malfunctioning and the device would auto-join the network.
For the other reporters of this issue. Does anyone of you have auto-join enabled and do you see the hanging device joining after a hard reset, but just not responding on the UART?
Yesterday, i discovered that another one of my devices running fw v3.5.3 stopped working after 49 days and around 15-18 hours. Taking into account the oscillator stability, everything around 49 days could have the same cause.
There is no pin that would tell you the status. The only thing that could show if it re-joins after a reset would be if join would be set to automatic and you would see the join request on the gateway or LNS.
But I understood from your commands, that you initiate the join process manually from your host MCU. If the UART does not work, it will never try to join.
During the last two days, 8 of my devices stopped working after 26-27 days of continuous operation. All of these devices were flashed with rui 4.0.0 + custom firmware, which only forces the baud rate to 9600. These devices were powered up within a few hours.
custom fw:
void setup(void)
{
// Force Serial to 9600 Baud
Serial.begin(9600);
}
void loop(void)
{
// No need for the loop, kill it
api.system.scheduler.task.destroy();
}
Another 3 devices with rui 3.5.3 (no custom fw) stopped working after 34 days of operation.
It is interesting that another 5 devices with RUI 4.0.0 + custom firmware are still alive, but they transmit every hour or so with low TX power and SF=7, compared to the previously mentioned devices with Txp=16 dBm, SF=12, and a transmit period of approximately 10 minutes.
Today, i checked my devices and all of them seems to operate normally. I use the Helium network and past days there was something like Network migration to Solana. Although I read somewhere that it should not affect the actual data transfer, most likely there was a system outage so i could not see the data transfers.
In reference to the 5 devices that I previously mentioned as working normally (with different settings…), I’d like to note that they are connected to my “data only” gateway, which was likely not affected by this migration.
For everyone who is experiencing this problem, I want to apologize for the inconvenience that this bug is causing all of you.
We found the root cause for the RAK3172 hanging after ~48 days.
We have tested the bug fix and will release patches for all RUI3 versions that have this bug.
The bug appears in the following RUI3 versions:
RUI3 V3.5.3
RUI3 V3.5.4
RUI3 V4.0.0
RUI3 V4.0.1
For users of the standard AT command firmware that have validated their product with one of these RUI3 versions, we will provide patches that will only fix this specific bug and do not change anything else.
For users of the RUI3 BSP, we will release new BSP versions for all affected versions.
Here is the list of the affected RUI3 versions and the new version numbers of the patched firmware and BSP:
That’s great news, we’ll test it out as soon as the AT+ version comes out, today we’ve chosen to use version 4.0.1, with a RAK3172 reset logic every 30 days.
It is a timer overflow (as Dana suggested), but the main problem is that the handler that should capture the overflow exception was not implemented correct. This led to a infinite loop that even disabled the hardware reset.
@beegee . Question?
To go from RUI3 3.5.3 to RUI3 4.0.3_233 BSD, What would be the steps for this?
i.e. Do we need to use STM32CubeProgrammer and complete erase and then flash with RAK3172-E_latest_final.hex?
Side question… does using a BSD compiled file completely update the RAK3172 to that RUI?