RAK3172 does not respond to AT+ commands after 48 days

Hello guys,

We have 40 modules in our clients, with firmware 3.5.3, because when we performed the update, 3.5.3 was the last version.

The point is that they are all stopping 48 days after the first message.

Our product use a primary battery, use only AT commands, we did some tests, measure voltage, test the battery, voltage drop, everything as expected, but when the module stops responding, consumption up to 7mA.

We implemented a reset logic in case the module does not respond, we perform a reset, put down the reset pin (400 ms), but didn’t work, only when we remove the battery and put it again, did the module respond AT command. I’m very concerned about the reset pin not working.

We put the module in boot mode, and put down the reset pin, and the module goes out of boot mode.

Ok, we understand we use a beta release, but it’s possible this issue be in version 4.0.1?

Regards

1 Like

Hello @guiff

What you experience should not happen, I forwarded your report to our RUI3 team to check. Waiting for their response now.

Questions from the R&D team,

  • are you using LoRaWAN or LoRa P2P?
  • If LoRaWAN, which region?
  • If LoRa P2P, which frequency?
  • What are other LoRa settings (DR, confirmed messages, TX power, …)
  • What was the battery level when the devices stopped sending? If you are sending the battery level, what was reported on the last message received?
  • What is the interval you are sending LoRa/LoRaWAN packets?

Hi,

If i could join and support this topic, i noticed probably same issue as @guiff
my RAK3172 device with fw v3.5.3 was running for about 48-49 days when stopped responding to AT commands, only power cycle helped to get it work normally again.

device parameters:

  • LoRaWan, EU868 region
  • DR=0 (SF12)
  • Tx power = 16 dBm
  • Tx intverval = 10 minutes
  • Confirmed messages - once per 6 hours (every 36 message was confirmed)
  • battery level - normal (>3.8 V), using li-ion battery with LDO (3.3V, 250 mA)

Regards

2 Likes

Hi,

Devices parameters:

  • Lorawan, AU 915
  • ADR - ON
  • DR - defined by network
  • Default module tx power.
  • Confirmed message, need to confirm with our developers
  • Battery level indicates 3.6V (primary battery)
  • send messages every 1:15 hours

Regards

1 Like

Thanks for the information, some additional questions from our R&D team.

Can you give me the AT command sequence that you usually use for
(a) initial setup (after system powers up)
(b) commands used when the device is getting active (prepare to send a packet).

For the RESET not working, can you share the schematics with the connection between your host MCU and the RAK3172

Thanks.

In init process we sent. All these commands are asked for the state, and changed if necessary

AT+NWM=1
AT+NJM=1
AT+CLASS=A
AT+BAND=6
AT+MASK=0001
AT+ADR=1
AT+CFM=0
AT+LPM=1 (this implemented when we changed to 4.0.1)

Join - AT+JOIN=1:0:8:0
Send data - AT+SEND=2:12345678

Reset circuit, a 10k pull-up (R43), and 100nF (C33) capacitor, WL_nRST pin goes direct to our MCU

Thanks

1 Like

I am experiencing a similar issue. In my case, I compile a custom firmware using RUI3, so there are no AT commands involved. The devices stop working after more or less 6 weeks (not sure the exact time). I deployed multiple devices that were flashed at the same time, and they all stopped working at the same exact moment. These devices were in different locations.

Here are some details:

  • Lorawan, US 915, OTAA
  • The network is joined once after flashing the device
  • I use AWS IoT-Core as the server, WisGate Edge Pro as the gateways, configured as simple station)
  • Device is powered by two AA batteries

I was using fw v3.5.3 when it occurred. I since upgraded to v4.0.1 and re-flashed my devices. I am waiting to see if this may have been resolved in the new fw.

2 Likes

Hi @Matejisko , @guiff and @antoine

Thank you for the additional information. The team is on the problem with highest priority. But as the problem occurs after a long time of usage, it will be difficult to find the root cause. Please be patient with us.

Hi @beegee,

Thank you for your attention to this matter.

We have 30 products that are returning for maintenance with this issue. Before removing the battery, we can do some tests for R&D team.

Regards

@guiff
Thanks for the offer. I asked R&D team to check what tests could help.
You have access to JLink/SWD to connect to the devices?

This sounds vaguely like something else in the past (MS Windows issue perhaps). 48 days * 24 hours * 3600 seconds = 4.1472e+6 seconds, which won’t overflow an uint32_t, but if this is in milliseconds, it almost exactly would (calculate ln(48 * 24 * 3600 * 1000) / ln(2), it’s very close to 32 bits).

Is there a timestamp or timer with 1mS granularity stored in a uint32_t?

2 Likes

@guiff

Additional question from R&D: How is 3V3 generated? Could it be the 3V3 is not stable?

Can you try to remove R44 (if possible without shutting down the power) and check if RESET works after that? Inside the RAK3172 the BOOT0 line has already a 10k pull down, so R44 should not be required.

@beegee,

Yes, we have access to SWD pins.

We only use the primary battery, the schematic shows a 3V3 in the module power supply, but it is a small error, there is VBAT, and we tested it for possible power supply problems, and we did not see any instability. We checked if there could be any resistance on the power rails, or if it was possible to see any voltage drop.

I’ll try removing R43 and R43 without removing the battery, and see if the reset works.

@guiff

R43 is required. Remove only R44 please.

@beegee

I removed R44 and do a reset, not working. I held reset for 1 second and not working too
I sent “AT+BOOTSTATUS”, but not receive any response.
It’s only responding when remove battery and put again.

Our R&D think it has something to do with the RAM, so when you remove the battery, it loses the RAM data and goes back to normal, make sense to you?

We have a product with 39 days of operation, we believe it will stop on the 48th day, can we have a chance to check something in its memory, any counters?

I’d bet it’s a 32-bit value of milliseconds overflowing as I mention above. Way too close to be a coincidence; this works out to 2^32 / 1000mS / 3600 seconds / 24 hours → 49 days, 17h, 2m, 47 seconds

Slightly more than 48 days exactly but way too close to ignore.

2 Likes

@danak6jq It’s possible.

Additional information, this issue not occurs in 1.0.4, we have products running for 223 days.

2 Likes

I did another test, I took two products, one that the RAK3172 is not responding, and another that it is responding, the same FW in both. I measured the consumption without active low power mode, and performed a physical reset on both modules, both have a very similar consumption variation during the reset, the consumption in operation is the same, but one responds normally to AT commands, and the other does not respond. So I believe that the reset worked normally, but in the module that stops responding, he is not able to make it go back to normal operation.

I thought the same thing, in fact I manage timers in ms with uint_64

1 Like