RAK5010 Reliability issues

Hi,

We are experimenting with the RAK5010 using v3.15 firmware.

The device is being very troublesome/unreliable and I’m not sure where to start troubleshooting this. I have managed to connect it successfully to LTE and send data to our server. I leave this and it can last anywhere from a few hours to a day, periodically sending data (as per the send_interval command).

After that, it simply stops and it appears as if the network connection has been completely broken. Also, the BG96 commands simply become very unresponsive.

As you can see here I have tried to run some simple status commands, and then the scan command, and the scan just hangs totally and never retrieves a result.

As mentioned, scan has previously picked up my network and connected to it correctly - but now it is just totally stopped. How can I debug this?

Hi @gbdsdhtrg ,

It seems the module disconnects to the network. Hmm. Maybe you can share more info:

  1. What is the power source of your RAK5010?
  2. What is the hose device sending the AT commands to RAK5010?
  3. Is the location the device is deployed, does it have good cellular signal strength? Have you tried using other network?
  4. How many modules shows the issue?
  5. When the issue occur, what is the status of the LEDs?
  6. When you reset the module, does it behave as you expect or still behaves with the same issue?

A small update, after waiting a while:

However the device is still very unresponsive and unpredictable. I am also not sure what to make of the CREG status? When it has worked prior I have seen it be 1,4.

As you can see I am also getting CME error 3, and I can’t get COPS? reliably to show me the settings.

Thanks for any help you can give!

  1. Powered by this battery and plugged in VIA usb https://download.mikroe.com/documents/datasheets/Li-Polymer%20Battery%203.7V%202000mAh.pdf
  2. Not sure if I understand you here but I’m sending commands via termite over USB
  3. Very good and reliable signal strength. Our sim card only supports the network but I wouldn’t expect network reliability to be the issue here.
  4. We are having trouble with both our RAK5010’s but the other one is being even more difficult and I don’t have access to it right now
  5. Upon reset the device mainly behaves the same, sluggish and won’t connect

Thanks! Let me know if there I can run any more commands to show you debug.

Very strange because I can get it to work sometimes, and it works perfectly for about a day and then becomes unuseable!

Thanks for giving more info @gbdsdhtrg .

The battery you are using is ok. I assume that is it charged well during your test.

I am thinking on the reliability of LTE-M1 coverage in your area. It seems your provider Telstra. Do you have other LTE-M1 device there that is working ok without any issue? Btw, the 4G LTE signal on your smartphones have different backend on the telco side.

On my RAK5010, I am connected via EGPRS using Hologram sim (I have no access to LTE-M and NB-IoT at the moment) and all is working fine.

Thanks @carlrowan

Battery is charged.

Coverage can be viewed here:

We are very well within the full LTE iot coverage area.

As I mentioned, when I have got the device to work it performs very well for about a day and then disconnects permanently and I have to run many different commands and it eventually just starts working again.

Please also review the following:
at+set_config=cellular:(AT+CSQ)
+CSQ: 31,99
at+set_config=cellular:(AT+COPS?)
+COPS: 0

Note also that the “unreliability” relates not just to the networking abilities, when I turn send_interval off, the device can become very unresponsive and frequently crash termite upon running commands.

Thanks again for any ideas you may have.

Hopefully I can give you a little more context here, when I run the following commands in the following order:

As you can see it is picking up the network just fine, my APN is configured, but when I run AT+COPS? after I run COPS=? for (the second time after scan), the device becomes totally unresponsive and termite freezes up.

Then, I reset the device using the on-board buttons. After boot and waiting 10 minutes, I run:

at+scan=cellular

And immediately I get the following:

+CME ERROR: 3
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ

We seem to have had luck doing total factory resets of the devices we are working with, but just want to confirm some things:

  • Is 150 seconds the absolute minimum interval that is supported?
  • Even when set to 150 second interval, the time in-between data can be anywhere from 3-6 minutes regardless of this setting
  • Sometimes on our server we see the following: Device connects up to the TCP server, waits a second or so, and then immediately disconnects without sending any data. Other times it works. What could be causing this?

For reference, we are using close-to-stock v3.15. We have made small modifications to point to our server etc. but it is mostly the same as original firmware.

Thanks.

Hi @gbdsdhtrg ,

I already raised your instability issues to our software team. I will get back to you for any update.

  1. Yes. The default minimum is 150 seconds. This can be configured via custom RUI2 built firmware.
  2. I am not sure where the delay is coming from. In my test, if I set the interval at 180seconds, I can get the data every 3 minutes with minimal difference.
  3. Still something to look at.

Btw, the RUI3 firmware for RAK5010 is available now but still in early stage. This will give you more control on your firmware. In case you are interested, you can check the documentation of AT commands here - AT Command Manual (Cellular) | RAKwireless Documentation Center

We are not yet moving the RAK5010 FW to RUI3 but this will be done soon since RUI2 development is already halted.

Hi @carlrowan ,

Great to hear thanks, because as-is I cannot see how anyone would be able to use the RAK5010 in a production environment. I’d really like to know where we are going wrong here.

RE #1. I’ve removed the check for “>150 sec” in the firmware, and then run set_interval to a nominal value (10 seconds for example). Despite this, the module only sends a position update at best every 3-5 minutes. Reception is very good where we are.

We currently have 2 RAK5010’s. We flashed them yesterday, and received constant position updates from them with success (slower than we would like but at least it works).

After about 18 hours, both of them just stopped working for no reason at around the same time (they are both still powered with sufficient battery + plugged in). The status lights still sometimes flash. They now no longer connect to the server at all. I have plugged in to termite and ran some commands, and it is very slow to respond now.
Eventually, after running a few commands it just responded:
ERROR
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ

And stopped responding all together to any new commands.

We are already using RUI v3,
at+version result is Version: RUI v3.0.0.15