RAK7289V2 lost connection to AWS IoT Core and WisDM but SSH works

Hi,
Our RAK7289V2 gateway is using WisGateOS 2.1.2 and is connected to AWS IoT Core as NS and to WisDM for monitoring.
Both the AWS and WisDM connections have been working fine when the gateway was connected via ethernet and also a short while when the gateway was connected via LTE and powered with solar panel and the RAK Battery Plus.

After running the gateway with 10 sensors communicating via the gateway for 1 day the gateway went offline. Both in AWS and in WisDM the gateway is seen as offline.
Restarting the gateway made it come online again for a short while before going offline again. During the short online time we could see that the battery was charging and that there was enough power to run the gateway, this we could se using the solar battery extension installed in the gateway as well as in WisDM.

In the gateway we also installed the open VPN extension and even if the gateway is seen as offline in AWS and WisDM the VPN client in the gateway is connected to our OpenVPN server becuase I can both ping the gateway and connect via SSH to it from my computer that is also connected to the same OpenVPN server.

Does anyone have any insights into how this can happen and what it is that has broken the gateway connection to AWS and WisDM?
Also does anyone have any advice for how to find the log files via SSH or how I can reboot the gateway via SSH to try to find out what has happened. I couldn’t find any log files when lookking around in the file system, at leats no log files that actually said anything of value.

We’re also seeing BasicsStation disconnects to AWS on our gateways.
(WisGateOS 2.1.1)
With netstat -t we can see only one connection to AWS and that’s probably the CUPS connection, while the LNS connection is gone / has been disconnected and does not get re-established by BasicsStation automatically.
Maybe as a workaround try to add this crontab check line with crontab -e:

# check number of (AWS) connections. Kill BasicsStation if it's only 1.
*/5 * * * * if [ `/bin/netstat -t -n | /bin/grep 443 | /usr/bin/wc -l` -eq 1 ]; then /usr/bin/killall station; fi > /dev/null

(made this line up on the spot, testing it now)

Hi,

Thanks for the information.

Since we see the exact same problem in WisDM in that when the gateway is reported as offline in WisDM it is also reported as offline in AWS and vice verse the problem with BasicStation not re-establishing the LNS connection must be the same for both WisDM and AWS.

I would appreciate if you could perform the tests you are mentioning with crontab since I have no idea on how to do this on our gateway and it would be better if I could get a fix for this via a WiseGateOS patch/version update.

There must be a lot of people who are experiencing this problem with their gateways.

Best regards

// Jakob

Hi Rafael,

I was just wondering if you have found out anything more about our gateway and if you have tested the command you suggested.

The gateway is reported as offline in WisDM since Saturday 18/3 but it should still be accessible via an SSH connection to it.

Best regards

// Jakob

The crontab command is in place at several of our Gateways and are still connected to AWS. So for us, it’s a workable solution for now.
From what I’ve heard is that there will be a firmware update to solve the underlying problem regarding connection. There’re a beta version, but we haven’t tested it yet.

Hi Rafael,

If this solution works as you say then can you please guide me how to put that in place in our gateway and how I can do it for other gateways?

Is it possible for you to write down the step/commands to issue to get the crontab command in place?

Best regards

// Jakob

Hi Rafael,

Any news on my request to get more information about how to setup the crontab command or if there is a new firmware fix for this that I can use?

Best regards

// Jakob

“crontab -e” will open up the editor. I think the actual file is in /etc/crontabs/root or something.
Update… while the connection stays “connected”, there is no data coming in anymore.
So this is not a good solution yet.
Best bet is to update the firmware to v2.1.4. I think that should fix it, but we also have to test that.
Last resort solution would be to inspect the packet counters and check there is actual LoRaWAN packets. (or a daily reboot)
For robustness, it’d be nice if a watchdog was installed too.

@WWJakob @Rafael Yes, WisGateOS2 2.1.4 have a fix for this behavior. There was an issue with the web sockets.