RAK7249 Gateway oom-killer Issue

Hi All, We have been developing a LoRaWAN network using RAK7249 Gateways and we are planning to deploy one gateway in one location. We installed ChirpStack and Mosquito MQTT broker in a VM and we are using MQTT Gateway bridge in RAK7249 to send data to our MQTT broker.

We observed that one Gateway is not sending the uplink frames to the MQTT broker and restarting the gateway fixed the issue. Once went through the logs in the sdcard, we found out that it has invoked oom-killer and it hasn’t send data to the MQTT broker after that.

Gateway Log:

Sat Dec 14 01:15:48 2019 kern.info quectel-CM[1368]: requestRegistrationState2 MCC: 0, MNC: 0, PS: Detached, DataCap: UNKNOW
Sat Dec 14 01:15:49 2019 kern.info quectel-CM[1368]: requestRegistrationState2 MCC: 413, MNC: 1, PS: Detached, DataCap: UNKNOW
Sat Dec 14 01:15:49 2019 kern.info quectel-CM[1368]: requestRegistrationState2 MCC: 413, MNC: 1, PS: Detached, DataCap: UNKNOW
Sat Dec 14 01:15:53 2019 kern.info quectel-CM[1368]: requestRegistrationState2 MCC: 413, MNC: 1, PS: Detached, DataCap: UNKNOW
Sat Dec 14 01:15:55 2019 daemon.info db[1658]: DataBase Backup
Sat Dec 14 01:16:23 2019 kern.warn kernel: [425482.020000] loragwbr invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0
Sat Dec 14 01:16:23 2019 kern.warn kernel: [425482.030000] CPU: 0 PID: 1651 Comm: loragwbr Not tainted 3.18.45 #1
Sat Dec 14 01:16:23 2019 kern.warn kernel: [425482.040000] Stack : 00000000 00000000 00000000 00000000 803741f2 00000036 00000000 00000000
Sat Dec 14 01:16:23 2019 kern.warn kernel: [425482.040000] 	  00000001 00000000 802caa50 80326643 00000673 80373420 861955a8 00000000
Sat Dec 14 01:16:23 2019 kern.warn kernel: [425482.040000] 	  80326730 80326854 00000000 80047748 00000006 800243d4 00000000 00000000
Sat Dec 14 01:16:23 2019 kern.warn kernel: [425482.040000] 	  802cdfbc 861d5b7c 00000000 00000000 00000000 00000000 00000000 00000000
Sat Dec 14 01:16:23 2019 kern.warn kernel: [425482.040000] 	  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Sat Dec 14 01:16:23 2019 kern.warn kernel: [425482.040000] 	  ...
Sat Dec 14 01:16:23 2019 kern.warn kernel: [425482.110000] Call Trace:
Sat Dec 14 01:16:23 2019 kern.warn kernel: [425482.120000] [<800141a8>] show_stack+0x50/0x84
Sat Dec 14 01:16:23 2019 kern.warn kernel: [425482.130000] [<8006deb0>] dump_header.isra.17+0x4c/0x13c
Sat Dec 14 01:16:23 2019 kern.warn kernel: [425482.140000] [<8006e218>] oom_kill_process+0xc8/0x3c8
Sat Dec 14 01:16:23 2019 kern.warn kernel: [425482.150000] [<8006e9b4>] out_of_memory+0x2f0/0x328
Sat Dec 14 01:16:23 2019 kern.warn kernel: [425482.160000] [<80071bb8>] __alloc_pages_nodemask+0x628/0x68c
Sat Dec 14 01:16:23 2019 kern.warn kernel: [425482.170000] [<8006d2cc>] filemap_fault+0x24c/0x438
Sat Dec 14 01:16:23 2019 kern.warn kernel: [425482.180000] [<80087ad4>] __do_fault+0x34/0x9c
Sat Dec 14 01:16:23 2019 kern.warn kernel: [425482.190000] [<8008aa64>] do_read_fault.isra.99+0x1a4/0x240
Sat Dec 14 01:16:23 2019 kern.warn kernel: [425482.200000] [<8008b2c4>] handle_mm_fault+0x4d8/0x998
Sat Dec 14 01:16:23 2019 kern.warn kernel: [425482.210000] [<800188fc>] __do_page_fault+0x168/0x3c8
Sat Dec 14 01:16:23 2019 kern.warn kernel: [425482.220000] [<80004820>] ret_from_exception+0x0/0x10
Sat Dec 14 01:16:23 2019 kern.warn kernel: [425482.230000] 
Sat Dec 14 01:16:23 2019 kern.warn kernel: [425482.230000] Mem-Info:
Sat Dec 14 01:16:23 2019 kern.warn kernel: [425482.240000] Normal per-cpu:
Sat Dec 14 01:16:23 2019 kern.warn kernel: [425482.240000] CPU    0: hi:   42, btch:   7 usd:  14
Sat Dec 14 01:16:23 2019 kern.warn kernel: [425482.250000] active_anon:25681 inactive_anon:35 isolated_anon:0
Sat Dec 14 01:16:23 2019 kern.warn kernel: [425482.250000]  active_file:176 inactive_file:208 isolated_file:0
Sat Dec 14 01:16:23 2019 kern.warn kernel: [425482.250000]  unevictable:0 dirty:1 writeback:0 unstable:0
Sat Dec 14 01:16:23 2019 kern.warn kernel: [425482.250000]  free:366 slab_reclaimable:185 slab_unreclaimable:1386
Sat Dec 14 01:16:23 2019 kern.warn kernel: [425482.250000]  mapped:12 shmem:4359 pagetables:146 bounce:0
Sat Dec 14 01:16:23 2019 kern.warn kernel: [425482.250000]  free_cma:0
Sat Dec 14 01:16:23 2019 kern.warn kernel: [425482.320000] Normal free:1432kB min:1420kB low:1772kB high:2128kB active_anon:102724kB inactive_anon:140kB active_file:704kB inactive_file:832kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:131072kB managed:126336kB mlocked:0kB dirty:4kB writeback:0kB mapped:48kB shmem:17436kB slab_reclaimable:740kB slab_unreclaimable:5544kB kernel_stack:480kB pagetables:584kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimaSat Dec 14 01:16:23 2019 kern.warn kernel: [425482.400000] lowmem_reserve[]: 0 0
Sat Dec 14 01:16:23 2019 kern.warn kernel: [425482.400000] Normal: 236*4kB (UER) 35*8kB (ER) 1*16kB (R) 3*32kB (R) 1*64kB (R) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 1400kB
Sat Dec 14 01:16:23 2019 kern.warn kernel: [425482.430000] 4743 total pagecache pages
Sat Dec 14 01:16:23 2019 kern.warn kernel: [425482.440000] 0 pages in swap cache
Sat Dec 14 01:16:23 2019 kern.warn kernel: [425482.440000] Swap cache stats: add 0, delete 0, find 0/0
Sat Dec 14 01:16:23 2019 kern.warn kernel: [425482.450000] Free swap  = 0kB
Sat Dec 14 01:16:23 2019 kern.warn kernel: [425482.460000] Total swap = 0kB
Sat Dec 14 01:16:23 2019 kern.warn kernel: [425482.470000] 32768 pages RAM
Sat Dec 14 01:16:23 2019 kern.warn kernel: [425482.470000] 0 pages HighMem/MovableOnly
Sat Dec 14 01:16:23 2019 kern.warn kernel: [425482.480000] 1184 pages reserved
Sat Dec 14 01:16:23 2019 kern.info kernel: [425482.490000] [ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name
Sat Dec 14 01:16:23 2019 kern.info kernel: [425482.500000] [  485]     0   485      226       20       4        0             0 ubusd
Sat Dec 14 01:16:23 2019 kern.info kernel: [425482.520000] [  545]     0   545      192       15       3        0             0 askfirst
Sat Dec 14 01:16:23 2019 kern.info kernel: [425482.530000] [ 1195]     0  1195      376       20       4        0             0 syslog.sh
Sat Dec 14 01:16:23 2019 kern.info kernel: [425482.550000] [ 1204]     0  1204      302       37       3        0             0 logread
Sat Dec 14 01:16:23 2019 kern.info kernel: [425482.570000] [ 1230]     0  1230      264       39       3        0             0 logd
Sat Dec 14 01:16:23 2019 kern.info kernel: [425482.580000] [ 1239]     0  1239      455       47       5        0             0 rpcd
Sat Dec 14 01:16:23 2019 kern.info kernel: [425482.600000] [ 1368]     0  1368      777       27       4        0             0 quectel
Sat Dec 14 01:16:23 2019 kern.info kernel: [425482.620000] [ 1403]     0  1403      402       57       3        0             0 netifd
Sat Dec 14 01:16:23 2019 kern.info kernel: [425482.630000] [ 1458]     0  1458      377       21       5        0             0 crond
Sat Dec 14 01:16:23 2019 kern.info kernel: [425482.650000] [ 1492]     0  1492      375       19       3        0             0 udhcpc
Sat Dec 14 01:16:23 2019 kern.info kernel: [425482.660000] [ 1532]     0  1532      287       20       3        0             0 dropbear
Sat Dec 14 01:16:23 2019 kern.info kernel: [425482.680000] [ 1575]     0  1575      622      126       4        0             0 uhttpd
Sat Dec 14 01:16:23 2019 kern.info kernel: [425482.700000] [ 1590]     0  1590      194       18       5        0             0 rteswplugd
Sat Dec 14 01:16:23 2019 kern.info kernel: [425482.710000] [ 1598]   200  1598      887       59       4        0             0 mosquitto
Sat Dec 14 01:16:23 2019 kern.info kernel: [425482.730000] [ 1628]     0  1628      566       35       4        0             0 lora_pkt_fwd
Sat Dec 14 01:16:23 2019 kern.info kernel: [425482.750000] [ 1630]     0  1630     5701       96       9        0             0 lora_pkt_fwd
Sat Dec 14 01:16:23 2019 kern.info kernel: [425482.760000] [ 1650]     0  1650     1271       61       5        0             0 loragwbr
Sat Dec 14 01:16:23 2019 kern.info kernel: [425482.780000] [ 1651]     0  1651    21639    20401      25        0             0 loragwbr
Sat Dec 14 01:16:23 2019 kern.info kernel: [425482.800000] [ 1656]     0  1656     1271       61       5        0             0 lorasrv
Sat Dec 14 01:16:23 2019 kern.info kernel: [425482.810000] [ 1658]     0  1658     1362      124       6        0             0 lorasrv
Sat Dec 14 01:16:23 2019 kern.info kernel: [425482.830000] [ 1693]     0  1693      376       21       4        0             0 ntpd
Sat Dec 14 01:16:23 2019 kern.info kernel: [425482.850000] [ 1700]     0  1700      203       21       4        0             0 mpstat
Sat Dec 14 01:16:23 2019 kern.info kernel: [425482.860000] [ 1735]     0  1735      375       19       5        0             0 udhcpc
Sat Dec 14 01:16:23 2019 kern.info kernel: [425482.880000] [ 1878] 65534  1878      247       29       5        0             0 dnsmasq
Sat Dec 14 01:16:23 2019 kern.info kernel: [425482.890000] [15157]     0 15157      373       16       3        0             0 sleep
Sat Dec 14 01:16:23 2019 kern.info kernel: [425482.910000] [15525]     0 15525     1362      123       5        0             0 lorasrv
Sat Dec 14 01:16:23 2019 kern.info kernel: [425482.930000] [15531]     0 15531     5701       93       9        0             0 lora_pkt_fwd
Sat Dec 14 01:16:23 2019 kern.err kernel: [425482.940000] Out of memory: Kill process 1651 (loragwbr) score 627 or sacrifice child
Sat Dec 14 01:16:23 2019 kern.err kernel: [425482.960000] Killed process 1651 (loragwbr) total-vm:86556kB, anon-rss:81580kB, file-rss:24kB
Sat Dec 14 01:16:23 2019 kern.info quectel-CM[1368]: requestRegistrationState2 MCC: 413, MNC: 1, PS: Detached, DataCap: UNKNOW
Sat Dec 14 01:16:24 2019 daemon.info loraSrv[1650]: child process 1651 exited  due to signal 9
Sat Dec 14 01:16:24 2019 daemon.err mqttEv[1658]: mqttEvDisconnectCB : MqttClient [gwBridgeS Mqtt Client] Disconnected : [1]
Sat Dec 14 01:16:24 2019 daemon.err mqttEv[1658]: mqttEvDisconnectCB : MqttClient [gwBridgeS Mqtt Client] Disconnected : [1]
Sat Dec 14 01:16:24 2019 daemon.err mqttEv[1658]: mqttLoopMiscTimerHandler: mosquitto [[gwBridgeS Mqtt Client]] loop misc ERR: 7
Sat Dec 14 01:16:24 2019 daemon.err mqttEv[1658]: mqttEvDisconnectCB : MqttClient [AppSrv Mqtt Client] Disconnected : [7]
Sat Dec 14 01:16:24 2019 daemon.err mqttEv[1658]: mqttEvRdReady: mosquitto [[AppSrv Mqtt Client]] loop read ERR: 7
Sat Dec 14 01:16:24 2019 daemon.info mqttEv[1658]: mqttEvRegister : Register mosquitto event [gwBridgeS Mqtt Client]
Sat Dec 14 01:16:24 2019 daemon.info gwBridge[1658]: mosqConnectCB: Mqtt Connect Success ...
Sat Dec 14 01:16:25 2019 daemon.info mqttEv[1658]: mqttEvRegister : Register mosquitto event [AppSrv Mqtt Client]
Sat Dec 14 01:16:25 2019 daemon.info appSrv[1658]: appMqttConnectCB: [AppSrv Mqtt Client] Connect Success ...
Sat Dec 14 01:16:27 2019 kern.info quectel-CM[1368]: requestRegistrationState2 MCC: 413, MNC: 1, PS: Detached, DataCap: UNKNOW
Sat Dec 14 01:16:32 2019 kern.info quectel-CM[1368]: requestRegistrationState2 MCC: 413, MNC: 1, PS: Detached, DataCap: UNKNOW
Sat Dec 14 01:16:37 2019 kern.info quectel-CM[1368]: requestRegistrationState2 MCC: 413, MNC: 1, PS: Detached, DataCap: UNKNOW
Sat Dec 14 01:16:37 2019 kern.info quectel-CM[1368]: requestSetOperatingMode(1)
Sat Dec 14 01:16:37 2019 kern.info quectel-CM[1368]: requestSetOperatingMode(0)

Has anyone else faced this issue? Since we don’t have redundancy, we want to make sure that this will not happen in the future. Thanks in advance.

Can anyone help me or point me in the right direction?

Hi. What firmware version is installed on Gateway?

Hi velev,
It was in the “V1.1.0054_Release_r190” at that time. But we have updated the firmware now. Have you done any changes related to that in the new firmware? I couldn’t find anything specifically related to this on Release Notes.

The strange think here is that i can not reproduce this situation. Can you send me a long log, for 10-15 minutes for example. You can add it here or send it to [email protected]

Hi velev,
I sent the complete log to your email. Thanks for your time.

@gavin.woods Looking at the logs, it seem the network was gone at that time and system was trying to make the backup. I know there is a feature if LTE network goes down then it backups the packets to send it later. May be in that case @velev not able to reproduce the situation. I hope this may give some direction to get you some help.