RAK4631-DeepSleep-LoRaWan not joining network

Using the example code with serial output enabled (MAX_SAVE commented out), I see the following serial output:

09:09:17.533 -> =====================================
09:09:17.533 -> RAK4631 LoRaWan Deep Sleep Test
09:09:17.533 -> =====================================
09:09:17.566 -> <BRD> SyncWord = 2414
09:09:17.566 -> <BRD> LoRa Task started
09:09:17.566 -> <LMH> OTAA 
09:09:17.566 -> DevEui=<REDACTED>
09:09:17.566 -> DevAdd=00000000
09:09:17.566 -> AppEui=<REDACTED>
09:09:17.566 -> AppKey=<REDACTED>
09:09:17.666 -> <LMH> Selected subband 1
09:09:17.666 -> <LMH> Selected subband 1
09:09:17.666 -> Starting LoRaWan task
09:09:17.666 -> Start network join request
09:10:22.592 -> LoRaWan init success

We can see that it took a full minute to go from Start network join request to LoRaWan init success. But even after init success it still hangs as it has not joined.

Looking at the gateway traffic on the TTNv3 console, I see a join request at 9:10:22, and no retries yet (5 minutes) after that.

On the device side I can see the JoinRequest was received, and a JoinAccept was sent at 9:10:24.

In the above feed we need to ignore the duplicate JoinRequests that were received by other gateways. Only the first one is processed.

What likely happens here is that the JoinAccept was either late, or the Wisblock did not receive it.

What I want to see:

  • The JoinRequest should be sent immediately, not a full minute after startup. As currently we have the board not sleeping for a full minute.
  • The Join should be retried if it failed.

Currently it’s 15 minutes later and the Wisblock still didn’t retry the join, and the serial output is still the same as above.

As a comparison I loaded the LoRaWAN_OTAA_ABP example, using the same keys and OTAA. It joins immediately and successfully directly after startup. This makes me think there is an issue with the DeepSleep example.

RAK4631 is running latest bootloader, and updated BSP and sx126x library (2.0.0).

LoRaWAN_OTAA_ABP output, in which we can see the immediate successful join:

09:25:51.731 -> =====================================
09:25:51.731 -> Welcome to RAK4630 LoRaWan!!!
09:25:51.731 -> Type: OTAA
09:25:51.731 -> Region: EU868
09:25:51.731 -> =====================================
09:25:51.731 -> <LMH> OTAA 
09:25:51.731 -> DevEui=<REDACTED>
09:25:51.731 -> DevAdd=00000000
09:25:51.731 -> AppEui=<REDACTED>
09:25:51.731 -> AppKey=<REDACTED>
09:25:51.798 -> <LMH> Selected subband 1
09:25:51.864 -> <LM> OnRadioTxDone
09:25:51.897 -> <LM> OnRadioTxDone => RX Windows #1 4989 #2 5995
09:25:51.897 -> <LM> OnRadioTxDone => TX was Join Request
09:25:56.942 -> <LM> OnRadioRxDone
09:26:17.949 -> <LM> OnRadioRxDone => FRAME_TYPE_JOIN_ACCEPT
09:26:17.949 -> OTAA Mode, Network Sending frame now...
09:26:17.982 -> lmh_send ok count 1
09:26:19.279 -> <LM> OnRadioTxDone
09:26:19.279 -> <LM> OnRadioTxDone => RX Windows #1 5030 #2 5995
09:26:25.484 -> <LM> OnRadioRxDone
09:26:25.484 -> <LM> OnRadioRxDone => FRAME_TYPE_DATA_(UN)CONFIRMED_DOWN
09:26:25.484 -> <LM> Calling MacMcpsIndication

@jpmeijers
Sorry, the RAK4631-DeepSleep-LoRaWAN example needs an update. But I didn’t find the time yet to do it.

As a quick patch, in lora_handler.cpp comment out the lines 149 to 159:

	// In deep sleep we need to hijack the SX126x IRQ to trigger a wakeup of the nRF52
	// attachInterrupt(PIN_LORA_DIO_1, loraIntHandler, RISING);

	// Start the task that will handle the LoRaWan events
// #ifndef MAX_SAVE
// 	Serial.println("Starting LoRaWan task");
// #endif
	// if (!xTaskCreate(loraTask, "LORA", 2048, NULL, TASK_PRIO_LOW, &loraTaskHandle))
	// {
	// 	return -4;
	// }

Since library version 2.0.0 there is no need to run a background task anymore. The library takes care of this already.

I flashed the original example code from the RAK4631-DeepSleep-LoRaWAN, with the changes above and (of course) adjusting the Region (I am in AS923-3) and the DevEUI and AppEUI.
The RAK4631 connects within seconds to the server and sends data after 2 minutes (the default sleep time in the original code). However, I am not using TTN V3, I am testing with Chirpstack server.

The time between calling Start network join request and LoRaWan init success is ~300 ms.

Log from Arduino Serial Terminal:

16:17:21.396 -> =====================================
16:17:21.396 -> RAK4631 LoRaWan Deep Sleep Test
16:17:21.396 -> =====================================
16:17:21.531 -> <BRD> SyncWord = 2414
16:17:21.531 -> <BRD> LoRa Task started
16:17:21.531 -> <LMH> OTAA 
16:17:21.531 -> DevEui=AC-1F-09-FF-FE-03-EF-06
16:17:21.531 -> DevAdd=00000000
16:17:21.531 -> AppEui=70-B3-D5-7E-D0-02-01-E1
16:17:21.531 -> AppKey=2B-84-E0-B0-9B-68-E5-CB-42-17-6F-E7-53-DC-EE-79
16:17:21.625 -> <LMH> Selected subband 1
16:17:21.625 -> <AS923> Switch to Version 3
16:17:21.625 -> <AS923> CH0 - TX 916.6
16:17:21.625 -> <AS923> CH1 - TX 916.8
16:17:21.625 -> <AS923> CH2 - TX 917.0
16:17:21.625 -> <AS923> CH3 - TX 917.2
16:17:21.625 -> <AS923> CH4 - TX 917.4
16:17:21.625 -> <AS923> CH5 - TX 917.6
16:17:21.625 -> <AS923> CH6 - TX 917.8
16:17:21.625 -> <AS923> CH7 - TX 918.0
16:17:21.625 -> <AS923> RX2 916.6
16:17:21.625 -> <LMH> Using AS923-3
16:17:21.625 -> <LMH> Selected subband 1
16:17:21.625 -> Starting LoRaWan task
16:17:21.625 -> Start network join request
16:17:21.998 -> LoRaWan init success
16:17:27.036 -> <LM> OnRadioTxDone => RX Windows #1 5002 #2 6002
16:17:27.036 -> <LM> OnRadioTxDone => TX was Join Request
16:17:27.036 -> <RADIO> RX window timeout = 3000
16:17:27.456 -> <LM> OnRadioRxDone RSSI -34 SNR 10
16:17:27.456 -> <LM> OnRadioRxDone => FRAME_TYPE_JOIN_ACCEPT
16:17:27.456 -> <AS923> Apply CF list: new channel at Freq = 917000000
16:17:27.456 -> <AS923> Apply CF list: new channel at Freq = 917200000
16:17:27.456 -> <AS923> Apply CF list: new channel at Freq = 917400000
16:17:27.456 -> <AS923> Apply CF list: new channel at Freq = 917600000
16:17:27.456 -> <AS923> Apply CF list: new channel at Freq = 917800000
16:17:27.456 -> OTAA joined and got dev address 01279E1F
16:19:27.965 -> Timer wakeup
16:19:28.011 -> LoRaWan package sent successfully
16:19:29.173 -> <LM> OnRadioTxDone => RX Windows #1 995 #2 2002
16:19:29.173 -> <RADIO> RX window timeout = 3000
16:19:29.360 -> <LM> OnRadioRxDone RSSI -34 SNR 11

Image from gateway log screen:

Join request was sent at 16:17:27.036
Join request was seen on gateway at 16:17:23
Join accept was sent from gateway at 16:17:23
Join accept was received on the node at 16:17:27.456
Please ignore the 4 seconds difference between the PC time and the gateway time.

It doesn’t look like this works for me. I also started with a clean DeepSleep example, changed MAX_SAVE, added the three keys, commented out the code like you suggest.

Still takes a full minute before I see the JoinRequest. JoinAccept is sent by the network but Wisblock doesn’t do anything. It just hangs there. No periodic transmits or Join retries.

Output:

11:09:56.528 -> =====================================
11:09:56.528 -> RAK4631 LoRaWan Deep Sleep Test
11:09:56.528 -> =====================================
11:09:56.560 -> <BRD> SyncWord = 2414
11:09:56.560 -> <BRD> LoRa Task started
11:09:56.560 -> <LMH> OTAA 
11:09:56.560 -> DevEui=<REDACTED>
11:09:56.560 -> DevAdd=00000000
11:09:56.560 -> AppEui=<REDACTED>
11:09:56.560 -> AppKey=<REDACTED>
11:09:56.626 -> <LMH> Selected subband 1
11:09:56.626 -> <LMH> Selected subband 1
11:09:56.626 -> Start network join request
11:11:01.580 -> LoRaWan init success
// no more output after this, and it's now 11:20

Just finished updating the code and testing it again and it works fine on my side.

@carlrowan
I know you can test on EU868, can you try? I cannot find any problems when testing on AS923-3.

Hi @jpmeijers ,

I tested EU868 on both Chirpstack and TTS (TTN V3).

It worked all fine from MAC V1.0.2 up to MAC V1.0.4.

Looking at your TTS console, you have issues on Uplink is a duplicate and DevNonce has already been used.

Can you try to rejoining using different appkey? I am thinking the devnonce is already monitored by the TTS backend.

Hi @carlrowan

I had the same thought, but the duplicate uplink and used nonce are both related to me having multiple gateways around. Some of the gateways have high latency. This causes the same uplink to be seen multiple times by the network. The delayed uplinks are correctly seen as duplicates, and therefore also already used nonces.

  • First JoinRequest is received
  • Join accept sent for this DevNonce
  • Delayed duplicate of JoinRequest received
  • DevNonce already used, ignored

This however do not explain why this wisblock takes a full minute after startup before it starts joining.

I’ll download fresh copies of the BSP and sx126x library and test again.

I tried again today, and I’m seeing the same problem still.

The LoRaWan_OTAA_ABP example joins immediately after startup and then transmits uplinks periodically as expected:

11:42:39.371 -> =====================================
11:42:39.371 -> Welcome to RAK4630 LoRaWan!!!
11:42:39.371 -> Type: OTAA
11:42:39.371 -> Region: EU868
11:42:39.371 -> =====================================
11:42:39.371 -> <LMH> OTAA 
11:42:39.371 -> DevEui=7D-80-DA-86-A8-3D-4D-85
11:42:39.371 -> DevAdd=00000000
11:42:39.371 -> AppEui=7D-80-DA-86-A8-3D-4D-85
11:42:39.371 -> AppKey=7D-80-DA-86-A8-3D-4D-85-7D-80-DA-86-A8-3D-4D-85
11:42:39.428 -> <LMH> Selected subband 1
11:42:39.528 -> <LM> OnRadioTxDone
11:42:39.528 -> <LM> OnRadioTxDone => RX Windows #1 4989 #2 5995
11:42:39.528 -> <LM> OnRadioTxDone => TX was Join Request
11:42:44.599 -> <LM> OnRadioRxDone
11:42:44.599 -> <LM> OnRadioRxDone => FRAME_TYPE_JOIN_ACCEPT
11:42:44.599 -> OTAA Mode, Network Joined!
11:43:05.587 -> Sending frame now...
11:43:05.620 -> lmh_send ok count 1
11:43:06.913 -> <LM> OnRadioTxDone
11:43:06.913 -> <LM> OnRadioTxDone => RX Windows #1 5030 #2 5995
11:43:13.112 -> <LM> OnRadioRxDone
11:43:13.112 -> <LM> OnRadioRxDone => FRAME_TYPE_DATA_(UN)CONFIRMED_DOWN
11:43:13.112 -> <LM> Calling MacMcpsIndication
11:43:25.576 -> Sending frame now...
11:43:25.609 -> lmh_send ok count 2
11:43:25.709 -> <LM> OnRadioTxDone
11:43:25.709 -> <LM> OnRadioTxDone => RX Windows #1 4991 #2 5995
11:43:30.782 -> <LM> OnRadioRxDone
11:43:30.782 -> <LM> OnRadioRxDone => FRAME_TYPE_DATA_(UN)CONFIRMED_DOWN
11:43:30.782 -> <LM> Calling MacMcpsIndication

Note in the above log how it takes 5 seconds for the device to join.

The RAK4631_DeepSleep_LoRaWAN example (with if (!xTaskCreate(loraTask,... commented out) takes a minute before it tries to join, the join seems to fail, and then it gets stuck, never transmitting again:

11:34:51.280 -> =====================================
11:34:51.280 -> RAK4631 LoRaWan Deep Sleep Test
11:34:51.280 -> =====================================
11:34:51.313 -> <BRD> SyncWord = 2414
11:34:51.313 -> <BRD> LoRa Task started
11:34:51.313 -> <LMH> OTAA 
11:34:51.313 -> DevEui=7D-80-DA-86-A8-3D-4D-85
11:34:51.313 -> DevAdd=00000000
11:34:51.313 -> AppEui=7D-80-DA-86-A8-3D-4D-85
11:34:51.313 -> AppKey=7D-80-DA-86-A8-3D-4D-85-7D-80-DA-86-A8-3D-4D-85
11:34:51.379 -> <LMH> Selected subband 1
11:34:51.379 -> <LMH> Selected subband 1
11:34:51.379 -> Start network join request
11:35:55.532 -> LoRaWan init success

Arduino IDE 1.8.13
RAKwireless nRF Boards 0.21.20
SX126x-Arduino 2.0.0

As a sanity check I took a fresh RAK4631 out of a packet, updated the firmware, and wrote the DeepSleep example to it. The behaviour is the same, where a join only happens after 1 minute, and the firmware hangs after printing LoRaWan init success.

New board’s output:

17:19:59.307 -> =====================================
17:19:59.307 -> RAK4631 LoRaWan Deep Sleep Test
17:19:59.307 -> =====================================
17:19:59.354 -> <BRD> SyncWord = 2414
17:19:59.354 -> <BRD> LoRa Task started
17:19:59.354 -> <LMH> OTAA 
17:19:59.354 -> DevEui=54-6A-AE-19-3E-E5-F2-FD
17:19:59.354 -> DevAdd=00000000
17:19:59.354 -> AppEui=54-6A-AE-19-3E-E5-F2-FD
17:19:59.354 -> AppKey=54-6A-AE-19-3E-E5-F2-FD-54-6A-AE-19-3E-E5-F2-FD
17:19:59.421 -> <LMH> Selected subband 1
17:19:59.421 -> <LMH> Selected subband 1
17:19:59.421 -> Start network join request
17:21:04.340 -> LoRaWan init success

TTNv3 non-verbose logs

Because the new board is not in its enclosure, I am able to see the LEDs better. I’m seeing the RED LED pulsing brighter sometimes. This is in the current state after LoRaWan init success is printed.

Hello @jpmeijers @carlrowan
The red LED is the charge indicator, it is not controlled by the MCU, only by the charger chip. It flickers while charging or if no battery is attached.

More important, I found the 1 minute delay.
In lora_handler.cpp look for

static lmh_param_t lora_param_init = {LORAWAN_ADR_OFF, DR_3, LORAWAN_PUBLIC_NETWORK, JOINREQ_NBTRIALS, LORAWAN_DEFAULT_TX_POWER, LORAWAN_DUTYCYCLE_ON};

And change LORAWAN_DUTYCYCLE_ON to LORAWAN_DUTYCYCLE_OFF.
This removes the 1 minute delay for me.

Why on EU868 LORAWAN_DUTYCYCLE_ON triggers a 1 minute delay is something I still have to find out. It looks like the DutyCycle is enforced and the sending is delayed by 1 minute. Looking into it right now.

1 Like

FWIW, I’m undergoing a similar exercise in understanding why someone else’s module based on the Semtech code has similar behaviour issues.

@nmcc My library is based on Semtech’s code as well.
So far I could only find that if DutyCycle is enabled, it delays the first join request by 65 seconds. Still trying to find out why it does it. And it seems to be linked to EU868.

My ‘mess’ is that it joins promptly but then the module concerned silently drops the first uplink if it’s too soon after the join. And then there is another mandatory delay required before I can send the second uplink.

This makes sending a boot/config/status message and then first set of sensor data a bit tricky.

I either want feedback to know it didn’t actually send or to know exactly what delay the firmware can tolerate - but it’s not RAK so it’s not an issue for here as such, more of general concern for those using LoRaMac-node.

Chaning to LORAWAN_DUTYCYLE_OFF indeed makes the JoinRequest being sent much sooner. But the join still doesn’t succeed. One can also see that the LoRaWan init success is not printed to Serial (at least not within 10 minutes).

18:55:08.555 -> =====================================
18:55:08.555 -> RAK4631 LoRaWan Deep Sleep Test
18:55:08.555 -> =====================================
18:55:08.589 -> <BRD> SyncWord = 2414
18:55:08.589 -> <BRD> LoRa Task started
18:55:08.589 -> <LMH> OTAA 
18:55:08.589 -> DevEui=54-6A-AE-19-3E-E5-F2-FD
18:55:08.589 -> DevAdd=00000000
18:55:08.589 -> AppEui=54-6A-AE-19-3E-E5-F2-FD
18:55:08.589 -> AppKey=54-6A-AE-19-3E-E5-F2-FD-54-6A-AE-19-3E-E5-F2-FD
18:55:08.655 -> <LMH> Selected subband 1
18:55:08.655 -> <LMH> Selected subband 1
18:55:08.655 -> Start network join request
// no further output for >10 minutes

To eliminate another variable I am now in a location with only a single gateway connected to TTN. That also didn’t change the behaviour.

DeepSleep vs normal example

The normal example joins immediately, and joins successfully. Only the DeepSleep example has the issue - and also only since the the BSP and sx126x update. Both of these examples use the same lora configuration and library. Why does the deepsleep fail and the normal one not? Has to be the deepsleep code?

Does the DeepSleep example disable the timer used for the LoRaWAN timing? Or maybe change its period?

Normal example output:

19:06:17.486 -> =====================================
19:06:17.486 -> Welcome to RAK4630 LoRaWan!!!
19:06:17.486 -> Type: OTAA
19:06:17.486 -> Region: EU868
19:06:17.486 -> =====================================
19:06:17.519 -> <LMH> OTAA 
19:06:17.519 -> DevEui=54-6A-AE-19-3E-E5-F2-FD
19:06:17.519 -> DevAdd=00000000
19:06:17.519 -> AppEui=54-6A-AE-19-3E-E5-F2-FD
19:06:17.519 -> AppKey=54-6A-AE-19-3E-E5-F2-FD-54-6A-AE-19-3E-E5-F2-FD
19:06:17.585 -> <LMH> Selected subband 1
19:06:17.654 -> <LM> OnRadioTxDone
19:06:17.654 -> <LM> OnRadioTxDone => RX Windows #1 4989 #2 5995
19:06:17.654 -> <LM> OnRadioTxDone => TX was Join Request
19:06:22.731 -> <LM> OnRadioRxDone
19:06:22.731 -> <LM> OnRadioRxDone => FRAME_TYPE_JOIN_ACCEPT
19:06:22.731 -> OTAA Mode, Network Joined!

Hi @jpmeijers ,

Can you send me the sketch that you are using at [email protected]? I’ll change the EUI/Key and try it.

Mine is working ok and not stopping. It does then the 2minute interval uplink. Though I have the same issue on 1minute delay before joining but it is gone now when the dutycyle is set to off.

It tested it in two RAK4631 (using same keys/EUI).

1 Like

Hi @carlrowan

I’ve sent it to you, although it is just the example with the lorawan task commented out, serial debug enabled, and my keys added. But please try it on your side and let me know.

I’ve also included a compiled hex version which you can try. It is possible that we are using different versions of the toolchain, even though mine says it is fully updated.

Hi @jpmeijers ,

I tried your example and it doesn’t work to me. Got the same issue you experienced. Have you tried the modifications suggested above by @beegee ? Those works for me and solve the issue. Sorry I wasn’t able to track the changes and I thought we are aligned in the code.

Here is the code that works for me. Can you try? You can change the EUIs and Key.

Hi @jpmeijers

I looked into the source code you sent to @carlrowan.
There is one line, that I asked you to change, but I think you missed it:

Running the code without this line commented out results in hanging. Commenting it or removing it and it worked for me.

Btw. I updated the WisBlock Repo example 4 days ago.

@beegee ahh yes, I thought I was only supposed to comment out the xTaskCreate lines. Thanks for spotting that mistake!

Repo update: Arduino hasn’t picked up an update yet. I also did not see any changes to GitHub - RAKWireless/RAKwireless-Arduino-BSP-Index: RAKwireless BSP Support for the Arduino Board Manager. Should I manually install the update from Github?

Also commenting out the attachInterrupt in addition to the xTaskCreate indeed solves the problem where the firmware stops after the JoinRequest. That solves the major part of this issue.

The one minute delay when duty cycle is enabled is however still a concern. Also strange that this is only a problem with DeepSleep and not the Normal example.

A strange thing that I see here is that OnRadioRxDone => FRAME_TYPE_JOIN_ACCEPT is only printed before the next TX 2 minutes later, not directly after the JoinRequest.

08:58:24.044 -> =====================================
08:58:24.044 -> RAK4631 LoRaWan Deep Sleep Test
08:58:24.044 -> =====================================
08:58:24.077 -> <BRD> SyncWord = 2414
08:58:24.077 -> <BRD> LoRa Task started
08:58:24.077 -> <LMH> OTAA 
08:58:24.077 -> DevEui=54-6A-AE-19-3E-E5-F2-FD
08:58:24.077 -> DevAdd=00000000
08:58:24.077 -> AppEui=54-6A-AE-19-3E-E5-F2-FD
08:58:24.077 -> AppKey=54-6A-AE-19-3E-E5-F2-FD-54-6A-AE-19-3E-E5-F2-FD
08:58:24.143 -> <LMH> Selected subband 1
08:58:24.143 -> <LMH> Selected subband 1
08:58:24.143 -> Start network join request
08:59:29.095 -> LoRaWan init success
08:59:29.161 -> <LM> OnRadioTxDone
08:59:34.149 -> <LM> OnRadioTxDone => RX Windows #1 4989 #2 5995
08:59:34.149 -> <LM> OnRadioTxDone => TX was Join Request
08:59:34.248 -> <LM> OnRadioRxDone
09:01:34.251 -> <LM> OnRadioRxDone => FRAME_TYPE_JOIN_ACCEPT
09:01:34.251 -> OTAA joined and got dev address 260B0721
09:01:34.748 -> Timer wakeup
09:01:34.748 -> LoRaWan package sent successfully
09:01:34.947 -> <LM> OnRadioTxDone
09:01:39.952 -> <LM> OnRadioTxDone => RX Windows #1 4995 #2 5995
09:01:43.898 -> <RADIO> RadioIrqProcess => IRQ_RX_TX_TIMEOUT
09:01:43.898 -> <LM> OnRadioRxTimeout

The examples in the Arduino BSP will only be updated when I release the next version of the BSP (which is in testing right now). I am talking about our WisBlock examples repo.

This is not strange. Serial is linked to the loop() task. If the loop() task goes into sleep, there is no Serial output until the loop wakes up. This is an FreeRTOS thing and I cannot change it.

A (very bad) work-around is to put a delay(100) after the Serial.println() in the callbacks. This gives the Serial time to send the data before the loop() goes back to sleep.

Again, not strange. The LoRaWAN_OTAA_ABP.ino has LORAWAN_DUTYCYCLE_OFF as default. If you change it to LORAWAN_DUTYCYCLE_ON you see the same delay.