Error callbacks LoRa p2p RAK11720 RUI3

I’m programming a RAK11720 with the RUI3 for LoRa P2P. For this application, I’m using two RAK11720 devices with the RUI.

One device only listens and sends a reception confirmation. That device isn’t having any problems; the problem lies with the device that sends the message.

The sequence is as follows:

  • Sends a LoRa packet.

  • Enters receive mode.

  • Waits for a while. If it doesn’t receive a confirmation,

  • Re-sends the message.

  • Enters receive mode.

  • Waits for a while. If it doesn’t receive a confirmation,

  • Goes to sleep.

  • When it wakes up, it sends the message again and repeats the process.

**If it receives the message, it sends a reception confirmation, goes to sleep, and repeats the process.

The device will correctly complete about three sequences. Each time it sends a message, the send callback is executed. api.lora.registerPSendCallback(send_cb);

But at a certain point, when it goes to sleep and wakes up, it no longer executes any of the callbacks, neither the send nor the receive callback. api.lora.registerPRecvCallback(recv_cb);

The problem is that I depend on these callbacks executing because I activate crucial flags for my application in the send callback.

I’ve been dealing with this error for about a year, and I was able to identify that the callbacks aren’t executing. I implemented traces to determine this.

I tried re-registering the send and receive callbacks every time the device wakes up. I tried doing it programmatically and even created an AT command that does it. Nothing works; it only works after restarting the device. I really wanted to avoid restarting the processor because battery consumption is critical for my application. I’m currently using RUI3 version 4.2.3 and I’ve also tried previous versions without WDT implemented.

My questions are as follows:

  • I’m considering replacing the RAK11720 with a RAK11160 with STM32 or with a RAK4630. Is it possible that the callback problem occurs specifically with Apollo 3 and not with STM32 or Nordic?

  • My intention is to continue using the RUI3 API since it’s quite quick to implement, so I wanted to ask if these errors only happen with Apollo 3 or if it’s a general problem with RUI3.

  • api.system.sleep.all(). api.system.lpm.set(). The other issue is whether putting the device to sleep with this instruction is what causes the callback bug, or if it’s better to use these instructions, and if there’s any difference compared to using api.system.sleep.lora().

  • I can set a guard condition to restart with the following instruction: api.system.reboot(); or restart with the WDT that’s now in the RUI (it didn’t exist before). My question is, which is more energy-efficient?

  • I have the option of implementing an external WDT.

I would appreciate any help and information, because if this is a widespread problem with RUI3, we as a team are considering not using RUI3.

Hi @and-tecnipak ,

To help you best on the issue, although the flow is pretty straight forward, will you be able send us the code? This way, it is apples to apples, same command, sames sequences and same timing.

I can try it on different cores aside RAK11720 to see if the behavior is the same. If it is repeatable on our end, it will be helpful in troubleshooting process and validation of the fix.

I run a test, changing my low power example to work as you said in your use case:

Sender:
Enable RX with TX enabled
Send packet
Wait with timeout for ACK from receiver

  • if ACK received, disable RX
  • if no ACK received, send packet a second time, do not wait for ACK, disable RX

Receiver:
Enable RX with TX enabled
Wait for packet
If packet received, send an ACK packet back

I am not using any api.system.sleep.xxx to control power, beside inside the loop() where I call api.system.sleep.all().
Everything is timer and event (callback) driven.
I do not see any problems with the P2P callbacks.

But

The callbacks are coming from the low level P2P handlers. The only thing I do in the callbacks is to set flags and enable timers to start handlers to do stuff when required. Nothing else is in the callbacks.
Trying to change P2P RX mode or lengthy stuff inside the callbacks are causing problems.

Logs after ~15 minutes:

Here is my test code, if you want to have a look. It is one code for both receiver and sender. To compile for sender or receiver, change the define in the code line 14
image

RUI3-P2P-Confirmed.zip (5.5 KB)

1 Like

Hello, thank you very much for the prompt response.

Below I’m sharing everything related to the callback error. I’ve also included an .xlsx file with the applied traces, the meaning of each one, and the TANK_003 sheet shows the traces of a failing device. The device is stuck in a loop, displaying the @ trace. This is because, in order to enter the if(send_result = 0) statement, it needs to be updated in the gosleep state. However, this also depends on a flag that is updated on send interrupts; the variable is tx_send = 1. If you look at the traces, the normal flow is T1>jR, where > is the execution of the send callback. In the last cycle before the failure, the trace shows T1hR, indicating that the callback wasn’t executed. However, the device is sending messages with AT+PSEND; the problem is solely with the callbacks. Send callbacks always execute immediately after sending a message, while the receive callback should execute upon receiving the message.

As a workaround, I tried re-registering the send and receive callbacks. The API responds that they are configured correctly, as can be seen in the M+N+ traces, where the + represents the API’s TRUE response.

The traces show that even after re-registering, the failure persists. My question is whether to use api.system.lpm.set(1); and api.system.sleep.all(SLEEP_PERIOD); Within the loop (that’s how I’ve built my program with a loop), is it incompatible, or should I add a delay when it wakes up instead of immediately sending a packet?

In the files, I’ve included the .ino file, the .cpp and .h traces, and the library files I edited to add the traces. Some of it is in Spanish because it’s my native language, and there’s also an .xlsx file with the traces from a device that failed.

Finally, as a workaround, we implemented WDT so that it restarts every time it glitches. The problem is that restarting drains the battery. Ideally, I’d like to fix it and understand why it’s failing, rather than relying on a temporary fix. However, I’m planning to keep a WDT in place for any reason. But I’ve already done many tests with various devices, and they all fail in the same way.

Thank you for your attention and help in resolving this issue.

I would appreciate it if you could share an email address so I can send the files privately and edit this comment, because I can’t upload a .zip file like you do.

You can send private messages to Carl or me.

Just click on our avatar, then choose Message.

image

Hi Beegee.

Were you able to replicate the problem? I’ll be waiting for your updates.

Thanks