How to handle concurrent data transfer errors

  • spec & device
    Module : RAK3172(OTAA)
    Gateway : RAK7289 V2
    Network server : chirpstack v4
    Region: KR920

Hi!
I’m asking you a question because there is a problem in the pre-mass production stage.

We conducted a simultaneous transfer test with 10 custom boards equipped with RAK3172.

Only 3 boards were defective at all times on 10 boards and 7 were operating normally.
The three boards failed to join or received the error “+EVT:SEND_CONFIRMED_FAILED(4)” when sending data.

As far as I know, 10 devices can be attached to one channel, but only 7 devices seem to have a problem operating normally.

And if it’s another problem, I wonder how to handle the data transfer.

FYI, RSSI and SNR are in good condition.

※ INFO
join : “AT+JOIN=1:0:30:5”
data :


error:

Hello Song,

I don’t think it is about the number of devices. I have 16 devices connected to one application in Chirpstack. But they are sending in intervals of 2 to 10 minutes (depending on the function of the end node)

Are the failing 3 devices always the same?

Your send interval is very short, I see only ~7 seconds between the transmissions.

Does the behaviour change if you choose a bigger send interval? You might overload the network with 10 devices sending in such short intervals.

For the join failure, same question, are these always the same 3 devices?
In your test, do you startup all 10 devices at the same time?
Did you try to change the join retry interval from 5 to 10?

Did you enable ListenBeforeTalk with AT+LBT=1 (I think that is required for KR920)?

Did you enable additional channels in Chirpstack or do you work with the three default channels of KR920 only?

Are the failing 3 devices always the same?
→ It’s the same.
However, if the data is transmitted separately only for the three that had the problem, it will be transmitted normally.

Your send interval is very short, I see only ~7 seconds between the transmissions.
Does the behaviour change if you choose a bigger send interval? You might overload the network with 10 devices sending in such short intervals.
→ If “+EVT:SEND_CONFIRMED_FAILED(4)” error occurs, it is retransmitting again. The interval between these errors appears to be 6 to 7 seconds.

For the join failure, same question, are these always the same 3 devices?
In your test, do you startup all 10 devices at the same time?
Did you try to change the join retry interval from 5 to 10?
→ It’s pretty much the same. And the join retry interval is being used as 5. Should I change it to 10?

Did you enable ListenBeforeTalk with AT+LBT=1 (I think that is required for KR920)?
→ No, I didn’t set it up. I’ll try it, but I don’t know what kind of function it is yet, so I’ll check it out.
->Can you explain “AT+LBT” by any chance?

Did you enable additional channels in Chirpstack or do you work with the three default channels of KR920 only?
→ No, I’ve added 4 extra channels and I’m using them.

Are all devices running standard RUI3 AT command firmware or a custom firmware? If yes, are there differences in the custom firmware?

LBT means that the device is first listening for traffic in the air before it starts sending. Alternative the device will use the duty cycle restrictions, which limits the air time on each availabler frequency/band. But from the LoRaWAN specifications, it should be always set to LBT:

If it is always the same three devices, did you try to reflash the devices with the firmware?
If you have STM32CubeProgrammer or a JLink adapter to flash the devices, try to do a chip erase, then flash the firmware and check if there is any change.

The RAK3172 module is using the firmware version exactly as it was received.

[2024-05-30 14:01:35.586] RAKwireless RAK3172
[2024-05-30 14:01:35.586] ------------------------------------------------------
[2024-05-30 14:01:35.586] Version: RUI_4.0.6_RAK3172-E
[2024-05-30 14:01:35.586] Current Work Mode: LoRaWAN.

I haven’t changed the firmware because I’m using a separate MCU other than RAK3172, should I change it?

If I set “AT+LBT=1”, does RAK3172 detect RSSI and select channel on its own?

The firmware version should be ok. The latest is V4.1.0, but the changes are more on the API side for custom applications.

I had some experience with RAK3172 where some settings got messed up, but that was because I was changing the device settings a lot between LoRaWAN, LoRa P2P and FSK P2P.
A chip erase and a fresh installation helped to get the device back to normal working.

I am not expecting such a problem, as I think you never switched between LoRaWAN and the other modes. But it is worth a try if you have the option to do it.

With LBT active, the device will first check the RSSI levels, and if it seems another device is sending already, it will retry to send the packet after a delay.
I am not sure about the delay time, I never worked with LBT.

We set up the LBT and conducted the same simultaneous transmission test as before.
As a result, it didn’t work out.

And I have a question regarding JOIN.
I used “AT+JOIN=1:0:30:5”, but I tried unsuccessfully as below.
In this case, what exceptions should be made to operate normally?

In this case, can you check in LoRaWAN frames if the Join Accept was actually sent back to the device?

I found that in Chirpstack, if a device tries to join too often, I have to “Flush OTAA device nonces” to get the Join request to work.

Thank you for the good information.
As I did the test, I have a question, so I would like to ask you another question.

After making the “JOIN” state in the “AT+JOIN” command,
Without using “JOIN” command every time
After checking the "JOIN"status with the “AT+NJS” command, we are transferring data.

However, we confirmed that “AT+NJS” outputs JOIN status as “1” even if the gateway is abnormal.

I’m going to install hundreds of LoRaWAN devices in one area, and I was wondering if it would be appropriate to try JOIN every time with the “AT+JOIN” command rather than checking the status with “AT+NJS”.

It is for vehicle detection in the parking lot, so data can be transmitted at the same time or randomly.

And I’m currently turning off the ADR and using the DR value fixed to “0”, I wonder if it’s right to turn off the ADR.

AT+NJS returns 0 if the device didn’t join yet and 1 if it joined once.
It does not say whether the connection to the gateway or to the LNS is working.

Example:
Gateway online and connected to LNS
AT+NJS=? ==> 0
AT+JOIN
"EVT:JOINED
AT+NJS=? ==> 1
Gateway looses connection to LNS
AT+NJS=? ==> 1

The end node cannot know whether there is a gateway in range or if a gateway is connected to a LoRaWAN server.
Without special actions, there is no information on the end node regarding a broken connection.

What you can do?
Frequently use a confirmed message or linkcheck to check whether the LNS is returning an acknowledgement to the end node.

Regarding ADR ==> TTN Adaptive Datarate Adaptive Data Rate:

ADR should be enabled whenever an end device has sufficiently stable RF conditions. This means that it can generally be enabled for static devices. If the static end device can determine that RF conditions are unstable (for example, when a car is parked on top of a parking sensor), ADR should (temporarily) be disabled.
Mobile end devices should be able to detect when they are stationary for a longer times, and enable ADR during those times. End devices decide if ADR should be used or not, not the application or the network.

Thank you, I’ll take note of that.