Duplicate messages from RAK4631

Alangward · April 12, 2022, 9:46am

I have previously posted this problem on the Semtech forum as I think it’s an issue with the radio driver. I got no reply, so I’m posting it here to see if anyone has any idea what is happening.

At the gateway, I am taking the incoming Lora messages straight off the Lora_pkt_forwarder socket API.
I am seeing that the 4631 radio is sending some messages more than once in quick succession and at different frequencies. It seems to happen more often a higher spreading factors.

Here is an example from my own debug log:

Received data: 1, 0, 0, 2, af74fc, de8dfb, 1000, 52, 0, 0, 0, 93, 0, 0 Radio settings: SF11BW125, 4/5, 868.1
Received data: 1, 0, 0, 2, af74fc, de8dfb, 1000, 52, 0, 0, 0, 93, 0, 0 Radio settings: SF11BW125, 4/5, 867.1.

The field value ‘52’ is the f_cnt.
The raw data from the pkt-forwarder shows the messages are sent very close together.

Thanks
Alan

beegee · April 12, 2022, 12:36pm

Hi Alan,
what code are your using?

Alangward · April 12, 2022, 5:26pm

Hi ,
I have my own code that is calling the radio code directly.
I don’t really expect you to help too much with this as I am not using the system as intended, but I hoped you would have more understanding of how the radio code works than I do.
It would be good if Semtech documented this stuff!

Alan

cstratton · April 13, 2022, 12:09am

Please show the raw reports including the RSSI and the rolling microsecond timestamp

tmst | number | Internal timestamp of “RX finished” event (32b unsigned)

How far is your node from the gateway? If it’s too close, then it may be overloading the receiver and aliasing as weak phantom signals on additional channels besides the actual one.

What’s telltale is that the timestamps will be impossibly close together, the real signal will be strong, and the phantom signal(s) will be weak.

Sometimes the phantom signals or even the real ones also have CRC errors.

There is however also an obscure LoRaWAN mode setting that causes packets to be sent more than once; the catch there is that they have to be at least the RX2 delay apart.

beegee · April 13, 2022, 1:38am

@Alangward
Right, if you are directly calling the radio code and have your own LoRaMAC stack running, I cannot help much.
The library that we propose for WisBlock Arduino is based on Semtech’s code as well, but quite old (2019 MAC 1.0.2). What I know is that there is no packet re-transmitting in it and that you get an error if the application tries to send again before the TX finished callback is done. TX finished callback is done after RX2 window is finished or a packet from the LNS was received.

@cstratton
I am eager to learn about the obscure LoRaWAN mode setting that transmits packets more than once. What would be the use of that?

cstratton · April 13, 2022, 1:47am

I’ve overlooked it myself in the past but it is NbTrans field of the too-many-purposes LinkADRReq downlink MAC command.

In the Redundancy bits the NbTrans field is the number of transmissions for each uplink
message. This applies only to “unconfirmed” uplink frames. The default value is 1
corresponding to a single transmission of each frame. The valid range is [1:15]. If NbTrans==0
is received the end-device should use the default value. This field can be used by the network manager to control the redundancy of the node uplinks to obtain a given Quality of Service. The end-device performs frequency hopping as usual between repeated transmissions, it does wait after each repetition until the receive windows have expired. . Whenever a downlink message is received during the RX1 slot window, it SHALL stop any further retransmission of the same uplink message. For class A devices, a reception in the RX2 slot has the same effect.

Also elsewhere in the spec if the the uplink frame count does not increment for the repeats, so once one is seen the network will ignore the rest.

What’s more relevant here is that the repeats are only sent after waiting out the receive windows. If, as I suspect, the duplicates are arriving on top of each other, it’s actually RF overloading creating phantom “signals” that is to blame.

beegee · April 13, 2022, 1:58am

Off topic now, but if the retransmission NbTrans is in the LinkADRReq, then it would be requested by the gateway, correct? Not initiated by the stack on the node.

cstratton · April 13, 2022, 2:06am

Commanded by the network, yes, though I recall reading about people who found packaged-for-purpose LoRaWAN sensors which seemingly defaulted NbTrans to a value larger than 1, in apparent contradiction to what I quoted from the spec.

Anyway, I mentioned it only for completeness - I strongly believe that what’s actually happening is not that, but “duplicates” overlapping in time that are actually caused by RF overloading from being too close to the gateway creating false “signals” on frequencies besides the true one.

That’s why I asked for the rssi and tmst fields from the duplicated packets.

carlrowan · April 13, 2022, 6:33am

Great discussion @cstratton

I got this experience before. Few years ago using ACSiP S76S modules not RAK and really a mystery to me. But I my device and gateway is too near. Just in the same room.

Aliasing as weak phantom signals, I wish there is a technical write up about that topic. The mystery of the phantom signals if the gateway antenna is too near to the device.
Is NbTrans really used in practice? With the sounds of it, it will likely overpopulate the air wave with so much LoRaWAN signal which is not good for the operation of LoRAWAN network. Also, why will it be suddenly be out of the default value?

Just to add my idea @Alangward , in your setup, is the ADR on? You can probably try to turn it off just to test. The NbTrans might be changed during configuration of ADR from the network server.

Alangward · April 13, 2022, 7:55am

@cstratton is exactly right. Thanks very much.

Both messages arrive in the same object and 8 microseconds apart. rssi of -24 and -88.

{“rxpk”:[{“jver”:1,“tmst”:1711907216,“time”:“2022-04-13T07:39:33.191697Z”,“tmms”:1333870792191,“chan”:0,“rfch”:1,“freq”:868.100000,“mid”: 0,“stat”:1,“modu”:“LORA”,“datr”:“SF11BW125”,“codr”:“4/5”,“rssis”:-27,“lsnr”:11.2,“foff”:-36,“rssi”:-27,“size”:30,“data”:“AhCvdPzejfvoAwIAdQUAAAAAAAAAAAAAVQAAAAAA”},{“jver”:1,“tmst”:1711907224,“time”:“2022-04-13T07:39:33.191705Z”,“tmms”:1333870792191,“chan”:3,“rfch”:0,“freq”:867.100000,“mid”: 1,“stat”:1,“modu”:“LORA”,“datr”:“SF11BW125”,“codr”:“4/5”,“rssis”:-100,“lsnr”:-12.2,“foff”:-95,“rssi”:-88,“size”:30,“data”:“AhCvdPzejfvoAwIAdQUAAAAAAAAAAAAAVQAAAAAA”}]}

@carlrowan I’m using LoRa not LoraWAN, therefore no ADR.

I’ll turn down the power - not that is a real problem as I am rejecting duplicates - I was just worried that if this happened in live use we would be using more radio bandwidth than we should.

I wonder also if this effect might be more likely if the crystal frequency on the transmitting radio was slightly wrong and it was therefore transmitting somewhere between the defined values?

Thanks again
Alan

cstratton · April 13, 2022, 2:09pm

This is definitely a case of overload from transmitting too close to the gateway creating phantom spurious signals. The facts that the packets overlap in time, one is absurdly strong, and the other much weaker are telling.

You typically want the strongest RSSI’s to be -40 or lower, -60 is about the strongest you’d see in a typical actual deployment.

All physically possible radio components can be characterized by a maximum signal power level they can handle before they start suffering more than a specified tiny degree of distortion. In an RF system, “distortion” typically shows up as spurious fake signals that didn’t exist on the air, but were invented by the non-linear distortion of a signal chain overdriven with too strong a signal.

In the architecture of a typical LoRaWAN gateway, the antenna input is fed to two front end radio chips that function as downconverters for receive. For the 868 band, these are typically tuned at 867.5 MHz and 868.5 MHz. Each converts a useful range of about +/- 500 KHz from that input frequency down to an intermediate frequency to be fed to a different input of the single baseband processing chip. As this is an IQ design, the actual IF frequency is 0 Hz, with signals falling at up to +500 KHz and -500 KHz.

For your specific example of a real signal at 868.1 MHz and a phantom signal at 867.1, in the typical global_config.json for 868 MHz, we can see that these channels are each at a -400 KHz offset from an IF, but they are at the offset from the two different IF’s of the two different radio chips. So likely what is happening is that the absurdly strong IF signal from one radio to the baseband chip is leaking over into the IF input from the other radio making the baseband think that both radios are seeing a signal at this offset from their respective center frequencies - the coupling might be in the baseband chip itself, through free space, in the board power supply, or whatever. But it’s entirely expected when the signal level is so absurdly high.

That’s actually not what I expected to find before I traced this particular case through the global_config.json. Rather, what I expected was something like the case in some US configs where there’s a channel defined at +100 KHz from an IF center, and another at +300 KHz. If you imagine a sine wave that is overdriving an amplifier, what will start to happen is that near the peaks, the amplifier will go into “compression” slightly lowering them. And changing the shape of a sine wave causes harmonic distortion, especially at the 3rd multiple of the actual frequency. So it would seem that a real signal at +100 suffering compression might start to show up as a weak spurious signal at +300 (though the lora modulation would be 3 times as wide, still, it might fool the processing). There are also ways that higher multiples can “fold back” from the sample frequency into the passband.

Anyway, figuring out the exact mechanism of the distortion isn’t the important point - what matters is simply understanding that like all radios, gateways have maximum signal levels they can see before they start behaving oddly. Not only can you get spurious signals, you can also get corruption of the real signals. Additionally, even when not creating fake signals, a node a few 10’s of meters away can still manage to “blank out” all channels on a gateway whenever it transmits, preventing signals from other more distant nodes from being received, even when those are on distinct frequencies that should not interfere. Plus the moderate ability of a gateway to receive two signals with different spreading factors on the same frequency at the same time only works when the difference in the signal levels is moderate.

Keep the nodes far enough from the gateways not to overload them.

If you have to test closer, replace the node or gateway antenna with an appropriate non-inductive resistor / dummy load (never operate a potential transmitter without an antenna)

Just for completeness, two other points which have nothing to do with what is actually going on here:

Although NbTrans is communicated in the LinkADRReq MAC command, that is a multi-purpose command used for far more than ADR. In particular, many network servers will have to send it to even non-ADR nodes in order to configure the channel map, or the NbTrans if they wanted to. Unfortunately all three things - channel map enables, datarate and power level, and NbTrans are merged together in a single MAC command, and it’s only possible to send all of them, even if the server doesn’t want to change the node’s power. But the case described in this thread doesn’t involve by-the-spec LoRaWAN at all, so that’s all irrelevant.
Frequency error in the node’s radio is not likely to be an issue here. LoRa is a fairly wideband modulation, so exact center frequency isn’t all that critical, and the difference between even adjacent channel frequencies (which these are not) is wider than reasonably expectable frequency error. It would theoretically be possible to have close-in spurious outputs from a transmitter using an upconversion topology however. If one looks at an FCC or whatever test report, these may be visible, as they are regulated. However, typically regulations only require that the spurious outputs be a certain degree weaker than the intended one, and the difference in signal levels seen here probably meets that easily. Spurious outputs (weak enough to meet the regulatory rules) wouldn’t typically show up at a gateway in a deployed system, because (to save battery power expended when transmitting and avoiding blanking more distant nodes) the power level of the main signal should be adjusted by the installer or ADR such that the intended signal is only being received at a level sufficient for confidence, rather than the absurdly strong overload level signal being seen here.