RAK4631 Low Power LoRaWAN Example Rx Windows

@Batto

So nothing I can test.

We tested the library against all LoRaWAN regions and with TTN V3, Chirpstack and ResIoT (myself) network servers and there was no problem with the RX2 window parameters. Is there a way to find out what Orange is using for the RX2 window? Are they different from the EU868 definitions in the LoRaWAN specifications? The latest I have (RP-2-1.0.3) says:
The RX2 receive window uses a fixed frequency and data rate. The default parameters are 869.525 MHz / DR0 (SF12, 125 kHz)

When I create an object in Liveobject’s platform, I can specify a profil (typically SF9RX2 or SF12RX2) and I tested a lot of them without success. As I’m testing other platforms and stacks, I know it should work because some of them work (but to be honnest, I did not succeedf on all of them, but TTN yes).

I tested with my TTNv3 gateway and it was ok but when I look in gateway logs, it shows that the join accept is sent with SF=7 so I guess it’s RAK4631 use RX1 (more, I guess TTN sends join accept both on RX1 and RX2).

My point of view is : LiveObject use RX2 only, so it explains why it fails with it and not with TTN if RX2 parameters are not applied. I will try to force the library to use DR_0 and 869.525 MHz for RX1 and I will see …

Ok so I did my tests … Nothing. I switch back my modifications … Device joined ! So I guess when I change device profile on LiveObject platform it’s not taken account immediatly.

I will try to switch back to other profiles to be sure …

EDIT : confirmed, it was just a profile issue … For the record : use “Generic_classA_RX2SF12”

Humm in fact I may I have spoken too quicly … days later, my device doesn’t join. I checked : I don’t know why, but now LiveObject gateway doesn’t seem to send JOIN_ACCEPT on RX1 (so same SF as device used to JOIN_REQUEST) but RX2 (so SF12).
As strange is LiveObject behaviour, it seems that RAK4630 doesn’t receive RX2 messages. When JOIN_ACCEPT was sent on RX1 it was OK.
More : speaking this SX126x-Arduino library author, it seems that the Semtech library used as base is pretty old Increase SF during join · Issue #49 · beegee-tokyo/SX126x-Arduino · GitHub

I know.

beegee-tokyo ==> that is me.

Still the only SX26x Arduino library.

I can’t find time to update it. The sources are 2 1/2 years old and work so far without bigger issues. Yours is the first major problem I encounter since a year. The Semtech code changed a lot and it is not a weekend job to update it to the latest

xD pff I’m completly unfocused …

Yes I already spend few hours on it xD I’ve done a very basic integration of the “new” version of Semtech’s library, it compiles but integrating your board.cpp, loramachelper etc … is the biggest part ^^’
I will keep you in touch if I make something which works …

Joins are always done with standard LoRaWAN settings. A custom/alternate RX2 setting (such as using SF9 as TTN does in Europe) can only apply afterwards, as the details are communicated in the join accept. It’s only with ABP nodes where you need to manually configure the RX2 settings.

Any network server can use either of the valid RX windows to send any communication. It is not supposed to use both for the same message.

I agree there is something very strange with this operator, and its documentation is too light on this point. With a generic profil, join accept are sent on SF12 (so it’s why I guess it’s using RX2 windows and not RX1 or both RX1 and RX2).

I will try to switch to others device profiles, maybe there is a better one …

Again, a LoRaWAN network server may use either RX1 or RX2 at its momentary whim.

There is no predictability which will be used in any given join attempt. It could use RX2 100 times in a row and then randomly decide to use RX1 instead.

But these are also required to be the standard RX window settings defined for joins, they may not be customized. Customized RX windows are only allowed for normal traffic after join, not the join itself.

So no, there’s nothing odd at all, and no unique configuration needed.

That additionally means there’s no need for a network operator to document their join RX window usage, as it’s all in the LoRaWAN specification.

I don’t completely agree with you : LoRa in his history evolves and operators can’t just drop devices because there are impossible to update to new Lora’s version. If I use your example : you coded a device using RX2 SF9, and the standard is now RX2 SF12, it’s not a bad idea to allow with device to continue to use your network. My reproach is LiveObject doesn’t provide the characteristics of the different profiles. Some of them are standards compliant, some other clearly are not to allow “old” devices to be usable.

Anyway RX1 or RX2 I agree, but here clearly when network answer on RX1 it works, and not on RX2. But for me it’s a rare problem because in all devices, and all networks I tested, RX1 was used. I’m using another historic LoRaWAN operator in France (Objenious) and I have no problem (but it uses RX1 … maybe if it switch to RX2 I may not join).

I tested for some time with ResIOT. There I can actually select which RX window to use.


Not sure if this applies for join as well.

If I find time, I will try to test.

Very quick test with ResIOT, I did not investigate why it doesn’t work.

When forcing RX2 window. the RAK4631 could not join.
When setting RX1 and RX2 fallback, the RAK4631 could join immediately.

So my library does not support Join over RX2.

For the third time customization of RX windows does not apply to join accepts.

It applies only afterwards.

LoRaWAN has actually changed some other details of the join accept, but this is handled by setting the device’s LoRaWAN version when entering its registration details with the network before attempting to join. If a network wants to support the old device, it sends an old format join accept to a device that is registered as having an older LoRaWAN version. If it doesn’t want to support that, it doesn’t allow registering devices with old LoRaWAN versions in its database of devices it is willing to send a join accept to.

But the RX windows used for join accepts have NOT changed. RX2 is SF12 for joins in EU868 just like it always has been, even on networks where post-join RX2 is SF9

But for me it’s a rare problem because in all devices, and all networks I tested, RX1 was used

That’s not a reasonable position to take at all. RX2 is part of the LoRaWAN spec, and plenty of networks use it quite heavily. You cannot derive any meaning from which window you see a network server use when you are looking, it may choose either when you are not looking, and if it’s remotely decent, it will use one when the other is not available, rather than have fixed behavior.

So can you explain that ? I got kind of same behaviour with TTN

Explanation stands: you cannot derive any predictive meaning from which window you see a server using now, because it may very well use a different one in a minute, or tomorrow, or next year. On a busy network, you could easily see different behavior on each attempt, because on some the RX1 window will already be being used for traffic to some other node, or that channel is out of airtime, or any other factor that the server authors have included in their algorithm wisely or unwisely, as long as they are in compliance with the LoRAWAN spec.

One your end, you either have a device that complies with the LoRaWAN spec in supporting both windows (and keeping their definitions for join distinct from possibly customized post-join definitions) or you don’t.

Okay, but how does it help us ? It’s exactly what I’m saying from the begining : I’m not joining if network server send join accept on RX2, but yes with RX1, and I agree It’s not a normal behaviour.

And the subsequent question was : why it seems like I’m the only one in the world to have this issue ? And the answer seems to be “because in a very lot of cases, RX1 is used by network instead of RX2”. I was not saying that RX1 is the rule for join accept messages

Not remotely true. I’ve seen plenty of logs where RX2 was used primarily. It’s a dynamic decision made based on factors you can’t see.

I was not saying that RX1 is the rule for join accept messages

Your misplaced focus was saying that though, RX2 is used commonly enough that if it didn’t work, that would probably have been noticed. Earlier in the thread it was posted that RX2 had been tested.

What would be actually helpful would be if you captured raw packet logs from the gateway of the join request and the resulting join accept. What they should show is either spec-mandated RX1 settings or spec-mandated RX2 settings (SF12), either perfectly valid. No customization are yet supposed to be being applied.

So first, supply logs that show that the network operator is meeting the LoRaWAN spec and sending standard join accepts, even if you or they chose customized settings that would apply after the join.

Then, it would be good to verify that the device firmware is attempting to receive with the spec-mandated standard settings, and not with variant settings that should only be applied after join.

Typically in my projects I do with with debug prints and a GPIO I blip at key times to see on a scope, or if accessible getting the radio SPI and DIO lines on a USB logic analyzer that can capture the whole 5+ second cycle is even better.

My suspicion right now is that there’s a mixup in the node usage, or possibly (but less likely) its code, originating in trying to apply customized normal RX2 settings to join RX2, when in actuality join RX2 is always by the spec and never customized.

Hi @all, I had to leave this developpement besides but now I’m back xD
So I was hopping in time that some improvements about this could have been done but my issues still exists.

Problems with LiveObjects are still present. So I started to do some more tests and here is the resume of the situation :

Apparently, the node does’nt receive a JOIN_ACCEPT if it’s sent on RX2 windows. But I think I find something about this => with LiveObjects, if the profile is RX2SF9 the device joins, but not with RX2SF12.
Two options here : with RX2SF9 the profile sends on RX1 too (I really doubt about this) or there is a problem with RX using SF12 datarate. If we take the last option, this leads to think that RX2 is not off, just misconfigured.
I tweaked some parts of the library to force to join using SF12 => in this case, RX1 and RX2 should be using SF12, and it’s not working.

I have some limits :

  • My RAK gateway doesn’t seem to show JOIN_ACCEPT messages if it’s not generated by it. So I can’t see what LiveObject is sending. Any idea on this is welcome :slight_smile:
    *Debug mode is not verbose enought, for example it doesn’t show what datarate it’s using.
  • I tried to find clear (and simple) informations about join process and its parameters. So to avoid any misunderstanding, I would like to clarify some points.

EU868

TTN gives almost everything here : Regional Parameters | The Things Network
But we do not know : basically datarates used in join process and delays. Here https://lora-alliance.org/wp-content/uploads/2020/11/lorawan_regional_parameters_v1.0.2_final_1944_1.pdf we have more informations, but still nothing about join datarates. And I found this https://lora-alliance.org/wp-content/uploads/2021/11/LW1.0.4_End_Device_Certification_V1.0.pdf => here there is something :

RX1DROffset = 2
RX2DataRate = Any DR except default

So SF used for RX1 by the gateway is offset by 2 from the one used in JOIN_REQUEST ? Hummm … SF7 + 2 = SF9 so RX1 is configured with SF9, but LiveObject doesn’t send a JOIN_ACCEPT on RX1, RX2 is misconfigured (keep RX1 settings) => it could explain why it works with RX2SF9 profile.

I continue my tests …

Oooh yes I was right !!

I added a debug log in RegionEU868.cpp → bool RegionEU868RxConfig(RxConfigParams_t *rxConfig, int8_t *datarate)

	LOG_LIB("EU868", "configuring DR=%d freq=%d, status=%d", dr, frequency, Radio.GetStatus());

	if (Radio.GetStatus() != RF_IDLE)
	{
		LOG_LIB("EU868", "Problem : radio is not idle");
		return false;
	}

And I got :

10:34:57.616 → OnRadioTxDone => RX Windows #1 4991 #2 6030
10:34:57.616 → OnRadioTxDone => TX was Join Request
10:35:02.565 → configuring DR=4 freq=0, status=0
10:35:02.565 → RX window timeout = 3000
10:35:03.598 → configuring DR=0 freq=869525000, status=1
10:35:03.598 → Problem : radio is not idle
10:35:06.531 → RadioIrqProcess => IRQ_RX_TX_TIMEOUT
10:35:06.531 → OnRadioRxTimeout
10:35:06.696 → Join network failed 15 time(s)

My first instinct was right, it seems linked with the original issue of this topic. RX1 is not well “closed” so it leads to overcurrent consumption and RX2 JOIN_ACCEPT issue. But it could not be the all story : forcing JOIN_REQ with SF12 should work.

Another discover : as this f****ing LiveObject is taking hhouurssss to change a profile, I think I missed a thing when join process worked : I tested another generic profile named “LORA/GenericA.1.0.2b_ETSI_Rx2-SF12” but I can’t know what is the real operationnal profile, I thougth that it was RX9SF2 profile that it was working but I am pretty sure it was LORA/GenericA.1.0.2b_ETSI_Rx2-SF12 what was “really” being used because when it works it’s using the same SF as the JOIN_REQ (like TTN) so I see some JOIN_ACCEPT with SF7 and it works. What a mess …

So finally here the probable scenario :

LiveObjects profile “Generic_classA_RX2SF12” use only RX2 with SF12
LiveObjects profile “Generic_classA_RX2SF9” use only RX2 with SF9
LiveObjects profile “LORA/GenericA.1.0.2b_ETSI_Rx2-SF12” use RX1 with SF accept = SF request (+RX2SF12 ?)
beegee-tokyo/SX126x-Arduino has an issue with RX2 so we can consider that RX2 windows is not working (as previous @beegee tests show) so if exchanges between gateway and node are tried on RX2, it cant work.

UPDATE :

Sorry for the multiple posts but I prefer to give the maximum information in case anyone meet the same troubles as me xD

I found a workaround, it’s not pretty but even with Generic_classA_RX2SF12 it works ! :slight_smile:

In RegionEU868.cpp → bool RegionEU868RxConfig(RxConfigParams_t *rxConfig, int8_t *datarate) I just commented the return false :

	if (Radio.GetStatus() != RF_IDLE)
	{
		//return false;
	}

I think this part is just a safety to avoid changing radio parameters as the stack is using it, but as the stack already manage that, we can be confident in the fact that it could not happen when there is no other issue.

Clearly it doesn’t fix the real issue, but now everything seems to work well !

UPDATE2

I think I found the issue !!

So background first :

In order to manage timings, the library use two mecanisms : one base on MCU timers and the timeout function included in SX127x. If I detail join process as it should happen :

  1. TX JOIN request is sent and RX1TIMER is started for 5000 ms, and RX2TIMER is started for 6000 ms.
  2. RX1 Timer is triggered, and Sx126x is configured to receive. Here internal Sx126x timeout function is activated.
  3. Sx1276 internal timeout is triggered, leading to STANDBY state.
  4. RX2Timer is triggered, and Sx126x is configured to receive. Here internal Sx126x timeout function is activated.
  5. Sx1276 internal timeout is triggered, leading to STANDBY state and triggering a JOIN_FAIL event.

This process is what SHOULD happen, BUT the SX126x timeout is badly configured : the registry is configured without taking account that there is a conversion to do, leading to a timeout which is triggered after the RX2Timer. So it’s why the library complains that the radio is not in stand by mode.

I made this simple correction (I think there is a better to do but for now …) :

src/radio/sx12x/radio.cpp → void RadioSetRxConfig(RadioModems_t modem, uint32_t bandwidth,
uint32_t datarate, uint8_t coderate,
uint32_t bandwidthAfc, uint16_t preambleLen,
uint16_t symbTimeout, bool fixLen,
uint8_t payloadLen,
bool crcOn, bool freqHopOn, uint8_t hopPeriod,
bool iqInverted, bool rxContinuous)

there is this :

// Timeout Max, Timeout handled directly in SetRx function
RxTimeout = 0xFA0;

I changed it to 700 and it’s joining on RX2 SF12.
Why 700 ? it’s a little bit arbitrary … Data sheet is not clear on this point : https://www.mouser.com/datasheet/2/761/DS_SX1261-2_V1.1-1307803.pdf → 13.1.5 SetRx
They do not explain on this registry what is the unit for duration. As any other timeout in this datasheet is using 15.625µs as conversion factor, I did the same. 700 * 15.625 = 11 ms (it’s too short I think compared to LoraWan specs, tell me if you find the exact value :wink: )