RAK4631 Low Power LoRaWAN Example Rx Windows

Explanation stands: you cannot derive any predictive meaning from which window you see a server using now, because it may very well use a different one in a minute, or tomorrow, or next year. On a busy network, you could easily see different behavior on each attempt, because on some the RX1 window will already be being used for traffic to some other node, or that channel is out of airtime, or any other factor that the server authors have included in their algorithm wisely or unwisely, as long as they are in compliance with the LoRAWAN spec.

One your end, you either have a device that complies with the LoRaWAN spec in supporting both windows (and keeping their definitions for join distinct from possibly customized post-join definitions) or you don’t.

Okay, but how does it help us ? It’s exactly what I’m saying from the begining : I’m not joining if network server send join accept on RX2, but yes with RX1, and I agree It’s not a normal behaviour.

And the subsequent question was : why it seems like I’m the only one in the world to have this issue ? And the answer seems to be “because in a very lot of cases, RX1 is used by network instead of RX2”. I was not saying that RX1 is the rule for join accept messages

Not remotely true. I’ve seen plenty of logs where RX2 was used primarily. It’s a dynamic decision made based on factors you can’t see.

I was not saying that RX1 is the rule for join accept messages

Your misplaced focus was saying that though, RX2 is used commonly enough that if it didn’t work, that would probably have been noticed. Earlier in the thread it was posted that RX2 had been tested.

What would be actually helpful would be if you captured raw packet logs from the gateway of the join request and the resulting join accept. What they should show is either spec-mandated RX1 settings or spec-mandated RX2 settings (SF12), either perfectly valid. No customization are yet supposed to be being applied.

So first, supply logs that show that the network operator is meeting the LoRaWAN spec and sending standard join accepts, even if you or they chose customized settings that would apply after the join.

Then, it would be good to verify that the device firmware is attempting to receive with the spec-mandated standard settings, and not with variant settings that should only be applied after join.

Typically in my projects I do with with debug prints and a GPIO I blip at key times to see on a scope, or if accessible getting the radio SPI and DIO lines on a USB logic analyzer that can capture the whole 5+ second cycle is even better.

My suspicion right now is that there’s a mixup in the node usage, or possibly (but less likely) its code, originating in trying to apply customized normal RX2 settings to join RX2, when in actuality join RX2 is always by the spec and never customized.

Hi @all, I had to leave this developpement besides but now I’m back xD
So I was hopping in time that some improvements about this could have been done but my issues still exists.

Problems with LiveObjects are still present. So I started to do some more tests and here is the resume of the situation :

Apparently, the node does’nt receive a JOIN_ACCEPT if it’s sent on RX2 windows. But I think I find something about this => with LiveObjects, if the profile is RX2SF9 the device joins, but not with RX2SF12.
Two options here : with RX2SF9 the profile sends on RX1 too (I really doubt about this) or there is a problem with RX using SF12 datarate. If we take the last option, this leads to think that RX2 is not off, just misconfigured.
I tweaked some parts of the library to force to join using SF12 => in this case, RX1 and RX2 should be using SF12, and it’s not working.

I have some limits :

  • My RAK gateway doesn’t seem to show JOIN_ACCEPT messages if it’s not generated by it. So I can’t see what LiveObject is sending. Any idea on this is welcome :slight_smile:
    *Debug mode is not verbose enought, for example it doesn’t show what datarate it’s using.
  • I tried to find clear (and simple) informations about join process and its parameters. So to avoid any misunderstanding, I would like to clarify some points.

EU868

TTN gives almost everything here : Regional Parameters | The Things Network
But we do not know : basically datarates used in join process and delays. Here https://lora-alliance.org/wp-content/uploads/2020/11/lorawan_regional_parameters_v1.0.2_final_1944_1.pdf we have more informations, but still nothing about join datarates. And I found this https://lora-alliance.org/wp-content/uploads/2021/11/LW1.0.4_End_Device_Certification_V1.0.pdf => here there is something :

RX1DROffset = 2
RX2DataRate = Any DR except default

So SF used for RX1 by the gateway is offset by 2 from the one used in JOIN_REQUEST ? Hummm … SF7 + 2 = SF9 so RX1 is configured with SF9, but LiveObject doesn’t send a JOIN_ACCEPT on RX1, RX2 is misconfigured (keep RX1 settings) => it could explain why it works with RX2SF9 profile.

I continue my tests …

Oooh yes I was right !!

I added a debug log in RegionEU868.cpp → bool RegionEU868RxConfig(RxConfigParams_t *rxConfig, int8_t *datarate)

	LOG_LIB("EU868", "configuring DR=%d freq=%d, status=%d", dr, frequency, Radio.GetStatus());

	if (Radio.GetStatus() != RF_IDLE)
	{
		LOG_LIB("EU868", "Problem : radio is not idle");
		return false;
	}

And I got :

10:34:57.616 → OnRadioTxDone => RX Windows #1 4991 #2 6030
10:34:57.616 → OnRadioTxDone => TX was Join Request
10:35:02.565 → configuring DR=4 freq=0, status=0
10:35:02.565 → RX window timeout = 3000
10:35:03.598 → configuring DR=0 freq=869525000, status=1
10:35:03.598 → Problem : radio is not idle
10:35:06.531 → RadioIrqProcess => IRQ_RX_TX_TIMEOUT
10:35:06.531 → OnRadioRxTimeout
10:35:06.696 → Join network failed 15 time(s)

My first instinct was right, it seems linked with the original issue of this topic. RX1 is not well “closed” so it leads to overcurrent consumption and RX2 JOIN_ACCEPT issue. But it could not be the all story : forcing JOIN_REQ with SF12 should work.

Another discover : as this f****ing LiveObject is taking hhouurssss to change a profile, I think I missed a thing when join process worked : I tested another generic profile named “LORA/GenericA.1.0.2b_ETSI_Rx2-SF12” but I can’t know what is the real operationnal profile, I thougth that it was RX9SF2 profile that it was working but I am pretty sure it was LORA/GenericA.1.0.2b_ETSI_Rx2-SF12 what was “really” being used because when it works it’s using the same SF as the JOIN_REQ (like TTN) so I see some JOIN_ACCEPT with SF7 and it works. What a mess …

So finally here the probable scenario :

LiveObjects profile “Generic_classA_RX2SF12” use only RX2 with SF12
LiveObjects profile “Generic_classA_RX2SF9” use only RX2 with SF9
LiveObjects profile “LORA/GenericA.1.0.2b_ETSI_Rx2-SF12” use RX1 with SF accept = SF request (+RX2SF12 ?)
beegee-tokyo/SX126x-Arduino has an issue with RX2 so we can consider that RX2 windows is not working (as previous @beegee tests show) so if exchanges between gateway and node are tried on RX2, it cant work.

UPDATE :

Sorry for the multiple posts but I prefer to give the maximum information in case anyone meet the same troubles as me xD

I found a workaround, it’s not pretty but even with Generic_classA_RX2SF12 it works ! :slight_smile:

In RegionEU868.cpp → bool RegionEU868RxConfig(RxConfigParams_t *rxConfig, int8_t *datarate) I just commented the return false :

	if (Radio.GetStatus() != RF_IDLE)
	{
		//return false;
	}

I think this part is just a safety to avoid changing radio parameters as the stack is using it, but as the stack already manage that, we can be confident in the fact that it could not happen when there is no other issue.

Clearly it doesn’t fix the real issue, but now everything seems to work well !

UPDATE2

I think I found the issue !!

So background first :

In order to manage timings, the library use two mecanisms : one base on MCU timers and the timeout function included in SX127x. If I detail join process as it should happen :

  1. TX JOIN request is sent and RX1TIMER is started for 5000 ms, and RX2TIMER is started for 6000 ms.
  2. RX1 Timer is triggered, and Sx126x is configured to receive. Here internal Sx126x timeout function is activated.
  3. Sx1276 internal timeout is triggered, leading to STANDBY state.
  4. RX2Timer is triggered, and Sx126x is configured to receive. Here internal Sx126x timeout function is activated.
  5. Sx1276 internal timeout is triggered, leading to STANDBY state and triggering a JOIN_FAIL event.

This process is what SHOULD happen, BUT the SX126x timeout is badly configured : the registry is configured without taking account that there is a conversion to do, leading to a timeout which is triggered after the RX2Timer. So it’s why the library complains that the radio is not in stand by mode.

I made this simple correction (I think there is a better to do but for now …) :

src/radio/sx12x/radio.cpp → void RadioSetRxConfig(RadioModems_t modem, uint32_t bandwidth,
uint32_t datarate, uint8_t coderate,
uint32_t bandwidthAfc, uint16_t preambleLen,
uint16_t symbTimeout, bool fixLen,
uint8_t payloadLen,
bool crcOn, bool freqHopOn, uint8_t hopPeriod,
bool iqInverted, bool rxContinuous)

there is this :

// Timeout Max, Timeout handled directly in SetRx function
RxTimeout = 0xFA0;

I changed it to 700 and it’s joining on RX2 SF12.
Why 700 ? it’s a little bit arbitrary … Data sheet is not clear on this point : https://www.mouser.com/datasheet/2/761/DS_SX1261-2_V1.1-1307803.pdf → 13.1.5 SetRx
They do not explain on this registry what is the unit for duration. As any other timeout in this datasheet is using 15.625µs as conversion factor, I did the same. 700 * 15.625 = 11 ms (it’s too short I think compared to LoraWan specs, tell me if you find the exact value :wink: )