LoRaWAN: Retransmission back-off correction for re-join #9217

hasnainvirk · 2018-12-31T16:38:27Z

Description

Previously, we had been initializing our time base in
LoRaMac::initialize() and if the device was not power-cycled, that
initial time was always being used in the calculation of the elapsed
time. This introduced a bug mentioned in issue #8921. The bug made the
re-join attempt of a device after a week or so to use wrong back-off
because in calculate_backoff() API we pass the elapsed time. If we do
not set the initial time while trying to connect, the elapsed time will
include previous session time as well and the device will think it has
spend tha much time and will apply harsher duty cycle back off.

Pull request type

[X] Fix
[ ] Refactor
[ ] Target update
[ ] Functionality change
[ ] Docs update
[ ] Test update
[ ] Breaking change

Reviewers

@AnttiKauppila
@kjbracey-arm

ciarmcom · 2018-12-31T18:00:26Z

@hasnainvirk, thank you for your changes.
@AnttiKauppila@ARMmbed/mbed-os-wan @ARMmbed/mbed-os-maintainers please review.

Previously, we had been initializing our time base in LoRaMac::initialize() and if the device was not power-cycled, that initial time was always being used in the calculation of the elapsed time. This introduced a bug mentioned in issue ARMmbed#8921. The bug made the re-join attempt of a device after a week or so to use wrong back-off because in calculate_backoff() API we pass the elapsed time. If we do not set the initial time while trying to connect, the elapsed time will include previous session time as well and the device will think it has spend tha much time and will apply harsher duty cycle back off.

mattbrown015 · 2019-01-02T10:54:37Z

Hi @hasnainvirk , Happy New Year! :-)

AFAICT, what you done will start the timer every time a join sequence is started.

But, LoRaWAN spec V1.0.2 says:

Aggregated during the first hour following power-up or reset

(My emphasis.)

It was my assumption that mac_init_time was set once in LoRaMac::initialize because of this. (Although there's nothing to force the app programmer to construct the LoRaMac immediately after power-up.)

Are you sure your change is consistent with the spec author's intentions?

I think what you've done means there's the possibility that a number of end-devices will all start using the minimum back-off for join requests after some system wide event. My interpretation of the spec is that a reduced back-off was required for the re-transmissions based on the time since power-on or reset.

IMHO, what you've done is fine. I think there are pros-and-cons with both approaches and any given approach is likely to work in some situations and not others. Hence analysing it too much is a waste of time but, on the other hand, we want to be compliant with the spec!

cmonr

Nice catch!

cmonr · 2019-01-03T01:45:42Z

@hasnainvirk Will start CI once @mattbrown015's comment is resolved.

hasnainvirk · 2019-01-03T19:47:54Z

@mattbrown015 You are right once again :) 🥇
Revisiting the spec, I feel that the intention is not to say literally 'Retransmission backoff'.
I believe restricting DC after a certain time of operation means that suddenly we lost connectivity after weeks of successful operation so it may be a catastrophic event. In that case, the device should not start with BACKOFF_DC_1_HOUR because that would mean thousands of devices transmitting too fast. It will be wise to apply stricter DC like BACKOFF_DC_10_HOUR or BACKOFF_DC_24_HOUR based on the elapsed time to be on the safe side.
So, the original code was correct. I would propose to close this PR as the original behaviour seems correct to me. What are your thoughts ?

mattbrown015 · 2019-01-03T22:03:09Z

It seems we're stuck trying to interpret exactly what the original spec authors meant. I definitely don't know, I'm relatively new to LoRaWAN and I am on my own here!

Like I said before, I think we can probably come up with scenarios where either approach works best.

I can understand your point that we should be careful not to take 'retransmission' too literally.

I've no experience of large deployments and catastrophic events.

I do know that in a small deployment one device can occasionally, possibly erroneously, decide to re-join in which case applying the long back-off is a real pain. Perhaps there's a temporary object gets in the RF path and a few link checks get missed by 1 device.

Allowing 1 transmission with normal back-off could work in some situations.

Any device deciding to re-join gets 1 go with the normal back-off. In a large deployment there may be a large number of initial transmissions but then the extended back-off will get applied and prevent an RF storm. If it is only one device applying the extended back-off it is a pain but at least it got one go relatively quickly.

One thing that worries me with the extended back-off, it's important the the back-off is randomised. If the 1000 devices all apply the extended back-off and then transmit together we've waited all that time and achieved nothing!

hasnainvirk · 2019-01-03T22:43:34Z

@mattbrown015 Yeah, the back-off is randomized at least for Ack Timeout but it isn't there for Joining process. I have taken a note of this and we will do something about it ASAP.
However, it wouldn't make sense to force calculate_backoff() to restart DC from BACKOFF_DC_1_HOUR because then the stricter versions of DC will never get invoked. Our join algorithm will try at least once on every data rate before giving up. How many times it tries is configurable. Ofcourse application can start joining process again if it wishes to. Keeping this in mind, I would be inclined to introduce randomness in backoff for Join process as you suggested and leave the rest as it is.

By the way, in v1.1 rejoin is a proper feature and there are 3 types of it. There can be network initiated forced rejoin requests and there can be periodic rejoins as well. In v1.0.2 there is no concept of rejoin as such. However, you could have a logic based on LINK_CHECK_REQUEST as you briefed.

hasnainvirk · 2019-01-04T00:12:21Z

Closing this PR in favour of #9251

hasnainvirk mentioned this pull request Dec 31, 2018

LoRaWAN: Retransmissions back-off Applied to First Transmission #8921

Closed

ciarmcom requested review from AnttiKauppila and a team December 31, 2018 18:00

ciarmcom added the needs: review label Dec 31, 2018

hasnainvirk force-pushed the dc_backoff_fix branch from 54851d4 to c101d88 Compare January 1, 2019 19:28

hasnainvirk changed the title ~~Retransmission back-off correction for re-join~~ LoRaWAN: Retransmission back-off correction for re-join Jan 1, 2019

AnttiKauppila approved these changes Jan 2, 2019

View reviewed changes

cmonr added the release-version: 5.11.1 label Jan 3, 2019

cmonr approved these changes Jan 3, 2019

View reviewed changes

cmonr removed the release-version: 5.11.1 label Jan 3, 2019

hasnainvirk closed this Jan 4, 2019

studavekar removed the needs: review label Jan 4, 2019

hasnainvirk deleted the dc_backoff_fix branch January 4, 2019 00:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

LoRaWAN: Retransmission back-off correction for re-join #9217

LoRaWAN: Retransmission back-off correction for re-join #9217

Uh oh!

hasnainvirk commented Dec 31, 2018 •

edited

Loading

Uh oh!

ciarmcom commented Dec 31, 2018

Uh oh!

mattbrown015 commented Jan 2, 2019

Uh oh!

cmonr left a comment

Uh oh!

cmonr commented Jan 3, 2019

Uh oh!

hasnainvirk commented Jan 3, 2019

Uh oh!

mattbrown015 commented Jan 3, 2019

Uh oh!

hasnainvirk commented Jan 3, 2019

Uh oh!

hasnainvirk commented Jan 4, 2019

Uh oh!

Uh oh!

LoRaWAN: Retransmission back-off correction for re-join #9217

LoRaWAN: Retransmission back-off correction for re-join #9217

Uh oh!

Conversation

hasnainvirk commented Dec 31, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Pull request type

Reviewers

Uh oh!

ciarmcom commented Dec 31, 2018

Uh oh!

mattbrown015 commented Jan 2, 2019

Uh oh!

cmonr left a comment

Choose a reason for hiding this comment

Uh oh!

cmonr commented Jan 3, 2019

Uh oh!

hasnainvirk commented Jan 3, 2019

Uh oh!

mattbrown015 commented Jan 3, 2019

Uh oh!

hasnainvirk commented Jan 3, 2019

Uh oh!

hasnainvirk commented Jan 4, 2019

Uh oh!

Uh oh!

hasnainvirk commented Dec 31, 2018 •

edited

Loading