Skip to content

LoRaWAN: Retransmission back-off correction for re-join #9217

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

hasnainvirk
Copy link
Contributor

@hasnainvirk hasnainvirk commented Dec 31, 2018

Description

Previously, we had been initializing our time base in
LoRaMac::initialize() and if the device was not power-cycled, that
initial time was always being used in the calculation of the elapsed
time. This introduced a bug mentioned in issue #8921. The bug made the
re-join attempt of a device after a week or so to use wrong back-off
because in calculate_backoff() API we pass the elapsed time. If we do
not set the initial time while trying to connect, the elapsed time will
include previous session time as well and the device will think it has
spend tha much time and will apply harsher duty cycle back off.

Pull request type

[X] Fix
[ ] Refactor
[ ] Target update
[ ] Functionality change
[ ] Docs update
[ ] Test update
[ ] Breaking change

Reviewers

@AnttiKauppila
@kjbracey-arm

@ciarmcom
Copy link
Member

@hasnainvirk, thank you for your changes.
@AnttiKauppila@ARMmbed/mbed-os-wan @ARMmbed/mbed-os-maintainers please review.

Previously, we had been initializing our time base in
LoRaMac::initialize() and if the device was not power-cycled, that
initial time was always being used in the calculation of the elapsed
time. This introduced a bug mentioned in issue ARMmbed#8921. The bug made the
re-join attempt of a device after a week or so to use wrong back-off
because in calculate_backoff() API we pass the elapsed time. If we do
not set the initial time while trying to connect, the elapsed time will
include previous session time as well and the device will think it has
spend tha much time and will apply harsher duty cycle back off.
@hasnainvirk hasnainvirk changed the title Retransmission back-off correction for re-join LoRaWAN: Retransmission back-off correction for re-join Jan 1, 2019
@mattbrown015
Copy link
Contributor

Hi @hasnainvirk , Happy New Year! :-)

AFAICT, what you done will start the timer every time a join sequence is started.

But, LoRaWAN spec V1.0.2 says:

Aggregated during the first hour following power-up or reset

(My emphasis.)

It was my assumption that mac_init_time was set once in LoRaMac::initialize because of this. (Although there's nothing to force the app programmer to construct the LoRaMac immediately after power-up.)

Are you sure your change is consistent with the spec author's intentions?

I think what you've done means there's the possibility that a number of end-devices will all start using the minimum back-off for join requests after some system wide event. My interpretation of the spec is that a reduced back-off was required for the re-transmissions based on the time since power-on or reset.

IMHO, what you've done is fine. I think there are pros-and-cons with both approaches and any given approach is likely to work in some situations and not others. Hence analysing it too much is a waste of time but, on the other hand, we want to be compliant with the spec!

Copy link
Contributor

@cmonr cmonr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch!

@cmonr
Copy link
Contributor

cmonr commented Jan 3, 2019

@hasnainvirk Will start CI once @mattbrown015's comment is resolved.

@hasnainvirk
Copy link
Contributor Author

@mattbrown015 You are right once again :) 🥇
Revisiting the spec, I feel that the intention is not to say literally 'Retransmission backoff'.
I believe restricting DC after a certain time of operation means that suddenly we lost connectivity after weeks of successful operation so it may be a catastrophic event. In that case, the device should not start with BACKOFF_DC_1_HOUR because that would mean thousands of devices transmitting too fast. It will be wise to apply stricter DC like BACKOFF_DC_10_HOUR or BACKOFF_DC_24_HOUR based on the elapsed time to be on the safe side.
So, the original code was correct. I would propose to close this PR as the original behaviour seems correct to me. What are your thoughts ?

@mattbrown015
Copy link
Contributor

It seems we're stuck trying to interpret exactly what the original spec authors meant. I definitely don't know, I'm relatively new to LoRaWAN and I am on my own here!

Like I said before, I think we can probably come up with scenarios where either approach works best.

I can understand your point that we should be careful not to take 'retransmission' too literally.

I've no experience of large deployments and catastrophic events.

I do know that in a small deployment one device can occasionally, possibly erroneously, decide to re-join in which case applying the long back-off is a real pain. Perhaps there's a temporary object gets in the RF path and a few link checks get missed by 1 device.

Allowing 1 transmission with normal back-off could work in some situations.

Any device deciding to re-join gets 1 go with the normal back-off. In a large deployment there may be a large number of initial transmissions but then the extended back-off will get applied and prevent an RF storm. If it is only one device applying the extended back-off it is a pain but at least it got one go relatively quickly.

One thing that worries me with the extended back-off, it's important the the back-off is randomised. If the 1000 devices all apply the extended back-off and then transmit together we've waited all that time and achieved nothing!

@hasnainvirk
Copy link
Contributor Author

@mattbrown015 Yeah, the back-off is randomized at least for Ack Timeout but it isn't there for Joining process. I have taken a note of this and we will do something about it ASAP.
However, it wouldn't make sense to force calculate_backoff() to restart DC from BACKOFF_DC_1_HOUR because then the stricter versions of DC will never get invoked. Our join algorithm will try at least once on every data rate before giving up. How many times it tries is configurable. Ofcourse application can start joining process again if it wishes to. Keeping this in mind, I would be inclined to introduce randomness in backoff for Join process as you suggested and leave the rest as it is.

By the way, in v1.1 rejoin is a proper feature and there are 3 types of it. There can be network initiated forced rejoin requests and there can be periodic rejoins as well. In v1.0.2 there is no concept of rejoin as such. However, you could have a logic based on LINK_CHECK_REQUEST as you briefed.

@hasnainvirk
Copy link
Contributor Author

Closing this PR in favour of #9251

@hasnainvirk hasnainvirk closed this Jan 4, 2019
@hasnainvirk hasnainvirk deleted the dc_backoff_fix branch January 4, 2019 00:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants