Skip to content

Allow firestore to recover quicker when a network change occurs #217

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 18 commits into from
Jan 29, 2019

Conversation

rsgowman
Copy link
Member

Eg going from airplane mode to wifi enabled. Previously, firestore would
use an exponential backoff to determine when to attempt a reconnect.
That backoff is now reset and the connections are retried immediately
upon a network change.

@googlebot googlebot added the cla: yes Override cla label Jan 28, 2019
Eg going from airplane mode to wifi enabled. Previously, firestore would
use an exponential backoff to determine when to attempt a reconnect.
That backoff is now reset and the connections are retried immediately
upon a network change.
@rsgowman rsgowman removed their assignment Jan 28, 2019
(Filenames themselves only to make subsequent diff's easier to review in
case github's code review tool doesn't handle this well.)
}

@Override
public void onLost(Network network) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: are onAvailable and onLost guaranteed to never fire twice in a row? (similar question about the pre-N implementation)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't expect them to be called twice in a row under normal circumstances, at least, not for the same network. But when going into (or coming out of) airplane mode (while in a wifi area) it seems possible for onLost to be called twice; once for wifi and again for cell.

The behaviour we'd take in response is to reset our streams twice (which I think is correct, since cell comes up first, and android will kill cell connections shortly after wifi is re-established.) Worst case: we do a little extra work when the networks are changing.

Does this seem reasonable, or am I missing a case?

@@ -201,6 +203,17 @@ public void onClose(Status status) {
handleWriteStreamClose(status);
}
});

networkReachabilityMonitor.onNetworkReachabilityChange(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine. It's pretty different from what iOS does, but my understanding is that the Android implementation has significantly less control over gRPC, so trying to bring this in line with iOS is probably impractical. One difference is that iOS does an error stop while this implementation performs a graceful stop. Graceful stop resets backoff, which is better for reestablishing connection, though it would result in more unsuccessful attempts to connect in case of losing connection.

@var-const var-const assigned rsgowman and unassigned var-const Jan 28, 2019
@wilhuff
Copy link
Contributor

wilhuff commented Jan 28, 2019

Also, don't forget the changelog entry.

- apply renames to spec test
- fix broken test. Previously, the connectivity monitor callback would
  reset the network (enableNetwork(); disableNetwork();) which caused the
  network state to change. Now that we go via the internals, the state
  transitions from UNKNOWN to UNKNOWN, which doesn't trigger the
  *OnlineStateTracker* callback. (The right thing was still happening...
  we just couldn't detect it due to that callback not firing.)
  Overriding the state "fixes" this.
}

@Override
public void onLost(Network network) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't expect them to be called twice in a row under normal circumstances, at least, not for the same network. But when going into (or coming out of) airplane mode (while in a wifi area) it seems possible for onLost to be called twice; once for wifi and again for cell.

The behaviour we'd take in response is to reset our streams twice (which I think is correct, since cell comes up first, and android will kill cell connections shortly after wifi is re-established.) Worst case: we do a little extra work when the networks are changing.

Does this seem reasonable, or am I missing a case?

waitFor(networkChangeSemaphore);
drain(testQueue);

waitFor(testQueue.enqueue(() -> remoteStore.forceEnableNetwork()));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't super wild about this test before, and now I'm liking it even less. It doesn't quite test what we want (though actually testing that the network recovers quickly when coming out of airplane mode is a bit challenging since you can't programatically alter the airplane mode state in api17+.) I may have to rethink this a bit... But in the meantime, if anyone has a shorter-term idea for making this test better, let me know.

@rsgowman
Copy link
Member Author

re changelog: done (thanks)

private final List<Consumer<NetworkStatus>> callbacks = new ArrayList<>();

public AndroidConnectivityMonitor(Context context) {
// This notnull restriction could be eliminated... the pre-N method doesn't
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A possible future refactoring (not for this PR) that would address this would just be to split this into two classes with a shared callback manager helper.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW: AndroidChannelBuilder resolves this by (a) checking for null before setting the connectivityManager, and then (b) checking if connectivityManager is null prior to using the N+ variant. (I've preserved that check, though it's not technically necessary here. See line 64.)

Copy link
Contributor

@wilhuff wilhuff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@VisibleForTesting
void forceEnableNetwork() {
enableNetwork();
onlineStateTracker.updateState(OnlineState.ONLINE);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be better handled by injecting the onlineStateTracker and allowing the test to manipulate it directly. Let's refactor later though.

}
}

private void restartNetwork() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider moving this closer to the other network-controlling methods like disableNetworkInternal so that it's more obvious that a choice is available.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@wilhuff
Copy link
Contributor

wilhuff commented Jan 29, 2019

Oh and even though github isn't flagging this as conflicting, it's probably worth merging master here before squashing because there is a conflict it's not flagging on the CHANGELOG.

(The SQLite blob thing is in Unreleased section in master and should be moved into the 18.0.1 section you've created.)

@var-const var-const removed their assignment Jan 29, 2019
We don't care about this in production, but our integration tests fail
due to 'Too many NetworkRequests filed' because of the way they create
and tear down firestore as a whole.
Copy link
Contributor

@wilhuff wilhuff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

We do care about this in more than the integration tests (e.g. google3-internal conformance tests), and eventually we need to support Firebase-wide app deletion so this is a worthwhile addition regardless.

@@ -58,21 +59,37 @@ public void addCallback(Consumer<NetworkStatus> callback) {
callbacks.add(callback);
}

@Override
public void shutdown() {
if (unregisterRunnable != null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine for now, but we should split these two versions so that we're not manually implementing virtual functions just to keep these two different implementations in the same class.

@rsgowman rsgowman merged commit 1a39d8c into master Jan 29, 2019
@rsgowman rsgowman deleted the rsgowman/reconnect_faster branch January 29, 2019 20:07
@firebase firebase locked and limited conversation to collaborators Oct 12, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
cla: yes Override cla size/L
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants