Skip to content

Fixing a Thread Leak in Database Client Code (Related to #29) #30

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 18, 2017

Conversation

hiranya911
Copy link
Contributor

Creating Websocket reader and writer threads lazily to avoid the thread leak described in #29; Handling thread interrupts from Websocket receiver thread (not comprehensive due to the blocking nature of IO calls; Reducing the default jitter factor to 25%

…ad leak described in #29; Handling thread interrupts from Websocket receiver thread (not comprehensive due to the blocking nature of IO calls; Reducing the default jitter factor to 25%
Copy link

@mikelehen mikelehen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically looks good!

@@ -139,7 +139,7 @@ public PersistentConnectionImpl(
.withMinDelayAfterFailure(1000)
.withRetryExponent(1.3)
.withMaxDelay(30 * 1000)
.withJitterFactor(0.7)
.withJitterFactor(0.25)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I mildly object to changing this since:

  1. I don't see how playing with timing numbers is going to fix it... It's like adding a sleep to fix a race condition... It may work (sometimes), but it's not a good long-term fix. :-)

  2. The goal of the jitter is to "smear" reconnecting clients over a larger period of time so they don't all hit the backend at the same times after a server restarts / has downtime / whatever. So in theory, this change would more than double the load on the server, since we're smearing over 25% of the time interval rather than 70%, and so we should talk to the backend team to make sure they're okay with this change.

But #1 isn't harmful and in practice #2 is likely inconsequential because the number of clients using the admin SDK will be extremely small compared to the number of clients using the mobile / web clients. So backend load isn't really an issue, and so if we want more predictable reconnect intervals, I guess it's fine. :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems both thread creation and termination are very expensive in App Engine. Both involve RPC calls and heavy backend processing. One implication of this is that threads don't really die when our Runnables return (it seems). There's some internal cleanup that needs to happen from GAE-end. Therefore having some breathing room between thread termination and creation helps. In my tests I observed a noticeable difference in error handling behavior when the reconnects are far apart in time.

However, those tests were conducted using an exponential backoff factor of 2. So perhaps dropping the jitter factor alone may not be enough. I'll revert this change for now. We can revisit it in the future, if we continue to have issues in GAE.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW-

  1. To test your other changes, it might be interesting to set the initial delay to 0 and the backoff factor to 1 temporarily to make sure reconnect timing is no longer a factor (obviously don't leave it deployed like this for an extended period of time :-)).

  2. I'd be more comfortable increasing the initial reconnect time than the backoff factor... and if we decide that GAE is weird enough that we want to minimize the thread creation for reconnect attempts, we could change these settings. But I'd at least accompany it with some comments explaining the rationale for the numbers (i.e. have a specific target for the frequency with which we will tolerate creating / destroying threads).

state = State.CONNECTING;
getInnerThread().start();
start();
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Since the code style here seems to be mix public and private methods, putting helpers close to where they're used, I'd put the start() method directly below here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -32,26 +34,20 @@
class WebSocketWriter {

private final Random random = new Random();
private final Thread innerThread;
private final String threadBaseName;
private final int clientId;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than add 2 members for these kinda' arbitrary values, can we just add a threadName member that we still compute in the constructor?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

innerThread.start();
}

void waitFor() throws InterruptedException {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

waitFor... what? waitForThreadExit() maybe?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to waitForTermination

@hiranya911 hiranya911 assigned hiranya911 and unassigned mikelehen May 16, 2017
@hiranya911
Copy link
Contributor Author

I've reverted the change to jitter factor. We can revisit it in the future. Other changes are done as proposed.

@hiranya911 hiranya911 assigned mikelehen and unassigned hiranya911 May 17, 2017
Copy link

@mikelehen mikelehen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants