Skip to content

PYTHON-2363 Rate limit new connection creations via maxConnecting #511

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Nov 23, 2020

Conversation

ShaneHarvey
Copy link
Member

At most 2 connections can be in the pending state per connection pool.
The pending state covers all the work required to setup a new connection
including TCP, TLS, and MongoDB authentication. For example, if two
threads are currently creating connections, a third thread will wait for
either an existing connection to be checked back into the pool or for
one of the two threads to finish creating a connection.

The change reduces the likelihood of connection storms and improves the
driver's ability to reuse existing connections.

@@ -303,7 +316,7 @@ def __init__(self, max_pool_size=MAX_POOL_SIZE,
wait_queue_multiple=None, ssl_context=None,
ssl_match_hostname=True, socket_keepalive=True,
event_listeners=None, appname=None, driver=None,
compression_settings=None):
compression_settings=None, max_connecting=MAX_CONNECTING):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't configurable by users but I think it still made sense to add max_connecting to PoolOptions.

pymongo/pool.py Outdated
while (self._pending >= self._max_connecting and
not self.sockets):
if self.opts.wait_queue_timeout:
# TODO: What if timeout is <= zero here?
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still an open question. I'll have to give it some more thought.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved after talking offline with Prashant. A <=0 timeout is fine, it means the operation really did timeout in the wait queue.

thread.join(10)

self.assertEqual(len(docs), 50)
self.assertLessEqual(len(pool.sockets), 50)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test would be more useful if it asserted that we created way less than 50 connections, like this:

        self.assertLessEqual(len(pool.sockets), 20)

But it would make the test more flaky, especially with noauth/nossl where connection creation is much faster. Hmm perhaps I can try changing this to:

if ssl and auth enabled:
        self.assertLessEqual(len(pool.sockets), 20)
else:
        self.assertLessEqual(len(pool.sockets), 50)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Waiting on EVG test results to make sure this change doesn't make the test flakey.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice test!

@@ -58,17 +58,32 @@ to 100. If there are ``maxPoolSize`` connections to a server and all are in
use, the next request to that server will wait until one of the connections
becomes available.

The client instance opens one additional socket per server in your MongoDB
The client instance opens two additional sockets per server in your MongoDB
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is because of the 4.4+ streaming SDAM behavior which requires 2 sockets per server.

@ShaneHarvey
Copy link
Member Author

Rebased to fix the geoSearch test failures (PYTHON-2421).

Copy link
Contributor

@prashantmital prashantmital left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM assuming EVG passes :)

- one of the first two threads finishes creating a connection, or
- an existing connection is checked back into the pool.

Rate limiting concurrent connection creation reduces the likelihood of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also say that users who are concerned about any additional latency this might result in can increase their minPoolSize to reduce the likelihood of having threads wait to create new connections?

Also, should we add a note about the impact this has at application start time when a minPoolSize is set? Presumably pools take longer to initialize in this case though I am not sure what impace this might have on applications.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather not suggest increasing minPoolSize. We actually want threads to wait a bit before creating connections because it yields better throughput and latency in most cases (as evidenced by test_maxConnecting).

Also, should we add a note about the impact this has at application start time when a minPoolSize is set? Presumably pools take longer to initialize in this case though I am not sure what impace this might have on applications.

Pools shouldn't take longer because minPoolSize connection creation is (and always was) single threaded. MongoClient only creates a single connection for a single pool at once.

if PY3:
def _cond_wait(condition, deadline):
timeout = deadline - _time() if deadline else None
return condition.wait(timeout)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets add a comment here that wait() treats negative timeouts same as timeout=0?

thread.join(10)

self.assertEqual(len(docs), 50)
self.assertLessEqual(len(pool.sockets), 50)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice test!

Copy link
Contributor

@prashantmital prashantmital left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, it looks like something about how threads work on Python 2 is causing one of the spec tests to fail.

@ShaneHarvey
Copy link
Member Author

Investigating this Python 2 test failure:

 [2020/11/13 20:34:45.813] Failed test: u'threads blocked by maxConnecting check out returned connections'
 [2020/11/13 20:34:45.813] Operations:
 [2020/11/13 20:34:45.813] {u'name': u'checkOut', u'label': u'conn0'}
 [2020/11/13 20:34:45.813] {u'name': u'start', u'target': u'thread1'}
 [2020/11/13 20:34:45.813] {u'name': u'checkOut', u'thread': u'thread1'}
 [2020/11/13 20:34:45.813] {u'name': u'start', u'target': u'thread2'}
 [2020/11/13 20:34:45.813] {u'name': u'checkOut', u'thread': u'thread2'}
 [2020/11/13 20:34:45.813] {u'name': u'start', u'target': u'thread3'}
 [2020/11/13 20:34:45.813] {u'name': u'checkOut', u'thread': u'thread3'}
 [2020/11/13 20:34:45.813] {u'count': 4, u'name': u'waitForEvent', u'event': u'ConnectionCheckOutStarted'}
 [2020/11/13 20:34:45.813] {u'name': u'wait', u'ms': 100}
 [2020/11/13 20:34:45.813] {u'connection': u'conn0', u'name': u'checkIn'}
 [2020/11/13 20:34:45.813] {u'count': 4, u'name': u'waitForEvent', u'event': u'ConnectionCheckedOut'}
 [2020/11/13 20:34:45.813] Threads:
 [2020/11/13 20:34:45.813] {u'thread3': <SpecRunnerThread(thread3, started daemon 139906613507840)>, u'thread2': <SpecRunnerThread(thread2, started daemon 139906133255936)>, u'thread1': <SpecRunnerThread(thread1, started daemon 139906143745792)>}
 [2020/11/13 20:34:45.813] Connections:
 [2020/11/13 20:34:45.813] {u'conn0': SocketInfo(<ssl.SSLSocket object at 0x7f3e99839150>) at 139906839696336}
 [2020/11/13 20:34:45.813] Events:
 [2020/11/13 20:34:45.813] PoolCreatedEvent(('localhost', 27017), {'maxPoolSize': 10, 'waitQueueTimeoutMS': 5000.0})
 [2020/11/13 20:34:45.813] ConnectionCheckOutStartedEvent(('localhost', 27017))
 [2020/11/13 20:34:45.813] ConnectionCreatedEvent(('localhost', 27017), 1)
 [2020/11/13 20:34:45.813] ConnectionReadyEvent(('localhost', 27017), 1)
 [2020/11/13 20:34:45.813] ConnectionCheckedOutEvent(('localhost', 27017), 1)
 [2020/11/13 20:34:45.813] ConnectionCheckOutStartedEvent(('localhost', 27017))
 [2020/11/13 20:34:45.813] ConnectionCreatedEvent(('localhost', 27017), 2)
 [2020/11/13 20:34:45.813] ConnectionCheckOutStartedEvent(('localhost', 27017))
 [2020/11/13 20:34:45.813] ConnectionCreatedEvent(('localhost', 27017), 3)
 [2020/11/13 20:34:45.813] ConnectionReadyEvent(('localhost', 27017), 3)
 [2020/11/13 20:34:45.813] ConnectionCheckedOutEvent(('localhost', 27017), 3)
 [2020/11/13 20:34:45.813] ConnectionReadyEvent(('localhost', 27017), 2)
 [2020/11/13 20:34:45.813] ConnectionCheckedOutEvent(('localhost', 27017), 2)
 [2020/11/13 20:34:45.813] ConnectionCheckOutStartedEvent(('localhost', 27017))
 [2020/11/13 20:34:45.813] ConnectionCreatedEvent(('localhost', 27017), 4)
 [2020/11/13 20:34:45.813] ConnectionCheckedInEvent(('localhost', 27017), 1)
 [2020/11/13 20:34:45.813] ConnectionReadyEvent(('localhost', 27017), 4)
 [2020/11/13 20:34:45.813] ConnectionCheckedOutEvent(('localhost', 27017), 4)
 [2020/11/13 20:34:45.813]   test_cmap_pool_checkout_returned_connection_maxConnecting_threads_blocked_by_maxConnecting_check_out_returned_connections (test_cmap.TestCMAP) ... FAIL (12.421s)
...
 [2020/11/13 20:40:03.560] FAIL [12.421s]: test_cmap_pool_checkout_returned_connection_maxConnecting_threads_blocked_by_maxConnecting_check_out_returned_connections (test_cmap.TestCMAP)
 [2020/11/13 20:40:03.560] ----------------------------------------------------------------------
 [2020/11/13 20:40:03.560] Traceback (most recent call last):
 [2020/11/13 20:40:03.560]   File "/data/mci/c71b1abdc304e836c91992a1aa44d4d1/src/test/__init__.py", line 444, in wrap
 [2020/11/13 20:40:03.560]     return f(*args, **kwargs)
 [2020/11/13 20:40:03.560]   File "/data/mci/c71b1abdc304e836c91992a1aa44d4d1/src/test/test_cmap.py", line 397, in run_scenario
 [2020/11/13 20:40:03.560]     self.run_scenario(scenario_def, test)
 [2020/11/13 20:40:03.560]   File "/data/mci/c71b1abdc304e836c91992a1aa44d4d1/src/test/test_cmap.py", line 238, in run_scenario
 [2020/11/13 20:40:03.560]     self.check_events(test['events'], test['ignore'])
 [2020/11/13 20:40:03.560]   File "/data/mci/c71b1abdc304e836c91992a1aa44d4d1/src/test/test_cmap.py", line 175, in check_events
 [2020/11/13 20:40:03.560]     self.check_event(actual, expected)
 [2020/11/13 20:40:03.560]   File "/data/mci/c71b1abdc304e836c91992a1aa44d4d1/src/test/test_cmap.py", line 163, in check_event
 [2020/11/13 20:40:03.560]     self.check_object(actual, expected)
 [2020/11/13 20:40:03.560]   File "/data/mci/c71b1abdc304e836c91992a1aa44d4d1/src/test/test_cmap.py", line 150, in check_object
 [2020/11/13 20:40:03.560]     self.assertEqual(type(actual), OBJECT_TYPES[expected['type']])
 [2020/11/13 20:40:03.560] AssertionError: <class 'pymongo.monitoring.ConnectionCheckedOutEvent'> != <class 'pymongo.monitoring.ConnectionCheckedInEvent'>

At most 2 connections can be in the pending state per connection pool.
The pending state covers all the work required to setup a new connection
including TCP, TLS, and MongoDB authentication. For example, if two
threads are currently creating connections, a third thread will wait for
either an existing connection to be checked back into the pool or for
one of the two threads to finish creating a connection.

The change reduces the likelihood of connection storms and improves the
driver's ability to reuse existing connections.
… the operation really did timeout in the wait queue
@ShaneHarvey
Copy link
Member Author

Rebased without any changes.

@ShaneHarvey
Copy link
Member Author

I still see a few test failures:

I made a slight change to the maxConnecting timeout behavior but I'm not convinced it will solve the failures. I more so suspect that Python 2 (especially Jython) is failing frequently because threading is worse there and Condition variables are more expensive (On Python 2 they're implemented as spin locks). This creates extra contention in the pool and causes the timeouts seen in the tests.

Since we're dropping Python 2 support soon and this feature isn't required in PyMongo 3.x I think we should just merge the changes as is. @prashantmital any final thoughts?

@ShaneHarvey
Copy link
Member Author

ShaneHarvey commented Nov 23, 2020

@ShaneHarvey ShaneHarvey merged commit d1fd3f7 into mongodb:master Nov 23, 2020
@ShaneHarvey ShaneHarvey deleted the PYTHON-2363-final branch November 23, 2020 23:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants