[lldb] Added an exponential algorithm for the sleep time in connect_to_debug_monitor() #118222

slydiman · 2024-12-01T17:38:44Z

See #118032 for details.

See llvm#118032 for details.

llvmbot · 2024-12-01T17:39:06Z

@llvm/pr-subscribers-lldb

Author: Dmitry Vasilyev (slydiman)

Changes

See #118032 for details.

Full diff: https://github.com/llvm/llvm-project/pull/118222.diff

1 Files Affected:

(modified) lldb/packages/Python/lldbsuite/test/tools/lldb-server/gdbremote_testcase.py (+1-1)

diff --git a/lldb/packages/Python/lldbsuite/test/tools/lldb-server/gdbremote_testcase.py b/lldb/packages/Python/lldbsuite/test/tools/lldb-server/gdbremote_testcase.py
index 8c8e4abed0b454..67b07ff4ddd998 100644
--- a/lldb/packages/Python/lldbsuite/test/tools/lldb-server/gdbremote_testcase.py
+++ b/lldb/packages/Python/lldbsuite/test/tools/lldb-server/gdbremote_testcase.py
@@ -388,7 +388,7 @@ def connect_to_debug_monitor(self, attach_pid=None):
         # We're using a random port algorithm to try not to collide with other ports,
         # and retry a max # times.
         attempts = 0
-        MAX_ATTEMPTS = 20
+        MAX_ATTEMPTS = 30
 
         while attempts < MAX_ATTEMPTS:
             server = self.launch_debug_monitor(attach_pid=attach_pid)

labath · 2024-12-02T09:58:43Z

20 attempts is already more than what I would consider reasonable. Increasing it doesn't sound like a good idea. If the problem is lingering sockets, then I think a better approach is to slow down the rate of socket creation, not churn out more of them.

While this code tries to back off, it doesn't implement the (standard?) exponential backoff algorithm. I think that's important, as without it, it just ends up churning out more and more connections at a relative fast rate (time.sleep(random.randint(1, 5)) gives you an average of 3s, so this may even be faster than "running a test" -- which also generates a new socket. And since this happens a few times, I wouldn't be surprised if that's during the times that the random generator generates a locally-lower-than-average sequence of random numbers).

So, the thing I'd do is use a proper exponential algorithm and reduce the retry count (so that the total wait time ends up being slightly more than what it is now), and also increase the range of ports that we're picking from (I don't know how the 12000 + random.randint(0, 3999) thingy was chosen, but I think we could increase that any issues).

(Of course, a much better solution would be to either let lldb-server pick a free port (and communicate it back to the test somehow) or to reuse the socket multiplexing approach we (you) recently implemented. That may not work for all tests (we still need to test the user-provided-port scenario somehow), but 99% of the tests don't care about the connection method, o if we can solve the problem for the majority of tests, then this alleviates the pressure on the minority that does need it.)

…TS to 16. The sleep time will be 3, 3.6, 4.3, ..., 38, 46. The total sleep time is 262 + 0.5*MAX_CONNECT_ATTEMPTS*MAX_ATTEMPTS = 342. Note the timeout is 600, so the rest time must be enough. Increased the port range (12000..20000).

slydiman · 2024-12-02T11:18:33Z

Updated the patch:
Added an exponential algorithm for the sleep time. Reduced MAX_ATTEMPTS to 16.
The sleep time will be 3, 3.6, 4.3, ..., 38, 46.
The total sleep time is 262 + 0.5 * MAX_CONNECT_ATTEMPTS * MAX_ATTEMPTS = 342.
Note the timeout is 600, so the rest time must be enough.
Increased the port range (12000..20000).

labath

Yeah, that's the general idea, although 16 attempts @ 262s sounds a bit too much (that's like almost three times as long as the first version with 30*3s). I worry that the tests will take too long to time out if they not able to establish a connection due to some other reason (like, imagine a bad patch causes lldb-server to never accept a connection). I'd suggest starting out with 10attempts@77 seconds. That's still higher than the initial ~60 seconds, and I really hope that should be enough since doubling the port range sort of doubles the number of attempts as well (by halving the chance of collision).

JDevlieghere · 2024-12-02T16:54:57Z

Thanks, I suggested something similar in #118032.

API tests llvm-project/lldb/test/API/tools/lldb-server/* with the category `lldb-server` run lldb-server locally on the host with a random port. Sometimes we got errors like https://lab.llvm.org/buildbot/#/builders/195/builds/3145 https://lab.llvm.org/buildbot/#/builders/197/builds/312 The patch llvm/llvm-project#118222 improved the situation but did not fix it completely.

) API tests llvm-project/lldb/test/API/tools/lldb-server/* with the category `lldb-server` run lldb-server locally on the host with a random port. Sometimes we got errors like https://lab.llvm.org/buildbot/#/builders/195/builds/3145 https://lab.llvm.org/buildbot/#/builders/197/builds/312 The patch llvm/llvm-project#118222 improved the situation but did not fix it completely.

[lldb] Increase MAX_ATTEMPTS in connect_to_debug_monitor()

bb68e6d

See llvm#118032 for details.

slydiman added the lldb label Dec 1, 2024

slydiman requested review from DavidSpickett and labath December 1, 2024 17:38

slydiman requested a review from JDevlieghere as a code owner December 1, 2024 17:38

slydiman changed the title ~~[lldb] Increase MAX_ATTEMPTS in connect_to_debug_monitor()~~ [lldb] Added an exponential algorithm for the sleep time in connect_to_debug_monitor() Dec 2, 2024

labath approved these changes Dec 2, 2024

View reviewed changes

Reduced MAX_ATTEMPTS to 10.

40e9a0f

slydiman merged commit a951e76 into llvm:main Dec 2, 2024
7 checks passed

slydiman mentioned this pull request Dec 3, 2024

[lldb][tests] Sockets leaks in API tests with a remote target #118032

Closed

slydiman mentioned this pull request Jan 7, 2025

Disable the test category lldb-server for lldb-remote-* builders llvm/llvm-zorg#353

Merged

slydiman deleted the connect_to_debug_monitor_MAX_ATTEMPTS branch April 18, 2025 15:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[lldb] Added an exponential algorithm for the sleep time in connect_to_debug_monitor() #118222

[lldb] Added an exponential algorithm for the sleep time in connect_to_debug_monitor() #118222

Uh oh!

slydiman commented Dec 1, 2024

Uh oh!

llvmbot commented Dec 1, 2024

Uh oh!

labath commented Dec 2, 2024

Uh oh!

slydiman commented Dec 2, 2024 •

edited

Loading

Uh oh!

labath left a comment

Uh oh!

Uh oh!

JDevlieghere commented Dec 2, 2024

Uh oh!

Uh oh!

[lldb] Added an exponential algorithm for the sleep time in connect_to_debug_monitor() #118222

[lldb] Added an exponential algorithm for the sleep time in connect_to_debug_monitor() #118222

Uh oh!

Conversation

slydiman commented Dec 1, 2024

Uh oh!

llvmbot commented Dec 1, 2024

Uh oh!

labath commented Dec 2, 2024

Uh oh!

slydiman commented Dec 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

labath left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

JDevlieghere commented Dec 2, 2024

Uh oh!

Uh oh!

slydiman commented Dec 2, 2024 •

edited

Loading