-
Notifications
You must be signed in to change notification settings - Fork 14.4k
[lldb] Fix broken pipe error #127100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[lldb] Fix broken pipe error #127100
Conversation
@llvm/pr-subscribers-lldb Author: Georgiy Samoylov (sga-sc) ChangesDuring LLDB testing on slow machines with the remote-linux platform all tests from llgs category fail with python exception List of failed tests:
This patch implements an additional check for the opened socket on lldb-server side and fixes this error. Full diff: https://github.com/llvm/llvm-project/pull/127100.diff 1 Files Affected:
diff --git a/lldb/packages/Python/lldbsuite/test/tools/lldb-server/gdbremote_testcase.py b/lldb/packages/Python/lldbsuite/test/tools/lldb-server/gdbremote_testcase.py
index cbe430c92fa7f..8ee41563da59d 100644
--- a/lldb/packages/Python/lldbsuite/test/tools/lldb-server/gdbremote_testcase.py
+++ b/lldb/packages/Python/lldbsuite/test/tools/lldb-server/gdbremote_testcase.py
@@ -343,6 +343,22 @@ def get_target_byte_order(self):
target = self.dbg.CreateTarget(inferior_exe_path)
return target.GetByteOrder()
+ def is_port_opened(self):
+ connect_port = self.port
+
+ err, retcode, cmd_output = self.run_platform_command(f"netstat -ltn | grep {connect_port} | grep LISTEN")
+
+ self.assertTrue(
+ err.Success(),
+ "Failed to get opened tcp sockets: %s, retcode: %d"
+ % (err.GetCString(), retcode),
+ )
+
+ if retcode == 0:
+ return True
+ else:
+ return False
+
def launch_debug_monitor(self, attach_pid=None, logfile=None):
if self.reverse_connect:
family, type, proto, _, addr = socket.getaddrinfo(
@@ -397,21 +413,23 @@ def connect_to_debug_monitor(self, attach_pid=None):
# Schedule debug monitor to be shut down during teardown.
logger = self.logger
- connect_attemps = 0
+ connect_attempts = 0
MAX_CONNECT_ATTEMPTS = 10
- while connect_attemps < MAX_CONNECT_ATTEMPTS:
- # Create a socket to talk to the server
- try:
- logger.info("Connect attempt %d", connect_attemps + 1)
- self.sock = self.create_socket()
- self._server = Server(self.sock, server)
- return server
- except _ConnectionRefused as serr:
- # Ignore, and try again.
- pass
+ while connect_attempts < MAX_CONNECT_ATTEMPTS:
+ if self.is_port_opened():
+ # Create a socket to talk to the server
+ try:
+ logger.info("Connect attempt %d", connect_attempts + 1)
+ self.sock = self.create_socket()
+ self._server = Server(self.sock, server)
+ return server
+ except _ConnectionRefused as serr:
+ # Ignore, and try again.
+ pass
+
time.sleep(0.5)
- connect_attemps += 1
+ connect_attempts += 1
# We should close the server here to be safe.
server.terminate()
|
✅ With the latest revision this PR passed the Python code formatter. |
def is_port_opened(self): | ||
connect_port = self.port | ||
|
||
err, retcode, cmd_output = self.run_platform_command(f"netstat -ltn | grep {connect_port} | grep LISTEN") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to do this from Python? This adds a hidden dependency on netstat
, which isn't available on all supported platforms (i.e. Windows). Similarly, don't pipe things to grep, do the filtering in Python.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem here is that the command executes on the remote machine, so they only way to access it is via the SBPlatform API. (But I also think all of this shouldn't be necessary)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with Jonas about dependency on netstat
. But llgs tests themselves are very linux-oriented:
llvm-project/lldb/packages/Python/lldbsuite/test/tools/lldb-server/gdbremote_testcase.py
Line 199 in 8844cd6
# FIXME: This is extremely linux-oriented |
I also agree that filtering should be done using Python
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That may be true, but I don't think its a reason to make things worse, particularly if we don't understand why does this work. I'll also note that the code you point to in a block of code that does not execute on darwin (the other main user of these tests) while this code does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed this function
I'm thoroughly confused by this. The idea is that if the port is not yet open, then |
I agree with Jonas and Michał. I think we should take a step back. Can you explain where exactly does this BrokenPipeError come from (an exception backtrace for example)? The code here assumes that we get a "connection refused" error if the port wasn't open. Maybe we need to catch "broken pipe" as well (but first I want to understand how does it get triggered). And if that isn't enough, we can increase the sleep time or change it into an exponential backoff thingy. |
# Create a socket to talk to the server | ||
try: | ||
logger.info("Connect attempt %d", connect_attemps + 1) | ||
self.sock = self.create_socket() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Python somehow manages to create a socket when the server-side port is not yet open. Because of this, the first time we try to read data from the socket the code crashes with an exception. For example TestGdbRemoteHostInfo.py
:
4: test_qHostInfo_returns_at_least_one_key_val_pair_llgs (TestGdbRemoteHostInfo.TestGdbRemoteHostInfo.test_qHostInfo_returns_at_least_one_key_val_pair_llgs) ... ERROR
FAIL: LLDB (/home/jorik/work/llvm-project/build/Release/bin/clang-rv64gc) :: test_qHostInfo_returns_at_least_one_key_val_pair_llgs (TestGdbRemoteHostInfo.TestGdbRemoteHostInfo.test_qHostInfo_returns_at_least_one_key_val_pair_llgs)
2025-02-14 12:04:31,718 WARNING failed to send kill packet to debug monitor: <class 'BrokenPipeError'>; ignoring
======================================================================
ERROR: test_qHostInfo_returns_at_least_one_key_val_pair_llgs (TestGdbRemoteHostInfo.TestGdbRemoteHostInfo.test_qHostInfo_returns_at_least_one_key_val_pair_llgs)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/jorik/work/llvm-project/lldb/packages/Python/lldbsuite/test/tools/lldb-server/gdbremote_testcase.py", line 48, in test_method
return attrvalue(self)
^^^^^^^^^^^^^^^
File "/home/jorik/work/llvm-project/lldb/test/API/tools/lldb-server/TestGdbRemoteHostInfo.py", line 122, in test_qHostInfo_returns_at_least_one_key_val_pair
self.get_qHostInfo_response()
File "/home/jorik/work/llvm-project/lldb/test/API/tools/lldb-server/TestGdbRemoteHostInfo.py", line 90, in get_qHostInfo_response
self.do_handshake()
File "/home/jorik/work/llvm-project/lldb/packages/Python/lldbsuite/test/tools/lldb-server/gdbremote_testcase.py", line 546, in do_handshake
self.assertEqual(server.get_normal_packet(), b"+")
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jorik/work/llvm-project/lldb/packages/Python/lldbsuite/test/tools/lldb-server/lldbgdbserverutils.py", line 943, in get_normal_packet
frame = self.get_raw_normal_packet()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jorik/work/llvm-project/lldb/packages/Python/lldbsuite/test/tools/lldb-server/lldbgdbserverutils.py", line 932, in get_raw_normal_packet
return self._read(self._normal_queue)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jorik/work/llvm-project/lldb/packages/Python/lldbsuite/test/tools/lldb-server/lldbgdbserverutils.py", line 885, in _read
new_bytes = self._sock.recv(4096)
^^^^^^^^^^^^^^^^^^^^^
ConnectionResetError: [Errno 104] Connection reset by peer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is interesting, but doesn't quite explain how could the connection be established without the other side being present. We may need to dig deeper.
One way I can see this happening is if there is some kind of a port forwarder sitting between the test and the server. Are you sure you don't have one of those?
We already have a hackaround for that, but it's only enabled on android. Could you check what happens if you remove the android check (and maybe increase the timeout in the sleep call)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@labath Thank you for your advice! After removing check for android platform tests passed. I execute tests on qemu-system-riscv64
and it's port forwarding behaves probably like android's (judging by passed tests). What should I do with this check for android?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool. Glad we could sort that out. I think we can remove that android check (and reword the comment to be more generic). I think this is a general property/design bug of port forwarders (they only attempt the outgoing connection after they receive the incoming one, at which point it is too late to abort, so the only thing they really can do is drop the connection). The only reason that check is android-specific is because we didn't need it anywhere else (yet).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed.
8844cd6
to
9aab567
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Thanks. (But be sure to wait for Jonas since he clicked the "request changes" button)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Thank you for a review, guys, can you merge it, please? I don't have merge rights |
During LLDB testing on slow machines with the remote-linux platform all tests from llgs category fail with python exception `BrokenPipeError`. The main reason of these failures is slow start of lldb-server in gdbserver mode. Due to this desired gdbserver socket does not have time to open by the time the Python script tries to establish a connection. List of failed tests: ``` TestAppleSimulatorOSType.py TestGdbRemoteAttach.py TestGdbRemoteAuxvSupport.py TestGdbRemoteCompletion.py TestGdbRemoteExitCode.py TestGdbRemoteExpeditedRegisters.py TestGdbRemoteHostInfo.py TestGdbRemoteKill.py TestGdbRemoteLaunch.py TestGdbRemoteModuleInfo.py TestGdbRemotePlatformFile.py TestGdbRemoteProcessInfo.py TestGdbRemoteRegisterState.py TestGdbRemoteSaveCore.py TestGdbRemoteSingleStep.py TestGdbRemoteThreadsInStopReply.py TestGdbRemote_qThreadStopInfo.py TestGdbRemote_vCont.py TestLldbGdbServer.py TestNonStop.py TestPtyServer.py TestGdbRemoteAttachWait.py TestGdbRemoteConnection.py TestStubSetSID.py TestGdbRemoteAbort.py TestGdbRemoteSegFault.py TestGdbRemoteLibrariesSvr4Support.py TestGdbRemoteMemoryAllocation.py TestGdbRemoteMemoryTagging.py TestGdbRemoteGPacket.py TestGdbRemoteTargetXmlPacket.py TestGdbRemote_QPassSignals.py TestGdbRemoteThreadName.py TestPartialResume.py TestSignal.py ``` This patch implements an additional check for the opened socket on lldb-server side and fixes this error.
During LLDB testing on slow machines with the remote-linux platform all tests from llgs category fail with python exception `BrokenPipeError`. The main reason of these failures is slow start of lldb-server in gdbserver mode. Due to this desired gdbserver socket does not have time to open by the time the Python script tries to establish a connection. List of failed tests: ``` TestAppleSimulatorOSType.py TestGdbRemoteAttach.py TestGdbRemoteAuxvSupport.py TestGdbRemoteCompletion.py TestGdbRemoteExitCode.py TestGdbRemoteExpeditedRegisters.py TestGdbRemoteHostInfo.py TestGdbRemoteKill.py TestGdbRemoteLaunch.py TestGdbRemoteModuleInfo.py TestGdbRemotePlatformFile.py TestGdbRemoteProcessInfo.py TestGdbRemoteRegisterState.py TestGdbRemoteSaveCore.py TestGdbRemoteSingleStep.py TestGdbRemoteThreadsInStopReply.py TestGdbRemote_qThreadStopInfo.py TestGdbRemote_vCont.py TestLldbGdbServer.py TestNonStop.py TestPtyServer.py TestGdbRemoteAttachWait.py TestGdbRemoteConnection.py TestStubSetSID.py TestGdbRemoteAbort.py TestGdbRemoteSegFault.py TestGdbRemoteLibrariesSvr4Support.py TestGdbRemoteMemoryAllocation.py TestGdbRemoteMemoryTagging.py TestGdbRemoteGPacket.py TestGdbRemoteTargetXmlPacket.py TestGdbRemote_QPassSignals.py TestGdbRemoteThreadName.py TestPartialResume.py TestSignal.py ``` This patch implements an additional check for the opened socket on lldb-server side and fixes this error.
During LLDB testing on slow machines with the remote-linux platform all tests from llgs category fail with python exception
BrokenPipeError
. The main reason of these failures is slow start of lldb-server in gdbserver mode. Due to this desired gdbserver socket does not have time to open by the time the Python script tries to establish a connection.List of failed tests:
This patch implements an additional check for the opened socket on lldb-server side and fixes this error.