Skip to content

[lldb] Fix broken pipe error #127100

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -249,14 +249,11 @@ def remove_port_forward():

def _verify_socket(self, sock):
# Normally, when the remote stub is not ready, we will get ECONNREFUSED during the
# connect() attempt. However, due to the way how ADB forwarding works, on android targets
# connect() attempt. However, due to the way how port forwarding can work, on some targets
# the connect() will always be successful, but the connection will be immediately dropped
# if ADB could not connect on the remote side. This function tries to detect this
# if we could not connect on the remote side. This function tries to detect this
# situation, and report it as "connection refused" so that the upper layers attempt the
# connection again.
triple = self.dbg.GetSelectedPlatform().GetTriple()
if not re.match(".*-.*-.*-android", triple):
return # Not android.
can_read, _, _ = select.select([sock], [], [], 0.1)
if sock not in can_read:
return # Data is not available, but the connection is alive.
Expand Down Expand Up @@ -397,21 +394,21 @@ def connect_to_debug_monitor(self, attach_pid=None):
# Schedule debug monitor to be shut down during teardown.
logger = self.logger

connect_attemps = 0
connect_attempts = 0
MAX_CONNECT_ATTEMPTS = 10

while connect_attemps < MAX_CONNECT_ATTEMPTS:
while connect_attempts < MAX_CONNECT_ATTEMPTS:
# Create a socket to talk to the server
try:
logger.info("Connect attempt %d", connect_attemps + 1)
logger.info("Connect attempt %d", connect_attempts + 1)
self.sock = self.create_socket()
Copy link
Contributor Author

@sga-sc sga-sc Feb 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Python somehow manages to create a socket when the server-side port is not yet open. Because of this, the first time we try to read data from the socket the code crashes with an exception. For example TestGdbRemoteHostInfo.py:

4: test_qHostInfo_returns_at_least_one_key_val_pair_llgs (TestGdbRemoteHostInfo.TestGdbRemoteHostInfo.test_qHostInfo_returns_at_least_one_key_val_pair_llgs) ... ERROR
FAIL: LLDB (/home/jorik/work/llvm-project/build/Release/bin/clang-rv64gc) :: test_qHostInfo_returns_at_least_one_key_val_pair_llgs (TestGdbRemoteHostInfo.TestGdbRemoteHostInfo.test_qHostInfo_returns_at_least_one_key_val_pair_llgs)
2025-02-14 12:04:31,718 WARNING  failed to send kill packet to debug monitor: <class 'BrokenPipeError'>; ignoring

======================================================================
ERROR: test_qHostInfo_returns_at_least_one_key_val_pair_llgs (TestGdbRemoteHostInfo.TestGdbRemoteHostInfo.test_qHostInfo_returns_at_least_one_key_val_pair_llgs)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jorik/work/llvm-project/lldb/packages/Python/lldbsuite/test/tools/lldb-server/gdbremote_testcase.py", line 48, in test_method
    return attrvalue(self)
           ^^^^^^^^^^^^^^^
  File "/home/jorik/work/llvm-project/lldb/test/API/tools/lldb-server/TestGdbRemoteHostInfo.py", line 122, in test_qHostInfo_returns_at_least_one_key_val_pair
    self.get_qHostInfo_response()
  File "/home/jorik/work/llvm-project/lldb/test/API/tools/lldb-server/TestGdbRemoteHostInfo.py", line 90, in get_qHostInfo_response
    self.do_handshake()
  File "/home/jorik/work/llvm-project/lldb/packages/Python/lldbsuite/test/tools/lldb-server/gdbremote_testcase.py", line 546, in do_handshake
    self.assertEqual(server.get_normal_packet(), b"+")
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jorik/work/llvm-project/lldb/packages/Python/lldbsuite/test/tools/lldb-server/lldbgdbserverutils.py", line 943, in get_normal_packet
    frame = self.get_raw_normal_packet()
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jorik/work/llvm-project/lldb/packages/Python/lldbsuite/test/tools/lldb-server/lldbgdbserverutils.py", line 932, in get_raw_normal_packet
    return self._read(self._normal_queue)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jorik/work/llvm-project/lldb/packages/Python/lldbsuite/test/tools/lldb-server/lldbgdbserverutils.py", line 885, in _read
    new_bytes = self._sock.recv(4096)
                ^^^^^^^^^^^^^^^^^^^^^
ConnectionResetError: [Errno 104] Connection reset by peer

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is interesting, but doesn't quite explain how could the connection be established without the other side being present. We may need to dig deeper.

One way I can see this happening is if there is some kind of a port forwarder sitting between the test and the server. Are you sure you don't have one of those?

We already have a hackaround for that, but it's only enabled on android. Could you check what happens if you remove the android check (and maybe increase the timeout in the sleep call)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@labath Thank you for your advice! After removing check for android platform tests passed. I execute tests on qemu-system-riscv64 and it's port forwarding behaves probably like android's (judging by passed tests). What should I do with this check for android?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. Glad we could sort that out. I think we can remove that android check (and reword the comment to be more generic). I think this is a general property/design bug of port forwarders (they only attempt the outgoing connection after they receive the incoming one, at which point it is too late to abort, so the only thing they really can do is drop the connection). The only reason that check is android-specific is because we didn't need it anywhere else (yet).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed.

self._server = Server(self.sock, server)
return server
except _ConnectionRefused as serr:
# Ignore, and try again.
pass
time.sleep(0.5)
connect_attemps += 1
connect_attempts += 1

# We should close the server here to be safe.
server.terminate()
Expand Down