Skip to content

[lldb-dap] Fix TestDap_attach.py flakiness #137278

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Apr 28, 2025

Conversation

kusmour
Copy link
Contributor

@kusmour kusmour commented Apr 25, 2025

Summary

This patch makes the request_attach wait for events process and initialized just like request_launch. This ensure the DAP session can move forward somewhat correctly.

Recently TestDap_attach.test_terminate_commands became flaky.
It's hitting:

lldbsuite/test/tools/lldb-dap/dap_server.py", line 350, in send_recv
    raise ValueError(desc)
ValueError: no response for "disconnect"

I took a look at the DAP msg from that test case and noticed:

  • It's not using the regular attaching, instead it's using the attachCommands to launch debug the binary and it will stop at entry.
  • The initialized event returned after the disconnect request. Which means lldb-dap didn't really get ready yet.

NOTE

The dap_server.py is doing things to mimic the VSCode (or other dap clients) but it had some assumptions. For example, it's still missing the configurationDone request and response because it relies on a continue action to trigger the configurationDone request.

Test Plan

./bin/llvm-lit -va /Users/wanyi/llvm-upstream/llvm-project/lldb/test/API/tools/lldb-dap/attach/TestDAP_attach.py
./bin/llvm-lit -va /Users/wanyi/llvm-upstream/llvm-project/lldb/test/API/tools/lldb-dap/launch/TestDAP_launch.py

To test the wait_for_events timeout case

events = self.wait_for_events(["process", "initialized", "fake", "event"], 1)
if events:
    raise ValueError(f'no events {",".join(events)} found for within timeout 1')

Observed
image

Also

Looks like some test cases should be re-enabled in 0b8dfb5
But only comments was removed. The skip statements survived the change.

@kusmour kusmour requested a review from JDevlieghere as a code owner April 25, 2025 00:55
@llvmbot llvmbot added the lldb label Apr 25, 2025
@llvmbot
Copy link
Member

llvmbot commented Apr 25, 2025

@llvm/pr-subscribers-lldb

Author: Wanyi (kusmour)

Changes

This patch makes the request_attach wait for events process and initialized just like request_launch. This ensure the DAP session can move forward somewhat correctly.

Recently TestDap_attach.test_terminate_commands became flaky.
It's hitting:

lldbsuite/test/tools/lldb-dap/dap_server.py", line 350, in send_recv
    raise ValueError(desc)
ValueError: no response for "disconnect"

I took a look at the DAP msg from that test case and noticed:

  • It's not using the regular attaching, instead it's using the attachCommands to launch debug the binary and it will stop at entry.
  • The initialized event returned after the disconnect request. Which means lldb-dap didn't really get ready yet.

NOTE: The dap_server.py is doing things to mimic the VSCode (or other dap clients) but it had some assumptions. For example, it's still missing the configurationDone request and response because it relies on a continue action to trigger the configurationDone request.


Full diff: https://github.com/llvm/llvm-project/pull/137278.diff

1 Files Affected:

  • (modified) lldb/packages/Python/lldbsuite/test/tools/lldb-dap/dap_server.py (+7-1)
diff --git a/lldb/packages/Python/lldbsuite/test/tools/lldb-dap/dap_server.py b/lldb/packages/Python/lldbsuite/test/tools/lldb-dap/dap_server.py
index dadf6b1f8774c..20f5286da3203 100644
--- a/lldb/packages/Python/lldbsuite/test/tools/lldb-dap/dap_server.py
+++ b/lldb/packages/Python/lldbsuite/test/tools/lldb-dap/dap_server.py
@@ -612,7 +612,13 @@ def request_attach(
         if gdbRemoteHostname is not None:
             args_dict["gdb-remote-hostname"] = gdbRemoteHostname
         command_dict = {"command": "attach", "type": "request", "arguments": args_dict}
-        return self.send_recv(command_dict)
+        response = self.send_recv(command_dict)
+
+        if response["success"]:
+            # Wait for a 'process' and 'initialized' event in any order
+            self.wait_for_event(filter=["process", "initialized"])
+            self.wait_for_event(filter=["process", "initialized"])
+        return response
 
     def request_breakpointLocations(
         self, file_path, line, end_line=None, column=None, end_column=None

@kusmour kusmour force-pushed the lldb-dap-flaky-test-attach branch 2 times, most recently from 956f2af to 2c5ff68 Compare April 25, 2025 02:01
@kusmour kusmour force-pushed the lldb-dap-flaky-test-attach branch from b87f1fe to 0e6d419 Compare April 25, 2025 04:13
@kusmour kusmour force-pushed the lldb-dap-flaky-test-attach branch from 0e6d419 to aa7540d Compare April 25, 2025 18:11
Copy link
Contributor

@dmpots dmpots left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I'm not too familiar with the dap protocol, so would be good to get another approver.

events = events[:] # Make a copy to avoid modifying the input
while events:
event = self.wait_for_event(filter=events, timeout=timeout)
events.remove(event["event"])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we hit a timeout, wont event be None?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah it looks like its not handling the timeout correctly.

events = events[:] # Make a copy to avoid modifying the input
while events:
event = self.wait_for_event(filter=events, timeout=timeout)
events.remove(event["event"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah it looks like its not handling the timeout correctly.

@DavidSpickett
Copy link
Collaborator

We (Linaro) are seeing the same error message with a different test on our AArch64 Linux bot:

ERROR: test_indexedVariables_with_raw_child_for_synthetics (TestDAP_variables.TestDAP_variables)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/llvm-project/lldb/packages/Python/lldbsuite/test/lldbtest.py", line 2065, in tearDown
    Base.tearDown(self)
  File "/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/llvm-project/lldb/packages/Python/lldbsuite/test/lldbtest.py", line 1105, in tearDown
    hook()  # try the plain call and hope it works
  File "/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/llvm-project/lldb/packages/Python/lldbsuite/test/tools/lldb-dap/lldbdap_testcase.py", line 422, in cleanup
    self.dap_server.request_disconnect(terminateDebuggee=True)
  File "/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/llvm-project/lldb/packages/Python/lldbsuite/test/tools/lldb-dap/dap_server.py", line 708, in request_disconnect
    return self.send_recv(command_dict)
  File "/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/llvm-project/lldb/packages/Python/lldbsuite/test/tools/lldb-dap/dap_server.py", line 338, in send_recv
    raise ValueError(desc)
ValueError: no response for "disconnect"

https://lab.llvm.org/buildbot/#/builders/59/builds/16771

Likely same cause.

kusmour added 2 commits April 28, 2025 14:25
Looks like these 2 test cases should be re-enabled in llvm@0b8dfb5
But only the comments got removed. The skip statement survived the change.
Summary:
Looks like some test cases should be re-enabled in llvm@0b8dfb5
But only comments was removed. The skip statements survived the change.

Test Plan:
./bin/llvm-lit -va /Users/wanyi/llvm-upstream/llvm-project/lldb/test/API/tools/lldb-dap/attach/TestDAP_attach.py
All tests passed locally on my mac
@kusmour kusmour force-pushed the lldb-dap-flaky-test-attach branch from aa7540d to f5f4ff8 Compare April 28, 2025 21:25
@kusmour kusmour force-pushed the lldb-dap-flaky-test-attach branch from f5f4ff8 to e747925 Compare April 28, 2025 21:56
events = events[:] # Make a copy to avoid modifying the input
while events:
event_dict = self.wait_for_event(filter=events, timeout=timeout)
if event_dict is None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want an explicit check that event_dict is None? Or would it be better to have a truth test like

if event_dict:
   ...

Copy link
Member

@JDevlieghere JDevlieghere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kusmour kusmour merged commit 47de54e into llvm:main Apr 28, 2025
10 checks passed
@kusmour kusmour deleted the lldb-dap-flaky-test-attach branch April 28, 2025 22:33
IanWood1 pushed a commit to IanWood1/llvm-project that referenced this pull request May 6, 2025
# Summary
This patch makes the `request_attach` wait for events `process` and
`initialized` just like `request_launch`. This ensure the DAP session
can move forward somewhat correctly.

Recently `TestDap_attach.test_terminate_commands` became flaky.
It's hitting:
```
lldbsuite/test/tools/lldb-dap/dap_server.py", line 350, in send_recv
    raise ValueError(desc)
ValueError: no response for "disconnect"
```
I took a look at the DAP msg from that test case and noticed:
- It's not using the regular attaching, instead it's using the
`attachCommands` to launch debug the binary and it will stop at entry.
- The `initialized` event returned after the `disconnect` request. Which
means lldb-dap didn't really get ready yet.

### NOTE
The `dap_server.py` is doing things to mimic the VSCode (or other dap
clients) but it had some assumptions. For example, it's still missing
the `configurationDone` request and response because it relies on a
continue action to trigger the `configurationDone` request.

# Test Plan
```
./bin/llvm-lit -va /Users/wanyi/llvm-upstream/llvm-project/lldb/test/API/tools/lldb-dap/attach/TestDAP_attach.py
./bin/llvm-lit -va /Users/wanyi/llvm-upstream/llvm-project/lldb/test/API/tools/lldb-dap/launch/TestDAP_launch.py
```

To test the `wait_for_events` timeout case
```
events = self.wait_for_events(["process", "initialized", "fake", "event"], 1)
if events:
    raise ValueError(f'no events {",".join(events)} found for within timeout 1')
```
Observed
<img width="696" alt="image"
src="https://github.com/user-attachments/assets/bc97c0ef-d91f-4561-8272-4d36f5f5d4e6"
/>

### Also
Looks like some test cases should be re-enabled in
llvm@0b8dfb5
But only comments was removed. The skip statements survived the change.
IanWood1 pushed a commit to IanWood1/llvm-project that referenced this pull request May 6, 2025
# Summary
This patch makes the `request_attach` wait for events `process` and
`initialized` just like `request_launch`. This ensure the DAP session
can move forward somewhat correctly.

Recently `TestDap_attach.test_terminate_commands` became flaky.
It's hitting:
```
lldbsuite/test/tools/lldb-dap/dap_server.py", line 350, in send_recv
    raise ValueError(desc)
ValueError: no response for "disconnect"
```
I took a look at the DAP msg from that test case and noticed:
- It's not using the regular attaching, instead it's using the
`attachCommands` to launch debug the binary and it will stop at entry.
- The `initialized` event returned after the `disconnect` request. Which
means lldb-dap didn't really get ready yet.

### NOTE
The `dap_server.py` is doing things to mimic the VSCode (or other dap
clients) but it had some assumptions. For example, it's still missing
the `configurationDone` request and response because it relies on a
continue action to trigger the `configurationDone` request.

# Test Plan
```
./bin/llvm-lit -va /Users/wanyi/llvm-upstream/llvm-project/lldb/test/API/tools/lldb-dap/attach/TestDAP_attach.py
./bin/llvm-lit -va /Users/wanyi/llvm-upstream/llvm-project/lldb/test/API/tools/lldb-dap/launch/TestDAP_launch.py
```

To test the `wait_for_events` timeout case
```
events = self.wait_for_events(["process", "initialized", "fake", "event"], 1)
if events:
    raise ValueError(f'no events {",".join(events)} found for within timeout 1')
```
Observed
<img width="696" alt="image"
src="https://github.com/user-attachments/assets/bc97c0ef-d91f-4561-8272-4d36f5f5d4e6"
/>

### Also
Looks like some test cases should be re-enabled in
llvm@0b8dfb5
But only comments was removed. The skip statements survived the change.
IanWood1 pushed a commit to IanWood1/llvm-project that referenced this pull request May 6, 2025
# Summary
This patch makes the `request_attach` wait for events `process` and
`initialized` just like `request_launch`. This ensure the DAP session
can move forward somewhat correctly.

Recently `TestDap_attach.test_terminate_commands` became flaky.
It's hitting:
```
lldbsuite/test/tools/lldb-dap/dap_server.py", line 350, in send_recv
    raise ValueError(desc)
ValueError: no response for "disconnect"
```
I took a look at the DAP msg from that test case and noticed:
- It's not using the regular attaching, instead it's using the
`attachCommands` to launch debug the binary and it will stop at entry.
- The `initialized` event returned after the `disconnect` request. Which
means lldb-dap didn't really get ready yet.

### NOTE
The `dap_server.py` is doing things to mimic the VSCode (or other dap
clients) but it had some assumptions. For example, it's still missing
the `configurationDone` request and response because it relies on a
continue action to trigger the `configurationDone` request.

# Test Plan
```
./bin/llvm-lit -va /Users/wanyi/llvm-upstream/llvm-project/lldb/test/API/tools/lldb-dap/attach/TestDAP_attach.py
./bin/llvm-lit -va /Users/wanyi/llvm-upstream/llvm-project/lldb/test/API/tools/lldb-dap/launch/TestDAP_launch.py
```

To test the `wait_for_events` timeout case
```
events = self.wait_for_events(["process", "initialized", "fake", "event"], 1)
if events:
    raise ValueError(f'no events {",".join(events)} found for within timeout 1')
```
Observed
<img width="696" alt="image"
src="https://github.com/user-attachments/assets/bc97c0ef-d91f-4561-8272-4d36f5f5d4e6"
/>

### Also
Looks like some test cases should be re-enabled in
llvm@0b8dfb5
But only comments was removed. The skip statements survived the change.
GeorgeARM pushed a commit to GeorgeARM/llvm-project that referenced this pull request May 7, 2025
# Summary
This patch makes the `request_attach` wait for events `process` and
`initialized` just like `request_launch`. This ensure the DAP session
can move forward somewhat correctly.

Recently `TestDap_attach.test_terminate_commands` became flaky.
It's hitting:
```
lldbsuite/test/tools/lldb-dap/dap_server.py", line 350, in send_recv
    raise ValueError(desc)
ValueError: no response for "disconnect"
```
I took a look at the DAP msg from that test case and noticed:
- It's not using the regular attaching, instead it's using the
`attachCommands` to launch debug the binary and it will stop at entry.
- The `initialized` event returned after the `disconnect` request. Which
means lldb-dap didn't really get ready yet.

### NOTE
The `dap_server.py` is doing things to mimic the VSCode (or other dap
clients) but it had some assumptions. For example, it's still missing
the `configurationDone` request and response because it relies on a
continue action to trigger the `configurationDone` request.

# Test Plan
```
./bin/llvm-lit -va /Users/wanyi/llvm-upstream/llvm-project/lldb/test/API/tools/lldb-dap/attach/TestDAP_attach.py
./bin/llvm-lit -va /Users/wanyi/llvm-upstream/llvm-project/lldb/test/API/tools/lldb-dap/launch/TestDAP_launch.py
```

To test the `wait_for_events` timeout case
```
events = self.wait_for_events(["process", "initialized", "fake", "event"], 1)
if events:
    raise ValueError(f'no events {",".join(events)} found for within timeout 1')
```
Observed
<img width="696" alt="image"
src="https://github.com/user-attachments/assets/bc97c0ef-d91f-4561-8272-4d36f5f5d4e6"
/>

### Also
Looks like some test cases should be re-enabled in
llvm@0b8dfb5
But only comments was removed. The skip statements survived the change.
Ankur-0429 pushed a commit to Ankur-0429/llvm-project that referenced this pull request May 9, 2025
# Summary
This patch makes the `request_attach` wait for events `process` and
`initialized` just like `request_launch`. This ensure the DAP session
can move forward somewhat correctly.

Recently `TestDap_attach.test_terminate_commands` became flaky.
It's hitting:
```
lldbsuite/test/tools/lldb-dap/dap_server.py", line 350, in send_recv
    raise ValueError(desc)
ValueError: no response for "disconnect"
```
I took a look at the DAP msg from that test case and noticed:
- It's not using the regular attaching, instead it's using the
`attachCommands` to launch debug the binary and it will stop at entry.
- The `initialized` event returned after the `disconnect` request. Which
means lldb-dap didn't really get ready yet.

### NOTE
The `dap_server.py` is doing things to mimic the VSCode (or other dap
clients) but it had some assumptions. For example, it's still missing
the `configurationDone` request and response because it relies on a
continue action to trigger the `configurationDone` request.

# Test Plan
```
./bin/llvm-lit -va /Users/wanyi/llvm-upstream/llvm-project/lldb/test/API/tools/lldb-dap/attach/TestDAP_attach.py
./bin/llvm-lit -va /Users/wanyi/llvm-upstream/llvm-project/lldb/test/API/tools/lldb-dap/launch/TestDAP_launch.py
```

To test the `wait_for_events` timeout case
```
events = self.wait_for_events(["process", "initialized", "fake", "event"], 1)
if events:
    raise ValueError(f'no events {",".join(events)} found for within timeout 1')
```
Observed
<img width="696" alt="image"
src="https://github.com/user-attachments/assets/bc97c0ef-d91f-4561-8272-4d36f5f5d4e6"
/>

### Also
Looks like some test cases should be re-enabled in
llvm@0b8dfb5
But only comments was removed. The skip statements survived the change.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants