-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[lldb][test] Fix flaky DIL array subscript test by reducing array indexes #141738
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-lldb Author: David Spickett (DavidSpickett) ChangesThis test has been flaky on Linaro's Windows on Arm bot and I was able to reproduce it within 10 or so runs locally. When it fails it's because we failed to read the value of int_arr[100]. When that happens the memory looks like this:
The first region is the stack and the stack pointer is pointing within that region, as expected. The second region is where we are trying to read int_arr[100] from and this is not mapped because we're trying to read above the start of the stack. Sometimes the test passes I think because ASLR / DYNAMICBASE moves the start of the stack down enough so there is some readable memory at the top. https://learn.microsoft.com/en-us/cpp/build/reference/dynamicbase?view=msvc-170 Note "Because ASLR can't be disabled on ARM, ARM64, or ARM64EC architectures, /DYNAMICBASE:NO isn't supported for these targets.". Which means on this bot, the layout is definitely being randomised. To fix this, I've setup main to pad its stack frame with 100 integers, then called a new function where we declare all the variables that used to be in main. We have to pad this way because if we pad around int_arr, in the same function, the compiler could decide to move int_arr to the end of the frame, and defeat the padding. By padding main, it doesn't matter how the compiler treats that as long as it makes main's frame larger. We saw this failure on Windows only but I wouldn't be surprised if it can happen on Linux as well. Full diff: https://github.com/llvm/llvm-project/pull/141738.diff 2 Files Affected:
diff --git a/lldb/test/API/commands/frame/var-dil/basics/ArraySubscript/TestFrameVarDILArraySubscript.py b/lldb/test/API/commands/frame/var-dil/basics/ArraySubscript/TestFrameVarDILArraySubscript.py
index 66f4c62761004..740a64f1477ff 100644
--- a/lldb/test/API/commands/frame/var-dil/basics/ArraySubscript/TestFrameVarDILArraySubscript.py
+++ b/lldb/test/API/commands/frame/var-dil/basics/ArraySubscript/TestFrameVarDILArraySubscript.py
@@ -19,8 +19,6 @@ def expect_var_path(self, expr, compare_to_framevar=False, value=None, type=None
self.runCmd("settings set target.experimental.use-DIL true")
self.assertEqual(value_dil.GetValue(), value_frv.GetValue())
- # int_arr[100] sometimes points to above the stack region, fix coming soon.
- @skipIfWindows
def test_subscript(self):
self.build()
lldbutil.run_to_source_breakpoint(
diff --git a/lldb/test/API/commands/frame/var-dil/basics/ArraySubscript/main.cpp b/lldb/test/API/commands/frame/var-dil/basics/ArraySubscript/main.cpp
index 485666ae46c20..5efbefeea684b 100644
--- a/lldb/test/API/commands/frame/var-dil/basics/ArraySubscript/main.cpp
+++ b/lldb/test/API/commands/frame/var-dil/basics/ArraySubscript/main.cpp
@@ -1,6 +1,21 @@
#include <vector>
-int main(int argc, char **argv) {
+// The variables being tested here are not delcared in main for a specific
+// reason. This is because during testing we access some of them out of their
+// bounds, such as int_arr[100]. If we put all these variables in main, the
+// compiler could decide to arrange them in such a way, that when combined with
+// how the stack offset was randomised by the OS, and how much was already used
+// by the libc's startup code, int_arr[100] would point to somewhere beyond the
+// top of the stack. Unmapped memory.
+//
+// To fix this we could pad around int_arr, but the compiler is free to shuffle
+// the order of stack variables. So instead, we pad main's frame so that the
+// stack pointer during fn will be far enough down that int_arr[100] will still
+// point to stack memory. It does not matter what the compiler does with the
+// padding in main as far as ordering, as long as it increases main's stack
+// frame size.
+
+int fn() {
int int_arr[] = {1, 2, 3};
int *int_ptr = int_arr;
int(&int_arr_ref)[3] = int_arr;
@@ -30,3 +45,8 @@ int main(int argc, char **argv) {
return 0; // Set a breakpoint here
}
+
+int main(int argc, char **argv) {
+ int stack_frame_padding[100];
+ return fn();
+}
|
This does not fix the test that is xfailed, where you can't read the first element of a vector. That's something entirely different that I don't plan to look into myself. |
@@ -19,8 +19,6 @@ def expect_var_path(self, expr, compare_to_framevar=False, value=None, type=None | |||
self.runCmd("settings set target.experimental.use-DIL true") | |||
self.assertEqual(value_dil.GetValue(), value_frv.GetValue()) | |||
|
|||
# int_arr[100] sometimes points to above the stack region, fix coming soon. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does frame var
suffer from the same problem? If not, should DIL handle things differently?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean it should be able to say:
int_arr[100] = <unavailable>;
Like registers can do?
I'll check what frame var does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tbh I'm not even sure what DIL is :) So feel free to educate me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the point of this test is to check that the expression really allows you to go beyond the array bounds, so fixing this by ensuring the memory at that address exists is the right thing to do. However, I think the index 100 is unnecessarily big. I think we could just change the test to use indexes 3 and 10 (a "one past" and "reasonably beyond" value), and then you don't need to write the very long comment.
Printing <unavailable>
or similar when this indexing lands you into unaddressable memory would be nice (maybe it happens already -- I don't know), but that's a different test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh but the long comments are the most fun :)
I agree though, I'll find a smaller out of bounds that is consistently within the stack.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does frame var suffer from the same problem?
(lldb) frame var int_arr[200]
(int) int_arr[200] = <read memory from 0xc812500160 failed (0 of 4 bytes read)>
With DIL on:
(lldb) settings set target.experimental.use-DIL true
(lldb) frame var int_arr[200]
(int) int_arr[200] = <read memory from 0xc812500160 failed (0 of 4 bytes read)>
Which I think means both are reporting it the same way, and that way seems fine to me.
…exes This test has been flaky on Linaro's Windows on Arm bot and I was able to reproduce it within 10 or so runs locally. When it fails it's because we failed to read the value of int_arr[100]. When that happens the memory looks like this: ``` [0x0000006bf88fd000-0x0000006bf8900000) rw- <-- sp (0x0000006bf88ffe20) [0x0000006bf8900000-0x0000025fec900000) --- <-- int_arr[100] (0x0000006bf8900070) ``` The first region is the stack and the stack pointer is pointing within that region, as expected. The second region is where we are trying to read int_arr[100] from and this is not mapped because we're trying to read above the start of the stack. Sometimes the test passes I think because ASLR / DYNAMICBASE moves the start of the stack down enough so there is some readable memory at the top. https://learn.microsoft.com/en-us/cpp/build/reference/dynamicbase?view=msvc-170 Note "Because ASLR can't be disabled on ARM, ARM64, or ARM64EC architectures, /DYNAMICBASE:NO isn't supported for these targets.". Which means on this bot, the layout is definitely being randomised. We don't need to be testing indexes this large. So I've changed the two test indexes to 3 (1 beyond the end) and 10 (a larger distance beyond the end). We know that index 42 always worked on the bot, so 10 should be fine, and did not fail locally.
ad5fddb
to
ce1b2c8
Compare
…exes (llvm#141738) This test has been flaky on Linaro's Windows on Arm bot and I was able to reproduce it within 10 or so runs locally. When it fails it's because we failed to read the value of int_arr[100]. When that happens the memory looks like this: ``` [0x0000006bf88fd000-0x0000006bf8900000) rw- <-- sp (0x0000006bf88ffe20) [0x0000006bf8900000-0x0000025fec900000) --- <-- int_arr[100] (0x0000006bf8900070) ``` The first region is the stack and the stack pointer is pointing within that region, as expected. The second region is where we are trying to read int_arr[100] from and this is not mapped because we're trying to read above the start of the stack. Sometimes the test passes I think because ASLR / DYNAMICBASE moves the start of the stack down enough so there is some readable memory at the top. https://learn.microsoft.com/en-us/cpp/build/reference/dynamicbase?view=msvc-170 Note "Because ASLR can't be disabled on ARM, ARM64, or ARM64EC architectures, /DYNAMICBASE:NO isn't supported for these targets.". Which means on this bot, the layout is definitely being randomised. We don't need to be testing indexes this large. So I've changed the two test indexes to 3 (1 beyond the end) and 10 (a larger distance beyond the end). We know that index 42 always worked on the bot, so 10 should be fine, and did not fail locally.
This test has been flaky on Linaro's Windows on Arm bot and I was
able to reproduce it within 10 or so runs locally.
When it fails it's because we failed to read the value of int_arr[100].
When that happens the memory looks like this:
The first region is the stack and the stack pointer is pointing
within that region, as expected.
The second region is where we are trying to read int_arr[100] from
and this is not mapped because we're trying to read above the start
of the stack.
Sometimes the test passes I think because ASLR / DYNAMICBASE moves
the start of the stack down enough so there is some readable memory
at the top.
https://learn.microsoft.com/en-us/cpp/build/reference/dynamicbase?view=msvc-170
Note "Because ASLR can't be disabled on ARM, ARM64, or ARM64EC architectures,
/DYNAMICBASE:NO isn't supported for these targets.". Which means on this bot,
the layout is definitely being randomised.
We don't need to be testing indexes this large. So I've changed the two
test indexes to 3 (1 beyond the end) and 10 (a larger distance beyond the end).
We know that index 42 always worked on the bot, so 10 should be fine, and
did not fail locally.