[lldb] Don't scan more than 10MB of assembly insns #105890

jasonmolenda · 2024-08-23T21:14:24Z

For supported architectures, lldb will do a static scan of the assembly instructions of a function to detect stack/frame pointer changes, register stores and loads, so we can retrieve register values for the caller stack frames. We trust that the function address range reflects the actual function range, but in a stripped binary or other unusual environment, we can end up scanning all of the text as a single "function" which is (1) incorrect and useless, but more importantly (2) slow.

Cap the max size we will profile to 10MB of instructions. There will surely be functions longer than this with no unwind info, and we will miss the final epilogue or mid-function epilogues past the first 10MB, but I think this will be unusual, and the failure more to missing the epilogue is that the user will need to step out an extra time or two as the StackID is not correctly calculated mid-epilogue. I think this is a good tradeoff of behaviors.

rdar://134391577

For supported architectures, lldb will do a static scan of the assembly instructions of a function to detect stack/frame pointer changes, register stores and loads, so we can retrieve register values for the caller stack frames. We trust that the function address range reflects the actual function range, but in a stripped binary or other unusual environment, we can end up scanning all of the text as a single "function" which is (1) incorrect and useless, but more importantly (2) slow. Cap the max size we will profile to 10MB of instructions. There will surely be functions longer than this with no unwind info, and we will miss the final epilogue or mid-function epilogues past the first 10MB, but I think this will be unusual, and the failure more to missing the epilogue is that the user will need to step out an extra time or two as the StackID is not correctly calculated mid-epilogue. I think this is a good tradeoff of behaviors. rdar://134391577

llvmbot · 2024-08-23T21:14:53Z

@llvm/pr-subscribers-lldb

Author: Jason Molenda (jasonmolenda)

Changes

For supported architectures, lldb will do a static scan of the assembly instructions of a function to detect stack/frame pointer changes, register stores and loads, so we can retrieve register values for the caller stack frames. We trust that the function address range reflects the actual function range, but in a stripped binary or other unusual environment, we can end up scanning all of the text as a single "function" which is (1) incorrect and useless, but more importantly (2) slow.

Cap the max size we will profile to 10MB of instructions. There will surely be functions longer than this with no unwind info, and we will miss the final epilogue or mid-function epilogues past the first 10MB, but I think this will be unusual, and the failure more to missing the epilogue is that the user will need to step out an extra time or two as the StackID is not correctly calculated mid-epilogue. I think this is a good tradeoff of behaviors.

rdar://134391577

Full diff: https://github.com/llvm/llvm-project/pull/105890.diff

1 Files Affected:

(modified) lldb/source/Symbol/FuncUnwinders.cpp (+11-1)

diff --git a/lldb/source/Symbol/FuncUnwinders.cpp b/lldb/source/Symbol/FuncUnwinders.cpp
index d67c0a828eb350..228d9a1072deca 100644
--- a/lldb/source/Symbol/FuncUnwinders.cpp
+++ b/lldb/source/Symbol/FuncUnwinders.cpp
@@ -334,12 +334,22 @@ UnwindPlanSP FuncUnwinders::GetAssemblyUnwindPlan(Target &target,
 
   m_tried_unwind_plan_assembly = true;
 
+  // Don't analyze more than 10 megabytes of instructions,
+  // if a function is legitimately larger than that, we'll
+  // miss the epilogue instructions, but guard against a
+  // bogusly large function and analyzing large amounts of
+  // non-instruction data.
+  AddressRange range = m_range;
+  const addr_t func_size =
+      std::min(range.GetByteSize(), (addr_t)1024 * 10 * 10);
+  range.SetByteSize(func_size);
+
   UnwindAssemblySP assembly_profiler_sp(GetUnwindAssemblyProfiler(target));
   if (assembly_profiler_sp) {
     m_unwind_plan_assembly_sp =
         std::make_shared<UnwindPlan>(lldb::eRegisterKindGeneric);
     if (!assembly_profiler_sp->GetNonCallSiteUnwindPlanFromAssembly(
-            m_range, thread, *m_unwind_plan_assembly_sp)) {
+            range, thread, *m_unwind_plan_assembly_sp)) {
       m_unwind_plan_assembly_sp.reset();
     }
   }

jasonmolenda · 2024-08-23T21:18:36Z

This has always been a small perf issue with stripped binaries, but there is an unusual case where it became a big problem: wine, the windows environment, has a symbol at a low address like 0x1000, and it reserves the first 4GB of address space to load 32-bit windows binaries (without lldb's knowledge). lldb has a backtrace where there is a saved pc near the high end of this range, we find the nearest symbol at 0x1000, and proceed to analyze 4GB of instructions which is quite slow.

I don't know how to represent this in a test case, unfortunately.

We do have a similar limit over in UnwindAssemblyInstEmulation::GetNonCallSiteUnwindPlanFromAssembly where we limit the disassembly to 99999 instructions, this is not an entirely new idea. But there are codepaths that will not go through this method and still try to disassemble any size "function".

For supported architectures, lldb will do a static scan of the assembly instructions of a function to detect stack/frame pointer changes, register stores and loads, so we can retrieve register values for the caller stack frames. We trust that the function address range reflects the actual function range, but in a stripped binary or other unusual environment, we can end up scanning all of the text as a single "function" which is (1) incorrect and useless, but more importantly (2) slow. Cap the max size we will profile to 10MB of instructions. There will surely be functions longer than this with no unwind info, and we will miss the final epilogue or mid-function epilogues past the first 10MB, but I think this will be unusual, and the failure more to missing the epilogue is that the user will need to step out an extra time or two as the StackID is not correctly calculated mid-epilogue. I think this is a good tradeoff of behaviors. rdar://134391577 (cherry picked from commit 3280292)

…ly-instruciton-scan2 [lldb] Don't scan more than 10MB of assembly insns (llvm#105890)

jasonmolenda requested a review from JDevlieghere as a code owner August 23, 2024 21:14

llvmbot added the lldb label Aug 23, 2024

JDevlieghere approved these changes Aug 27, 2024

View reviewed changes

jasonmolenda merged commit 3280292 into llvm:main Aug 27, 2024
9 checks passed

jasonmolenda deleted the limit-assembly-analysis-to-ten-megabytes-max branch August 27, 2024 21:50

jasonmolenda added a commit to swiftlang/llvm-project that referenced this pull request Aug 27, 2024

Merge pull request #9184 from jasonmolenda/cp/r134391577-limit-assemb…

7accbab

…ly-instruciton-scan2 [lldb] Don't scan more than 10MB of assembly insns (llvm#105890)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[lldb] Don't scan more than 10MB of assembly insns #105890

[lldb] Don't scan more than 10MB of assembly insns #105890

Uh oh!

jasonmolenda commented Aug 23, 2024

Uh oh!

llvmbot commented Aug 23, 2024

Uh oh!

jasonmolenda commented Aug 23, 2024

Uh oh!

Uh oh!

Uh oh!

[lldb] Don't scan more than 10MB of assembly insns #105890

[lldb] Don't scan more than 10MB of assembly insns #105890

Uh oh!

Conversation

jasonmolenda commented Aug 23, 2024

Uh oh!

llvmbot commented Aug 23, 2024

Uh oh!

jasonmolenda commented Aug 23, 2024

Uh oh!

Uh oh!

Uh oh!