[lldb] Use consistent CFA before/after prologue of async functions #4806

kastiglione · 2022-06-06T23:13:14Z

Previously, SwiftLanguageRuntime::GetRuntimeUnwindPlan would not generate an async unwind plan when stopped inside the prologue. The reason was that the logic couldn't distinguish between an async function and a sync function called by an async function. This happens because – in a prologue, the register values may make it look like an async function (specifically the extended frame marker bit set on the frame pointer).

To determine whether lldb is stopped in an async function, in addition to checking the extended frame marker, it can look for marker nodes in the symbol's demangle tree.

This all seemed fine initially, but then we discovered some logic bugs within thread plans. The logic bugs were caused by CFA values varying at different parts of the function.

By not returning an async unwind plan during the prologue, the effect is that the function call gets a standard (thread based) CFA (Canonical Frame Address). The standard CFA is the stack pointer ($sp) value at the call site. However once execution proceeds past the prologue, for the same function, lldb returns an async unwind plan. For an async unwind, the CFA is taken to be the async context passed into the function (x22 on arm64, r14 on x86-64). The problem is that now the CFA varies across the function. From the DWARF standard:

The algorithm to compute CFA changes as you progress through the prologue and epilogue code. (By definition, the CFA value does not change.)

Between the logic bugs and DWARF, it's best to keep the CFA consistent throughout a function. This change does that by returning an async unwind plan even in the prologue. This makes the unwind plan logic more branch-y and complex than it was. A follow up change is to refactor this code as well as document it better. For diff readability, those changes will come separately.

rdar://88142757

kastiglione · 2022-06-06T23:19:37Z

@swift-ci test

adrian-prantl

Nice!

adrian-prantl · 2022-06-06T23:28:22Z

lldb/source/Plugins/LanguageRuntime/Swift/SwiftLanguageRuntime.cpp

-    }
+    func_start_addr = sc.symbol->GetAddress();
+    prologue_size = sc.symbol->GetPrologueByteSize();
+    mangled_name = sc.symbol->GetMangled().GetMangledName();


Are sc.function and sc.symbol the same type? I wonder if this could be

if (sc.function) func_info = sc.function; else if (sc.symbol) func_info = sc.symbol; Address func_start_addr = func_info->GetAddress(); ...

They aren't the same type, I did have the same thought. Maybe we can extract out a shared interface that we can use in the future.

For these limit set of calls, Symbol and Function behave similarly, but Function can provide many additional things and Symbol is pretty much just this. (they also have very different ways of determining the prologue byte size).

adrian-prantl · 2022-06-06T23:29:37Z

lldb/source/Plugins/LanguageRuntime/Swift/SwiftLanguageRuntime.cpp

+    prologue_size = sc.symbol->GetPrologueByteSize();
+    mangled_name = sc.symbol->GetMangled().GetMangledName();
+  } else {
+    return UnwindPlanSP();


We often just write return {};

I prefer that too, I was following suit of the rest of the function. How about I change these all when I do a NFC refactor and documentation?

Personal preference, I prefer showing the explicitly returned object type but I know that's not the style in SwiftLanguageRuntime, feel free to change these to be a better fit.

adrian-prantl · 2022-06-06T23:30:50Z

lldb/source/Plugins/LanguageRuntime/Swift/SwiftLanguageRuntime.cpp

-    return UnwindPlanSP();
+  if (in_prologue) {
+    if (!IsAnySwiftAsyncFunctionSymbol(mangled_name.GetStringRef()))
+      return UnwindPlanSP();


Are any of these error paths log-worthy and would point to future ABI changes, or do they just mean that this isn't an async function?

Good call, I will add logs!

There's the possibility of a proper async function but not using swift mangling. I noticed some like that. Perhaps people could hand roll such things.

adrian-prantl · 2022-06-06T23:31:38Z

lldb/source/Plugins/LanguageRuntime/Swift/SwiftLanguageRuntime.cpp

+  } else {
+    addr_t saved_fp = LLDB_INVALID_ADDRESS;
+    Status error;
+    if (!process_sp->ReadMemory(fp, &saved_fp, 8, error))


That hardcoded "8" makes me nervous :-)

Yes. This is moved code, not new, but I can probably do something about it.

swift async is only supported on 64-bit targets. At the beginning we call GetAsyncUnwindRegisterNumbers() which only returns regnums for x86_64 and aarch64; any other architecture will return. We also have a redundant llvm_unreachable later in the method if the architecture isn't one of these two.

adrian-prantl · 2022-06-06T23:32:57Z

lldb/source/Plugins/LanguageRuntime/Swift/SwiftLanguageRuntime.cpp

+  // The debug info for locals reflects this difference, so our unwinding of the
+  // context register needs to reflect it too.
+  bool indirect_context =
+      IsSwiftAsyncAwaitResumePartialFunctionSymbol(mangled_name.GetStringRef());

  UnwindPlan::RowSP row(new UnwindPlan::Row);
  const int32_t ptr_size = 8;


Shouldn't we get this from Target?

We should, this is also pre-existing :)

It's fine to parameterize these, but swift async is only supported on 64-bit targets. we could have a const int addr_size = 8; at the top and use that instead of 8, or get it from the target's archspec.

adrian-prantl · 2022-06-06T23:34:14Z

lldb/source/Plugins/LanguageRuntime/Swift/SwiftLanguageRuntime.cpp

-                      sc.symbol->GetMangled().GetMangledName().GetStringRef())
-                : false;
+  if (in_prologue) {
+    if (indirect_context)


Why is this condition only relevant in the prologue?

In the prologue the way we set up the unwind to find the async context chain is different than mid-function when it's been set up. I expect the real difference is "am I on the first instruction" versus "am I in the middle of the function", but because you can interrupt programs asynchronously, you could be on any instruction. I think that's right. Dave?

@adrian-prantl indirect_context is also relevant below, I will comment there for visibility.

jasonmolenda

LGTM.

jasonmolenda · 2022-06-07T00:03:10Z

lldb/source/Plugins/LanguageRuntime/Swift/SwiftLanguageRuntime.cpp

-    }
+    func_start_addr = sc.symbol->GetAddress();
+    prologue_size = sc.symbol->GetPrologueByteSize();
+    mangled_name = sc.symbol->GetMangled().GetMangledName();


For these limit set of calls, Symbol and Function behave similarly, but Function can provide many additional things and Symbol is pretty much just this. (they also have very different ways of determining the prologue byte size).

jasonmolenda · 2022-06-07T00:03:15Z

lldb/source/Plugins/LanguageRuntime/Swift/SwiftLanguageRuntime.cpp

+    prologue_size = sc.symbol->GetPrologueByteSize();
+    mangled_name = sc.symbol->GetMangled().GetMangledName();
+  } else {
+    return UnwindPlanSP();


Personal preference, I prefer showing the explicitly returned object type but I know that's not the style in SwiftLanguageRuntime, feel free to change these to be a better fit.

jasonmolenda · 2022-06-07T00:04:06Z

lldb/source/Plugins/LanguageRuntime/Swift/SwiftLanguageRuntime.cpp

+  } else {
+    addr_t saved_fp = LLDB_INVALID_ADDRESS;
+    Status error;
+    if (!process_sp->ReadMemory(fp, &saved_fp, 8, error))


swift async is only supported on 64-bit targets. At the beginning we call GetAsyncUnwindRegisterNumbers() which only returns regnums for x86_64 and aarch64; any other architecture will return. We also have a redundant llvm_unreachable later in the method if the architecture isn't one of these two.

jasonmolenda · 2022-06-07T00:06:49Z

lldb/source/Plugins/LanguageRuntime/Swift/SwiftLanguageRuntime.cpp

+  // The debug info for locals reflects this difference, so our unwinding of the
+  // context register needs to reflect it too.
+  bool indirect_context =
+      IsSwiftAsyncAwaitResumePartialFunctionSymbol(mangled_name.GetStringRef());

  UnwindPlan::RowSP row(new UnwindPlan::Row);
  const int32_t ptr_size = 8;


It's fine to parameterize these, but swift async is only supported on 64-bit targets. we could have a const int addr_size = 8; at the top and use that instead of 8, or get it from the target's archspec.

jasonmolenda · 2022-06-07T00:11:43Z

lldb/source/Plugins/LanguageRuntime/Swift/SwiftLanguageRuntime.cpp

-    return UnwindPlanSP();
+  AddressRange prologue_range(func_start_addr, prologue_size);
+  bool in_prologue = (func_start_addr == pc ||
+                      prologue_range.ContainsLoadAddress(pc, &target));


Just curious about this - is it a performance concern, comparing the pc to function start addr, trying to avoid the method call? It seems unnecessary.

no intentions on my part, this is a refactor of the existing code which had both the equality check and the Contains check. Good point, both aren't needed.

jasonmolenda · 2022-06-07T00:21:37Z

lldb/source/Plugins/LanguageRuntime/Swift/SwiftLanguageRuntime.cpp

-                      sc.symbol->GetMangled().GetMangledName().GetStringRef())
-                : false;
+  if (in_prologue) {
+    if (indirect_context)


In the prologue the way we set up the unwind to find the async context chain is different than mid-function when it's been set up. I expect the real difference is "am I on the first instruction" versus "am I in the middle of the function", but because you can interrupt programs asynchronously, you could be on any instruction. I think that's right. Dave?

adrian-prantl · 2022-06-07T17:59:33Z

lldb/test/API/lang/swift/async/frame/variable/TestSwiftAsyncFrameVar.py

+        self.assertGreater(a.unsigned, 0)
+        b = frame.FindVariable("b")
+        self.assertTrue(b.IsValid())
+        self.assertEqual(b.unsigned, 0)


I'm not sure if this is actually better, but FYI: swift-lldb has lldbutil.check_variable()

kastiglione · 2022-06-07T18:39:07Z

lldb/source/Plugins/LanguageRuntime/Swift/SwiftLanguageRuntime.cpp

-                      sc.symbol->GetMangled().GetMangledName().GetStringRef())
-                : false;
+  if (in_prologue) {
+    if (indirect_context)


@adrian-prantl indirect_context is also relevant below, I will comment there for visibility.

kastiglione · 2022-06-07T18:40:18Z

lldb/source/Plugins/LanguageRuntime/Swift/SwiftLanguageRuntime.cpp

  if (indirect_context) {
-    // In a "resume" coroutine, the passed context argument needs to be
-    // dereferenced once to get the context. This is reflected in the debug
-    // info so we need to account for it and report am async register value
-    // that needs to be dereferenced to get to the context.
-    // Note that the size passed for the DWARF expression is the size of the
-    // array minus one. This skips the last deref for this use.
-    assert(expr[expr_size - 1] == llvm::dwarf::DW_OP_deref &&
-           "Should skip a deref");
-    row->SetRegisterLocationToIsDWARFExpression(regnums->async_ctx_regnum, expr,
-                                                expr_size - 1, false);
+    if (in_prologue) {
+      row->SetRegisterLocationToSame(regnums->async_ctx_regnum, false);
+    } else {


@adrian-prantl here's some other logic conditional on indirect_context, and here it's relevant both in and after the prologue.

kastiglione · 2022-06-07T18:40:54Z

@swift-ci test

kastiglione · 2022-06-16T17:30:09Z

This change uncovered a crash, which is why I haven't merged it yet. The crash is fixed by #4838

…4806) Previously, `SwiftLanguageRuntime::GetRuntimeUnwindPlan` would not generate an _async_ unwind plan when stopped inside the prologue. The reason was that the logic couldn't distinguish between an async function and a sync function called by an async function. This happens because – in a prologue, the register values may make it look like an async function (specifically the extended frame marker bit set on the frame pointer). To determine whether lldb is stopped in an async function, in addition to checking the extended frame marker, it can look for marker nodes in the symbol's demangle tree. This all seemed fine initially, but then we discovered some logic bugs within thread plans. The logic bugs were caused by CFA values varying at different parts of the function. By not returning an async unwind plan during the prologue, the effect is that the function call gets a standard (thread based) CFA (Canonical Frame Address). The standard CFA is the stack pointer ($sp) value at the call site. However once execution proceeds past the prologue, for the same function, lldb returns an async unwind plan. For an async unwind, the CFA is taken to be the async context passed into the function (`x22` on arm64, `r14` on x86-64). The problem is that now the CFA varies across the function. From the DWARF standard: > The algorithm to compute CFA changes as you progress through the prologue and epilogue code. (By definition, the CFA value does not change.) Between the logic bugs and DWARF, it's best to keep the CFA consistent throughout a function. This change does that by returning an async unwind plan even in the prologue. This makes the unwind plan logic more branch-y and complex than it was. A follow up change is to refactor this code as well as document it better. For diff readability, those changes will come separately. rdar://88142757 (cherry picked from commit be3823b)

…4806) (#4955) Previously, `SwiftLanguageRuntime::GetRuntimeUnwindPlan` would not generate an _async_ unwind plan when stopped inside the prologue. The reason was that the logic couldn't distinguish between an async function and a sync function called by an async function. This happens because – in a prologue, the register values may make it look like an async function (specifically the extended frame marker bit set on the frame pointer). To determine whether lldb is stopped in an async function, in addition to checking the extended frame marker, it can look for marker nodes in the symbol's demangle tree. This all seemed fine initially, but then we discovered some logic bugs within thread plans. The logic bugs were caused by CFA values varying at different parts of the function. By not returning an async unwind plan during the prologue, the effect is that the function call gets a standard (thread based) CFA (Canonical Frame Address). The standard CFA is the stack pointer ($sp) value at the call site. However once execution proceeds past the prologue, for the same function, lldb returns an async unwind plan. For an async unwind, the CFA is taken to be the async context passed into the function (`x22` on arm64, `r14` on x86-64). The problem is that now the CFA varies across the function. From the DWARF standard: > The algorithm to compute CFA changes as you progress through the prologue and epilogue code. (By definition, the CFA value does not change.) Between the logic bugs and DWARF, it's best to keep the CFA consistent throughout a function. This change does that by returning an async unwind plan even in the prologue. This makes the unwind plan logic more branch-y and complex than it was. A follow up change is to refactor this code as well as document it better. For diff readability, those changes will come separately. rdar://88142757 (cherry picked from PR #4806)

…4806) Previously, `SwiftLanguageRuntime::GetRuntimeUnwindPlan` would not generate an _async_ unwind plan when stopped inside the prologue. The reason was that the logic couldn't distinguish between an async function and a sync function called by an async function. This happens because – in a prologue, the register values may make it look like an async function (specifically the extended frame marker bit set on the frame pointer). To determine whether lldb is stopped in an async function, in addition to checking the extended frame marker, it can look for marker nodes in the symbol's demangle tree. This all seemed fine initially, but then we discovered some logic bugs within thread plans. The logic bugs were caused by CFA values varying at different parts of the function. By not returning an async unwind plan during the prologue, the effect is that the function call gets a standard (thread based) CFA (Canonical Frame Address). The standard CFA is the stack pointer ($sp) value at the call site. However once execution proceeds past the prologue, for the same function, lldb returns an async unwind plan. For an async unwind, the CFA is taken to be the async context passed into the function (`x22` on arm64, `r14` on x86-64). The problem is that now the CFA varies across the function. From the DWARF standard: > The algorithm to compute CFA changes as you progress through the prologue and epilogue code. (By definition, the CFA value does not change.) Between the logic bugs and DWARF, it's best to keep the CFA consistent throughout a function. This change does that by returning an async unwind plan even in the prologue. This makes the unwind plan logic more branch-y and complex than it was. A follow up change is to refactor this code as well as document it better. For diff readability, those changes will come separately. rdar://88142757 (cherry-picked from commit be3823b)

…4806) (#4955) Previously, `SwiftLanguageRuntime::GetRuntimeUnwindPlan` would not generate an _async_ unwind plan when stopped inside the prologue. The reason was that the logic couldn't distinguish between an async function and a sync function called by an async function. This happens because – in a prologue, the register values may make it look like an async function (specifically the extended frame marker bit set on the frame pointer). To determine whether lldb is stopped in an async function, in addition to checking the extended frame marker, it can look for marker nodes in the symbol's demangle tree. This all seemed fine initially, but then we discovered some logic bugs within thread plans. The logic bugs were caused by CFA values varying at different parts of the function. By not returning an async unwind plan during the prologue, the effect is that the function call gets a standard (thread based) CFA (Canonical Frame Address). The standard CFA is the stack pointer ($sp) value at the call site. However once execution proceeds past the prologue, for the same function, lldb returns an async unwind plan. For an async unwind, the CFA is taken to be the async context passed into the function (`x22` on arm64, `r14` on x86-64). The problem is that now the CFA varies across the function. From the DWARF standard: > The algorithm to compute CFA changes as you progress through the prologue and epilogue code. (By definition, the CFA value does not change.) Between the logic bugs and DWARF, it's best to keep the CFA consistent throughout a function. This change does that by returning an async unwind plan even in the prologue. This makes the unwind plan logic more branch-y and complex than it was. A follow up change is to refactor this code as well as document it better. For diff readability, those changes will come separately. rdar://88142757 (cherry picked from PR #4806) (cherry-picked from commit 6f9af03)

kastiglione requested review from adrian-prantl and jasonmolenda June 6, 2022 23:13

[lldb] Use consistent CFA before/after prologue of async functions

f76f320

kastiglione force-pushed the lldb-Use-consistent-CFA-before-after-prologue-of-async-functions branch from c67de06 to f76f320 Compare June 6, 2022 23:14

kastiglione marked this pull request as draft June 6, 2022 23:28

adrian-prantl reviewed Jun 6, 2022

View reviewed changes

jasonmolenda approved these changes Jun 7, 2022

View reviewed changes

Update TestSwiftAsyncFrameVar.py

4339836

adrian-prantl approved these changes Jun 7, 2022

View reviewed changes

kastiglione marked this pull request as ready for review June 7, 2022 18:37

kastiglione commented Jun 7, 2022

View reviewed changes

kastiglione merged commit be3823b into swift/release/5.7 Jul 6, 2022

kastiglione deleted the lldb-Use-consistent-CFA-before-after-prologue-of-async-functions branch July 6, 2022 18:12

kastiglione mentioned this pull request Jul 7, 2022

[lldb] Use consistent CFA before/after prologue of async functions #4955

Merged

kastiglione mentioned this pull request Sep 22, 2022

[lldb] Implement python __repr__ #5351

Merged

[lldb] Use consistent CFA before/after prologue of async functions #4806

[lldb] Use consistent CFA before/after prologue of async functions #4806

Uh oh!

Conversation

kastiglione commented Jun 6, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kastiglione commented Jun 6, 2022

Uh oh!

adrian-prantl left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jasonmolenda left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kastiglione Jun 7, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kastiglione commented Jun 7, 2022

Uh oh!

kastiglione commented Jun 16, 2022

Uh oh!

Uh oh!

kastiglione commented Jun 6, 2022 •

edited

Loading

kastiglione Jun 7, 2022 •

edited

Loading