Skip to content

[lldb][swift] Fix thread plan to step through swift_task_switch #9546

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

felipepiovezan
Copy link

@felipepiovezan felipepiovezan commented Nov 7, 2024

This PR removes ThreadPlanStepInAsync and replaces it with ThreadPlanRunToAddressOnAsyncCtx, a thread plan that runs to a specific address on a specific async context.

Please review each commit individually.

This is another attempt at: #9452

This thread plan attempts to identify whether a funclet contains a single line
in the line table. If it does, it considers such a funclet to contain no user
code and then attempts to step through it, reaching the next funclet.

This is problematic, as some funclets _can_ have a single line for legitimate
reasons.

Recent improvements are taking LLDB closer to being able to step from any
funclet to any funclet. As such, if a `thread step-*` command takes us to one
"single-line"-funclet, this should not be problematic. LLDB must know how to
move from this funclet to wherever the next `thread step-*` requires.

The other evidence this plan is not really needed is that no tests fail when it
is removed. It seems that `API/lang/swift/async/stepping/step-in/task-switch`
was originally intended to test this step-through behavior, but the async
function of that test had a **single** funclet (the entry funclet). Stepping
through that would have taken us back to caller, which would have been
incorrect. With or without this patch, we are correctly stopping at the entry
funclet. In fact, the test is even checking we do so:

```
  self.assertEqual(frame.name, "a.f() async -> Swift.Int")
```

Note that the demangled name doesn't contain a number prior to `a.f()`, which is
how non-entry funclets can be identified. With all that in mind, it doesn't seem
like this test was exercising ThreadPlanStepInAsync.

A follow up commit will use one idea from ThreadPlanStepInAsync in order to step
through *swift_task_switch* (as opposed to through funclets) to the target
funclet of swift_task_switch.
uint32_t fp_regnum;
uint32_t pc_regnum;

lldb::RegisterKind GetRegisterKind() const { return lldb::eRegisterKindDWARF; }

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could use a doxygen comment, it's unclear what the other options would be?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any of the kinds in lldb::RegisterKind :) LLDB has many different numbering schemes for registers, @jasonmolenda calls them "register domains", so whenever a register number is used, it is important to know which domain it is part of. One day I'd like to see if we can design a different type for each domain.

enum RegisterKind {
  eRegisterKindEHFrame = 0, ///< the register numbers seen in eh_frame
  eRegisterKindDWARF,       ///< the register numbers seen DWARF
  eRegisterKindGeneric, ///< insn ptr reg, stack ptr reg, etc not specific to
                        ///< any particular target
  eRegisterKindProcessPlugin, ///< num used by the process plugin - e.g. by the
                              ///< remote gdb-protocol stub program
  eRegisterKindLLDB,          ///< lldb's internal register numbers
  kNumRegisterKinds

@adrian-prantl
Copy link

Why no test change?

@felipepiovezan
Copy link
Author

Why no test change?

The fact that nothing breaks is a form of testing, but I will ask around to see if there is a good way of testing a "step through" plan in isolation.

}

bool ValidatePlan(Stream *error) override { return (bool)m_step_in_plan_sp; }
bool ValidatePlan(Stream *error) override { return true; }

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you are expecting to create a breakpoint in the constructor, you should probably check here that you have a resolved location for that breakpoint. E.g. in case somebody passed you an unmapped address for destination_addr

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that didn't work, also delete the breakpoint here - you never got pushed, so you won't get popped.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, I'll update to this on the next push:

  bool ValidatePlan(Stream *error) override {
    if (m_funclet_bp->HasResolvedLocations())
      return true;

    // If we failed to resolve any locations, this plan is invalid.
    m_funclet_bp->GetTarget().RemoveBreakpointByID(m_funclet_bp->GetID());
    return false;
  }

return false;
}
/// If this plan said ShouldStop, then its job is complete.
bool MischiefManaged() override { return true; }

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you are using SetPlanComplete to signal that this is done, you might check IsPlanComplete here. That way if you ever have stops that didn't make the plan complete, you won't have to change this part.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, updating to:

  /// If this plan said ShouldStop, then its job is complete.
  bool MischiefManaged() override {
    return IsPlanComplete();
  }

@jimingham
Copy link

jimingham commented Nov 8, 2024

Why no test change?

The fact that nothing breaks is a form of testing, but I will ask around to see if there is a good way of testing a "step through" plan in isolation.

The "step through" plans really don't get tested "in isolation", because they don't ever get invoked that way. And anyway, testing your new plan "in isolation" wouldn't be able to test whether the context pointer and the return-to-address were actually correct, so they wouldn't really be sufficient.

Instead, we try to come up with a sufficiently wide range of stepping scenario's, and make sure that when we drive the debugger through those scenario's we arrive at the right place.

There are some tests for async stepping in tests/API/swift/async/stepping but there aren't all that many scenarios covered there. Might be worth seeing if those should be beefed up.

@felipepiovezan felipepiovezan force-pushed the felipe/remove_thread_plan_enhance_async_plan_try2 branch from 0c2474a to ec29891 Compare November 8, 2024 20:49
This will enable this function to be reused.
@felipepiovezan felipepiovezan force-pushed the felipe/remove_thread_plan_enhance_async_plan_try2 branch from ec29891 to 4088af0 Compare November 8, 2024 20:50
@felipepiovezan
Copy link
Author

I've addressed the comments related to code; will now look into writing another stepping test that is known to exercise this plan. The main challenge with a "full step" test is that it needs to dodge the existing issue in unwinding a specific funclet.

@felipepiovezan
Copy link
Author

I've addressed the comments related to code; will now look into writing another stepping test that is known to exercise this plan. The main challenge with a "full step" test is that it needs to dodge the existing issue in unwinding a specific funclet.

Maybe I should have started by posting a fix for that...

@felipepiovezan felipepiovezan force-pushed the felipe/remove_thread_plan_enhance_async_plan_try2 branch from 4088af0 to 5f35ba1 Compare November 12, 2024 17:13
@felipepiovezan
Copy link
Author

Added a step-over test for this!

@felipepiovezan felipepiovezan force-pushed the felipe/remove_thread_plan_enhance_async_plan_try2 branch from 5f35ba1 to 6f9bb81 Compare November 12, 2024 17:17
@felipepiovezan
Copy link
Author

@swift-ci test

@felipepiovezan
Copy link
Author

Argh, these tests are now running into swift's weird line tables :(

@felipepiovezan
Copy link
Author

Yup, there is some bad IRGen going on, this is the IR I am seeing before CoroSplit:

97:                                               ; preds = %79
  %98 = call swiftcc { ptr, ptr } @"$ss27_allocateUninitializedArrayySayxG_BptBwlFyp_Tg5"(i64 1), !dbg !84 ; [#uses=2 type={ ptr, ptr }] [debug line = 11:3]
  %99 = extractvalue { ptr, ptr } %98, 0, !dbg !84 ; [#uses=1 type=ptr] [debug line = 11:3]
  %100 = extractvalue { ptr, ptr } %98, 1, !dbg !84 ; [#uses=3 type=ptr] [debug line = 11:3]
  %101 = call swiftcc { i64, ptr } @"$sSS21_builtinStringLiteral17utf8CodeUnitCount7isASCIISSBp_BwBi1_tcfC"(ptr @".str.5.here!", i64 5, i1 true), !dbg !86 ; [#uses=2 type={ i64, ptr }] [debug line = 9:13]
  %102 = extractvalue { i64, ptr } %101, 0, !dbg !86 ; [#uses=1 type=i64] [debug line = 9:13]

Notice how the first three instructions are attributed to line 11 and subsequent ones to line 9.

I could swear this test was passing when I wrote it...

The symbol for swift_task_switch is marked as a trampoline, which triggers a
generic handling of it through DynamicLoaderDarwin. This makes sense, as it is
usually just a symbol stub that we jump through to get to the real task_switch:

```
...
 0x1000031bc <+112>: b      0x100003d20    ; symbol stub for: swift_task_switch
...

testt.out`swift_task_switch:
swift_task_switch
->  0x100003d20 <+0>: adrp   x16, 1
    0x100003d24 <+4>: ldr    x16, [x16, #0x10]
    0x100003d28 <+8>: br     x16  ; <<< real task_switch_addr
```

DynamicLoaderDarwin will take the thread all the way to the target of the
`br x16` instruction, which is the real task switch.

However, once that's done, there are no thread plans taking the thread
to the target funclet we're switching to. Recall that a task_switch is a
request to run a given funclet inside _some_ executor/thread. It's like
a function call, but with scheduling in the middle.

This commit adds a new step through task_switch thread plan inside
SwiftLanguageRuntime; it is a simplified version of the (now removed)
ThreadPlanStepInAsync: a breakpoint is created to the target funclet,
and we compare async contexts to make sure the breakpoint was hit on the
right task.

Note: we could remove the handling inside DynamicLoaderDarwin with some
kind of "if (symbol == swift_task_switch) don't do anything". The new
thread plan would work even on the stub.
@felipepiovezan felipepiovezan force-pushed the felipe/remove_thread_plan_enhance_async_plan_try2 branch from 6f9bb81 to 1d2f744 Compare November 13, 2024 21:38
@felipepiovezan
Copy link
Author

I've updated the test to account for the bad line table and linked a radar tracking the fix

@felipepiovezan
Copy link
Author

@swift-ci test

@felipepiovezan felipepiovezan merged commit c763551 into swiftlang:stable/20240723 Nov 15, 2024
2 checks passed
@felipepiovezan felipepiovezan deleted the felipe/remove_thread_plan_enhance_async_plan_try2 branch November 15, 2024 21:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants