Skip to content

[Concurrency] Defensive copying of task locals in task groups, rather than crashing #73978

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

ktoso
Copy link
Contributor

@ktoso ktoso commented May 29, 2024

This is a more advanced version of #73881

This addresses a memory safety issue that task groups and task locals have faced since their introduction and has been previously guarded against by crashing rather than allowing the unsafety. The very specific situation this is about is the following:

      await withTaskGroup(of: Void.self) { group in
        TL.$two.withValue(2222) { // PREVIOUSLY: crash and warn against this usage pattern
          group.addTask {
            print("Survived, two: \(TL.two)")
          }
        }

The problem is that the withValue is effectively a push and a pop of the task local binding. The and child tasks refer directly to their parent's task local storage in order to minimize copying. This is fantastic because operations such as this:

TL.$a.withValue(...) {
TL.$b.withValue(...) {
TL.$c.withValue(...) {
TL.$d.withValue(...) { 
await withTaskGroup(of: Void.self) { group in
  group.addTask {}  // no copying, direct to parent reference (safe and good)
}

will cause zero copying. (good and safe)

However, the case of DIRECTLY wrapping an addTask with a task local binding like this: TL.$two.withValue(2222) { group.addTask {} } is unsafe because the "pop" of the $two binding would happen BEFORE the child task gets to run to completion -- would would allow it to refer to released memory.

Previous solution:
We previously would crash whenever withValue was used inside a task group at all. Which led to problematic and unexpected crashes when more complex libraries which wrap code in task groups would call to user code or set task locals, causing people to work around issues: swiftlang/swift-testing#339 and a few more in server libraries.

Sometimes they may not be able to work around these incompatibilities though, leading to a not well composing structured task hierarchy (bad!)

This solution:
We instead now detect task locals pushed during the body of a task group. Then, as we addTask we defensively COPY them if directly in the parent task. This way the following situation is safe:

  await TL.$one.withValue(11) {
    await TL.$one.withValue(1111) {
      await withTaskGroup(of: Void.self) { group in

        TL.$two.withValue(2222) {
          group.addTask { 
          // task locals:
          // - copy 2222 item
          // - next reference to 1111
          // [skip] 11 no need to refer to it, since one already bound   
          }
        }

etc.

TL;DR:

  • Instead of crashing in order to prevent bad memory access, we not handle it by copying.
  • The situation where we copy is very rare and does not impact most of users: because the usual way is to bind around the task group
  • It is simple to avoid the copying: move from withValue { addTask {} } to addTask { withValue {} } which causes no copying.

Result: Instead of crashing, we now can support this pattern - which sometimes comes up unbeknownst to actual developers, because they may be using libraries which wrap them in task groups and whatnot.

resolves rdar://127156303

@ktoso ktoso requested review from mikeash, al45tair and a team as code owners May 29, 2024 15:42
@ktoso ktoso requested a review from FranzBusch May 29, 2024 15:42
@ktoso
Copy link
Contributor Author

ktoso commented May 29, 2024

@swift-ci please test

ktoso added 2 commits May 30, 2024 00:47
These are something CLion understands and suggests, but yeah let's avoid
having warnings in the build.
@ktoso ktoso force-pushed the wip-task-locals-defensive-copying-rather-than-crashing branch from 16b3fb0 to b99b37a Compare May 29, 2024 15:47
@ktoso
Copy link
Contributor Author

ktoso commented May 29, 2024

@swift-ci please test

@ktoso ktoso marked this pull request as draft May 29, 2024 15:52
@ktoso ktoso force-pushed the wip-task-locals-defensive-copying-rather-than-crashing branch from 2facfe1 to d713aef Compare May 30, 2024 05:57
@ktoso
Copy link
Contributor Author

ktoso commented May 30, 2024

@swift-ci please test

@@ -378,7 +376,6 @@ bool swift_bincompat_useLegacyNonCrashingExecutorChecks() {

return legacyMode;
}
#pragma clang diagnostic pop
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was producing warnings, i had introduced this recently -- sorry for noise

ktoso added 3 commits May 30, 2024 16:51
@ktoso ktoso force-pushed the wip-task-locals-defensive-copying-rather-than-crashing branch from b713d76 to 30e1c69 Compare May 30, 2024 07:51
@ktoso
Copy link
Contributor Author

ktoso commented May 30, 2024

@swift-ci please test

@ktoso ktoso marked this pull request as ready for review May 30, 2024 10:19
@ktoso ktoso requested a review from DougGregor May 30, 2024 13:24
@ktoso
Copy link
Contributor Author

ktoso commented May 31, 2024

@swift-ci please smoke test

@ktoso ktoso enabled auto-merge (squash) May 31, 2024 01:37
@ktoso
Copy link
Contributor Author

ktoso commented May 31, 2024

resolves #73217

@ktoso ktoso changed the title [Concurrency] Implement defensive copying in task groups, rather than crashing [Concurrency] Defensive copying of task locals in task groups, rather than crashing May 31, 2024
@ktoso
Copy link
Contributor Author

ktoso commented May 31, 2024

@swift-ci please smoke test macOS

2 similar comments
@ktoso
Copy link
Contributor Author

ktoso commented May 31, 2024

@swift-ci please smoke test macOS

@ktoso
Copy link
Contributor Author

ktoso commented May 31, 2024

@swift-ci please smoke test macOS

@ktoso ktoso merged commit 0c44645 into swiftlang:main May 31, 2024
3 checks passed
Copy link
Contributor

@rjmccall rjmccall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, this is eagerly copying everything that was pushed inside of any task group, right? That's actually subtly less than optimal in the case where there are multiple task groups active at once — we only really need to copy up to the task group that the task is being created within. It's fine to do that for now, but let's not do anything that commits us to that in the long term, like making anything about this public ABI. (This doesn't need to be public anyway — it's all internal to the concurrency library.)

/// Items of this kind must be copied by a group child task for access
/// safety reasons, as otherwise the pop would happen before the child task
/// has completed.
IsNextCreatedInTaskGroupBody = 0b10,
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to do this now, but it kind of sounds like these are flags instead of a single "kind".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, lemme think if I can put these somewhere else 🤔

nextOverride->isParentPointer() &&
"Currently relinking is only done within a task group to "
"avoid within-taskgroup next pointers; attempted to point at "
"task local declared within task group body though!");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A || B && C is parsed as A || (B && C). That turns out to be logically equivalent in this case, but still, it's better to get into the habit of parenthesizing assertion conditions.

The assertion and behavior here is strange — the method name suggests that it's just changing the next pointer, but you're really assuming a stronger condition. It feels like you can't really assert that condition properly, which to me is usually a sign that the operation isn't as general as I'm thinking it is. That's okay; operations don't always generalize. It just suggests that you ought to name this method something that indicates its special behavior, and maybe feel less bad about writing something that doesn't feel like a generic operation.

I assume what we're doing here is cloning the entries and then calling relinkNext on them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a good point to rename this method, it is a very specific trick. When we detect the "next" task local item is "safe" as in "not the bad pattern where we have to copy" we immediately link to it rather than continuing our copying dance. I'll rename and see if I can clarify some more.

Fixing the assertion too, thank you for noticing that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cleaned up in #74089

if (taskCreateFlags.addPendingGroupTaskUnconditionally()) {
assert(group && "Missing group");
swift_taskGroup_addPending(group, /*unconditionally=*/true);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a logically-unrelated cleanup? Seems reasonable, just trying to find the connection.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this bit is just cleanup -- same functionality.

/// func swift_task_localsCopyToTaskGroupChildTaskDefensively<Key>(AsyncTask* task)
/// \endcode
SWIFT_EXPORT_FROM(swift_Concurrency) SWIFT_CC(swift)
void swift_task_localsCopyToTaskGroupChildTaskDefensively(AsyncTask* target);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this needs to be an exported runtime function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can managed without exposing it -- will remove (and remove before getting this into 6.0)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed in #74089

@ktoso
Copy link
Contributor Author

ktoso commented Jun 3, 2024

Thanks for the review John!

So, this is eagerly copying everything that was pushed inside of any task group, right? That's actually subtly less than optimal in the case where there are multiple task groups active at once
Right, we can optimize this a bit more eventually, I didn't find a good way to do this yet, we'd have to store "which group it was created in" and I guess we'd probably either need to make task local 'item' larger or find a different trick, like another kind of record...

I'll make sure we don't lock ourselfs out of the optimization, following up now

@@ -212,6 +258,8 @@ class TaskLocal {
/// "pop" of the `B` value - it was spawned from a scope where only B was observable.
void copyTo(AsyncTask *target);

void copyToOnlyOnlyFromCurrent(AsyncTask *target);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed this in follow up #74089

@ktoso ktoso deleted the wip-task-locals-defensive-copying-rather-than-crashing branch June 3, 2024 06:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants