Skip to content

[Concurrency] Fix too optimistic TaskGroup bail-out-when-empty, #63016

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Jan 27, 2023

Conversation

ktoso
Copy link
Contributor

@ktoso ktoso commented Jan 13, 2023

discarding group may need to emit an error out of such waitAll attempt, if a previous error was already stored.

The exact dance is:

  • we store an error from child task
  • the pending task count is 0
  • the code trying to waitAll would check status ONLY and bail out, noticing that pending count is zero -- returning the actual body value

Instead, it should:

  • check again the readyQueue if we're in, if there was a stored value, because if so, it may be an RawError that the DiscardingTaskGroup should emit.

This also is now able to easily handle the "first error always wins" but we choose to implement the "error thrown by body wins" after discussions within the team.

Resolves rdar://104198700
Resolves #63147
Resolves rdar://104507347
Resolves rdar://104332560
Resolves rdar://104212282

@ktoso
Copy link
Contributor Author

ktoso commented Jan 13, 2023

This resolves a flaky test (and an actual bug) that @shahmishal and @beccadax reported today.
Thank you for the ping, was a real omission in the code.

@ktoso
Copy link
Contributor Author

ktoso commented Jan 13, 2023

@swift-ci please smoke test

@ktoso ktoso added the concurrency Feature: umbrella label for concurrency language features label Jan 13, 2023
@ktoso
Copy link
Contributor Author

ktoso commented Jan 13, 2023

I'll have another look and add more tests about the retain/release correctness of the errors... I have a suspicion it might be wrong and over-retain the error a bit... 🤔

@ktoso
Copy link
Contributor Author

ktoso commented Jan 16, 2023

Had another bug hiding, we must offerError WHILE holding the group locks ofc, otherwise we get interleaving between the offer and completions of other tasks and may end up returning a success while we intended to throw the body error.

Fix: 95628b1

@ktoso ktoso force-pushed the wip-waitAll-race branch 2 times, most recently from 642dc77 to 13de654 Compare January 16, 2023 09:36
@ktoso
Copy link
Contributor Author

ktoso commented Jan 16, 2023

@swift-ci please smoke test

@ktoso
Copy link
Contributor Author

ktoso commented Jan 16, 2023

@swift-ci please smoke test

@ktoso
Copy link
Contributor Author

ktoso commented Jan 17, 2023

@swift-ci please smoke test linux

@ktoso
Copy link
Contributor Author

ktoso commented Jan 17, 2023

@swift-ci please smoke test

1 similar comment
@ktoso
Copy link
Contributor Author

ktoso commented Jan 17, 2023

@swift-ci please smoke test

@@ -1641,8 +1652,9 @@ static void swift_taskGroup_waitAllImpl(
ThrowingTaskFutureWaitContinuationFunction *resumeFunction,
AsyncContext *rawContext) {
auto waitingTask = swift_task_getCurrent();
waitingTask->ResumeTask = task_group_wait_resume_adapter;
waitingTask->ResumeContext = rawContext;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was incorrect, we cannot do this without holding the lock, as otherwise an offer may pull the waiting task off the queue; but only one of the paths shall update and run the task.

@@ -1739,21 +1739,52 @@ PollResult TaskGroupBase::tryEnqueueWaitingTask(AsyncTask *waitingTask) {
bool haveRunOneChildTaskInline = false;

reevaluate_if_TaskGroup_has_results:;
auto assumed = statusMarkWaitingAssumeAcquire();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change here is that we don't mark and undo, but only mark when we indeed decide that we'll be waiting.

This is all still done with a lock, but the dance is more correct anyway.


@main struct Main {
static func main() async {
for i in 0...1_000 {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is quick enough tbh, let's leave it like this

@ktoso
Copy link
Contributor Author

ktoso commented Jan 18, 2023

@swift-ci please smoke test

@ktoso ktoso removed the request for review from shahmishal January 18, 2023 13:10
@ktoso
Copy link
Contributor Author

ktoso commented Jan 19, 2023

All passed but I want to make double-sure here, run again.

@ktoso
Copy link
Contributor Author

ktoso commented Jan 19, 2023

@swift-ci please smoke test

may need to emit an error out of such waitAll attempt, if a previous
error was already stored.
@ktoso
Copy link
Contributor Author

ktoso commented Jan 24, 2023

@swift-ci please smoke test

@ktoso
Copy link
Contributor Author

ktoso commented Jan 24, 2023

@swift-ci please smoke test linux

@ktoso
Copy link
Contributor Author

ktoso commented Jan 24, 2023

Heh, only remaining thing is a pure virtual method called happening only on linux that I can't explain :-( Getting a linux box to dig deeper.

Only on our Ubuntu 20.04 CI this happens:

[0x7fce8ebfd700] [/home/build-user/swift/stdlib/public/Concurrency/TaskGroup.cpp:847][group(0x55e4c18fc5b0)] (swift_taskGroup_initializeWithFlagsImpl) create group; flags: isDiscardingResults=1
[0x7fce8ebfd700] [/home/build-user/swift/stdlib/public/Concurrency/TaskGroup.cpp:631][group(0x55e4c18fc5b0,discardResults)] (statusAddPendingTaskRelaxed) addPending, after = TaskGroupStatus{ C:n W:n P:1 0000000000000000000000000000000000000000000000000000000000000001 }
[0x7fce8ebfd700] [/home/build-user/swift/stdlib/public/Concurrency/TaskGroup.cpp:631][group(0x55e4c18fc5b0,discardResults)] (statusAddPendingTaskRelaxed) addPending, after = TaskGroupStatus{ C:n W:n P:2 0000000000000000000000000000000000000000000000000000000000000010 }
[0x7fce8f5fe700] [/home/build-user/swift/stdlib/public/Concurrency/TaskGroup.cpp:1159][group(0x55e4c18fc5b0,discardResults)] (offer) offer, completedTask:0x7fce8802a2f0, error:0, status:TaskGroupStatus{ C:n W:n P:2 0000000000000000000000000000000000000000000000000000000000000010 }
[0x7fce8f5fe700] [/home/build-user/swift/stdlib/public/Concurrency/TaskGroup.cpp:1167][group(0x55e4c18fc5b0,discardResults)] (offer) offer, complete, status afterComplete:TaskGroupStatus{ C:n W:n P:1 0000000000000000000000000000000000000000000000000000000000000001 }
[0x7fce8ebfd700] [/home/build-user/swift/stdlib/public/Concurrency/TaskGroup.cpp:1728][group(0x55e4c18fc5b0,discardResults)] (waitAll) waitAll, bodyError = (nil), status = TaskGroupStatus{ C:n W:n P:1 0000000000000000000000000000000000000000000000000000000000000001 }
[0x7fce8ebfd700] [/home/build-user/swift/stdlib/public/Concurrency/TaskGroup.cpp:1743][group(0x55e4c18fc5b0,discardResults)] (waitAll) waitAll, status = TaskGroupStatus{ C:n W:n P:1 0000000000000000000000000000000000000000000000000000000000000001 }
[0x7fce8ebfd700] [/home/build-user/swift/stdlib/public/Concurrency/TaskGroup.cpp:1792][group(0x55e4c18fc5b0,discardResults)] (waitAll) WATCH OUT, set waiter onto... waitQueue.head = (nil)
[0x7fce8ebfd700] [/home/build-user/swift/stdlib/public/Concurrency/TaskGroup.cpp:1799][group(0x55e4c18fc5b0,discardResults)] (waitAll) waitAll, marked waiting status = TaskGroupStatus{ C:n W:y P:1 0100000000000000000000000000000000000000000000000000000000000001 }
[0x7fce8f5fe700] [/home/build-user/swift/stdlib/public/Concurrency/TaskGroup.cpp:1159][group(0x55e4c18fc5b0,discardResults)] (offer) offer, completedTask:0x7fce88029c70, error:1, status:TaskGroupStatus{ C:n W:y P:1 0100000000000000000000000000000000000000000000000000000000000001 }
[0x7fce8f5fe700] [/home/build-user/swift/stdlib/public/Concurrency/TaskGroup.cpp:1167][group(0x55e4c18fc5b0,discardResults)] (offer) offer, complete, status afterComplete:TaskGroupStatus{ C:n W:y P:0 0100000000000000000000000000000000000000000000000000000000000000 }
[0x7fce8f5fe700] [/home/build-user/swift/stdlib/public/Concurrency/TaskGroup.cpp:1256][group(0x55e4c18fc5b0,discardResults)] (resumeWaitingTask) resume waiting task = 0x55e4c18fc290, alreadyDecremented:1, error:1, complete with = 0x7fce88029c70
[0x7fce8f5fe700] [/home/build-user/swift/stdlib/public/Concurrency/TaskGroup.cpp:1259][group(0x55e4c18fc5b0,discardResults)] (resumeWaitingTask) resumeWaitingTask, attempt CAS, waiting task = 0x55e4c18fc290, waitQueue.head = 0x55e4c18fc290, error:1, complete with = 0x7fce88029c70
[0x7fce8f5fe700] [/home/build-user/swift/stdlib/public/Concurrency/TaskGroup.cpp:1296][group(0x55e4c18fc5b0,discardResults)] (resumeWaitingTask) resume waiting DONE, task = 0x55e4c18fc290, backup = 0x55e4c18fc290, complete with = 0x7fce88029c70, status = TaskGroupStatus{ C:y W:y P:0 1100000000000000000000000000000000000000000000000000000000000000 }
pure virtual method called
terminate called without an active exception
/home/build-user/build/buildbot_linux/swift-linux-x86_64/test-linux-x86_64/Concurrency/Runtime/Output/async_taskgroup_discarding.swift.script: line 1: 25881 Aborted                 (core dumped) /usr/bin/env LD_LIBRARY_PATH='/home/build-user/build/buildbot_linux/swift-linux-x86_64/lib/swift/linux' /home/build-user/build/buildbot_linux/swift-linux-x86_64/test-linux-x86_64/Concurrency/Runtime/Output/async_taskgroup_discarding.swift.tmp/a.out

--

Unable to reproduce this issue on a linux host 👎 Haven been trying all day to find some reason for this.

@ktoso
Copy link
Contributor Author

ktoso commented Jan 24, 2023

@swift-ci please smoke test

@ktoso
Copy link
Contributor Author

ktoso commented Jan 24, 2023

Managed to reproduce the other occasional issue, with debug output:

[0x7f032a0a9640] [/home/ktoso/swift-project/swift/stdlib/public/Concurrency/TaskGroup.cpp:847][group(0x559fad1a15f0)] (swift_taskGroup_initializeWithFlagsImpl) create group; flags: isDiscardingResults=1
[0x7f032a0a9640] [/home/ktoso/swift-project/swift/stdlib/public/Concurrency/TaskGroup.cpp:631][group(0x559fad1a15f0,discardResults)] (statusAddPendingTaskRelaxed) addPending, after = TaskGroupStatus{ C:n W:n P:1 0000000000000000000000000000000000000000000000000000000000000001 }
[0x7f032a0a9640] [/home/ktoso/swift-project/swift/stdlib/public/Concurrency/TaskGroup.cpp:631][group(0x559fad1a15f0,discardResults)] (statusAddPendingTaskRelaxed) addPending, after = TaskGroupStatus{ C:n W:n P:2 0000000000000000000000000000000000000000000000000000000000000010 }
[0x7f032a0a9640] [/home/ktoso/swift-project/swift/stdlib/public/Concurrency/TaskGroup.cpp:1728][group(0x559fad1a15f0,discardResults)] (waitAll) waitAll, bodyError = (nil), status = TaskGroupStatus{ C:n W:n P:2 0000000000000000000000000000000000000000000000000000000000000010 }
[0x7f032a0a9640] [/home/ktoso/swift-project/swift/stdlib/public/Concurrency/TaskGroup.cpp:1743][group(0x559fad1a15f0,discardResults)] (waitAll) waitAll, status = TaskGroupStatus{ C:n W:n P:2 0000000000000000000000000000000000000000000000000000000000000010 }
[0x7f032a0a9640] [/home/ktoso/swift-project/swift/stdlib/public/Concurrency/TaskGroup.cpp:1792][group(0x559fad1a15f0,discardResults)] (waitAll) WATCH OUT, set waiter onto... waitQueue.head = (nil)
[0x7f032a0a9640] [/home/ktoso/swift-project/swift/stdlib/public/Concurrency/TaskGroup.cpp:1799][group(0x559fad1a15f0,discardResults)] (waitAll) waitAll, marked waiting status = TaskGroupStatus{ C:n W:y P:2 0100000000000000000000000000000000000000000000000000000000000010 }
[0x7f032b8ac640] [/home/ktoso/swift-project/swift/stdlib/public/Concurrency/TaskGroup.cpp:1159][group(0x559fad1a15f0,discardResults)] (offer) offer, completedTask:0x7f031c00ed00, error:0, status:TaskGroupStatus{ C:n W:y P:2 0100000000000000000000000000000000000000000000000000000000000010 }
[0x7f032b8ac640] [/home/ktoso/swift-project/swift/stdlib/public/Concurrency/TaskGroup.cpp:1167][group(0x559fad1a15f0,discardResults)] (offer) offer, complete, status afterComplete:TaskGroupStatus{ C:n W:y P:1 0100000000000000000000000000000000000000000000000000000000000001 }
[0x7f032a8aa640] [/home/ktoso/swift-project/swift/stdlib/public/Concurrency/TaskGroup.cpp:1159][group(0x559fad1a15f0,discardResults)] (offer) offer, completedTask:0x7f031c00e820, error:1, status:TaskGroupStatus{ C:n W:y P:1 0100000000000000000000000000000000000000000000000000000000000001 }
[0x7f032a8aa640] [/home/ktoso/swift-project/swift/stdlib/public/Concurrency/TaskGroup.cpp:1167][group(0x559fad1a15f0,discardResults)] (offer) offer, complete, status afterComplete:TaskGroupStatus{ C:n W:y P:0 0100000000000000000000000000000000000000000000000000000000000000 }
[0x7f032a8aa640] [/home/ktoso/swift-project/swift/stdlib/public/Concurrency/TaskGroup.cpp:1256][group(0x559fad1a15f0,discardResults)] (resumeWaitingTask) resume waiting task = 0x559fad1a1290, alreadyDecremented:1, error:1, complete with = 0x7f031c00e820
[0x7f032a8aa640] [/home/ktoso/swift-project/swift/stdlib/public/Concurrency/TaskGroup.cpp:1259][group(0x559fad1a15f0,discardResults)] (resumeWaitingTask) resumeWaitingTask, attempt CAS, waiting task = 0x559fad1a1290, waitQueue.head = 0x559fad1a1290, error:1, complete with = 0x7f031c00e820
[0x7f032a8aa640] [/home/ktoso/swift-project/swift/stdlib/public/Concurrency/TaskGroup.cpp:1296][group(0x559fad1a15f0,discardResults)] (resumeWaitingTask) resume waiting DONE, task = 0x559fad1a1290, backup = 0x559fad1a1290, complete with = 0x7f031c00e820, status = TaskGroupStatus{ C:y W:y P:0 1100000000000000000000000000000000000000000000000000000000000000 }
main/async_taskgroup_discarding.swift:56: Fatal error: (iteration:33) expected error to be re-thrown, got: 12
Current stack trace:
0    libswiftCore.so                    0x00007f032c5df890 _swift_stdlib_reportFatalErrorInFile + 112
1    libswiftCore.so                    0x00007f032c1d3892 <unavailable> + 1517714
2    libswiftCore.so                    0x00007f032c1d36e6 <unavailable> + 1517286

but I can't understand how that's possible; we never hit any "store the result" codepath in the first offer, so how does the 12 end up in the result...

@ktoso
Copy link
Contributor Author

ktoso commented Jan 24, 2023

preset=asan
@swift-ci Please test with preset macOS

@ktoso
Copy link
Contributor Author

ktoso commented Jan 24, 2023

@swift-ci Please smoke test linux

@ktoso
Copy link
Contributor Author

ktoso commented Jan 24, 2023

@swift-ci Please smoke test

@ktoso
Copy link
Contributor Author

ktoso commented Jan 24, 2023

@swift-ci Please smoke test macOS

@ktoso
Copy link
Contributor Author

ktoso commented Jan 24, 2023

Seems linux is passing with the latest rework of waitAll body -- avoiding a race on context mutation. Let's give it another spin though.

Yeah all those issues can be tracked back to an use-after-free I found and finally fixed now.

@ktoso
Copy link
Contributor Author

ktoso commented Jan 24, 2023

@swift-ci Please smoke test linux

@ktoso
Copy link
Contributor Author

ktoso commented Jan 25, 2023

@swift-ci Please smoke test

@ktoso
Copy link
Contributor Author

ktoso commented Jan 25, 2023

Unlocking one more test

@swift-ci Please smoke test

@ktoso
Copy link
Contributor Author

ktoso commented Jan 25, 2023

Also picked over the overflow protection we did way back then in #62271

I believe we're finally golden here, let's give it another test run and I'll pick forward.

@ktoso
Copy link
Contributor Author

ktoso commented Jan 25, 2023

@swift-ci please test

@ktoso
Copy link
Contributor Author

ktoso commented Jan 25, 2023

ARM_ARCH_7K compilation fun™, fixed now.

@ktoso
Copy link
Contributor Author

ktoso commented Jan 25, 2023

@swift-ci please test

@ktoso
Copy link
Contributor Author

ktoso commented Jan 26, 2023

@swift-ci please smoke test windows

@ktoso ktoso merged commit 99fb37f into swiftlang:main Jan 27, 2023
@ktoso ktoso deleted the wip-waitAll-race branch January 27, 2023 10:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
concurrency Feature: umbrella label for concurrency language features
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TaskGroups waitForAll changed behaviour in recent 5.8 toolchains
1 participant