Skip to content

[Coroutines] Always set the calling convention of generated resuming call from 'llvm.coro.await.suspend.handle' as fast #93167

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

ChuanqiXu9
Copy link
Member

See the post commit message in
#89751

We met a regression due to a change of calling convention of this patch. Previously, the calling convention of indirect resume calls is always fast.

And in this patch, although we tried to take care of it in the cloner, we forget the case that we have to update the resuming calls in the ramp functions. So this is the root cause of the downstream failure.

This patch tries to mark the generated resuming calls as fast immediately after they got created to make sure the calling convention is correct.

@ChuanqiXu9 ChuanqiXu9 added the coroutines C++20 coroutines label May 23, 2024
@ChuanqiXu9 ChuanqiXu9 requested a review from zmodem May 23, 2024 10:34
@ChuanqiXu9 ChuanqiXu9 self-assigned this May 23, 2024
@llvmbot
Copy link
Member

llvmbot commented May 23, 2024

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-coroutines

Author: Chuanqi Xu (ChuanqiXu9)

Changes

See the post commit message in
#89751

We met a regression due to a change of calling convention of this patch. Previously, the calling convention of indirect resume calls is always fast.

And in this patch, although we tried to take care of it in the cloner, we forget the case that we have to update the resuming calls in the ramp functions. So this is the root cause of the downstream failure.

This patch tries to mark the generated resuming calls as fast immediately after they got created to make sure the calling convention is correct.


Full diff: https://github.com/llvm/llvm-project/pull/93167.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/Coroutines/CoroSplit.cpp (+1-1)
  • (added) llvm/test/Transforms/Coroutines/coro-await-suspend-handle-in-ramp.ll (+59)
diff --git a/llvm/lib/Transforms/Coroutines/CoroSplit.cpp b/llvm/lib/Transforms/Coroutines/CoroSplit.cpp
index 1d9cf185b75a7..5a58a99d2879e 100644
--- a/llvm/lib/Transforms/Coroutines/CoroSplit.cpp
+++ b/llvm/lib/Transforms/Coroutines/CoroSplit.cpp
@@ -227,6 +227,7 @@ static void lowerAwaitSuspend(IRBuilder<> &Builder, CoroAwaitSuspendInst *CB,
     FunctionType *ResumeTy = FunctionType::get(
         Type::getVoidTy(Ctx), PointerType::getUnqual(Ctx), false);
     auto *ResumeCall = Builder.CreateCall(ResumeTy, ResumeAddr, {NewCall});
+    ResumeCall->setCallingConv(CallingConv::Fast);
 
     // We can't insert the 'ret' instruction and adjust the cc until the
     // function has been split, so remember this for later.
@@ -1088,7 +1089,6 @@ void CoroCloner::create() {
   // Turn symmetric transfers into musttail calls.
   for (CallInst *ResumeCall : Shape.SymmetricTransfers) {
     ResumeCall = cast<CallInst>(VMap[ResumeCall]);
-    ResumeCall->setCallingConv(NewF->getCallingConv());
     if (TTI.supportsTailCallFor(ResumeCall)) {
       // FIXME: Could we support symmetric transfer effectively without
       // musttail?
diff --git a/llvm/test/Transforms/Coroutines/coro-await-suspend-handle-in-ramp.ll b/llvm/test/Transforms/Coroutines/coro-await-suspend-handle-in-ramp.ll
new file mode 100644
index 0000000000000..85e8bb52fee35
--- /dev/null
+++ b/llvm/test/Transforms/Coroutines/coro-await-suspend-handle-in-ramp.ll
@@ -0,0 +1,59 @@
+; Tests lowerings of different versions of coro.await.suspend
+; RUN: opt < %s -passes='module(coro-early),cgscc(coro-split),simplifycfg' -S | FileCheck %s
+
+%Awaiter = type {}
+
+define void @f() presplitcoroutine {
+entry:
+  %awaiter = alloca %Awaiter
+  %id = call token @llvm.coro.id(i32 0, ptr null, ptr null, ptr null)
+  %size = call i32 @llvm.coro.size.i32()
+  %alloc = call ptr @malloc(i32 %size)
+  %hdl = call ptr @llvm.coro.begin(token %id, ptr %alloc)
+  call void @llvm.coro.await.suspend.handle(ptr %awaiter, ptr %hdl, ptr @await_suspend_wrapper_handle)
+  %suspend.init = call i8 @llvm.coro.suspend(token none, i1 false)
+  switch i8 %suspend.init, label %ret [
+    i8 0, label %step
+    i8 1, label %cleanup
+  ]
+
+; Check the calling convention for resuming function is fastcc
+; CHECK:     define {{[^@]*}} @f()
+; CHECK:      entry:
+; CHECK:        %[[NEXT_HDL:.+]] = call ptr @await_suspend_wrapper_handle(
+; CHECK-NEXT:   %[[CONT:.+]] = call ptr @llvm.coro.subfn.addr(ptr %[[NEXT_HDL]], i8 0)
+; CHECK-NEXT:   musttail call fastcc void %[[CONT]](ptr %[[NEXT_HDL]])
+step:
+  br label %cleanup
+
+cleanup:
+  %mem = call ptr @llvm.coro.free(token %id, ptr %hdl)
+  call void @free(ptr %mem)
+  br label %ret
+
+ret:
+  call i1 @llvm.coro.end(ptr %hdl, i1 0, token none)
+  ret void
+}
+
+; check that we were haven't accidentally went out of @f body
+; CHECK-LABEL: @f.resume(
+; CHECK-LABEL: @f.destroy(
+; CHECK-LABEL: @f.cleanup(
+
+declare ptr @await_suspend_wrapper_handle(ptr, ptr)
+
+declare ptr @llvm.coro.free(token, ptr)
+declare i32 @llvm.coro.size.i32()
+declare i8  @llvm.coro.suspend(token, i1)
+declare void @llvm.coro.resume(ptr)
+declare void @llvm.coro.destroy(ptr)
+
+declare token @llvm.coro.id(i32, ptr, ptr, ptr)
+declare i1 @llvm.coro.alloc(token)
+declare ptr @llvm.coro.begin(token, ptr)
+declare void @llvm.coro.await.suspend.handle(ptr, ptr, ptr)
+declare i1 @llvm.coro.end(ptr, i1, token)
+
+declare noalias ptr @malloc(i32)
+declare void @free(ptr)

…call from 'llvm.coro.await.suspend.handle' as fast

See the post commit message in
llvm#89751

We met a regression due to a change of calling convention of this patch.
Previously, the calling convention of indirect resume calls is always
fast.

And in this patch, although we tried to take care of it in the cloner,
we forget the case that we have to update the resuming calls in the
ramp functions. So this is the root cause of the downstream failure.

This patch tries to mark the generated resuming calls as fast
immediately after they got created to make sure the calling convention
is correct.
@ChuanqiXu9 ChuanqiXu9 force-pushed the AlwaysFastConventionForCoroResuming branch from 322b821 to cbcb66d Compare May 23, 2024 10:38
@zmodem
Copy link
Collaborator

zmodem commented May 23, 2024

lgtm, thanks!

We met a regression due to a change of calling convention of this patch.

How did you detect the regression? Was the resume call crashing due to mismatched calling convention? Did the verifier catch it?

@ChuanqiXu9
Copy link
Member Author

lgtm, thanks!

We met a regression due to a change of calling convention of this patch.

How did you detect the regression? Was the resume call crashing due to mismatched calling convention? Did the verifier catch it?

No, it was an amazing double free segfault. I tried to reduce the case but it shows too hard. Then I tried to review the code second time and find this point luckily.

@ChuanqiXu9
Copy link
Member Author

I'll land this to give it a try. Feel free to revert it if the bots alert again and I don't respond in time.

@ChuanqiXu9 ChuanqiXu9 merged commit 44430de into llvm:main May 23, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants