Skip to content

[Coroutines] Introduce [[clang::coro_only_destroy_when_complete]] #71014

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Nov 9, 2023

Conversation

ChuanqiXu9
Copy link
Member

Close #56980.

This patch tries to introduce a light-weight optimization attribute for coroutines which are guaranteed to only be destroyed after it reached the final suspend.

The rationale behind the patch is simple. See the example:

A foo() {
  dtor d;
  co_await something();
  dtor d1;
  co_await something();
  dtor d2;
  co_return 43;
}

Generally the generated .destroy function may be:

void foo.destroy(foo.Frame *frame) {
  switch(frame->suspend_index()) {
    case 1:
      frame->d.~dtor();
      break;
    case 2:
      frame->d.~dtor();
      frame->d1.~dtor();
      break;
    case 3:
      frame->d.~dtor();
      frame->d1.~dtor();
      frame->d2.~dtor();
      break;
    default: // coroutine completed or haven't started
      break;
  }

  frame->promise.~promise_type();
  delete frame;
}

Since the compiler need to be ready for all the cases that the coroutine may be destroyed in a valid state.

However, from the user's perspective, we can understand that certain coroutine types may only be destroyed after it reached to the final suspend point. And we need a method to teach the compiler about this. Then this is the patch. After the compiler recognized that the coroutines can only be destroyed after complete, it can optimize the above example to:

void foo.destroy(foo.Frame *frame) {
  frame->promise.~promise_type();
  delete frame;
}

I spent a lot of time experimenting and experiencing this in the downstream. The numbers are really good. In a real-world coroutine-heavy workload, the size of the build dir (including .o files) reduces 14%. And the size of final libraries (excluding the .o files) reduces 8% in Debug mode and 1% in Release mode.

@ChuanqiXu9 ChuanqiXu9 added the coroutines C++20 coroutines label Nov 2, 2023
@ChuanqiXu9 ChuanqiXu9 self-assigned this Nov 2, 2023
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:codegen IR generation bugs: mangling, exceptions, etc. llvm:ir llvm:transforms labels Nov 2, 2023
@llvmbot
Copy link
Member

llvmbot commented Nov 2, 2023

@llvm/pr-subscribers-llvm-ir
@llvm/pr-subscribers-clang-codegen
@llvm/pr-subscribers-clang

@llvm/pr-subscribers-coroutines

Author: Chuanqi Xu (ChuanqiXu9)

Changes

Close #56980.

This patch tries to introduce a light-weight optimization attribute for coroutines which are guaranteed to only be destroyed after it reached the final suspend.

The rationale behind the patch is simple. See the example:

A foo() {
  dtor d;
  co_await something();
  dtor d1;
  co_await something();
  dtor d2;
  co_return 43;
}

Generally the generated .destroy function may be:

void foo.destroy(foo.Frame *frame) {
  switch(frame->suspend_index()) {
    case 1:
      frame->d.~dtor();
      break;
    case 2:
      frame->d.~dtor();
      frame->d1.~dtor();
      break;
    case 3:
      frame->d.~dtor();
      frame->d1.~dtor();
      frame->d2.~dtor();
      break;
    default: // coroutine completed or haven't started
      break;
  }

  frame->promise.~promise_type();
  delete frame;
}

Since the compiler need to be ready for all the cases that the coroutine may be destroyed in a valid state.

However, from the user's perspective, we can understand that certain coroutine types may only be destroyed after it reached to the final suspend point. And we need a method to teach the compiler about this. Then this is the patch. After the compiler recognized that the coroutines can only be destroyed after complete, it can optimize the above example to:

void foo.destroy(foo.Frame *frame) {
  frame->promise.~promise_type();
  delete frame;
}

I spent a lot of time experimenting and experiencing this in the downstream. The numbers are really good. In a real-world coroutine-heavy workload, the size of the build dir (including .o files) reduces 14%. And the size of final libraries (excluding the .o files) reduces 8% in Debug mode and 1% in Release mode.


Full diff: https://github.com/llvm/llvm-project/pull/71014.diff

14 Files Affected:

  • (modified) clang/include/clang/Basic/Attr.td (+12)
  • (modified) clang/include/clang/Basic/AttrDocs.td (+66)
  • (modified) clang/lib/CodeGen/CGCoroutine.cpp (+6)
  • (added) clang/test/CodeGenCoroutines/coro-only-destroy-when-complete.cpp (+59)
  • (modified) clang/test/Misc/pragma-attribute-supported-attributes-list.test (+1)
  • (modified) llvm/docs/Coroutines.rst (+11)
  • (modified) llvm/include/llvm/Bitcode/LLVMBitCodes.h (+1)
  • (modified) llvm/include/llvm/IR/Attributes.td (+3)
  • (modified) llvm/include/llvm/IR/Function.h (+7)
  • (modified) llvm/lib/Bitcode/Reader/BitcodeReader.cpp (+2)
  • (modified) llvm/lib/Bitcode/Writer/BitcodeWriter.cpp (+2)
  • (modified) llvm/lib/Transforms/Coroutines/CoroSplit.cpp (+14-7)
  • (modified) llvm/lib/Transforms/Utils/CodeExtractor.cpp (+1)
  • (added) llvm/test/Transforms/Coroutines/coro-only-destroy-when-done.ll (+137)
diff --git a/clang/include/clang/Basic/Attr.td b/clang/include/clang/Basic/Attr.td
index 60b549999c155e5..31434565becaec6 100644
--- a/clang/include/clang/Basic/Attr.td
+++ b/clang/include/clang/Basic/Attr.td
@@ -1082,6 +1082,18 @@ def CFConsumed : InheritableParamAttr {
   let Documentation = [RetainBehaviorDocs];
 }
 
+
+// coro_only_destroy_when_complete indicates the coroutines whose return type
+// is marked by coro_only_destroy_when_complete can only be destroyed when the
+// coroutine completes. Then the space for the destroy functions can be saved.
+def CoroOnlyDestroyWhenComplete : InheritableAttr {
+  let Spellings = [Clang<"coro_only_destroy_when_complete">];
+  let Subjects = SubjectList<[CXXRecord]>;
+  let LangOpts = [CPlusPlus];
+  let Documentation = [CoroOnlyDestroyWhenCompleteDocs];
+  let SimpleHandler = 1;
+}
+
 // OSObject-based attributes.
 def OSConsumed : InheritableParamAttr {
   let Spellings = [Clang<"os_consumed">];
diff --git a/clang/include/clang/Basic/AttrDocs.td b/clang/include/clang/Basic/AttrDocs.td
index 05703df2129f612..dd1b44801a76947 100644
--- a/clang/include/clang/Basic/AttrDocs.td
+++ b/clang/include/clang/Basic/AttrDocs.td
@@ -7416,3 +7416,69 @@ that ``p->array`` must have at least ``p->count`` number of elements available:
 
   }];
 }
+
+def CoroOnlyDestroyWhenCompleteDocs : Documentation {
+  let Category = DocCatDecl;
+  let Content = [{
+The `coro_only_destroy_when_complete` attribute should be marked on a C++ class. The coroutines
+whose return type is marked as the attribute are assumed to be destroyed only after then coroutines
+reached to the final suspend point.
+
+This is helpful for the optimizers to perform more optimizations.
+
+For example,
+
+.. code-block:: c++
+
+  A foo() {
+    dtor d;
+    co_await something();
+    dtor d1;
+    co_await something();
+    dtor d2;
+    co_return 43;
+  }
+
+The compiler may generate the following pseudocode:
+
+.. code-block:: c++
+
+  void foo.destroy(foo.Frame *frame) {
+    switch(frame->suspend_index()) {
+      case 1:
+        frame->d.~dtor();
+        break;
+      case 2:
+        frame->d.~dtor();
+        frame->d1.~dtor();
+        break;
+      case 3:
+        frame->d.~dtor();
+        frame->d1.~dtor();
+        frame->d2.~dtor();
+        break;
+      default: // coroutine completed or haven't started
+        break;
+    }
+
+    frame->promise.~promise_type();
+    delete frame;
+  }
+
+The rationale of the `foo.destroy()` function is: we need to release all the
+resources initialized if we're going to be destroyed in any suspend point.
+
+But if the coroutine never got destroyed besides the final suspend point, the generated
+big switch case is a waste. In this case, users can pass the information to the optimizer
+by the `coro_only_destroy_when_complete` attribute, then the optimizier can optimize the
+above code to:
+
+.. code-block:: c++
+
+  void foo.destroy(foo.Frame *frame) {
+    frame->promise.~promise_type();
+    delete frame;
+  }
+
+  }];
+}
diff --git a/clang/lib/CodeGen/CGCoroutine.cpp b/clang/lib/CodeGen/CGCoroutine.cpp
index 58310216ecff1f5..b6ac3fabeab7bc2 100644
--- a/clang/lib/CodeGen/CGCoroutine.cpp
+++ b/clang/lib/CodeGen/CGCoroutine.cpp
@@ -777,6 +777,12 @@ void CodeGenFunction::EmitCoroutineBody(const CoroutineBodyStmt &S) {
 
   // LLVM require the frontend to mark the coroutine.
   CurFn->setPresplitCoroutine();
+
+  {
+    CXXRecordDecl *RD = FnRetTy->getAsCXXRecordDecl();
+    if (RD && RD->hasAttr<CoroOnlyDestroyWhenCompleteAttr>())
+      CurFn->setCoroDestroyOnlyWhenDone();
+  }
 }
 
 // Emit coroutine intrinsic and patch up arguments of the token type.
diff --git a/clang/test/CodeGenCoroutines/coro-only-destroy-when-complete.cpp b/clang/test/CodeGenCoroutines/coro-only-destroy-when-complete.cpp
new file mode 100644
index 000000000000000..d7eb64f915484be
--- /dev/null
+++ b/clang/test/CodeGenCoroutines/coro-only-destroy-when-complete.cpp
@@ -0,0 +1,59 @@
+// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -std=c++20 \
+// RUN:     -disable-llvm-passes -emit-llvm %s -o - | FileCheck %s
+
+// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -std=c++20 \
+// RUN:     -O3 -emit-llvm %s -o - | FileCheck %s --check-prefix=CHECK-O
+
+#include "Inputs/coroutine.h"
+
+using namespace std;
+
+struct A;
+struct A_promise_type {
+  A get_return_object();
+  suspend_always initial_suspend();
+  suspend_always final_suspend() noexcept;
+  void return_value(int);
+  void unhandled_exception();
+
+  std::coroutine_handle<> handle;
+};
+
+struct Awaitable{
+  bool await_ready();
+  int await_resume();
+  template <typename F>
+  void await_suspend(F);
+};
+Awaitable something();
+
+struct dtor {
+    dtor();
+    ~dtor();
+};
+
+struct [[clang::coro_only_destroy_when_complete]] A {
+  using promise_type = A_promise_type;
+  A();
+  A(std::coroutine_handle<>);
+  ~A();
+
+  std::coroutine_handle<promise_type> handle;
+};
+
+A foo() {
+    dtor d;
+    co_await something();
+    dtor d1;
+    co_await something();
+    dtor d2;
+    co_return 43;
+}
+
+// CHECK: define{{.*}}@_Z3foov({{.*}}) #[[ATTR_NUM:[0-9]+]]
+// CHECK: attributes #[[ATTR_NUM]] = {{.*}}coro_only_destroy_when_done
+
+// CHECK-O: define{{.*}}@_Z3foov.destroy
+// CHECK-O: {{^.*}}:
+// CHECK-O-NOT: br
+// CHECK-O: ret void
diff --git a/clang/test/Misc/pragma-attribute-supported-attributes-list.test b/clang/test/Misc/pragma-attribute-supported-attributes-list.test
index f48126775c8685c..969794073a5f2e8 100644
--- a/clang/test/Misc/pragma-attribute-supported-attributes-list.test
+++ b/clang/test/Misc/pragma-attribute-supported-attributes-list.test
@@ -56,6 +56,7 @@
 // CHECK-NEXT: ConsumableAutoCast (SubjectMatchRule_record)
 // CHECK-NEXT: ConsumableSetOnRead (SubjectMatchRule_record)
 // CHECK-NEXT: Convergent (SubjectMatchRule_function)
+// CHECK-NEXT: CoroOnlyDestroyWhenComplete (SubjectMatchRule_record)
 // CHECK-NEXT: CountedBy (SubjectMatchRule_field)
 // CHECK-NEXT: DLLExport (SubjectMatchRule_function, SubjectMatchRule_variable, SubjectMatchRule_record, SubjectMatchRule_objc_interface)
 // CHECK-NEXT: DLLImport (SubjectMatchRule_function, SubjectMatchRule_variable, SubjectMatchRule_record, SubjectMatchRule_objc_interface)
diff --git a/llvm/docs/Coroutines.rst b/llvm/docs/Coroutines.rst
index f4a27812bc7bac8..84a5c4b6d7ec0a5 100644
--- a/llvm/docs/Coroutines.rst
+++ b/llvm/docs/Coroutines.rst
@@ -1775,6 +1775,17 @@ CoroCleanup
 This pass runs late to lower all coroutine related intrinsics not replaced by
 earlier passes.
 
+Attributes
+==========
+
+coro_only_destroy_when_done
+---------------------------
+
+When the coroutine are marked with coro_only_destroy_when_done, it indicates
+the coroutine must reach the final suspend point when it get destroyed.
+
+This attribute only works for switched-resume coroutines now.
+
 Metadata
 ========
 
diff --git a/llvm/include/llvm/Bitcode/LLVMBitCodes.h b/llvm/include/llvm/Bitcode/LLVMBitCodes.h
index 2a522c517c7fa2f..57c94c6411d67c4 100644
--- a/llvm/include/llvm/Bitcode/LLVMBitCodes.h
+++ b/llvm/include/llvm/Bitcode/LLVMBitCodes.h
@@ -718,6 +718,7 @@ enum AttributeKindCodes {
   ATTR_KIND_NOFPCLASS = 87,
   ATTR_KIND_OPTIMIZE_FOR_DEBUGGING = 88,
   ATTR_KIND_WRITABLE = 89,
+  ATTR_KIND_CORO_ONLY_DESTROY_WHEN_DONE = 90,
 };
 
 enum ComdatSelectionKindCodes {
diff --git a/llvm/include/llvm/IR/Attributes.td b/llvm/include/llvm/IR/Attributes.td
index 693ffbbf32e0e33..2920d672e60b393 100644
--- a/llvm/include/llvm/IR/Attributes.td
+++ b/llvm/include/llvm/IR/Attributes.td
@@ -318,6 +318,9 @@ def MustProgress : EnumAttr<"mustprogress", [FnAttr]>;
 /// Function is a presplit coroutine.
 def PresplitCoroutine : EnumAttr<"presplitcoroutine", [FnAttr]>;
 
+/// The coroutine would only be destroyed when it is done.
+def CoroDestroyOnlyWhenDone : EnumAttr<"coro_only_destroy_when_done", [FnAttr]>;
+
 /// Target-independent string attributes.
 def LessPreciseFPMAD : StrBoolAttr<"less-precise-fpmad">;
 def NoInfsFPMath : StrBoolAttr<"no-infs-fp-math">;
diff --git a/llvm/include/llvm/IR/Function.h b/llvm/include/llvm/IR/Function.h
index 692ef7535f0e6b1..eca772e6f8c53cd 100644
--- a/llvm/include/llvm/IR/Function.h
+++ b/llvm/include/llvm/IR/Function.h
@@ -493,6 +493,13 @@ class LLVM_EXTERNAL_VISIBILITY Function : public GlobalObject,
   void setPresplitCoroutine() { addFnAttr(Attribute::PresplitCoroutine); }
   void setSplittedCoroutine() { removeFnAttr(Attribute::PresplitCoroutine); }
 
+  bool isCoroOnlyDestroyWhenDone() const {
+    return hasFnAttribute(Attribute::CoroDestroyOnlyWhenDone);
+  }
+  void setCoroDestroyOnlyWhenDone() {
+    addFnAttr(Attribute::CoroDestroyOnlyWhenDone);
+  }
+
   MemoryEffects getMemoryEffects() const;
   void setMemoryEffects(MemoryEffects ME);
 
diff --git a/llvm/lib/Bitcode/Reader/BitcodeReader.cpp b/llvm/lib/Bitcode/Reader/BitcodeReader.cpp
index 747968b51407cd9..d0d2289dda0273c 100644
--- a/llvm/lib/Bitcode/Reader/BitcodeReader.cpp
+++ b/llvm/lib/Bitcode/Reader/BitcodeReader.cpp
@@ -2060,6 +2060,8 @@ static Attribute::AttrKind getAttrFromCode(uint64_t Code) {
     return Attribute::PresplitCoroutine;
   case bitc::ATTR_KIND_WRITABLE:
     return Attribute::Writable;
+  case bitc::ATTR_KIND_CORO_ONLY_DESTROY_WHEN_DONE:
+    return Attribute::CoroDestroyOnlyWhenDone;
   }
 }
 
diff --git a/llvm/lib/Bitcode/Writer/BitcodeWriter.cpp b/llvm/lib/Bitcode/Writer/BitcodeWriter.cpp
index 5e3341d4de13323..6a9285dc05948cb 100644
--- a/llvm/lib/Bitcode/Writer/BitcodeWriter.cpp
+++ b/llvm/lib/Bitcode/Writer/BitcodeWriter.cpp
@@ -826,6 +826,8 @@ static uint64_t getAttrKindEncoding(Attribute::AttrKind Kind) {
     return bitc::ATTR_KIND_PRESPLIT_COROUTINE;
   case Attribute::Writable:
     return bitc::ATTR_KIND_WRITABLE;
+  case Attribute::CoroDestroyOnlyWhenDone:
+    return bitc::ATTR_KIND_CORO_ONLY_DESTROY_WHEN_DONE;
   case Attribute::EndAttrKinds:
     llvm_unreachable("Can not encode end-attribute kinds marker.");
   case Attribute::None:
diff --git a/llvm/lib/Transforms/Coroutines/CoroSplit.cpp b/llvm/lib/Transforms/Coroutines/CoroSplit.cpp
index e3eb8c4d8f1dc11..43c969fa372bedc 100644
--- a/llvm/lib/Transforms/Coroutines/CoroSplit.cpp
+++ b/llvm/lib/Transforms/Coroutines/CoroSplit.cpp
@@ -529,13 +529,20 @@ void CoroCloner::handleFinalSuspend() {
     BasicBlock *OldSwitchBB = Switch->getParent();
     auto *NewSwitchBB = OldSwitchBB->splitBasicBlock(Switch, "Switch");
     Builder.SetInsertPoint(OldSwitchBB->getTerminator());
-    auto *GepIndex = Builder.CreateStructGEP(Shape.FrameTy, NewFramePtr,
-                                       coro::Shape::SwitchFieldIndex::Resume,
-                                             "ResumeFn.addr");
-    auto *Load = Builder.CreateLoad(Shape.getSwitchResumePointerType(),
-                                    GepIndex);
-    auto *Cond = Builder.CreateIsNull(Load);
-    Builder.CreateCondBr(Cond, ResumeBB, NewSwitchBB);
+
+    if (NewF->isCoroOnlyDestroyWhenDone()) {
+      // When the coroutine can only be destroyed when done, we don't need
+      // to generate code for other cases.
+      Builder.CreateBr(ResumeBB);
+    } else {
+      auto *GepIndex = Builder.CreateStructGEP(
+          Shape.FrameTy, NewFramePtr, coro::Shape::SwitchFieldIndex::Resume,
+          "ResumeFn.addr");
+      auto *Load =
+          Builder.CreateLoad(Shape.getSwitchResumePointerType(), GepIndex);
+      auto *Cond = Builder.CreateIsNull(Load);
+      Builder.CreateCondBr(Cond, ResumeBB, NewSwitchBB);
+    }
     OldSwitchBB->getTerminator()->eraseFromParent();
   }
 }
diff --git a/llvm/lib/Transforms/Utils/CodeExtractor.cpp b/llvm/lib/Transforms/Utils/CodeExtractor.cpp
index 32f1bb1db9c3547..18d4aed16fb1e61 100644
--- a/llvm/lib/Transforms/Utils/CodeExtractor.cpp
+++ b/llvm/lib/Transforms/Utils/CodeExtractor.cpp
@@ -922,6 +922,7 @@ Function *CodeExtractor::constructFunction(const ValueSet &inputs,
       case Attribute::PresplitCoroutine:
       case Attribute::Memory:
       case Attribute::NoFPClass:
+      case Attribute::CoroDestroyOnlyWhenDone:
         continue;
       // Those attributes should be safe to propagate to the extracted function.
       case Attribute::AlwaysInline:
diff --git a/llvm/test/Transforms/Coroutines/coro-only-destroy-when-done.ll b/llvm/test/Transforms/Coroutines/coro-only-destroy-when-done.ll
new file mode 100644
index 000000000000000..57de1b4376048a1
--- /dev/null
+++ b/llvm/test/Transforms/Coroutines/coro-only-destroy-when-done.ll
@@ -0,0 +1,137 @@
+; RUN: opt < %s -passes='cgscc(coro-split),early-cse,dce,simplifycfg' -S | FileCheck %s
+
+%"struct.std::__n4861::noop_coroutine_promise" = type { i8 }
+%struct.Promise = type { %"struct.std::__n4861::coroutine_handle" }
+%"struct.std::__n4861::coroutine_handle" = type { ptr }
+
+define ptr @foo() #1 {
+entry:
+  %__promise = alloca %struct.Promise, align 8
+  %0 = call token @llvm.coro.id(i32 16, ptr nonnull %__promise, ptr nonnull @foo, ptr null)
+  %1 = call i1 @llvm.coro.alloc(token %0)
+  br i1 %1, label %coro.alloc, label %init.suspend
+
+coro.alloc:                                       ; preds = %entry
+  %2 = tail call i64 @llvm.coro.size.i64()
+  %call = call noalias noundef nonnull ptr @_Znwm(i64 noundef %2) #11
+  br label %init.suspend
+
+init.suspend:                                     ; preds = %entry, %coro.alloc
+  %3 = phi ptr [ null, %entry ], [ %call, %coro.alloc ]
+  %4 = call ptr @llvm.coro.begin(token %0, ptr %3) #12
+  call void @llvm.lifetime.start.p0(i64 8, ptr nonnull %__promise) #3
+  store ptr null, ptr %__promise, align 8
+  %5 = call token @llvm.coro.save(ptr null)
+  %6 = call i8 @llvm.coro.suspend(token %5, i1 false)
+  switch i8 %6, label %coro.ret [
+  i8 0, label %await.suspend
+  i8 1, label %cleanup1
+  ]
+
+await.suspend:                                    ; preds = %init.suspend
+  %7 = call token @llvm.coro.save(ptr null)
+  %8 = call i8 @llvm.coro.suspend(token %7, i1 false)
+  switch i8 %8, label %coro.ret [
+  i8 0, label %await2.suspend
+  i8 1, label %cleanup2
+  ]
+
+await2.suspend:                                   ; preds = %await.suspend
+  %call27 = call ptr @_Z5Innerv() #3
+  %9 = call token @llvm.coro.save(ptr null)
+  %10 = getelementptr inbounds i8, ptr %__promise, i64 -16
+  store ptr %10, ptr %call27, align 8
+  %11 = getelementptr inbounds i8, ptr %call27, i64 -16
+  %12 = call ptr @llvm.coro.subfn.addr(ptr nonnull %11, i8 0)
+  call fastcc void %12(ptr nonnull %11) #3
+  %13 = call i8 @llvm.coro.suspend(token %9, i1 false)
+  switch i8 %13, label %coro.ret [
+  i8 0, label %final.suspend
+  i8 1, label %cleanup3
+  ]
+
+final.suspend:                                    ; preds = %await2.suspend
+  %14 = call ptr @llvm.coro.subfn.addr(ptr nonnull %11, i8 1)
+  call fastcc void %14(ptr nonnull %11) #3
+  %15 = call token @llvm.coro.save(ptr null)
+  %retval.sroa.0.0.copyload.i = load ptr, ptr %__promise, align 8
+  %16 = call ptr @llvm.coro.subfn.addr(ptr %retval.sroa.0.0.copyload.i, i8 0)
+  call fastcc void %16(ptr %retval.sroa.0.0.copyload.i) #3
+  %17 = call i8 @llvm.coro.suspend(token %15, i1 true) #12
+  switch i8 %17, label %coro.ret [
+  i8 0, label %final.ready
+  i8 1, label %cleanup62
+  ]
+
+final.ready:                                      ; preds = %final.suspend
+  call void @exit(i32 noundef 1)
+  unreachable
+
+cleanup1:
+  call void @dtor1()
+  br label %cleanup62
+
+cleanup2:
+  call void @dtor2()
+  br label %cleanup62
+
+cleanup3:
+  call void @dtor3()
+  br label %cleanup62
+
+cleanup62:                                        ; preds = %await2.suspend, %await.suspend, %init.suspend, %final.suspend
+  call void @llvm.lifetime.end.p0(i64 8, ptr nonnull %__promise) #3
+  %18 = call ptr @llvm.coro.free(token %0, ptr %4)
+  %.not = icmp eq ptr %18, null
+  br i1 %.not, label %coro.ret, label %coro.free
+
+coro.free:                                        ; preds = %cleanup62
+  call void @_ZdlPv(ptr noundef nonnull %18) #3
+  br label %coro.ret
+
+coro.ret:                                         ; preds = %coro.free, %cleanup62, %final.suspend, %await2.suspend, %await.suspend, %init.suspend
+  %19 = call i1 @llvm.coro.end(ptr null, i1 false, token none) #12
+  ret ptr %__promise
+}
+
+declare token @llvm.coro.id(i32, ptr readnone, ptr nocapture readonly, ptr) #2
+declare i1 @llvm.coro.alloc(token) #3
+declare dso_local noundef nonnull ptr @_Znwm(i64 noundef) local_unnamed_addr #4
+declare i64 @llvm.coro.size.i64() #5
+declare ptr @llvm.coro.begin(token, ptr writeonly) #3
+declare void @llvm.lifetime.start.p0(i64 immarg, ptr nocapture) #6
+declare token @llvm.coro.save(ptr) #7
+declare void @llvm.lifetime.end.p0(i64 immarg, ptr nocapture) #6
+declare i8 @llvm.coro.suspend(token, i1) #3
+declare ptr @_Z5Innerv() local_unnamed_addr
+declare dso_local void @_ZdlPv(ptr noundef) local_unnamed_addr #8
+declare ptr @llvm.coro.free(token, ptr nocapture readonly) #2
+declare i1 @llvm.coro.end(ptr, i1, token) #3
+declare void @exit(i32 noundef)
+declare ptr @llvm.coro.subfn.addr(ptr nocapture readonly, i8) #10
+declare void @dtor1()
+declare void @dtor2()
+declare void @dtor3()
+
+attributes #0 = { mustprogress nounwind uwtable }
+attributes #1 = { nounwind presplitcoroutine uwtable coro_only_destroy_when_done }
+attributes #2 = { argmemonly nofree nounwind readonly }
+attributes #3 = { nounwind }
+attributes #4 = { nobuiltin allocsize(0) }
+attributes #5 = { nofree nosync nounwind readnone }
+attributes #6 = { argmemonly mustprogress nocallback nofree nosync nounwind willreturn }
+attributes #7 = { nomerge nounwind }
+attributes #8 = { nobuiltin nounwind }
+attributes #9 = { noreturn }
+attributes #10 = { argmemonly nounwind readonly }
+attributes #11 = { nounwind allocsize(0) }
+attributes #12 = { noduplicate }
+
+; CHECK: define{{.*}}@foo.destroy(
+; CHECK-NEXT: entry.destroy:
+; CHECK-NEXT: call void @_ZdlPv
+; CHECK-NEXT: ret void
+
+; CHECK: define{{.*}}@foo.cleanup(
+; CHECK-NEXT: entry.cleanup:
+; CHECK-NEXT: ret void

@ChuanqiXu9 ChuanqiXu9 requested a review from rjmccall November 2, 2023 03:37
Copy link
Collaborator

@erichkeane erichkeane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't dig into the LLVM section of this, but the CFE section looks good delta some small documentation changes.

whose return type is marked as the attribute are assumed to be destroyed only after then coroutines
reached to the final suspend point.

This is helpful for the optimizers to perform more optimizations.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please clarify what benefits here, just an ominous 'better optimizations' isn't as helpful as a description of the type of optimizations (in laymans terms) we can provide. The example helps, but a better 'intro summary' would be greatly appreciated.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed this to This is helpful for the optimizers to reduce the size of the destroy function for the coroutines.

To be honest, I didn't think this is a problem. For example, [[likey]], [[unlikely]] and [[assume]] don't explain what optimization would be performed actually. The contract between the users and the compilers is: the users should provide the correct information then the compiler at least shouldn't generate worse code.

BTW, an interesting point here is that, according to the design intention of coroutines, users shouldn't care about what is destroy function of coroutines. But actually, many C++ programmers (who don't know about compilers) understand the concept of destroy functions. This may be the culture of C++ : )

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its better enough to be an 'intro' sentence at least. And yes, our other documentation is bad, but we're trying to improve it.

Those are at least better 'understood' (or at least, understood in their lack of understanding) attributes that most folks can reason about, since they are documented plenty elsewhere (blogs, standards, etc).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I am not saying other documentation is bad. I mean, from the perspective of optimizers, it will never be able (or better not to) explain what the optimizations made by [[likey]], [[unlikely]] and [[assume]]. Since the optimizers want the freedom to perform any optimization they want as long as they doesn't violate the semantics. Otherwise, it'll be the responsibility of optimizers to report what they did after they made some non-trivial changes. It is generally not wanted by optimizers. We wish the programmers and implementers can focus more on semantics.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I'm just suggesting we give 'motivating' text for folks to read, not "this pass does X, this pass does Y". In this case, if I were to see "better optimizations" that doesn't help/isn't particularly actionable. But if I see, "Allows for better code size in cases where a coroutine is known to ONLY be destroyed after its final suspend point", and I have one of those, it might be 'relevant to my interests!'.

While the example was good, I found I wasn't able to determine WHAT the attribute was doing until investing a lot of time reading, we want our users/doc readers to be 'hooked' on being interest (or, alternatively, be able to tell they don't need to continue reading) as soon as possible. Does that make sense?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got. Completely understood. Many thanks for reviewing this and sharing thoughts : )

Copy link
Collaborator

@erichkeane erichkeane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CFE stuff looks good, but someone more familiar with the LLVM changes needs to do final approval.

@erichkeane
Copy link
Collaborator

Also note: This needs a Release Note in ReleaseNotes.rst.

Copy link
Collaborator

@erichkeane erichkeane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CFE + Release note LGTM. Again, please wait on someone to be review the LLVM changes.

Copy link
Contributor

@h-vetinari h-vetinari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is substantial inconsistency between "coro only destroy when complete" and "... when done" throughout the PR (in various forms; e.g. literal strings, snake_case, CamelCase, ALL_CAPS as well as file names).

Presumably there should only be one spelling everywhere.

@ChuanqiXu9 ChuanqiXu9 force-pushed the CoroOnlyDestroyWhenDone branch from 411bcd9 to c69205d Compare November 8, 2023 01:27
@ChuanqiXu9
Copy link
Member Author

There is substantial inconsistency between "coro only destroy when complete" and "... when done" throughout the PR (in various forms; e.g. literal strings, snake_case, CamelCase, ALL_CAPS as well as file names).

Presumably there should only be one spelling everywhere.

Done

@ChuanqiXu9
Copy link
Member Author

@bcardosolopes @apolloww @rjmccall would you like to take a look for the middle end part?

Copy link
Contributor

@h-vetinari h-vetinari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presumably there should only be one spelling everywhere.

Done

Missed a few. ;-)

~dtor();
};

struct [[clang::coro_only_destroy_when_complete]] A {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the attribute is on the return type, would that mean it would apply to any coroutine returning the particular type? Say user is using some library implementation (i.e. folly::coro::Task), they would not be able to use the attribute because it would mark all the coroutines this way.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the attribute is on the return type, would that mean it would apply to any coroutine returning the particular type?

Yes.

Say user is using some library implementation (i.e. folly::coro::Task), they would not be able to use the attribute because it would mark all the coroutines this way.

Yes. My thought is that such attribute can only be applied if the programmers understand the behavior of the coroutines very well. So that the users of the attribute should be the library writers instead of end users.

And for folly::coro::Task, (I didn't check it actively), it can't use the attribute if I remember correctly since its framework supports cancelling, that said, a task can be cancelled before being complete.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, while it is technically possible to give finer grained marks, I don't think it is good. Since the behaviors of coroutines are controlled by the framework, in another word, the library writers. And it is not good for the end users to predict the behaviors.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explaination

Close llvm#56980.

This patch tries to introduce a light-weight optimizer for coroutines
which are guaranteed to only be destroyed afer it reached the final
suspend.

The rationale behind the patch is simple. See the example:

```C++
A foo() {
  dtor d;
  co_await something();
  dtor d1;
  co_await something();
  dtor d2;
  co_return 43;
}
```

Generally the generated .destroy function may be:

```C++
void foo.destroy(foo.Frame *frame) {
  switch(frame->suspend_index()) {
    case 1:
      frame->d.~dtor();
      break;
    case 2:
      frame->d.~dtor();
      frame->d1.~dtor();
      break;
    case 3:
      frame->d.~dtor();
      frame->d1.~dtor();
      frame->d2.~dtor();
      break;
    default: // coroutine completed or haven't started
      break;
  }

  frame->promise.~promise_type();
  delete frame;
}
```

Since the compiler need to be ready for all the cases that the coroutine
may be destroyed in a valid state.

However, from the user's perspective, we can understand that certain
coroutine types may only be destroyed after it reached to the final
suspend point. And we need a method to teach the compiler about this.
Then this is the patch. After the compiler recognized that the
coroutines can only be destroyed after complete, it can optimize the
above example to:

```C++
void foo.destroy(foo.Frame *frame) {
  frame->promise.~promise_type();
  delete frame;
}
```

I spent a lot of time experimenting and experiencing this in the
downstream. The numbers are really good. In a real-world coroutine-heavy
workload, the size of the build dir (including .o files) reduces 14%. And
the size of final libraries (excluding the .o files) reduces 8% in Debug
mode and 1% in Release mode.
@ChuanqiXu9
Copy link
Member Author

Presumably there should only be one spelling everywhere.

Done

Missed a few. ;-)

Sorry and thanks for the detailed comment!

@ChuanqiXu9 ChuanqiXu9 force-pushed the CoroOnlyDestroyWhenDone branch from c69205d to 4531d8b Compare November 8, 2023 11:42
@ChuanqiXu9
Copy link
Member Author

Thanks for reviewing this!

@ChuanqiXu9 ChuanqiXu9 merged commit b7b5907 into llvm:main Nov 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:codegen IR generation bugs: mangling, exceptions, etc. clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category coroutines C++20 coroutines llvm:ir llvm:transforms
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

Feature request: allow teaching llvm that coroutines can only be destroyed when done
6 participants