-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[Coroutines] Introduce [[clang::coro_only_destroy_when_complete]] #71014
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-llvm-ir @llvm/pr-subscribers-coroutines Author: Chuanqi Xu (ChuanqiXu9) ChangesClose #56980. This patch tries to introduce a light-weight optimization attribute for coroutines which are guaranteed to only be destroyed after it reached the final suspend. The rationale behind the patch is simple. See the example: A foo() {
dtor d;
co_await something();
dtor d1;
co_await something();
dtor d2;
co_return 43;
} Generally the generated .destroy function may be: void foo.destroy(foo.Frame *frame) {
switch(frame->suspend_index()) {
case 1:
frame->d.~dtor();
break;
case 2:
frame->d.~dtor();
frame->d1.~dtor();
break;
case 3:
frame->d.~dtor();
frame->d1.~dtor();
frame->d2.~dtor();
break;
default: // coroutine completed or haven't started
break;
}
frame->promise.~promise_type();
delete frame;
} Since the compiler need to be ready for all the cases that the coroutine may be destroyed in a valid state. However, from the user's perspective, we can understand that certain coroutine types may only be destroyed after it reached to the final suspend point. And we need a method to teach the compiler about this. Then this is the patch. After the compiler recognized that the coroutines can only be destroyed after complete, it can optimize the above example to: void foo.destroy(foo.Frame *frame) {
frame->promise.~promise_type();
delete frame;
} I spent a lot of time experimenting and experiencing this in the downstream. The numbers are really good. In a real-world coroutine-heavy workload, the size of the build dir (including .o files) reduces 14%. And the size of final libraries (excluding the .o files) reduces 8% in Debug mode and 1% in Release mode. Full diff: https://github.com/llvm/llvm-project/pull/71014.diff 14 Files Affected:
diff --git a/clang/include/clang/Basic/Attr.td b/clang/include/clang/Basic/Attr.td
index 60b549999c155e5..31434565becaec6 100644
--- a/clang/include/clang/Basic/Attr.td
+++ b/clang/include/clang/Basic/Attr.td
@@ -1082,6 +1082,18 @@ def CFConsumed : InheritableParamAttr {
let Documentation = [RetainBehaviorDocs];
}
+
+// coro_only_destroy_when_complete indicates the coroutines whose return type
+// is marked by coro_only_destroy_when_complete can only be destroyed when the
+// coroutine completes. Then the space for the destroy functions can be saved.
+def CoroOnlyDestroyWhenComplete : InheritableAttr {
+ let Spellings = [Clang<"coro_only_destroy_when_complete">];
+ let Subjects = SubjectList<[CXXRecord]>;
+ let LangOpts = [CPlusPlus];
+ let Documentation = [CoroOnlyDestroyWhenCompleteDocs];
+ let SimpleHandler = 1;
+}
+
// OSObject-based attributes.
def OSConsumed : InheritableParamAttr {
let Spellings = [Clang<"os_consumed">];
diff --git a/clang/include/clang/Basic/AttrDocs.td b/clang/include/clang/Basic/AttrDocs.td
index 05703df2129f612..dd1b44801a76947 100644
--- a/clang/include/clang/Basic/AttrDocs.td
+++ b/clang/include/clang/Basic/AttrDocs.td
@@ -7416,3 +7416,69 @@ that ``p->array`` must have at least ``p->count`` number of elements available:
}];
}
+
+def CoroOnlyDestroyWhenCompleteDocs : Documentation {
+ let Category = DocCatDecl;
+ let Content = [{
+The `coro_only_destroy_when_complete` attribute should be marked on a C++ class. The coroutines
+whose return type is marked as the attribute are assumed to be destroyed only after then coroutines
+reached to the final suspend point.
+
+This is helpful for the optimizers to perform more optimizations.
+
+For example,
+
+.. code-block:: c++
+
+ A foo() {
+ dtor d;
+ co_await something();
+ dtor d1;
+ co_await something();
+ dtor d2;
+ co_return 43;
+ }
+
+The compiler may generate the following pseudocode:
+
+.. code-block:: c++
+
+ void foo.destroy(foo.Frame *frame) {
+ switch(frame->suspend_index()) {
+ case 1:
+ frame->d.~dtor();
+ break;
+ case 2:
+ frame->d.~dtor();
+ frame->d1.~dtor();
+ break;
+ case 3:
+ frame->d.~dtor();
+ frame->d1.~dtor();
+ frame->d2.~dtor();
+ break;
+ default: // coroutine completed or haven't started
+ break;
+ }
+
+ frame->promise.~promise_type();
+ delete frame;
+ }
+
+The rationale of the `foo.destroy()` function is: we need to release all the
+resources initialized if we're going to be destroyed in any suspend point.
+
+But if the coroutine never got destroyed besides the final suspend point, the generated
+big switch case is a waste. In this case, users can pass the information to the optimizer
+by the `coro_only_destroy_when_complete` attribute, then the optimizier can optimize the
+above code to:
+
+.. code-block:: c++
+
+ void foo.destroy(foo.Frame *frame) {
+ frame->promise.~promise_type();
+ delete frame;
+ }
+
+ }];
+}
diff --git a/clang/lib/CodeGen/CGCoroutine.cpp b/clang/lib/CodeGen/CGCoroutine.cpp
index 58310216ecff1f5..b6ac3fabeab7bc2 100644
--- a/clang/lib/CodeGen/CGCoroutine.cpp
+++ b/clang/lib/CodeGen/CGCoroutine.cpp
@@ -777,6 +777,12 @@ void CodeGenFunction::EmitCoroutineBody(const CoroutineBodyStmt &S) {
// LLVM require the frontend to mark the coroutine.
CurFn->setPresplitCoroutine();
+
+ {
+ CXXRecordDecl *RD = FnRetTy->getAsCXXRecordDecl();
+ if (RD && RD->hasAttr<CoroOnlyDestroyWhenCompleteAttr>())
+ CurFn->setCoroDestroyOnlyWhenDone();
+ }
}
// Emit coroutine intrinsic and patch up arguments of the token type.
diff --git a/clang/test/CodeGenCoroutines/coro-only-destroy-when-complete.cpp b/clang/test/CodeGenCoroutines/coro-only-destroy-when-complete.cpp
new file mode 100644
index 000000000000000..d7eb64f915484be
--- /dev/null
+++ b/clang/test/CodeGenCoroutines/coro-only-destroy-when-complete.cpp
@@ -0,0 +1,59 @@
+// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -std=c++20 \
+// RUN: -disable-llvm-passes -emit-llvm %s -o - | FileCheck %s
+
+// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -std=c++20 \
+// RUN: -O3 -emit-llvm %s -o - | FileCheck %s --check-prefix=CHECK-O
+
+#include "Inputs/coroutine.h"
+
+using namespace std;
+
+struct A;
+struct A_promise_type {
+ A get_return_object();
+ suspend_always initial_suspend();
+ suspend_always final_suspend() noexcept;
+ void return_value(int);
+ void unhandled_exception();
+
+ std::coroutine_handle<> handle;
+};
+
+struct Awaitable{
+ bool await_ready();
+ int await_resume();
+ template <typename F>
+ void await_suspend(F);
+};
+Awaitable something();
+
+struct dtor {
+ dtor();
+ ~dtor();
+};
+
+struct [[clang::coro_only_destroy_when_complete]] A {
+ using promise_type = A_promise_type;
+ A();
+ A(std::coroutine_handle<>);
+ ~A();
+
+ std::coroutine_handle<promise_type> handle;
+};
+
+A foo() {
+ dtor d;
+ co_await something();
+ dtor d1;
+ co_await something();
+ dtor d2;
+ co_return 43;
+}
+
+// CHECK: define{{.*}}@_Z3foov({{.*}}) #[[ATTR_NUM:[0-9]+]]
+// CHECK: attributes #[[ATTR_NUM]] = {{.*}}coro_only_destroy_when_done
+
+// CHECK-O: define{{.*}}@_Z3foov.destroy
+// CHECK-O: {{^.*}}:
+// CHECK-O-NOT: br
+// CHECK-O: ret void
diff --git a/clang/test/Misc/pragma-attribute-supported-attributes-list.test b/clang/test/Misc/pragma-attribute-supported-attributes-list.test
index f48126775c8685c..969794073a5f2e8 100644
--- a/clang/test/Misc/pragma-attribute-supported-attributes-list.test
+++ b/clang/test/Misc/pragma-attribute-supported-attributes-list.test
@@ -56,6 +56,7 @@
// CHECK-NEXT: ConsumableAutoCast (SubjectMatchRule_record)
// CHECK-NEXT: ConsumableSetOnRead (SubjectMatchRule_record)
// CHECK-NEXT: Convergent (SubjectMatchRule_function)
+// CHECK-NEXT: CoroOnlyDestroyWhenComplete (SubjectMatchRule_record)
// CHECK-NEXT: CountedBy (SubjectMatchRule_field)
// CHECK-NEXT: DLLExport (SubjectMatchRule_function, SubjectMatchRule_variable, SubjectMatchRule_record, SubjectMatchRule_objc_interface)
// CHECK-NEXT: DLLImport (SubjectMatchRule_function, SubjectMatchRule_variable, SubjectMatchRule_record, SubjectMatchRule_objc_interface)
diff --git a/llvm/docs/Coroutines.rst b/llvm/docs/Coroutines.rst
index f4a27812bc7bac8..84a5c4b6d7ec0a5 100644
--- a/llvm/docs/Coroutines.rst
+++ b/llvm/docs/Coroutines.rst
@@ -1775,6 +1775,17 @@ CoroCleanup
This pass runs late to lower all coroutine related intrinsics not replaced by
earlier passes.
+Attributes
+==========
+
+coro_only_destroy_when_done
+---------------------------
+
+When the coroutine are marked with coro_only_destroy_when_done, it indicates
+the coroutine must reach the final suspend point when it get destroyed.
+
+This attribute only works for switched-resume coroutines now.
+
Metadata
========
diff --git a/llvm/include/llvm/Bitcode/LLVMBitCodes.h b/llvm/include/llvm/Bitcode/LLVMBitCodes.h
index 2a522c517c7fa2f..57c94c6411d67c4 100644
--- a/llvm/include/llvm/Bitcode/LLVMBitCodes.h
+++ b/llvm/include/llvm/Bitcode/LLVMBitCodes.h
@@ -718,6 +718,7 @@ enum AttributeKindCodes {
ATTR_KIND_NOFPCLASS = 87,
ATTR_KIND_OPTIMIZE_FOR_DEBUGGING = 88,
ATTR_KIND_WRITABLE = 89,
+ ATTR_KIND_CORO_ONLY_DESTROY_WHEN_DONE = 90,
};
enum ComdatSelectionKindCodes {
diff --git a/llvm/include/llvm/IR/Attributes.td b/llvm/include/llvm/IR/Attributes.td
index 693ffbbf32e0e33..2920d672e60b393 100644
--- a/llvm/include/llvm/IR/Attributes.td
+++ b/llvm/include/llvm/IR/Attributes.td
@@ -318,6 +318,9 @@ def MustProgress : EnumAttr<"mustprogress", [FnAttr]>;
/// Function is a presplit coroutine.
def PresplitCoroutine : EnumAttr<"presplitcoroutine", [FnAttr]>;
+/// The coroutine would only be destroyed when it is done.
+def CoroDestroyOnlyWhenDone : EnumAttr<"coro_only_destroy_when_done", [FnAttr]>;
+
/// Target-independent string attributes.
def LessPreciseFPMAD : StrBoolAttr<"less-precise-fpmad">;
def NoInfsFPMath : StrBoolAttr<"no-infs-fp-math">;
diff --git a/llvm/include/llvm/IR/Function.h b/llvm/include/llvm/IR/Function.h
index 692ef7535f0e6b1..eca772e6f8c53cd 100644
--- a/llvm/include/llvm/IR/Function.h
+++ b/llvm/include/llvm/IR/Function.h
@@ -493,6 +493,13 @@ class LLVM_EXTERNAL_VISIBILITY Function : public GlobalObject,
void setPresplitCoroutine() { addFnAttr(Attribute::PresplitCoroutine); }
void setSplittedCoroutine() { removeFnAttr(Attribute::PresplitCoroutine); }
+ bool isCoroOnlyDestroyWhenDone() const {
+ return hasFnAttribute(Attribute::CoroDestroyOnlyWhenDone);
+ }
+ void setCoroDestroyOnlyWhenDone() {
+ addFnAttr(Attribute::CoroDestroyOnlyWhenDone);
+ }
+
MemoryEffects getMemoryEffects() const;
void setMemoryEffects(MemoryEffects ME);
diff --git a/llvm/lib/Bitcode/Reader/BitcodeReader.cpp b/llvm/lib/Bitcode/Reader/BitcodeReader.cpp
index 747968b51407cd9..d0d2289dda0273c 100644
--- a/llvm/lib/Bitcode/Reader/BitcodeReader.cpp
+++ b/llvm/lib/Bitcode/Reader/BitcodeReader.cpp
@@ -2060,6 +2060,8 @@ static Attribute::AttrKind getAttrFromCode(uint64_t Code) {
return Attribute::PresplitCoroutine;
case bitc::ATTR_KIND_WRITABLE:
return Attribute::Writable;
+ case bitc::ATTR_KIND_CORO_ONLY_DESTROY_WHEN_DONE:
+ return Attribute::CoroDestroyOnlyWhenDone;
}
}
diff --git a/llvm/lib/Bitcode/Writer/BitcodeWriter.cpp b/llvm/lib/Bitcode/Writer/BitcodeWriter.cpp
index 5e3341d4de13323..6a9285dc05948cb 100644
--- a/llvm/lib/Bitcode/Writer/BitcodeWriter.cpp
+++ b/llvm/lib/Bitcode/Writer/BitcodeWriter.cpp
@@ -826,6 +826,8 @@ static uint64_t getAttrKindEncoding(Attribute::AttrKind Kind) {
return bitc::ATTR_KIND_PRESPLIT_COROUTINE;
case Attribute::Writable:
return bitc::ATTR_KIND_WRITABLE;
+ case Attribute::CoroDestroyOnlyWhenDone:
+ return bitc::ATTR_KIND_CORO_ONLY_DESTROY_WHEN_DONE;
case Attribute::EndAttrKinds:
llvm_unreachable("Can not encode end-attribute kinds marker.");
case Attribute::None:
diff --git a/llvm/lib/Transforms/Coroutines/CoroSplit.cpp b/llvm/lib/Transforms/Coroutines/CoroSplit.cpp
index e3eb8c4d8f1dc11..43c969fa372bedc 100644
--- a/llvm/lib/Transforms/Coroutines/CoroSplit.cpp
+++ b/llvm/lib/Transforms/Coroutines/CoroSplit.cpp
@@ -529,13 +529,20 @@ void CoroCloner::handleFinalSuspend() {
BasicBlock *OldSwitchBB = Switch->getParent();
auto *NewSwitchBB = OldSwitchBB->splitBasicBlock(Switch, "Switch");
Builder.SetInsertPoint(OldSwitchBB->getTerminator());
- auto *GepIndex = Builder.CreateStructGEP(Shape.FrameTy, NewFramePtr,
- coro::Shape::SwitchFieldIndex::Resume,
- "ResumeFn.addr");
- auto *Load = Builder.CreateLoad(Shape.getSwitchResumePointerType(),
- GepIndex);
- auto *Cond = Builder.CreateIsNull(Load);
- Builder.CreateCondBr(Cond, ResumeBB, NewSwitchBB);
+
+ if (NewF->isCoroOnlyDestroyWhenDone()) {
+ // When the coroutine can only be destroyed when done, we don't need
+ // to generate code for other cases.
+ Builder.CreateBr(ResumeBB);
+ } else {
+ auto *GepIndex = Builder.CreateStructGEP(
+ Shape.FrameTy, NewFramePtr, coro::Shape::SwitchFieldIndex::Resume,
+ "ResumeFn.addr");
+ auto *Load =
+ Builder.CreateLoad(Shape.getSwitchResumePointerType(), GepIndex);
+ auto *Cond = Builder.CreateIsNull(Load);
+ Builder.CreateCondBr(Cond, ResumeBB, NewSwitchBB);
+ }
OldSwitchBB->getTerminator()->eraseFromParent();
}
}
diff --git a/llvm/lib/Transforms/Utils/CodeExtractor.cpp b/llvm/lib/Transforms/Utils/CodeExtractor.cpp
index 32f1bb1db9c3547..18d4aed16fb1e61 100644
--- a/llvm/lib/Transforms/Utils/CodeExtractor.cpp
+++ b/llvm/lib/Transforms/Utils/CodeExtractor.cpp
@@ -922,6 +922,7 @@ Function *CodeExtractor::constructFunction(const ValueSet &inputs,
case Attribute::PresplitCoroutine:
case Attribute::Memory:
case Attribute::NoFPClass:
+ case Attribute::CoroDestroyOnlyWhenDone:
continue;
// Those attributes should be safe to propagate to the extracted function.
case Attribute::AlwaysInline:
diff --git a/llvm/test/Transforms/Coroutines/coro-only-destroy-when-done.ll b/llvm/test/Transforms/Coroutines/coro-only-destroy-when-done.ll
new file mode 100644
index 000000000000000..57de1b4376048a1
--- /dev/null
+++ b/llvm/test/Transforms/Coroutines/coro-only-destroy-when-done.ll
@@ -0,0 +1,137 @@
+; RUN: opt < %s -passes='cgscc(coro-split),early-cse,dce,simplifycfg' -S | FileCheck %s
+
+%"struct.std::__n4861::noop_coroutine_promise" = type { i8 }
+%struct.Promise = type { %"struct.std::__n4861::coroutine_handle" }
+%"struct.std::__n4861::coroutine_handle" = type { ptr }
+
+define ptr @foo() #1 {
+entry:
+ %__promise = alloca %struct.Promise, align 8
+ %0 = call token @llvm.coro.id(i32 16, ptr nonnull %__promise, ptr nonnull @foo, ptr null)
+ %1 = call i1 @llvm.coro.alloc(token %0)
+ br i1 %1, label %coro.alloc, label %init.suspend
+
+coro.alloc: ; preds = %entry
+ %2 = tail call i64 @llvm.coro.size.i64()
+ %call = call noalias noundef nonnull ptr @_Znwm(i64 noundef %2) #11
+ br label %init.suspend
+
+init.suspend: ; preds = %entry, %coro.alloc
+ %3 = phi ptr [ null, %entry ], [ %call, %coro.alloc ]
+ %4 = call ptr @llvm.coro.begin(token %0, ptr %3) #12
+ call void @llvm.lifetime.start.p0(i64 8, ptr nonnull %__promise) #3
+ store ptr null, ptr %__promise, align 8
+ %5 = call token @llvm.coro.save(ptr null)
+ %6 = call i8 @llvm.coro.suspend(token %5, i1 false)
+ switch i8 %6, label %coro.ret [
+ i8 0, label %await.suspend
+ i8 1, label %cleanup1
+ ]
+
+await.suspend: ; preds = %init.suspend
+ %7 = call token @llvm.coro.save(ptr null)
+ %8 = call i8 @llvm.coro.suspend(token %7, i1 false)
+ switch i8 %8, label %coro.ret [
+ i8 0, label %await2.suspend
+ i8 1, label %cleanup2
+ ]
+
+await2.suspend: ; preds = %await.suspend
+ %call27 = call ptr @_Z5Innerv() #3
+ %9 = call token @llvm.coro.save(ptr null)
+ %10 = getelementptr inbounds i8, ptr %__promise, i64 -16
+ store ptr %10, ptr %call27, align 8
+ %11 = getelementptr inbounds i8, ptr %call27, i64 -16
+ %12 = call ptr @llvm.coro.subfn.addr(ptr nonnull %11, i8 0)
+ call fastcc void %12(ptr nonnull %11) #3
+ %13 = call i8 @llvm.coro.suspend(token %9, i1 false)
+ switch i8 %13, label %coro.ret [
+ i8 0, label %final.suspend
+ i8 1, label %cleanup3
+ ]
+
+final.suspend: ; preds = %await2.suspend
+ %14 = call ptr @llvm.coro.subfn.addr(ptr nonnull %11, i8 1)
+ call fastcc void %14(ptr nonnull %11) #3
+ %15 = call token @llvm.coro.save(ptr null)
+ %retval.sroa.0.0.copyload.i = load ptr, ptr %__promise, align 8
+ %16 = call ptr @llvm.coro.subfn.addr(ptr %retval.sroa.0.0.copyload.i, i8 0)
+ call fastcc void %16(ptr %retval.sroa.0.0.copyload.i) #3
+ %17 = call i8 @llvm.coro.suspend(token %15, i1 true) #12
+ switch i8 %17, label %coro.ret [
+ i8 0, label %final.ready
+ i8 1, label %cleanup62
+ ]
+
+final.ready: ; preds = %final.suspend
+ call void @exit(i32 noundef 1)
+ unreachable
+
+cleanup1:
+ call void @dtor1()
+ br label %cleanup62
+
+cleanup2:
+ call void @dtor2()
+ br label %cleanup62
+
+cleanup3:
+ call void @dtor3()
+ br label %cleanup62
+
+cleanup62: ; preds = %await2.suspend, %await.suspend, %init.suspend, %final.suspend
+ call void @llvm.lifetime.end.p0(i64 8, ptr nonnull %__promise) #3
+ %18 = call ptr @llvm.coro.free(token %0, ptr %4)
+ %.not = icmp eq ptr %18, null
+ br i1 %.not, label %coro.ret, label %coro.free
+
+coro.free: ; preds = %cleanup62
+ call void @_ZdlPv(ptr noundef nonnull %18) #3
+ br label %coro.ret
+
+coro.ret: ; preds = %coro.free, %cleanup62, %final.suspend, %await2.suspend, %await.suspend, %init.suspend
+ %19 = call i1 @llvm.coro.end(ptr null, i1 false, token none) #12
+ ret ptr %__promise
+}
+
+declare token @llvm.coro.id(i32, ptr readnone, ptr nocapture readonly, ptr) #2
+declare i1 @llvm.coro.alloc(token) #3
+declare dso_local noundef nonnull ptr @_Znwm(i64 noundef) local_unnamed_addr #4
+declare i64 @llvm.coro.size.i64() #5
+declare ptr @llvm.coro.begin(token, ptr writeonly) #3
+declare void @llvm.lifetime.start.p0(i64 immarg, ptr nocapture) #6
+declare token @llvm.coro.save(ptr) #7
+declare void @llvm.lifetime.end.p0(i64 immarg, ptr nocapture) #6
+declare i8 @llvm.coro.suspend(token, i1) #3
+declare ptr @_Z5Innerv() local_unnamed_addr
+declare dso_local void @_ZdlPv(ptr noundef) local_unnamed_addr #8
+declare ptr @llvm.coro.free(token, ptr nocapture readonly) #2
+declare i1 @llvm.coro.end(ptr, i1, token) #3
+declare void @exit(i32 noundef)
+declare ptr @llvm.coro.subfn.addr(ptr nocapture readonly, i8) #10
+declare void @dtor1()
+declare void @dtor2()
+declare void @dtor3()
+
+attributes #0 = { mustprogress nounwind uwtable }
+attributes #1 = { nounwind presplitcoroutine uwtable coro_only_destroy_when_done }
+attributes #2 = { argmemonly nofree nounwind readonly }
+attributes #3 = { nounwind }
+attributes #4 = { nobuiltin allocsize(0) }
+attributes #5 = { nofree nosync nounwind readnone }
+attributes #6 = { argmemonly mustprogress nocallback nofree nosync nounwind willreturn }
+attributes #7 = { nomerge nounwind }
+attributes #8 = { nobuiltin nounwind }
+attributes #9 = { noreturn }
+attributes #10 = { argmemonly nounwind readonly }
+attributes #11 = { nounwind allocsize(0) }
+attributes #12 = { noduplicate }
+
+; CHECK: define{{.*}}@foo.destroy(
+; CHECK-NEXT: entry.destroy:
+; CHECK-NEXT: call void @_ZdlPv
+; CHECK-NEXT: ret void
+
+; CHECK: define{{.*}}@foo.cleanup(
+; CHECK-NEXT: entry.cleanup:
+; CHECK-NEXT: ret void
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't dig into the LLVM section of this, but the CFE section looks good delta some small documentation changes.
whose return type is marked as the attribute are assumed to be destroyed only after then coroutines | ||
reached to the final suspend point. | ||
|
||
This is helpful for the optimizers to perform more optimizations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please clarify what benefits here, just an ominous 'better optimizations' isn't as helpful as a description of the type of optimizations (in laymans terms) we can provide. The example helps, but a better 'intro summary' would be greatly appreciated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed this to This is helpful for the optimizers to reduce the size of the destroy function for the coroutines
.
To be honest, I didn't think this is a problem. For example, [[likey]], [[unlikely]] and [[assume]] don't explain what optimization would be performed actually. The contract between the users and the compilers is: the users should provide the correct information then the compiler at least shouldn't generate worse code.
BTW, an interesting point here is that, according to the design intention of coroutines, users shouldn't care about what is destroy
function of coroutines. But actually, many C++ programmers (who don't know about compilers) understand the concept of destroy
functions. This may be the culture of C++ : )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its better enough to be an 'intro' sentence at least. And yes, our other documentation is bad, but we're trying to improve it.
Those are at least better 'understood' (or at least, understood in their lack of understanding) attributes that most folks can reason about, since they are documented plenty elsewhere (blogs, standards, etc).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I am not saying other documentation is bad. I mean, from the perspective of optimizers, it will never be able (or better not to) explain what the optimizations made by [[likey]], [[unlikely]] and [[assume]]. Since the optimizers want the freedom to perform any optimization they want as long as they doesn't violate the semantics. Otherwise, it'll be the responsibility of optimizers to report what they did after they made some non-trivial changes. It is generally not wanted by optimizers. We wish the programmers and implementers can focus more on semantics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I'm just suggesting we give 'motivating' text for folks to read, not "this pass does X, this pass does Y". In this case, if I were to see "better optimizations" that doesn't help/isn't particularly actionable. But if I see, "Allows for better code size in cases where a coroutine is known to ONLY be destroyed after its final suspend point", and I have one of those, it might be 'relevant to my interests!'.
While the example was good, I found I wasn't able to determine WHAT the attribute was doing until investing a lot of time reading, we want our users/doc readers to be 'hooked' on being interest (or, alternatively, be able to tell they don't need to continue reading) as soon as possible. Does that make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got. Completely understood. Many thanks for reviewing this and sharing thoughts : )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CFE stuff looks good, but someone more familiar with the LLVM changes needs to do final approval.
Also note: This needs a Release Note in ReleaseNotes.rst. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CFE + Release note LGTM. Again, please wait on someone to be review the LLVM changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is substantial inconsistency between "coro only destroy when complete" and "... when done" throughout the PR (in various forms; e.g. literal strings, snake_case, CamelCase, ALL_CAPS as well as file names).
Presumably there should only be one spelling everywhere.
411bcd9
to
c69205d
Compare
Done |
@bcardosolopes @apolloww @rjmccall would you like to take a look for the middle end part? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Presumably there should only be one spelling everywhere.
Done
Missed a few. ;-)
clang/test/CodeGenCoroutines/coro-only-destroy-when-complete.cpp
Outdated
Show resolved
Hide resolved
~dtor(); | ||
}; | ||
|
||
struct [[clang::coro_only_destroy_when_complete]] A { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the attribute is on the return type, would that mean it would apply to any coroutine returning the particular type? Say user is using some library implementation (i.e. folly::coro::Task
), they would not be able to use the attribute because it would mark all the coroutines this way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the attribute is on the return type, would that mean it would apply to any coroutine returning the particular type?
Yes.
Say user is using some library implementation (i.e. folly::coro::Task), they would not be able to use the attribute because it would mark all the coroutines this way.
Yes. My thought is that such attribute can only be applied if the programmers understand the behavior of the coroutines very well. So that the users of the attribute should be the library writers instead of end users.
And for folly::coro::Task
, (I didn't check it actively), it can't use the attribute if I remember correctly since its framework supports cancelling, that said, a task can be cancelled before being complete.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, while it is technically possible to give finer grained marks, I don't think it is good. Since the behaviors of coroutines are controlled by the framework, in another word, the library writers. And it is not good for the end users to predict the behaviors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explaination
Close llvm#56980. This patch tries to introduce a light-weight optimizer for coroutines which are guaranteed to only be destroyed afer it reached the final suspend. The rationale behind the patch is simple. See the example: ```C++ A foo() { dtor d; co_await something(); dtor d1; co_await something(); dtor d2; co_return 43; } ``` Generally the generated .destroy function may be: ```C++ void foo.destroy(foo.Frame *frame) { switch(frame->suspend_index()) { case 1: frame->d.~dtor(); break; case 2: frame->d.~dtor(); frame->d1.~dtor(); break; case 3: frame->d.~dtor(); frame->d1.~dtor(); frame->d2.~dtor(); break; default: // coroutine completed or haven't started break; } frame->promise.~promise_type(); delete frame; } ``` Since the compiler need to be ready for all the cases that the coroutine may be destroyed in a valid state. However, from the user's perspective, we can understand that certain coroutine types may only be destroyed after it reached to the final suspend point. And we need a method to teach the compiler about this. Then this is the patch. After the compiler recognized that the coroutines can only be destroyed after complete, it can optimize the above example to: ```C++ void foo.destroy(foo.Frame *frame) { frame->promise.~promise_type(); delete frame; } ``` I spent a lot of time experimenting and experiencing this in the downstream. The numbers are really good. In a real-world coroutine-heavy workload, the size of the build dir (including .o files) reduces 14%. And the size of final libraries (excluding the .o files) reduces 8% in Debug mode and 1% in Release mode.
Sorry and thanks for the detailed comment! |
c69205d
to
4531d8b
Compare
Thanks for reviewing this! |
Close #56980.
This patch tries to introduce a light-weight optimization attribute for coroutines which are guaranteed to only be destroyed after it reached the final suspend.
The rationale behind the patch is simple. See the example:
Generally the generated .destroy function may be:
Since the compiler need to be ready for all the cases that the coroutine may be destroyed in a valid state.
However, from the user's perspective, we can understand that certain coroutine types may only be destroyed after it reached to the final suspend point. And we need a method to teach the compiler about this. Then this is the patch. After the compiler recognized that the coroutines can only be destroyed after complete, it can optimize the above example to:
I spent a lot of time experimenting and experiencing this in the downstream. The numbers are really good. In a real-world coroutine-heavy workload, the size of the build dir (including .o files) reduces 14%. And the size of final libraries (excluding the .o files) reduces 8% in Debug mode and 1% in Release mode.