-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[DirectX] Eliminate resource global variables from module #114105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
By giving these intrinsics their appropriate attributes, loads of allocas that are stored on the other side of these calls can be eliminated. Adds a test that verifies that the unneeded loads can be eliminated and also that the attributes are set properly. Fixes llvm#104271 This may be the first part of a broader audit of
@llvm/pr-subscribers-llvm-ir @llvm/pr-subscribers-backend-directx Author: Greg Roth (pow2clk) ChangesBy giving these intrinsics their appropriate attributes, loads of allocas that are stored on the other side of these calls can be eliminated. Adds a test that verifies that the unneeded loads can be eliminated and also that the attributes are set properly. Fixes #104271 This may be the first part of a broader audit of Full diff: https://github.com/llvm/llvm-project/pull/114105.diff 2 Files Affected:
diff --git a/llvm/include/llvm/IR/IntrinsicsDirectX.td b/llvm/include/llvm/IR/IntrinsicsDirectX.td
index e30d37f69f781e..88c17a3378cda5 100644
--- a/llvm/include/llvm/IR/IntrinsicsDirectX.td
+++ b/llvm/include/llvm/IR/IntrinsicsDirectX.td
@@ -28,12 +28,12 @@ def int_dx_handle_fromBinding
[IntrNoMem]>;
def int_dx_typedBufferLoad
- : DefaultAttrsIntrinsic<[llvm_any_ty], [llvm_any_ty, llvm_i32_ty]>;
+ : DefaultAttrsIntrinsic<[llvm_any_ty], [llvm_any_ty, llvm_i32_ty], [IntrReadMem]>;
def int_dx_typedBufferLoad_checkbit
: DefaultAttrsIntrinsic<[llvm_any_ty, llvm_i1_ty],
- [llvm_any_ty, llvm_i32_ty]>;
+ [llvm_any_ty, llvm_i32_ty], [IntrReadMem]>;
def int_dx_typedBufferStore
- : DefaultAttrsIntrinsic<[], [llvm_any_ty, llvm_i32_ty, llvm_anyvector_ty]>;
+ : DefaultAttrsIntrinsic<[], [llvm_any_ty, llvm_i32_ty, llvm_anyvector_ty], [IntrWriteMem]>;
// Cast between target extension handle types and dxil-style opaque handles
def int_dx_cast_handle : Intrinsic<[llvm_any_ty], [llvm_any_ty]>;
diff --git a/llvm/test/CodeGen/DirectX/ResourceGlobalElimination.ll b/llvm/test/CodeGen/DirectX/ResourceGlobalElimination.ll
new file mode 100644
index 00000000000000..c7ba9377911008
--- /dev/null
+++ b/llvm/test/CodeGen/DirectX/ResourceGlobalElimination.ll
@@ -0,0 +1,85 @@
+; RUN: opt -S -passes='early-cse<memssa>' %s | FileCheck %s
+
+; Ensure that EarlyCSE is able to eliminate unneeded loads of resource globals across typedBufferLoad.
+
+target datalayout = "e-m:e-p:32:32-i1:32-i8:8-i16:16-i32:32-i64:64-f16:16-f32:32-f64:64-n8:16:32:64"
+target triple = "dxilv1.6-unknown-shadermodel6.6-compute"
+
+%"class.hlsl::RWBuffer" = type { target("dx.TypedBuffer", <4 x float>, 1, 0, 0) }
+
+@In = global %"class.hlsl::RWBuffer" zeroinitializer, align 4
+@Out = global %"class.hlsl::RWBuffer" zeroinitializer, align 4
+
+; Function Attrs: convergent noinline norecurse
+; CHECK-LABEL define void @main()
+define void @main() local_unnamed_addr #0 {
+entry:
+ %tmp = alloca target("dx.TypedBuffer", <4 x float>, 1, 0, 0), align 4
+ %In_h.i = call target("dx.TypedBuffer", <4 x float>, 1, 0, 0) @llvm.dx.handle.fromBinding.tdx.TypedBuffer_v4f32_1_0_0t(i32 0, i32 0, i32 1, i32 0, i1 false)
+ store target("dx.TypedBuffer", <4 x float>, 1, 0, 0) %In_h.i, ptr @In, align 4
+ %Out_h.i = call target("dx.TypedBuffer", <4 x float>, 1, 0, 0) @llvm.dx.handle.fromBinding.tdx.TypedBuffer_v4f32_1_0_0t(i32 4, i32 1, i32 1, i32 0, i1 false)
+ store target("dx.TypedBuffer", <4 x float>, 1, 0, 0) %Out_h.i, ptr @Out, align 4
+ ; CHECK: call i32 @llvm.dx.flattened.thread.id.in.group()
+ %0 = call i32 @llvm.dx.flattened.thread.id.in.group()
+ ; CHECK-NOT: load {{.*}} ptr @In
+ %1 = load target("dx.TypedBuffer", <4 x float>, 1, 0, 0), ptr @In, align 4
+ ; CHECK call noundef <4 x float> @llvm.dx.typedBufferLoad.v4f32.tdx.TypedBuffer_v4f32_1_0_0t
+ %2 = call noundef <4 x float> @llvm.dx.typedBufferLoad.v4f32.tdx.TypedBuffer_v4f32_1_0_0t(target("dx.TypedBuffer", <4 x float>, 1, 0, 0) %1, i32 %0)
+ ; CHECK-NOT: load {{.*}} ptr @In
+ %3 = load target("dx.TypedBuffer", <4 x float>, 1, 0, 0), ptr @In, align 4
+ %4 = call noundef <4 x float> @llvm.dx.typedBufferLoad.v4f32.tdx.TypedBuffer_v4f32_1_0_0t(target("dx.TypedBuffer", <4 x float>, 1, 0, 0) %3, i32 %0)
+ %add.i = fadd <4 x float> %2, %4
+ store target("dx.TypedBuffer", <4 x float>, 1, 0, 0) %Out_h.i, ptr %tmp, align 4
+ call void @llvm.dx.typedBufferStore.tdx.TypedBuffer_v4f32_1_0_0t.v4f32(target("dx.TypedBuffer", <4 x float>, 1, 0, 0) %Out_h.i, i32 %0, <4 x float> %add.i)
+ ret void
+}
+
+; Function Attrs: mustprogress nofree nosync nounwind willreturn memory(none)
+declare i32 @llvm.dx.flattened.thread.id.in.group() #1
+
+; Function Attrs: mustprogress nocallback nofree nosync nounwind willreturn
+; CHECK: declare <4 x float> @llvm.dx.typedBufferLoad.v4f32.tdx.TypedBuffer_v4f32_1_0_0t(target("dx.TypedBuffer", <4 x float>, 1, 0, 0), i32) [[ROAttr:#[0-9]+]]
+declare <4 x float> @llvm.dx.typedBufferLoad.v4f32.tdx.TypedBuffer_v4f32_1_0_0t(target("dx.TypedBuffer", <4 x float>, 1, 0, 0), i32) #2
+
+; Function Attrs: mustprogress nocallback nofree nosync nounwind willreturn
+; CHECK: declare void @llvm.dx.typedBufferStore.tdx.TypedBuffer_v4f32_1_0_0t.v4f32(target("dx.TypedBuffer", <4 x float>, 1, 0, 0), i32, <4 x float>) [[WOAttr:#[0-9]+]]
+declare void @llvm.dx.typedBufferStore.tdx.TypedBuffer_v4f32_1_0_0t.v4f32(target("dx.TypedBuffer", <4 x float>, 1, 0, 0), i32, <4 x float>) #2
+
+; Function Attrs: mustprogress nocallback nofree nosync nounwind willreturn memory(none)
+declare target("dx.TypedBuffer", <4 x float>, 1, 0, 0) @llvm.dx.handle.fromBinding.tdx.TypedBuffer_v4f32_1_0_0t(i32, i32, i32, i32, i1) #3
+
+; CHECK: attributes [[ROAttr]] = { {{.*}} memory(read) }
+; CHECK: attributes [[WOAttr]] = { {{.*}} memory(write) }
+
+attributes #0 = { convergent noinline norecurse "frame-pointer"="all" "hlsl.numthreads"="8,1,1" "hlsl.shader"="compute" "no-trapping-math"="true" "stack-protector-buffer-size"="8" }
+attributes #1 = { mustprogress nofree nosync nounwind willreturn memory(none) }
+attributes #2 = { mustprogress nocallback nofree nosync nounwind willreturn }
+attributes #3 = { mustprogress nocallback nofree nosync nounwind willreturn memory(none) }
+
+!llvm.module.flags = !{!0, !1}
+!dx.valver = !{!2}
+!llvm.ident = !{!3}
+
+!0 = !{i32 1, !"wchar_size", i32 4}
+!1 = !{i32 7, !"frame-pointer", i32 2}
+!2 = !{i32 1, i32 8}
+!3 = !{!"clang version 20.0.0git ([email protected]:llvm/llvm-project.git 54dc966bd3d375d7c1604fac5fdac20989c1072a)"}
+!4 = !{!5}
+!5 = distinct !{!5, !6, !"_ZN4hlsl8RWBufferIDv4_fEixEi: %agg.result"}
+!6 = distinct !{!6, !"_ZN4hlsl8RWBufferIDv4_fEixEi"}
+!7 = !{!8, !9, i64 0}
+!8 = !{!"_ZTSN4hlsl8RWBufferIDv4_fEE", !9, i64 0}
+!9 = !{!"omnipotent char", !10, i64 0}
+!10 = !{!"Simple C++ TBAA"}
+!11 = !{!12}
+!12 = distinct !{!12, !13, !"_ZN4hlsl8RWBufferIDv4_fEixEi: %agg.result"}
+!13 = distinct !{!13, !"_ZN4hlsl8RWBufferIDv4_fEixEi"}
+!14 = !{!15}
+!15 = distinct !{!15, !16, !"_ZN4hlsl8RWBufferIDv4_fEixEi: %agg.result"}
+!16 = distinct !{!16, !"_ZN4hlsl8RWBufferIDv4_fEixEi"}
+!17 = !{!18, !9, i64 0}
+!18 = !{!"_ZTSN4hlsl8__detail18TypedResourceProxyIU9_Res_u_CTDv4_fu17__hlsl_resource_tS2_EE", !9, i64 0, !19, i64 4}
+!19 = !{!"int", !9, i64 0}
+!20 = !{!21}
+!21 = distinct !{!21, !22, !"_ZN4hlsl8__detail18TypedResourceProxyIU9_Res_u_CTDv4_fu17__hlsl_resource_tS2_EaSES2_: %agg.result"}
+!22 = distinct !{!22, !"_ZN4hlsl8__detail18TypedResourceProxyIU9_Res_u_CTDv4_fu17__hlsl_resource_tS2_EaSES2_"}
|
This is certainly a step in the right direction, but it will probably not eliminate extra |
By giving these intrinsics their appropriate attributes, loads of globals that are stored on the other side of these calls can be eliminated by the EarlyCSE pass. Stores to the same globals and the globals themselves require more direct intervention as part of the handleFromBinding lowering. Adds a test that verifies that the unneeded globals and their uses can be eliminated and also that the attributes are set properly. Fixes llvm#104271
✅ With the latest revision this PR passed the C/C++ code formatter. |
; RUN: opt -S -passes='early-cse<memssa>' %s -o %t | ||
; RUN: FileCheck --check-prefixes=CSE,CHECK %s < %t | ||
; finish compiling to verify that dxil-op-lower removes the globals entirely | ||
; RUN: llc -mtriple=dxil-pc-shadermodel6.0-compute --filetype=asm -o - %t | FileCheck --check-prefixes=LLC,CHECK %s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are both shader model 6.0 and 6.6 used here? Is there some special difference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the difference is whether the handle is created or created and annotated. We have separate paths for them, so even though they call the same utility function, I thought it better to test both paths.
; Also that DXILOpLowering eliminates the globals entirely. | ||
|
||
target datalayout = "e-m:e-p:32:32-i1:32-i8:8-i16:16-i32:32-i64:64-f16:16-f32:32-f64:64-n8:16:32:64" | ||
target triple = "dxilv1.6-unknown-shadermodel6.6-compute" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this triple get overridden by the -mtriple set above in the tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does. We should only set the triple one way or the other here, so I would probably drop the truple from the llc
command line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather be able to run both options above, I've removed this too.
void removeResourceGlobals(CallInst *CI) { | ||
for (User *User : make_early_inc_range(CI->users())) { | ||
if(StoreInst *Store = dyn_cast<StoreInst>(User)) { | ||
if (GlobalVariable *GV = dyn_cast<GlobalVariable>(Store->getOperand(1))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is special about operand 1? in Store->getOperand(1)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's the location of the global that gets stored to. In DXC, we would declare constants that made clear what these refer to, but I see no similar practice in LLVM.
def int_dx_typedBufferLoad | ||
: DefaultAttrsIntrinsic<[llvm_any_ty], [llvm_any_ty, llvm_i32_ty]>; | ||
: DefaultAttrsIntrinsic<[llvm_any_ty], [llvm_any_ty, llvm_i32_ty], [IntrReadMem]>; | ||
def int_dx_typedBufferLoad_checkbit | ||
: DefaultAttrsIntrinsic<[llvm_any_ty, llvm_i1_ty], | ||
[llvm_any_ty, llvm_i32_ty]>; | ||
[llvm_any_ty, llvm_i32_ty], [IntrReadMem]>; | ||
def int_dx_typedBufferStore | ||
: DefaultAttrsIntrinsic<[], [llvm_any_ty, llvm_i32_ty, llvm_anyvector_ty]>; | ||
: DefaultAttrsIntrinsic<[], [llvm_any_ty, llvm_i32_ty, llvm_anyvector_ty], [IntrWriteMem]>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file isn't exactly clang-format clean, but it would be nice to keep this formatting for this part to what clang-format says, ie:
def int_dx_typedBufferLoad
: DefaultAttrsIntrinsic<[llvm_any_ty], [llvm_any_ty, llvm_i32_ty],
[IntrReadMem]>;
def int_dx_typedBufferLoad_checkbit
: DefaultAttrsIntrinsic<[llvm_any_ty, llvm_i1_ty],
[llvm_any_ty, llvm_i32_ty], [IntrReadMem]>;
def int_dx_typedBufferStore
: DefaultAttrsIntrinsic<[], [llvm_any_ty, llvm_i32_ty, llvm_anyvector_ty],
[IntrWriteMem]>;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. I don't know how to apply clang-format-diff to a specific file that's not included by the existing rules
!llvm.module.flags = !{!0, !1} | ||
!dx.valver = !{!2} | ||
!llvm.ident = !{!3} | ||
|
||
!0 = !{i32 1, !"wchar_size", i32 4} | ||
!1 = !{i32 7, !"frame-pointer", i32 2} | ||
!2 = !{i32 1, i32 8} | ||
!3 = !{!"clang version 20.0.0git ([email protected]:llvm/llvm-project.git 54dc966bd3d375d7c1604fac5fdac20989c1072a)"} | ||
!4 = !{!5} | ||
!5 = distinct !{!5, !6, !"_ZN4hlsl8RWBufferIDv4_fEixEi: %agg.result"} | ||
!6 = distinct !{!6, !"_ZN4hlsl8RWBufferIDv4_fEixEi"} | ||
!7 = !{!8, !9, i64 0} | ||
!8 = !{!"_ZTSN4hlsl8RWBufferIDv4_fEE", !9, i64 0} | ||
!9 = !{!"omnipotent char", !10, i64 0} | ||
!10 = !{!"Simple C++ TBAA"} | ||
!11 = !{!12} | ||
!12 = distinct !{!12, !13, !"_ZN4hlsl8RWBufferIDv4_fEixEi: %agg.result"} | ||
!13 = distinct !{!13, !"_ZN4hlsl8RWBufferIDv4_fEixEi"} | ||
!14 = !{!15} | ||
!15 = distinct !{!15, !16, !"_ZN4hlsl8RWBufferIDv4_fEixEi: %agg.result"} | ||
!16 = distinct !{!16, !"_ZN4hlsl8RWBufferIDv4_fEixEi"} | ||
!17 = !{!18, !9, i64 0} | ||
!18 = !{!"_ZTSN4hlsl8__detail18TypedResourceProxyIU9_Res_u_CTDv4_fu17__hlsl_resource_tS2_EE", !9, i64 0, !19, i64 4} | ||
!19 = !{!"int", !9, i64 0} | ||
!20 = !{!21} | ||
!21 = distinct !{!21, !22, !"_ZN4hlsl8__detail18TypedResourceProxyIU9_Res_u_CTDv4_fu17__hlsl_resource_tS2_EaSES2_: %agg.result"} | ||
!22 = distinct !{!22, !"_ZN4hlsl8__detail18TypedResourceProxyIU9_Res_u_CTDv4_fu17__hlsl_resource_tS2_EaSES2_"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should be able to remove all of this metadata. Most of it isn't even referenced currently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed. Funny that this shows as commenting on a large number of lines, but besides the text at the top, there's no visual indication of it.
; Also that DXILOpLowering eliminates the globals entirely. | ||
|
||
target datalayout = "e-m:e-p:32:32-i1:32-i8:8-i16:16-i32:32-i64:64-f16:16-f32:32-f64:64-n8:16:32:64" | ||
target triple = "dxilv1.6-unknown-shadermodel6.6-compute" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does. We should only set the triple one way or the other here, so I would probably drop the truple from the llc
command line
; Ensure that EarlyCSE is able to eliminate unneeded loads of resource globals across typedBufferLoad. | ||
; Also that DXILOpLowering eliminates the globals entirely. | ||
|
||
target datalayout = "e-m:e-p:32:32-i1:32-i8:8-i16:16-i32:32-i64:64-f16:16-f32:32-f64:64-n8:16:32:64" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the default datalayout for the triple. Best to remove it since it doesn't really affect anything
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed
@In = global %"class.hlsl::RWBuffer" zeroinitializer, align 4 | ||
@Out = global %"class.hlsl::RWBuffer" zeroinitializer, align 4 | ||
|
||
; Function Attrs: convergent noinline norecurse |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I generally drop these function attr comments in tests - they're mostly just noise and can easily get out of sync with the actual attributes if we update those for one reason or another.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, removed.
|
||
; Function Attrs: mustprogress nocallback nofree nosync nounwind willreturn | ||
; CSE: declare <4 x float> @llvm.dx.typedBufferLoad.v4f32.tdx.TypedBuffer_v4f32_1_0_0t(target("dx.TypedBuffer", <4 x float>, 1, 0, 0), i32) [[ROAttr:#[0-9]+]] | ||
declare <4 x float> @llvm.dx.typedBufferLoad.v4f32.tdx.TypedBuffer_v4f32_1_0_0t(target("dx.TypedBuffer", <4 x float>, 1, 0, 0), i32) #2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're declaring the intrinsic with the wrong attributes here and then testing that it gets overwritten with the correct ones? That's kind of a weird thing to do...
I'd probably leave the declarations of the intrinsics out entirely since they'll be declared implicitly anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the only way I have at the moment to directly test the attributes applied in IntrinsicsDirectX.td. That does go better with a CodeGen test when these can be generated. I'm happy to leave that for that change and limit this testing to the indirect effects of the attribute here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the LLVM IR parser has a feature where if you use an intrinsic without declaring it, it will find the declaration implicitly. So what I'm saying is that you could omit the declare <4 x float> @llvm.dx.typedBufferLoad...
line entirely and still check the generated attributes, rather than having a declaration whose attributes are intentionally wrong. The only potential pitfall is that the order of the declarations might not be guaranteed - it probably depends on the traversal order of some pass or other over the IR.
For example, consider https://hlsl.godbolt.org/z/6bohv41s5 - the source doesn't declare any intrinsics, but they're all present in the output of opt
running no passes. Bonus - you don't even have to spell out the overload parts of the intrinsic names if you don't want to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've restored the tests for intrinsic attributes as you suggested.
Value *V = Store->getOperand(1); | ||
Store->eraseFromParent(); | ||
if (GlobalVariable *GV = dyn_cast<GlobalVariable>(V)) | ||
if (GV->use_empty()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be worth leaving a TODO comment around here somewhere that says we should really validate that all of the globals do eventually get removed, since otherwise we'll generate a broken module. Actually implementing that validation can probably be left for later for now, since it would be quite difficult to do locally here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Format IntrinsicsDirectX.td Add todo comment about future work clean up test to remove unneeded elements and better name check prefixes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Responses
; RUN: opt -S -passes='early-cse<memssa>' %s -o %t | ||
; RUN: FileCheck --check-prefixes=CSE,CHECK %s < %t | ||
; finish compiling to verify that dxil-op-lower removes the globals entirely | ||
; RUN: llc -mtriple=dxil-pc-shadermodel6.0-compute --filetype=asm -o - %t | FileCheck --check-prefixes=LLC,CHECK %s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the difference is whether the handle is created or created and annotated. We have separate paths for them, so even though they call the same utility function, I thought it better to test both paths.
; Ensure that EarlyCSE is able to eliminate unneeded loads of resource globals across typedBufferLoad. | ||
; Also that DXILOpLowering eliminates the globals entirely. | ||
|
||
target datalayout = "e-m:e-p:32:32-i1:32-i8:8-i16:16-i32:32-i64:64-f16:16-f32:32-f64:64-n8:16:32:64" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed
; Also that DXILOpLowering eliminates the globals entirely. | ||
|
||
target datalayout = "e-m:e-p:32:32-i1:32-i8:8-i16:16-i32:32-i64:64-f16:16-f32:32-f64:64-n8:16:32:64" | ||
target triple = "dxilv1.6-unknown-shadermodel6.6-compute" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather be able to run both options above, I've removed this too.
@In = global %"class.hlsl::RWBuffer" zeroinitializer, align 4 | ||
@Out = global %"class.hlsl::RWBuffer" zeroinitializer, align 4 | ||
|
||
; Function Attrs: convergent noinline norecurse |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, removed.
|
||
; Function Attrs: mustprogress nocallback nofree nosync nounwind willreturn | ||
; CSE: declare <4 x float> @llvm.dx.typedBufferLoad.v4f32.tdx.TypedBuffer_v4f32_1_0_0t(target("dx.TypedBuffer", <4 x float>, 1, 0, 0), i32) [[ROAttr:#[0-9]+]] | ||
declare <4 x float> @llvm.dx.typedBufferLoad.v4f32.tdx.TypedBuffer_v4f32_1_0_0t(target("dx.TypedBuffer", <4 x float>, 1, 0, 0), i32) #2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the only way I have at the moment to directly test the attributes applied in IntrinsicsDirectX.td. That does go better with a CodeGen test when these can be generated. I'm happy to leave that for that change and limit this testing to the indirect effects of the attribute here.
!llvm.module.flags = !{!0, !1} | ||
!dx.valver = !{!2} | ||
!llvm.ident = !{!3} | ||
|
||
!0 = !{i32 1, !"wchar_size", i32 4} | ||
!1 = !{i32 7, !"frame-pointer", i32 2} | ||
!2 = !{i32 1, i32 8} | ||
!3 = !{!"clang version 20.0.0git ([email protected]:llvm/llvm-project.git 54dc966bd3d375d7c1604fac5fdac20989c1072a)"} | ||
!4 = !{!5} | ||
!5 = distinct !{!5, !6, !"_ZN4hlsl8RWBufferIDv4_fEixEi: %agg.result"} | ||
!6 = distinct !{!6, !"_ZN4hlsl8RWBufferIDv4_fEixEi"} | ||
!7 = !{!8, !9, i64 0} | ||
!8 = !{!"_ZTSN4hlsl8RWBufferIDv4_fEE", !9, i64 0} | ||
!9 = !{!"omnipotent char", !10, i64 0} | ||
!10 = !{!"Simple C++ TBAA"} | ||
!11 = !{!12} | ||
!12 = distinct !{!12, !13, !"_ZN4hlsl8RWBufferIDv4_fEixEi: %agg.result"} | ||
!13 = distinct !{!13, !"_ZN4hlsl8RWBufferIDv4_fEixEi"} | ||
!14 = !{!15} | ||
!15 = distinct !{!15, !16, !"_ZN4hlsl8RWBufferIDv4_fEixEi: %agg.result"} | ||
!16 = distinct !{!16, !"_ZN4hlsl8RWBufferIDv4_fEixEi"} | ||
!17 = !{!18, !9, i64 0} | ||
!18 = !{!"_ZTSN4hlsl8__detail18TypedResourceProxyIU9_Res_u_CTDv4_fu17__hlsl_resource_tS2_EE", !9, i64 0, !19, i64 4} | ||
!19 = !{!"int", !9, i64 0} | ||
!20 = !{!21} | ||
!21 = distinct !{!21, !22, !"_ZN4hlsl8__detail18TypedResourceProxyIU9_Res_u_CTDv4_fu17__hlsl_resource_tS2_EaSES2_: %agg.result"} | ||
!22 = distinct !{!22, !"_ZN4hlsl8__detail18TypedResourceProxyIU9_Res_u_CTDv4_fu17__hlsl_resource_tS2_EaSES2_"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed. Funny that this shows as commenting on a large number of lines, but besides the text at the top, there's no visual indication of it.
void removeResourceGlobals(CallInst *CI) { | ||
for (User *User : make_early_inc_range(CI->users())) { | ||
if(StoreInst *Store = dyn_cast<StoreInst>(User)) { | ||
if (GlobalVariable *GV = dyn_cast<GlobalVariable>(Store->getOperand(1))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's the location of the global that gets stored to. In DXC, we would declare constants that made clear what these refer to, but I see no similar practice in LLVM.
def int_dx_typedBufferLoad | ||
: DefaultAttrsIntrinsic<[llvm_any_ty], [llvm_any_ty, llvm_i32_ty]>; | ||
: DefaultAttrsIntrinsic<[llvm_any_ty], [llvm_any_ty, llvm_i32_ty], [IntrReadMem]>; | ||
def int_dx_typedBufferLoad_checkbit | ||
: DefaultAttrsIntrinsic<[llvm_any_ty, llvm_i1_ty], | ||
[llvm_any_ty, llvm_i32_ty]>; | ||
[llvm_any_ty, llvm_i32_ty], [IntrReadMem]>; | ||
def int_dx_typedBufferStore | ||
: DefaultAttrsIntrinsic<[], [llvm_any_ty, llvm_i32_ty, llvm_anyvector_ty]>; | ||
: DefaultAttrsIntrinsic<[], [llvm_any_ty, llvm_i32_ty, llvm_anyvector_ty], [IntrWriteMem]>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. I don't know how to apply clang-format-diff to a specific file that's not included by the existing rules
Value *V = Store->getOperand(1); | ||
Store->eraseFromParent(); | ||
if (GlobalVariable *GV = dyn_cast<GlobalVariable>(V)) | ||
if (GV->use_empty()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
By giving these intrinsics their appropriate attributes, loads of globals that are stored on the other side of these calls can be eliminated by the EarlyCSE pass. Stores to the same globals and the globals themselves require more direct intervention as part of the handleFromBinding lowering.
Adds a test that verifies that the unneeded globals and their uses can be eliminated and also that the attributes are set properly.
Fixes #104271