Skip to content

[DirectX] Eliminate resource global variables from module #114105

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Nov 12, 2024

Conversation

pow2clk
Copy link
Contributor

@pow2clk pow2clk commented Oct 29, 2024

By giving these intrinsics their appropriate attributes, loads of globals that are stored on the other side of these calls can be eliminated by the EarlyCSE pass. Stores to the same globals and the globals themselves require more direct intervention as part of the handleFromBinding lowering.

Adds a test that verifies that the unneeded globals and their uses can be eliminated and also that the attributes are set properly.

Fixes #104271

By giving these intrinsics their appropriate attributes, loads of allocas that are stored on the other side of these calls can be eliminated.

Adds a test that verifies that the unneeded loads can be eliminated and also that the attributes are set properly.

Fixes llvm#104271

This may be the first part of a broader audit of
@llvmbot
Copy link
Member

llvmbot commented Oct 29, 2024

@llvm/pr-subscribers-llvm-ir

@llvm/pr-subscribers-backend-directx

Author: Greg Roth (pow2clk)

Changes

By giving these intrinsics their appropriate attributes, loads of allocas that are stored on the other side of these calls can be eliminated.

Adds a test that verifies that the unneeded loads can be eliminated and also that the attributes are set properly.

Fixes #104271

This may be the first part of a broader audit of


Full diff: https://github.com/llvm/llvm-project/pull/114105.diff

2 Files Affected:

  • (modified) llvm/include/llvm/IR/IntrinsicsDirectX.td (+3-3)
  • (added) llvm/test/CodeGen/DirectX/ResourceGlobalElimination.ll (+85)
diff --git a/llvm/include/llvm/IR/IntrinsicsDirectX.td b/llvm/include/llvm/IR/IntrinsicsDirectX.td
index e30d37f69f781e..88c17a3378cda5 100644
--- a/llvm/include/llvm/IR/IntrinsicsDirectX.td
+++ b/llvm/include/llvm/IR/IntrinsicsDirectX.td
@@ -28,12 +28,12 @@ def int_dx_handle_fromBinding
           [IntrNoMem]>;
 
 def int_dx_typedBufferLoad
-    : DefaultAttrsIntrinsic<[llvm_any_ty], [llvm_any_ty, llvm_i32_ty]>;
+    : DefaultAttrsIntrinsic<[llvm_any_ty], [llvm_any_ty, llvm_i32_ty], [IntrReadMem]>;
 def int_dx_typedBufferLoad_checkbit
     : DefaultAttrsIntrinsic<[llvm_any_ty, llvm_i1_ty],
-                            [llvm_any_ty, llvm_i32_ty]>;
+                            [llvm_any_ty, llvm_i32_ty], [IntrReadMem]>;
 def int_dx_typedBufferStore
-    : DefaultAttrsIntrinsic<[], [llvm_any_ty, llvm_i32_ty, llvm_anyvector_ty]>;
+    : DefaultAttrsIntrinsic<[], [llvm_any_ty, llvm_i32_ty, llvm_anyvector_ty], [IntrWriteMem]>;
 
 // Cast between target extension handle types and dxil-style opaque handles
 def int_dx_cast_handle : Intrinsic<[llvm_any_ty], [llvm_any_ty]>;
diff --git a/llvm/test/CodeGen/DirectX/ResourceGlobalElimination.ll b/llvm/test/CodeGen/DirectX/ResourceGlobalElimination.ll
new file mode 100644
index 00000000000000..c7ba9377911008
--- /dev/null
+++ b/llvm/test/CodeGen/DirectX/ResourceGlobalElimination.ll
@@ -0,0 +1,85 @@
+; RUN: opt -S -passes='early-cse<memssa>' %s | FileCheck %s
+
+; Ensure that EarlyCSE is able to eliminate unneeded loads of resource globals across typedBufferLoad.
+
+target datalayout = "e-m:e-p:32:32-i1:32-i8:8-i16:16-i32:32-i64:64-f16:16-f32:32-f64:64-n8:16:32:64"
+target triple = "dxilv1.6-unknown-shadermodel6.6-compute"
+
+%"class.hlsl::RWBuffer" = type { target("dx.TypedBuffer", <4 x float>, 1, 0, 0) }
+
+@In = global %"class.hlsl::RWBuffer" zeroinitializer, align 4
+@Out = global %"class.hlsl::RWBuffer" zeroinitializer, align 4
+
+; Function Attrs: convergent noinline norecurse
+; CHECK-LABEL define void @main()
+define void @main() local_unnamed_addr #0 {
+entry:
+  %tmp = alloca target("dx.TypedBuffer", <4 x float>, 1, 0, 0), align 4
+  %In_h.i = call target("dx.TypedBuffer", <4 x float>, 1, 0, 0) @llvm.dx.handle.fromBinding.tdx.TypedBuffer_v4f32_1_0_0t(i32 0, i32 0, i32 1, i32 0, i1 false)
+  store target("dx.TypedBuffer", <4 x float>, 1, 0, 0) %In_h.i, ptr @In, align 4
+  %Out_h.i = call target("dx.TypedBuffer", <4 x float>, 1, 0, 0) @llvm.dx.handle.fromBinding.tdx.TypedBuffer_v4f32_1_0_0t(i32 4, i32 1, i32 1, i32 0, i1 false)
+  store target("dx.TypedBuffer", <4 x float>, 1, 0, 0) %Out_h.i, ptr @Out, align 4
+  ; CHECK: call i32 @llvm.dx.flattened.thread.id.in.group()
+  %0 = call i32 @llvm.dx.flattened.thread.id.in.group()
+  ; CHECK-NOT: load {{.*}} ptr @In
+  %1 = load target("dx.TypedBuffer", <4 x float>, 1, 0, 0), ptr @In, align 4
+  ; CHECK call noundef <4 x float> @llvm.dx.typedBufferLoad.v4f32.tdx.TypedBuffer_v4f32_1_0_0t
+  %2 = call noundef <4 x float> @llvm.dx.typedBufferLoad.v4f32.tdx.TypedBuffer_v4f32_1_0_0t(target("dx.TypedBuffer", <4 x float>, 1, 0, 0) %1, i32 %0)
+  ; CHECK-NOT: load {{.*}} ptr @In
+  %3 = load target("dx.TypedBuffer", <4 x float>, 1, 0, 0), ptr @In, align 4
+  %4 = call noundef <4 x float> @llvm.dx.typedBufferLoad.v4f32.tdx.TypedBuffer_v4f32_1_0_0t(target("dx.TypedBuffer", <4 x float>, 1, 0, 0) %3, i32 %0)
+  %add.i = fadd <4 x float> %2, %4
+  store target("dx.TypedBuffer", <4 x float>, 1, 0, 0) %Out_h.i, ptr %tmp, align 4
+  call void @llvm.dx.typedBufferStore.tdx.TypedBuffer_v4f32_1_0_0t.v4f32(target("dx.TypedBuffer", <4 x float>, 1, 0, 0) %Out_h.i, i32 %0, <4 x float> %add.i)
+  ret void
+}
+
+; Function Attrs: mustprogress nofree nosync nounwind willreturn memory(none)
+declare i32 @llvm.dx.flattened.thread.id.in.group() #1
+
+; Function Attrs: mustprogress nocallback nofree nosync nounwind willreturn
+; CHECK: declare <4 x float> @llvm.dx.typedBufferLoad.v4f32.tdx.TypedBuffer_v4f32_1_0_0t(target("dx.TypedBuffer", <4 x float>, 1, 0, 0), i32) [[ROAttr:#[0-9]+]]
+declare <4 x float> @llvm.dx.typedBufferLoad.v4f32.tdx.TypedBuffer_v4f32_1_0_0t(target("dx.TypedBuffer", <4 x float>, 1, 0, 0), i32) #2
+
+; Function Attrs: mustprogress nocallback nofree nosync nounwind willreturn
+; CHECK: declare void @llvm.dx.typedBufferStore.tdx.TypedBuffer_v4f32_1_0_0t.v4f32(target("dx.TypedBuffer", <4 x float>, 1, 0, 0), i32, <4 x float>) [[WOAttr:#[0-9]+]]
+declare void @llvm.dx.typedBufferStore.tdx.TypedBuffer_v4f32_1_0_0t.v4f32(target("dx.TypedBuffer", <4 x float>, 1, 0, 0), i32, <4 x float>) #2
+
+; Function Attrs: mustprogress nocallback nofree nosync nounwind willreturn memory(none)
+declare target("dx.TypedBuffer", <4 x float>, 1, 0, 0) @llvm.dx.handle.fromBinding.tdx.TypedBuffer_v4f32_1_0_0t(i32, i32, i32, i32, i1) #3
+
+; CHECK: attributes [[ROAttr]] = { {{.*}} memory(read) }
+; CHECK: attributes [[WOAttr]] = { {{.*}} memory(write) }
+
+attributes #0 = { convergent noinline norecurse "frame-pointer"="all" "hlsl.numthreads"="8,1,1" "hlsl.shader"="compute" "no-trapping-math"="true" "stack-protector-buffer-size"="8" }
+attributes #1 = { mustprogress nofree nosync nounwind willreturn memory(none) }
+attributes #2 = { mustprogress nocallback nofree nosync nounwind willreturn }
+attributes #3 = { mustprogress nocallback nofree nosync nounwind willreturn memory(none) }
+
+!llvm.module.flags = !{!0, !1}
+!dx.valver = !{!2}
+!llvm.ident = !{!3}
+
+!0 = !{i32 1, !"wchar_size", i32 4}
+!1 = !{i32 7, !"frame-pointer", i32 2}
+!2 = !{i32 1, i32 8}
+!3 = !{!"clang version 20.0.0git ([email protected]:llvm/llvm-project.git 54dc966bd3d375d7c1604fac5fdac20989c1072a)"}
+!4 = !{!5}
+!5 = distinct !{!5, !6, !"_ZN4hlsl8RWBufferIDv4_fEixEi: %agg.result"}
+!6 = distinct !{!6, !"_ZN4hlsl8RWBufferIDv4_fEixEi"}
+!7 = !{!8, !9, i64 0}
+!8 = !{!"_ZTSN4hlsl8RWBufferIDv4_fEE", !9, i64 0}
+!9 = !{!"omnipotent char", !10, i64 0}
+!10 = !{!"Simple C++ TBAA"}
+!11 = !{!12}
+!12 = distinct !{!12, !13, !"_ZN4hlsl8RWBufferIDv4_fEixEi: %agg.result"}
+!13 = distinct !{!13, !"_ZN4hlsl8RWBufferIDv4_fEixEi"}
+!14 = !{!15}
+!15 = distinct !{!15, !16, !"_ZN4hlsl8RWBufferIDv4_fEixEi: %agg.result"}
+!16 = distinct !{!16, !"_ZN4hlsl8RWBufferIDv4_fEixEi"}
+!17 = !{!18, !9, i64 0}
+!18 = !{!"_ZTSN4hlsl8__detail18TypedResourceProxyIU9_Res_u_CTDv4_fu17__hlsl_resource_tS2_EE", !9, i64 0, !19, i64 4}
+!19 = !{!"int", !9, i64 0}
+!20 = !{!21}
+!21 = distinct !{!21, !22, !"_ZN4hlsl8__detail18TypedResourceProxyIU9_Res_u_CTDv4_fu17__hlsl_resource_tS2_EaSES2_: %agg.result"}
+!22 = distinct !{!22, !"_ZN4hlsl8__detail18TypedResourceProxyIU9_Res_u_CTDv4_fu17__hlsl_resource_tS2_EaSES2_"}

@hekota
Copy link
Member

hekota commented Oct 29, 2024

This is certainly a step in the right direction, but it will probably not eliminate extra loada on a handle after a typedBufferStore is used on the same buffer, right? Or is that part of the "broader audit" you mentioned above?

By giving these intrinsics their appropriate attributes, loads of globals that are stored on the other side of these calls can be eliminated by the EarlyCSE pass.  Stores to the same globals and the globals themselves require more direct intervention as part of the handleFromBinding lowering.

Adds a test that verifies that the unneeded globals and their uses can be eliminated and also that the attributes are set properly.

Fixes llvm#104271
@pow2clk pow2clk changed the title [DirectX] Mark buffer load/store as mem read/write [DirectX] Eliminate resource global variables from module Oct 30, 2024
Copy link

github-actions bot commented Oct 30, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

; RUN: opt -S -passes='early-cse<memssa>' %s -o %t
; RUN: FileCheck --check-prefixes=CSE,CHECK %s < %t
; finish compiling to verify that dxil-op-lower removes the globals entirely
; RUN: llc -mtriple=dxil-pc-shadermodel6.0-compute --filetype=asm -o - %t | FileCheck --check-prefixes=LLC,CHECK %s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are both shader model 6.0 and 6.6 used here? Is there some special difference?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the difference is whether the handle is created or created and annotated. We have separate paths for them, so even though they call the same utility function, I thought it better to test both paths.

; Also that DXILOpLowering eliminates the globals entirely.

target datalayout = "e-m:e-p:32:32-i1:32-i8:8-i16:16-i32:32-i64:64-f16:16-f32:32-f64:64-n8:16:32:64"
target triple = "dxilv1.6-unknown-shadermodel6.6-compute"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this triple get overridden by the -mtriple set above in the tests?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does. We should only set the triple one way or the other here, so I would probably drop the truple from the llc command line

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather be able to run both options above, I've removed this too.

void removeResourceGlobals(CallInst *CI) {
for (User *User : make_early_inc_range(CI->users())) {
if(StoreInst *Store = dyn_cast<StoreInst>(User)) {
if (GlobalVariable *GV = dyn_cast<GlobalVariable>(Store->getOperand(1))) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is special about operand 1? in Store->getOperand(1)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the location of the global that gets stored to. In DXC, we would declare constants that made clear what these refer to, but I see no similar practice in LLVM.

Comment on lines 30 to 36
def int_dx_typedBufferLoad
: DefaultAttrsIntrinsic<[llvm_any_ty], [llvm_any_ty, llvm_i32_ty]>;
: DefaultAttrsIntrinsic<[llvm_any_ty], [llvm_any_ty, llvm_i32_ty], [IntrReadMem]>;
def int_dx_typedBufferLoad_checkbit
: DefaultAttrsIntrinsic<[llvm_any_ty, llvm_i1_ty],
[llvm_any_ty, llvm_i32_ty]>;
[llvm_any_ty, llvm_i32_ty], [IntrReadMem]>;
def int_dx_typedBufferStore
: DefaultAttrsIntrinsic<[], [llvm_any_ty, llvm_i32_ty, llvm_anyvector_ty]>;
: DefaultAttrsIntrinsic<[], [llvm_any_ty, llvm_i32_ty, llvm_anyvector_ty], [IntrWriteMem]>;
Copy link
Contributor

@bogner bogner Nov 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file isn't exactly clang-format clean, but it would be nice to keep this formatting for this part to what clang-format says, ie:

def int_dx_typedBufferLoad
    : DefaultAttrsIntrinsic<[llvm_any_ty], [llvm_any_ty, llvm_i32_ty],
                            [IntrReadMem]>;
def int_dx_typedBufferLoad_checkbit
    : DefaultAttrsIntrinsic<[llvm_any_ty, llvm_i1_ty],
                            [llvm_any_ty, llvm_i32_ty], [IntrReadMem]>;
def int_dx_typedBufferStore
    : DefaultAttrsIntrinsic<[], [llvm_any_ty, llvm_i32_ty, llvm_anyvector_ty],
                            [IntrWriteMem]>;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I don't know how to apply clang-format-diff to a specific file that's not included by the existing rules

Comment on lines 68 to 94
!llvm.module.flags = !{!0, !1}
!dx.valver = !{!2}
!llvm.ident = !{!3}

!0 = !{i32 1, !"wchar_size", i32 4}
!1 = !{i32 7, !"frame-pointer", i32 2}
!2 = !{i32 1, i32 8}
!3 = !{!"clang version 20.0.0git ([email protected]:llvm/llvm-project.git 54dc966bd3d375d7c1604fac5fdac20989c1072a)"}
!4 = !{!5}
!5 = distinct !{!5, !6, !"_ZN4hlsl8RWBufferIDv4_fEixEi: %agg.result"}
!6 = distinct !{!6, !"_ZN4hlsl8RWBufferIDv4_fEixEi"}
!7 = !{!8, !9, i64 0}
!8 = !{!"_ZTSN4hlsl8RWBufferIDv4_fEE", !9, i64 0}
!9 = !{!"omnipotent char", !10, i64 0}
!10 = !{!"Simple C++ TBAA"}
!11 = !{!12}
!12 = distinct !{!12, !13, !"_ZN4hlsl8RWBufferIDv4_fEixEi: %agg.result"}
!13 = distinct !{!13, !"_ZN4hlsl8RWBufferIDv4_fEixEi"}
!14 = !{!15}
!15 = distinct !{!15, !16, !"_ZN4hlsl8RWBufferIDv4_fEixEi: %agg.result"}
!16 = distinct !{!16, !"_ZN4hlsl8RWBufferIDv4_fEixEi"}
!17 = !{!18, !9, i64 0}
!18 = !{!"_ZTSN4hlsl8__detail18TypedResourceProxyIU9_Res_u_CTDv4_fu17__hlsl_resource_tS2_EE", !9, i64 0, !19, i64 4}
!19 = !{!"int", !9, i64 0}
!20 = !{!21}
!21 = distinct !{!21, !22, !"_ZN4hlsl8__detail18TypedResourceProxyIU9_Res_u_CTDv4_fu17__hlsl_resource_tS2_EaSES2_: %agg.result"}
!22 = distinct !{!22, !"_ZN4hlsl8__detail18TypedResourceProxyIU9_Res_u_CTDv4_fu17__hlsl_resource_tS2_EaSES2_"}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should be able to remove all of this metadata. Most of it isn't even referenced currently.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed. Funny that this shows as commenting on a large number of lines, but besides the text at the top, there's no visual indication of it.

; Also that DXILOpLowering eliminates the globals entirely.

target datalayout = "e-m:e-p:32:32-i1:32-i8:8-i16:16-i32:32-i64:64-f16:16-f32:32-f64:64-n8:16:32:64"
target triple = "dxilv1.6-unknown-shadermodel6.6-compute"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does. We should only set the triple one way or the other here, so I would probably drop the truple from the llc command line

; Ensure that EarlyCSE is able to eliminate unneeded loads of resource globals across typedBufferLoad.
; Also that DXILOpLowering eliminates the globals entirely.

target datalayout = "e-m:e-p:32:32-i1:32-i8:8-i16:16-i32:32-i64:64-f16:16-f32:32-f64:64-n8:16:32:64"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the default datalayout for the triple. Best to remove it since it doesn't really affect anything

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

@In = global %"class.hlsl::RWBuffer" zeroinitializer, align 4
@Out = global %"class.hlsl::RWBuffer" zeroinitializer, align 4

; Function Attrs: convergent noinline norecurse
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I generally drop these function attr comments in tests - they're mostly just noise and can easily get out of sync with the actual attributes if we update those for one reason or another.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, removed.


; Function Attrs: mustprogress nocallback nofree nosync nounwind willreturn
; CSE: declare <4 x float> @llvm.dx.typedBufferLoad.v4f32.tdx.TypedBuffer_v4f32_1_0_0t(target("dx.TypedBuffer", <4 x float>, 1, 0, 0), i32) [[ROAttr:#[0-9]+]]
declare <4 x float> @llvm.dx.typedBufferLoad.v4f32.tdx.TypedBuffer_v4f32_1_0_0t(target("dx.TypedBuffer", <4 x float>, 1, 0, 0), i32) #2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're declaring the intrinsic with the wrong attributes here and then testing that it gets overwritten with the correct ones? That's kind of a weird thing to do...

I'd probably leave the declarations of the intrinsics out entirely since they'll be declared implicitly anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the only way I have at the moment to directly test the attributes applied in IntrinsicsDirectX.td. That does go better with a CodeGen test when these can be generated. I'm happy to leave that for that change and limit this testing to the indirect effects of the attribute here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the LLVM IR parser has a feature where if you use an intrinsic without declaring it, it will find the declaration implicitly. So what I'm saying is that you could omit the declare <4 x float> @llvm.dx.typedBufferLoad... line entirely and still check the generated attributes, rather than having a declaration whose attributes are intentionally wrong. The only potential pitfall is that the order of the declarations might not be guaranteed - it probably depends on the traversal order of some pass or other over the IR.

For example, consider https://hlsl.godbolt.org/z/6bohv41s5 - the source doesn't declare any intrinsics, but they're all present in the output of opt running no passes. Bonus - you don't even have to spell out the overload parts of the intrinsic names if you don't want to.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've restored the tests for intrinsic attributes as you suggested.

Value *V = Store->getOperand(1);
Store->eraseFromParent();
if (GlobalVariable *GV = dyn_cast<GlobalVariable>(V))
if (GV->use_empty()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth leaving a TODO comment around here somewhere that says we should really validate that all of the globals do eventually get removed, since otherwise we'll generate a broken module. Actually implementing that validation can probably be left for later for now, since it would be quite difficult to do locally here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Format IntrinsicsDirectX.td

Add todo comment about future work

clean up test to remove unneeded elements and better name check prefixes
Copy link
Contributor Author

@pow2clk pow2clk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Responses

; RUN: opt -S -passes='early-cse<memssa>' %s -o %t
; RUN: FileCheck --check-prefixes=CSE,CHECK %s < %t
; finish compiling to verify that dxil-op-lower removes the globals entirely
; RUN: llc -mtriple=dxil-pc-shadermodel6.0-compute --filetype=asm -o - %t | FileCheck --check-prefixes=LLC,CHECK %s
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the difference is whether the handle is created or created and annotated. We have separate paths for them, so even though they call the same utility function, I thought it better to test both paths.

; Ensure that EarlyCSE is able to eliminate unneeded loads of resource globals across typedBufferLoad.
; Also that DXILOpLowering eliminates the globals entirely.

target datalayout = "e-m:e-p:32:32-i1:32-i8:8-i16:16-i32:32-i64:64-f16:16-f32:32-f64:64-n8:16:32:64"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

; Also that DXILOpLowering eliminates the globals entirely.

target datalayout = "e-m:e-p:32:32-i1:32-i8:8-i16:16-i32:32-i64:64-f16:16-f32:32-f64:64-n8:16:32:64"
target triple = "dxilv1.6-unknown-shadermodel6.6-compute"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather be able to run both options above, I've removed this too.

@In = global %"class.hlsl::RWBuffer" zeroinitializer, align 4
@Out = global %"class.hlsl::RWBuffer" zeroinitializer, align 4

; Function Attrs: convergent noinline norecurse
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, removed.


; Function Attrs: mustprogress nocallback nofree nosync nounwind willreturn
; CSE: declare <4 x float> @llvm.dx.typedBufferLoad.v4f32.tdx.TypedBuffer_v4f32_1_0_0t(target("dx.TypedBuffer", <4 x float>, 1, 0, 0), i32) [[ROAttr:#[0-9]+]]
declare <4 x float> @llvm.dx.typedBufferLoad.v4f32.tdx.TypedBuffer_v4f32_1_0_0t(target("dx.TypedBuffer", <4 x float>, 1, 0, 0), i32) #2
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the only way I have at the moment to directly test the attributes applied in IntrinsicsDirectX.td. That does go better with a CodeGen test when these can be generated. I'm happy to leave that for that change and limit this testing to the indirect effects of the attribute here.

Comment on lines 68 to 94
!llvm.module.flags = !{!0, !1}
!dx.valver = !{!2}
!llvm.ident = !{!3}

!0 = !{i32 1, !"wchar_size", i32 4}
!1 = !{i32 7, !"frame-pointer", i32 2}
!2 = !{i32 1, i32 8}
!3 = !{!"clang version 20.0.0git ([email protected]:llvm/llvm-project.git 54dc966bd3d375d7c1604fac5fdac20989c1072a)"}
!4 = !{!5}
!5 = distinct !{!5, !6, !"_ZN4hlsl8RWBufferIDv4_fEixEi: %agg.result"}
!6 = distinct !{!6, !"_ZN4hlsl8RWBufferIDv4_fEixEi"}
!7 = !{!8, !9, i64 0}
!8 = !{!"_ZTSN4hlsl8RWBufferIDv4_fEE", !9, i64 0}
!9 = !{!"omnipotent char", !10, i64 0}
!10 = !{!"Simple C++ TBAA"}
!11 = !{!12}
!12 = distinct !{!12, !13, !"_ZN4hlsl8RWBufferIDv4_fEixEi: %agg.result"}
!13 = distinct !{!13, !"_ZN4hlsl8RWBufferIDv4_fEixEi"}
!14 = !{!15}
!15 = distinct !{!15, !16, !"_ZN4hlsl8RWBufferIDv4_fEixEi: %agg.result"}
!16 = distinct !{!16, !"_ZN4hlsl8RWBufferIDv4_fEixEi"}
!17 = !{!18, !9, i64 0}
!18 = !{!"_ZTSN4hlsl8__detail18TypedResourceProxyIU9_Res_u_CTDv4_fu17__hlsl_resource_tS2_EE", !9, i64 0, !19, i64 4}
!19 = !{!"int", !9, i64 0}
!20 = !{!21}
!21 = distinct !{!21, !22, !"_ZN4hlsl8__detail18TypedResourceProxyIU9_Res_u_CTDv4_fu17__hlsl_resource_tS2_EaSES2_: %agg.result"}
!22 = distinct !{!22, !"_ZN4hlsl8__detail18TypedResourceProxyIU9_Res_u_CTDv4_fu17__hlsl_resource_tS2_EaSES2_"}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed. Funny that this shows as commenting on a large number of lines, but besides the text at the top, there's no visual indication of it.

void removeResourceGlobals(CallInst *CI) {
for (User *User : make_early_inc_range(CI->users())) {
if(StoreInst *Store = dyn_cast<StoreInst>(User)) {
if (GlobalVariable *GV = dyn_cast<GlobalVariable>(Store->getOperand(1))) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the location of the global that gets stored to. In DXC, we would declare constants that made clear what these refer to, but I see no similar practice in LLVM.

Comment on lines 30 to 36
def int_dx_typedBufferLoad
: DefaultAttrsIntrinsic<[llvm_any_ty], [llvm_any_ty, llvm_i32_ty]>;
: DefaultAttrsIntrinsic<[llvm_any_ty], [llvm_any_ty, llvm_i32_ty], [IntrReadMem]>;
def int_dx_typedBufferLoad_checkbit
: DefaultAttrsIntrinsic<[llvm_any_ty, llvm_i1_ty],
[llvm_any_ty, llvm_i32_ty]>;
[llvm_any_ty, llvm_i32_ty], [IntrReadMem]>;
def int_dx_typedBufferStore
: DefaultAttrsIntrinsic<[], [llvm_any_ty, llvm_i32_ty, llvm_anyvector_ty]>;
: DefaultAttrsIntrinsic<[], [llvm_any_ty, llvm_i32_ty, llvm_anyvector_ty], [IntrWriteMem]>;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I don't know how to apply clang-format-diff to a specific file that's not included by the existing rules

Value *V = Store->getOperand(1);
Store->eraseFromParent();
if (GlobalVariable *GV = dyn_cast<GlobalVariable>(V))
if (GV->use_empty()) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@pow2clk pow2clk merged commit 47ef3a0 into llvm:main Nov 12, 2024
8 checks passed
@pow2clk pow2clk deleted the elim_res_loads_pr branch November 12, 2024 16:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

[HLSL] Eliminate resource global variables from module
5 participants