Skip to content

[DirectX] Clean up extra vectors when lowering to buffer store #116721

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Dec 2, 2024

Conversation

bogner
Copy link
Contributor

@bogner bogner commented Nov 19, 2024

DXILOpLowering runs after scalarization but @llvm.dx.typedbuffer.store takes a vector, so the argument is usually an artifact. Avoid creating a vector just to extract elements from it immediately.

DXILOpLowering runs after scalarization but `@llvm.dx.typedbuffer.store`
takes a vector, so the argument is usually an artifact. Avoid creating a
vector just to extract elements from it immediately.
@llvmbot
Copy link
Member

llvmbot commented Nov 19, 2024

@llvm/pr-subscribers-backend-directx

Author: Justin Bogner (bogner)

Changes

DXILOpLowering runs after scalarization but @<!-- -->llvm.dx.typedbuffer.store takes a vector, so the argument is usually an artifact. Avoid creating a vector just to extract elements from it immediately.


Full diff: https://github.com/llvm/llvm-project/pull/116721.diff

2 Files Affected:

  • (modified) llvm/lib/Target/DirectX/DXILOpLowering.cpp (+34-11)
  • (modified) llvm/test/CodeGen/DirectX/BufferStore.ll (+24)
diff --git a/llvm/lib/Target/DirectX/DXILOpLowering.cpp b/llvm/lib/Target/DirectX/DXILOpLowering.cpp
index 9f124394363a38..00daa87a53aa13 100644
--- a/llvm/lib/Target/DirectX/DXILOpLowering.cpp
+++ b/llvm/lib/Target/DirectX/DXILOpLowering.cpp
@@ -531,23 +531,46 @@ class OpLowerer {
         return make_error<StringError>(
             "typedBufferStore data must be a vector of 4 elements",
             inconvertibleErrorCode());
-      Value *Data0 =
-          IRB.CreateExtractElement(Data, ConstantInt::get(Int32Ty, 0));
-      Value *Data1 =
-          IRB.CreateExtractElement(Data, ConstantInt::get(Int32Ty, 1));
-      Value *Data2 =
-          IRB.CreateExtractElement(Data, ConstantInt::get(Int32Ty, 2));
-      Value *Data3 =
-          IRB.CreateExtractElement(Data, ConstantInt::get(Int32Ty, 3));
-
-      std::array<Value *, 8> Args{Handle, Index0, Index1, Data0,
-                                  Data1,  Data2,  Data3,  Mask};
+
+      // Since we're post-scalarizer, we likely have a vector that's constructed
+      // solely for the argument of the store.
+      std::array<Value *, 4> DataElements{nullptr, nullptr, nullptr, nullptr};
+      auto *IEI = dyn_cast<InsertElementInst>(Data);
+      while (IEI) {
+        auto *IndexOp = dyn_cast<ConstantInt>(IEI->getOperand(2));
+        if (!IndexOp)
+          break;
+        size_t IndexVal = IndexOp->getZExtValue();
+        assert(IndexVal < 4 && "Too many elements for buffer store");
+        DataElements[IndexVal] = IEI->getOperand(1);
+        IEI = dyn_cast<InsertElementInst>(IEI->getOperand(0));
+      }
+
+      // If for some reason we weren't able to forward the arguments from the
+      // scalarizer artifact, then we need to actually extract elements from the
+      // vector.
+      for (int I = 0, E = 4; I != E; ++I)
+        if (DataElements[I] == nullptr)
+          DataElements[I] =
+              IRB.CreateExtractElement(Data, ConstantInt::get(Int32Ty, I));
+
+      std::array<Value *, 8> Args{
+          Handle,          Index0,          Index1,          DataElements[0],
+          DataElements[1], DataElements[2], DataElements[3], Mask};
       Expected<CallInst *> OpCall =
           OpBuilder.tryCreateOp(OpCode::BufferStore, Args, CI->getName());
       if (Error E = OpCall.takeError())
         return E;
 
       CI->eraseFromParent();
+      // Clean up any leftover `insertelement`s
+      IEI = dyn_cast<InsertElementInst>(Data);
+      while (IEI && IEI->use_empty()) {
+        InsertElementInst *Tmp = IEI;
+        IEI = dyn_cast<InsertElementInst>(IEI->getOperand(0));
+        Tmp->eraseFromParent();
+      }
+
       return Error::success();
     });
   }
diff --git a/llvm/test/CodeGen/DirectX/BufferStore.ll b/llvm/test/CodeGen/DirectX/BufferStore.ll
index 9ea7735be59c81..81cc5fd328e0a7 100644
--- a/llvm/test/CodeGen/DirectX/BufferStore.ll
+++ b/llvm/test/CodeGen/DirectX/BufferStore.ll
@@ -90,3 +90,27 @@ define void @storei16(<4 x i16> %data, i32 %index) {
 
   ret void
 }
+
+define void @store_scalarized_floats(float %data0, float %data1, float %data2, float %data3, i32 %index) {
+
+  ; CHECK: [[BIND:%.*]] = call %dx.types.Handle @dx.op.createHandleFromBinding(i32 217,
+  ; CHECK: [[HANDLE:%.*]] = call %dx.types.Handle @dx.op.annotateHandle(i32 216, %dx.types.Handle [[BIND]]
+  %buffer = call target("dx.TypedBuffer", <4 x float>, 1, 0, 0)
+      @llvm.dx.handle.fromBinding.tdx.TypedBuffer_v4f32_1_0_0(
+          i32 0, i32 0, i32 1, i32 0, i1 false)
+
+  ; We shouldn't end up with any inserts/extracts.
+  ; CHECK-NOT: insertelement
+  ; CHECK-NOT: extractelement
+
+  ; CHECK: call void @dx.op.bufferStore.f32(i32 69, %dx.types.Handle [[HANDLE]], i32 %index, i32 undef, float %data0, float %data1, float %data2, float %data3, i8 15)
+  %vec.upto0 = insertelement <4 x float> poison, float %data0, i64 0
+  %vec.upto1 = insertelement <4 x float> %vec.upto0, float %data1, i64 1
+  %vec.upto2 = insertelement <4 x float> %vec.upto1, float %data2, i64 2
+  %vec = insertelement <4 x float> %vec.upto2, float %data3, i64 3
+  call void @llvm.dx.typedBufferStore(
+      target("dx.TypedBuffer", <4 x float>, 1, 0, 0) %buffer,
+      i32 %index, <4 x float> %vec)
+
+  ret void
+}

Copy link
Contributor

@damyanp damyanp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks Good To My Untrained Eye

Comment on lines 535 to 536
// Since we're post-scalarizer, we likely have a vector that's constructed
// solely for the argument of the store.
Copy link
Contributor

@damyanp damyanp Nov 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I feel that there's a "...so we can (do whatever this loop does)" missing from this comment, but that might just be my lack of familiarity with this sort of code speaking.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added "If so, just use the scalar values from before they're inserted into the temporary." to the comment.

if (!IndexOp)
break;
size_t IndexVal = IndexOp->getZExtValue();
assert(IndexVal < 4 && "Too many elements for buffer store");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we start support > 4 element vectors, are asserts like this going to bite us (ie hard to debug errors that don't show up in production builds?)

Copy link
Contributor Author

@bogner bogner Nov 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so, this is fairly likely to crash in a non-asserts build if it comes up anyways, and this is a fairly obvious place we'd need to update if we were to make such a change. Note that we'd have to expand into multiple calls to dx.op.BufferStore here if we were to support larger vectors in HLSL's typed buffers, so this isn't just some implementation detail we'd be likely to forget about.

@bogner bogner merged commit 2c88ac9 into llvm:main Dec 2, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

4 participants