Skip to content

[clang][HLSL][SPRI-V] Add convergence intrinsics #80680

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Mar 28, 2024

Conversation

Keenuts
Copy link
Contributor

@Keenuts Keenuts commented Feb 5, 2024

HLSL has wave operations and other kind of function which required the control flow to either be converged, or respect certain constraints as where and how to re-converge.

At the HLSL level, the convergence are mostly obvious: the control flow is expected to re-converge at the end of a scope.
Once translated to IR, HLSL scopes disapear. This means we need a way to communicate convergence restrictions down to the backend.

For this, the SPIR-V backend uses convergence intrinsics. So this commit adds some code to generate convergence intrinsics when required.

@Keenuts Keenuts force-pushed the generate-convergence-frontent branch 2 times, most recently from 55509e5 to b62ed15 Compare February 22, 2024 20:31
@arsenm arsenm requested a review from ssahasra February 28, 2024 07:13
@Keenuts Keenuts force-pushed the generate-convergence-frontent branch from b62ed15 to 5fbfecb Compare March 11, 2024 16:47
@Keenuts Keenuts marked this pull request as ready for review March 11, 2024 16:49
@llvmbot llvmbot added clang Clang issues not falling into any other category backend:X86 clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:headers Headers provided by Clang, e.g. for intrinsics clang:codegen IR generation bugs: mangling, exceptions, etc. HLSL HLSL Language Support llvm:ir labels Mar 11, 2024
@llvmbot
Copy link
Member

llvmbot commented Mar 11, 2024

@llvm/pr-subscribers-hlsl
@llvm/pr-subscribers-llvm-ir
@llvm/pr-subscribers-clang-codegen

@llvm/pr-subscribers-clang

Author: Nathan Gauër (Keenuts)

Changes

HLSL has wave operations and other kind of function which required the control flow to either be converged, or respect certain constraints as where and how to re-converge.

At the HLSL level, the convergence are mostly obvious: the control flow is expected to re-converge at the end of a scope.
Once translated to IR, HLSL scopes disapear. This means we need a way to communicate convergence restrictions down to the backend.

For this, the SPIR-V backend uses convergence intrinsics. So this commit adds some code to generate convergence intrinsics when required.

This commit is not to be submitted as-is (lacks testing), but should serve as a basis for an upcoming RFC.


Full diff: https://github.com/llvm/llvm-project/pull/80680.diff

10 Files Affected:

  • (modified) clang/include/clang/Basic/Builtins.td (+7)
  • (modified) clang/lib/CodeGen/CGBuiltin.cpp (+97)
  • (modified) clang/lib/CodeGen/CGCall.cpp (+4)
  • (modified) clang/lib/CodeGen/CGLoopInfo.h (+7-1)
  • (modified) clang/lib/CodeGen/CodeGenFunction.h (+19)
  • (modified) clang/lib/Headers/hlsl/hlsl_intrinsics.h (+5)
  • (added) clang/test/CodeGenHLSL/builtins/wave_get_lane_index_do_while.hlsl (+40)
  • (added) clang/test/CodeGenHLSL/builtins/wave_get_lane_index_simple.hlsl (+14)
  • (added) clang/test/CodeGenHLSL/builtins/wave_get_lane_index_subcall.hlsl (+21)
  • (modified) llvm/include/llvm/IR/IntrinsicInst.h (+13)
diff --git a/clang/include/clang/Basic/Builtins.td b/clang/include/clang/Basic/Builtins.td
index 9c703377ca8d3e..11c857cfa3f374 100644
--- a/clang/include/clang/Basic/Builtins.td
+++ b/clang/include/clang/Basic/Builtins.td
@@ -4554,6 +4554,13 @@ def HLSLWaveActiveCountBits : LangBuiltin<"HLSL_LANG"> {
   let Prototype = "unsigned int(bool)";
 }
 
+// HLSL
+def HLSLWaveGetLaneIndex : LangBuiltin<"HLSL_LANG"> {
+  let Spellings = ["__builtin_hlsl_wave_get_lane_index"];
+  let Attributes = [NoThrow, Const];
+  let Prototype = "unsigned int()";
+}
+
 def HLSLCreateHandle : LangBuiltin<"HLSL_LANG"> {
   let Spellings = ["__builtin_hlsl_create_handle"];
   let Attributes = [NoThrow, Const];
diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index 20c35757939152..12fc855fb92bb8 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -1130,8 +1130,96 @@ struct BitTest {
 
   static BitTest decodeBitTestBuiltin(unsigned BuiltinID);
 };
+
+// Returns the first convergence entry/loop/anchor instruction found in |BB|.
+// std::nullopt otherwise.
+std::optional<llvm::IntrinsicInst *> getConvergenceToken(llvm::BasicBlock *BB) {
+  for (auto &I : *BB) {
+    auto *II = dyn_cast<llvm::IntrinsicInst>(&I);
+    if (II && isConvergenceControlIntrinsic(II->getIntrinsicID()))
+      return II;
+  }
+  return std::nullopt;
+}
+
 } // namespace
 
+llvm::CallBase *
+CodeGenFunction::AddConvergenceControlAttr(llvm::CallBase *Input,
+                                           llvm::Value *ParentToken) {
+  llvm::Value *bundleArgs[] = {ParentToken};
+  llvm::OperandBundleDef OB("convergencectrl", bundleArgs);
+  auto Output = llvm::CallBase::addOperandBundle(
+      Input, llvm::LLVMContext::OB_convergencectrl, OB, Input);
+  Input->replaceAllUsesWith(Output);
+  Input->eraseFromParent();
+  return Output;
+}
+
+llvm::IntrinsicInst *
+CodeGenFunction::EmitConvergenceLoop(llvm::BasicBlock *BB,
+                                     llvm::Value *ParentToken) {
+  CGBuilderTy::InsertPoint IP = Builder.saveIP();
+  Builder.SetInsertPoint(&BB->front());
+  auto CB = Builder.CreateIntrinsic(
+      llvm::Intrinsic::experimental_convergence_loop, {}, {});
+  Builder.restoreIP(IP);
+
+  auto I = AddConvergenceControlAttr(CB, ParentToken);
+  // Controlled convergence is incompatible with uncontrolled convergence.
+  // Removing any old attributes.
+  I->setNotConvergent();
+
+  return cast<llvm::IntrinsicInst>(I);
+}
+
+llvm::IntrinsicInst *
+CodeGenFunction::getOrEmitConvergenceEntryToken(llvm::Function *F) {
+  auto *BB = &F->getEntryBlock();
+  auto token = getConvergenceToken(BB);
+  if (token.has_value())
+    return token.value();
+
+  // Adding a convergence token requires the function to be marked as
+  // convergent.
+  F->setConvergent();
+
+  CGBuilderTy::InsertPoint IP = Builder.saveIP();
+  Builder.SetInsertPoint(&BB->front());
+  auto I = Builder.CreateIntrinsic(
+      llvm::Intrinsic::experimental_convergence_entry, {}, {});
+  assert(isa<llvm::IntrinsicInst>(I));
+  Builder.restoreIP(IP);
+
+  return cast<llvm::IntrinsicInst>(I);
+}
+
+llvm::IntrinsicInst *
+CodeGenFunction::getOrEmitConvergenceLoopToken(const LoopInfo *LI) {
+  assert(LI != nullptr);
+
+  auto token = getConvergenceToken(LI->getHeader());
+  if (token.has_value())
+    return *token;
+
+  llvm::IntrinsicInst *PII =
+      LI->getParent()
+          ? EmitConvergenceLoop(LI->getHeader(),
+                                getOrEmitConvergenceLoopToken(LI->getParent()))
+          : getOrEmitConvergenceEntryToken(LI->getHeader()->getParent());
+
+  return EmitConvergenceLoop(LI->getHeader(), PII);
+}
+
+llvm::CallBase *
+CodeGenFunction::AddControlledConvergenceAttr(llvm::CallBase *Input) {
+  llvm::Value *ParentToken =
+      LoopStack.hasInfo()
+          ? getOrEmitConvergenceLoopToken(&LoopStack.getInfo())
+          : getOrEmitConvergenceEntryToken(Input->getFunction());
+  return AddConvergenceControlAttr(Input, ParentToken);
+}
+
 BitTest BitTest::decodeBitTestBuiltin(unsigned BuiltinID) {
   switch (BuiltinID) {
     // Main portable variants.
@@ -5698,6 +5786,15 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
         {NDRange, Kernel, Block}));
   }
 
+  case Builtin::BI__builtin_hlsl_wave_get_lane_index: {
+    auto *CI = EmitRuntimeCall(CGM.CreateRuntimeFunction(
+        llvm::FunctionType::get(IntTy, {}, false), "__hlsl_wave_get_lane_index",
+        {}, false, true));
+    if (getTarget().getTriple().isSPIRVLogical())
+      CI = dyn_cast<CallInst>(AddControlledConvergenceAttr(CI));
+    return RValue::get(CI);
+  }
+
   case Builtin::BI__builtin_store_half:
   case Builtin::BI__builtin_store_halff: {
     Value *Val = EmitScalarExpr(E->getArg(0));
diff --git a/clang/lib/CodeGen/CGCall.cpp b/clang/lib/CodeGen/CGCall.cpp
index a28d7888715d85..4b24367a8e19d9 100644
--- a/clang/lib/CodeGen/CGCall.cpp
+++ b/clang/lib/CodeGen/CGCall.cpp
@@ -5686,6 +5686,10 @@ RValue CodeGenFunction::EmitCall(const CGFunctionInfo &CallInfo,
   if (!CI->getType()->isVoidTy())
     CI->setName("call");
 
+  if (getTarget().getTriple().isSPIRVLogical() &&
+      CI->getCalledFunction()->isConvergent())
+    CI = AddControlledConvergenceAttr(CI);
+
   // Update largest vector width from the return type.
   LargestVectorWidth =
       std::max(LargestVectorWidth, getMaxVectorWidth(CI->getType()));
diff --git a/clang/lib/CodeGen/CGLoopInfo.h b/clang/lib/CodeGen/CGLoopInfo.h
index a1c8c7e5307fd9..7c2f7443bd3c99 100644
--- a/clang/lib/CodeGen/CGLoopInfo.h
+++ b/clang/lib/CodeGen/CGLoopInfo.h
@@ -110,6 +110,10 @@ class LoopInfo {
   /// been processed.
   void finish();
 
+  /// Returns the first outer loop containing this loop if any, nullptr
+  /// otherwise.
+  const LoopInfo *getParent() const { return Parent; }
+
 private:
   /// Loop ID metadata.
   llvm::TempMDTuple TempLoopID;
@@ -291,12 +295,14 @@ class LoopInfoStack {
   /// Set no progress for the next loop pushed.
   void setMustProgress(bool P) { StagedAttrs.MustProgress = P; }
 
-private:
   /// Returns true if there is LoopInfo on the stack.
   bool hasInfo() const { return !Active.empty(); }
+
   /// Return the LoopInfo for the current loop. HasInfo should be called
   /// first to ensure LoopInfo is present.
   const LoopInfo &getInfo() const { return *Active.back(); }
+
+private:
   /// The set of attributes that will be applied to the next pushed loop.
   LoopAttributes StagedAttrs;
   /// Stack of active loops.
diff --git a/clang/lib/CodeGen/CodeGenFunction.h b/clang/lib/CodeGen/CodeGenFunction.h
index 6c825a302913df..c475b80db0fc41 100644
--- a/clang/lib/CodeGen/CodeGenFunction.h
+++ b/clang/lib/CodeGen/CodeGenFunction.h
@@ -4868,6 +4868,25 @@ class CodeGenFunction : public CodeGenTypeCache {
   llvm::Value *emitBoolVecConversion(llvm::Value *SrcVec,
                                      unsigned NumElementsDst,
                                      const llvm::Twine &Name = "");
+  // Adds a convergence_ctrl attribute to |Input| and emits the required parent
+  // convergence instructions.
+  llvm::CallBase *AddControlledConvergenceAttr(llvm::CallBase *Input);
+
+private:
+  // Emits a convergence_loop instruction for the given |BB|, with |ParentToken|
+  // as it's parent convergence instr.
+  llvm::IntrinsicInst *EmitConvergenceLoop(llvm::BasicBlock *BB,
+                                           llvm::Value *ParentToken);
+  // Adds a convergence_ctrl attribute with |ParentToken| as parent convergence
+  // instr to the call |Input|.
+  llvm::CallBase *AddConvergenceControlAttr(llvm::CallBase *Input,
+                                            llvm::Value *ParentToken);
+  // Find the convergence_entry instruction |F|, or emits ones if none exists.
+  // Returns the convergence instruction.
+  llvm::IntrinsicInst *getOrEmitConvergenceEntryToken(llvm::Function *F);
+  // Find the convergence_loop instruction for the loop defined by |LI|, or
+  // emits one if none exists. Returns the convergence instruction.
+  llvm::IntrinsicInst *getOrEmitConvergenceLoopToken(const LoopInfo *LI);
 
 private:
   llvm::MDNode *getRangeForLoadFromType(QualType Ty);
diff --git a/clang/lib/Headers/hlsl/hlsl_intrinsics.h b/clang/lib/Headers/hlsl/hlsl_intrinsics.h
index 45f8544392584e..108588e5e0af60 100644
--- a/clang/lib/Headers/hlsl/hlsl_intrinsics.h
+++ b/clang/lib/Headers/hlsl/hlsl_intrinsics.h
@@ -1297,5 +1297,10 @@ _HLSL_AVAILABILITY(shadermodel, 6.0)
 _HLSL_BUILTIN_ALIAS(__builtin_hlsl_wave_active_count_bits)
 uint WaveActiveCountBits(bool Val);
 
+/// \brief Returns the index of the current lane within the current wave.
+_HLSL_AVAILABILITY(shadermodel, 6.0)
+_HLSL_BUILTIN_ALIAS(__builtin_hlsl_wave_get_lane_index)
+uint WaveGetLaneIndex();
+
 } // namespace hlsl
 #endif //_HLSL_HLSL_INTRINSICS_H_
diff --git a/clang/test/CodeGenHLSL/builtins/wave_get_lane_index_do_while.hlsl b/clang/test/CodeGenHLSL/builtins/wave_get_lane_index_do_while.hlsl
new file mode 100644
index 00000000000000..9481b0d60a2723
--- /dev/null
+++ b/clang/test/CodeGenHLSL/builtins/wave_get_lane_index_do_while.hlsl
@@ -0,0 +1,40 @@
+// RUN: %clang_cc1 -std=hlsl2021 -finclude-default-header -x hlsl -triple \
+// RUN:   spirv-pc-vulkan-library %s -emit-llvm -disable-llvm-passes -o - | FileCheck %s
+
+// CHECK: define spir_func void @main() [[A0:#[0-9]+]] {
+void main() {
+// CHECK: entry:
+// CHECK:   %[[CT_ENTRY:[0-9]+]] = call token @llvm.experimental.convergence.entry()
+// CHECK:   br label %[[LABEL_WHILE_COND:.+]]
+  int cond = 0;
+
+// CHECK: [[LABEL_WHILE_COND]]:
+// CHECK:   %[[CT_LOOP:[0-9]+]] = call token @llvm.experimental.convergence.loop() [ "convergencectrl"(token %[[CT_ENTRY]]) ]
+// CHECK:   br label %[[LABEL_WHILE_BODY:.+]]
+  while (true) {
+
+// CHECK: [[LABEL_WHILE_BODY]]:
+// CHECK:   br i1 {{%.+}}, label %[[LABEL_IF_THEN:.+]], label %[[LABEL_IF_END:.+]]
+
+// CHECK: [[LABEL_IF_THEN]]:
+// CHECK:   call i32 @__hlsl_wave_get_lane_index() [ "convergencectrl"(token %[[CT_LOOP]]) ]
+// CHECK:   br label %[[LABEL_WHILE_END:.+]]
+    if (cond == 2) {
+      uint index = WaveGetLaneIndex();
+      break;
+    }
+
+// CHECK: [[LABEL_IF_END]]:
+// CHECK:   br label %[[LABEL_WHILE_COND]]
+    cond++;
+  }
+
+// CHECK: [[LABEL_WHILE_END]]:
+// CHECK:   ret void
+}
+
+// CHECK-DAG: declare i32 @__hlsl_wave_get_lane_index() [[A1:#[0-9]+]]
+
+// CHECK-DAG: attributes [[A0]] = {{{.*}}convergent{{.*}}}
+// CHECK-DAG: attributes [[A1]] = {{{.*}}convergent{{.*}}}
+
diff --git a/clang/test/CodeGenHLSL/builtins/wave_get_lane_index_simple.hlsl b/clang/test/CodeGenHLSL/builtins/wave_get_lane_index_simple.hlsl
new file mode 100644
index 00000000000000..8f52d81091c180
--- /dev/null
+++ b/clang/test/CodeGenHLSL/builtins/wave_get_lane_index_simple.hlsl
@@ -0,0 +1,14 @@
+// RUN: %clang_cc1 -std=hlsl2021 -finclude-default-header -x hlsl -triple \
+// RUN:   spirv-pc-vulkan-library %s -emit-llvm -disable-llvm-passes -o - | FileCheck %s
+
+// CHECK: define spir_func noundef i32 @_Z6test_1v() [[A0:#[0-9]+]] {
+// CHECK: %[[CI:[0-9]+]] = call token @llvm.experimental.convergence.entry()
+// CHECK: call i32 @__hlsl_wave_get_lane_index() [ "convergencectrl"(token %[[CI]]) ]
+uint test_1() {
+  return WaveGetLaneIndex();
+}
+
+// CHECK: declare i32 @__hlsl_wave_get_lane_index() [[A1:#[0-9]+]]
+
+// CHECK-DAG: attributes [[A0]] = { {{.*}}convergent{{.*}} }
+// CHECK-DAG: attributes [[A1]] = { {{.*}}convergent{{.*}} }
diff --git a/clang/test/CodeGenHLSL/builtins/wave_get_lane_index_subcall.hlsl b/clang/test/CodeGenHLSL/builtins/wave_get_lane_index_subcall.hlsl
new file mode 100644
index 00000000000000..379c8f118f52f3
--- /dev/null
+++ b/clang/test/CodeGenHLSL/builtins/wave_get_lane_index_subcall.hlsl
@@ -0,0 +1,21 @@
+// RUN: %clang_cc1 -std=hlsl2021 -finclude-default-header -x hlsl -triple \
+// RUN:   spirv-pc-vulkan-library %s -emit-llvm -disable-llvm-passes -o - | FileCheck %s
+
+// CHECK: define spir_func noundef i32 @_Z6test_1v() [[A0:#[0-9]+]] {
+// CHECK: %[[C1:[0-9]+]] = call token @llvm.experimental.convergence.entry()
+// CHECK: call i32 @__hlsl_wave_get_lane_index() [ "convergencectrl"(token %[[C1]]) ]
+uint test_1() {
+  return WaveGetLaneIndex();
+}
+
+// CHECK-DAG: declare i32 @__hlsl_wave_get_lane_index() [[A1:#[0-9]+]]
+
+// CHECK: define spir_func noundef i32 @_Z6test_2v() [[A0]] {
+// CHECK: %[[C2:[0-9]+]] = call token @llvm.experimental.convergence.entry()
+// CHECK: call spir_func noundef i32 @_Z6test_1v() [ "convergencectrl"(token %[[C2]]) ]
+uint test_2() {
+  return test_1();
+}
+
+// CHECK-DAG: attributes [[A0]] = {{{.*}}convergent{{.*}}}
+// CHECK-DAG: attributes [[A1]] = {{{.*}}convergent{{.*}}}
diff --git a/llvm/include/llvm/IR/IntrinsicInst.h b/llvm/include/llvm/IR/IntrinsicInst.h
index c07b83a81a63e1..4f22720f1c558d 100644
--- a/llvm/include/llvm/IR/IntrinsicInst.h
+++ b/llvm/include/llvm/IR/IntrinsicInst.h
@@ -1782,6 +1782,19 @@ class ConvergenceControlInst : public IntrinsicInst {
   static bool classof(const Value *V) {
     return isa<IntrinsicInst>(V) && classof(cast<IntrinsicInst>(V));
   }
+
+  // Returns the convergence intrinsic referenced by |I|'s convergencectrl
+  // attribute if any.
+  static IntrinsicInst *getParentConvergenceToken(Instruction *I) {
+    auto *CI = dyn_cast<llvm::CallInst>(I);
+    if (!CI)
+      return nullptr;
+
+    auto Bundle = CI->getOperandBundle(llvm::LLVMContext::OB_convergencectrl);
+    assert(Bundle->Inputs.size() == 1 &&
+           Bundle->Inputs[0]->getType()->isTokenTy());
+    return dyn_cast<llvm::IntrinsicInst>(Bundle->Inputs[0].get());
+  }
 };
 
 } // end namespace llvm

@llvmbot
Copy link
Member

llvmbot commented Mar 11, 2024

@llvm/pr-subscribers-backend-x86

Author: Nathan Gauër (Keenuts)

Changes

HLSL has wave operations and other kind of function which required the control flow to either be converged, or respect certain constraints as where and how to re-converge.

At the HLSL level, the convergence are mostly obvious: the control flow is expected to re-converge at the end of a scope.
Once translated to IR, HLSL scopes disapear. This means we need a way to communicate convergence restrictions down to the backend.

For this, the SPIR-V backend uses convergence intrinsics. So this commit adds some code to generate convergence intrinsics when required.

This commit is not to be submitted as-is (lacks testing), but should serve as a basis for an upcoming RFC.


Full diff: https://github.com/llvm/llvm-project/pull/80680.diff

10 Files Affected:

  • (modified) clang/include/clang/Basic/Builtins.td (+7)
  • (modified) clang/lib/CodeGen/CGBuiltin.cpp (+97)
  • (modified) clang/lib/CodeGen/CGCall.cpp (+4)
  • (modified) clang/lib/CodeGen/CGLoopInfo.h (+7-1)
  • (modified) clang/lib/CodeGen/CodeGenFunction.h (+19)
  • (modified) clang/lib/Headers/hlsl/hlsl_intrinsics.h (+5)
  • (added) clang/test/CodeGenHLSL/builtins/wave_get_lane_index_do_while.hlsl (+40)
  • (added) clang/test/CodeGenHLSL/builtins/wave_get_lane_index_simple.hlsl (+14)
  • (added) clang/test/CodeGenHLSL/builtins/wave_get_lane_index_subcall.hlsl (+21)
  • (modified) llvm/include/llvm/IR/IntrinsicInst.h (+13)
diff --git a/clang/include/clang/Basic/Builtins.td b/clang/include/clang/Basic/Builtins.td
index 9c703377ca8d3e..11c857cfa3f374 100644
--- a/clang/include/clang/Basic/Builtins.td
+++ b/clang/include/clang/Basic/Builtins.td
@@ -4554,6 +4554,13 @@ def HLSLWaveActiveCountBits : LangBuiltin<"HLSL_LANG"> {
   let Prototype = "unsigned int(bool)";
 }
 
+// HLSL
+def HLSLWaveGetLaneIndex : LangBuiltin<"HLSL_LANG"> {
+  let Spellings = ["__builtin_hlsl_wave_get_lane_index"];
+  let Attributes = [NoThrow, Const];
+  let Prototype = "unsigned int()";
+}
+
 def HLSLCreateHandle : LangBuiltin<"HLSL_LANG"> {
   let Spellings = ["__builtin_hlsl_create_handle"];
   let Attributes = [NoThrow, Const];
diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index 20c35757939152..12fc855fb92bb8 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -1130,8 +1130,96 @@ struct BitTest {
 
   static BitTest decodeBitTestBuiltin(unsigned BuiltinID);
 };
+
+// Returns the first convergence entry/loop/anchor instruction found in |BB|.
+// std::nullopt otherwise.
+std::optional<llvm::IntrinsicInst *> getConvergenceToken(llvm::BasicBlock *BB) {
+  for (auto &I : *BB) {
+    auto *II = dyn_cast<llvm::IntrinsicInst>(&I);
+    if (II && isConvergenceControlIntrinsic(II->getIntrinsicID()))
+      return II;
+  }
+  return std::nullopt;
+}
+
 } // namespace
 
+llvm::CallBase *
+CodeGenFunction::AddConvergenceControlAttr(llvm::CallBase *Input,
+                                           llvm::Value *ParentToken) {
+  llvm::Value *bundleArgs[] = {ParentToken};
+  llvm::OperandBundleDef OB("convergencectrl", bundleArgs);
+  auto Output = llvm::CallBase::addOperandBundle(
+      Input, llvm::LLVMContext::OB_convergencectrl, OB, Input);
+  Input->replaceAllUsesWith(Output);
+  Input->eraseFromParent();
+  return Output;
+}
+
+llvm::IntrinsicInst *
+CodeGenFunction::EmitConvergenceLoop(llvm::BasicBlock *BB,
+                                     llvm::Value *ParentToken) {
+  CGBuilderTy::InsertPoint IP = Builder.saveIP();
+  Builder.SetInsertPoint(&BB->front());
+  auto CB = Builder.CreateIntrinsic(
+      llvm::Intrinsic::experimental_convergence_loop, {}, {});
+  Builder.restoreIP(IP);
+
+  auto I = AddConvergenceControlAttr(CB, ParentToken);
+  // Controlled convergence is incompatible with uncontrolled convergence.
+  // Removing any old attributes.
+  I->setNotConvergent();
+
+  return cast<llvm::IntrinsicInst>(I);
+}
+
+llvm::IntrinsicInst *
+CodeGenFunction::getOrEmitConvergenceEntryToken(llvm::Function *F) {
+  auto *BB = &F->getEntryBlock();
+  auto token = getConvergenceToken(BB);
+  if (token.has_value())
+    return token.value();
+
+  // Adding a convergence token requires the function to be marked as
+  // convergent.
+  F->setConvergent();
+
+  CGBuilderTy::InsertPoint IP = Builder.saveIP();
+  Builder.SetInsertPoint(&BB->front());
+  auto I = Builder.CreateIntrinsic(
+      llvm::Intrinsic::experimental_convergence_entry, {}, {});
+  assert(isa<llvm::IntrinsicInst>(I));
+  Builder.restoreIP(IP);
+
+  return cast<llvm::IntrinsicInst>(I);
+}
+
+llvm::IntrinsicInst *
+CodeGenFunction::getOrEmitConvergenceLoopToken(const LoopInfo *LI) {
+  assert(LI != nullptr);
+
+  auto token = getConvergenceToken(LI->getHeader());
+  if (token.has_value())
+    return *token;
+
+  llvm::IntrinsicInst *PII =
+      LI->getParent()
+          ? EmitConvergenceLoop(LI->getHeader(),
+                                getOrEmitConvergenceLoopToken(LI->getParent()))
+          : getOrEmitConvergenceEntryToken(LI->getHeader()->getParent());
+
+  return EmitConvergenceLoop(LI->getHeader(), PII);
+}
+
+llvm::CallBase *
+CodeGenFunction::AddControlledConvergenceAttr(llvm::CallBase *Input) {
+  llvm::Value *ParentToken =
+      LoopStack.hasInfo()
+          ? getOrEmitConvergenceLoopToken(&LoopStack.getInfo())
+          : getOrEmitConvergenceEntryToken(Input->getFunction());
+  return AddConvergenceControlAttr(Input, ParentToken);
+}
+
 BitTest BitTest::decodeBitTestBuiltin(unsigned BuiltinID) {
   switch (BuiltinID) {
     // Main portable variants.
@@ -5698,6 +5786,15 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
         {NDRange, Kernel, Block}));
   }
 
+  case Builtin::BI__builtin_hlsl_wave_get_lane_index: {
+    auto *CI = EmitRuntimeCall(CGM.CreateRuntimeFunction(
+        llvm::FunctionType::get(IntTy, {}, false), "__hlsl_wave_get_lane_index",
+        {}, false, true));
+    if (getTarget().getTriple().isSPIRVLogical())
+      CI = dyn_cast<CallInst>(AddControlledConvergenceAttr(CI));
+    return RValue::get(CI);
+  }
+
   case Builtin::BI__builtin_store_half:
   case Builtin::BI__builtin_store_halff: {
     Value *Val = EmitScalarExpr(E->getArg(0));
diff --git a/clang/lib/CodeGen/CGCall.cpp b/clang/lib/CodeGen/CGCall.cpp
index a28d7888715d85..4b24367a8e19d9 100644
--- a/clang/lib/CodeGen/CGCall.cpp
+++ b/clang/lib/CodeGen/CGCall.cpp
@@ -5686,6 +5686,10 @@ RValue CodeGenFunction::EmitCall(const CGFunctionInfo &CallInfo,
   if (!CI->getType()->isVoidTy())
     CI->setName("call");
 
+  if (getTarget().getTriple().isSPIRVLogical() &&
+      CI->getCalledFunction()->isConvergent())
+    CI = AddControlledConvergenceAttr(CI);
+
   // Update largest vector width from the return type.
   LargestVectorWidth =
       std::max(LargestVectorWidth, getMaxVectorWidth(CI->getType()));
diff --git a/clang/lib/CodeGen/CGLoopInfo.h b/clang/lib/CodeGen/CGLoopInfo.h
index a1c8c7e5307fd9..7c2f7443bd3c99 100644
--- a/clang/lib/CodeGen/CGLoopInfo.h
+++ b/clang/lib/CodeGen/CGLoopInfo.h
@@ -110,6 +110,10 @@ class LoopInfo {
   /// been processed.
   void finish();
 
+  /// Returns the first outer loop containing this loop if any, nullptr
+  /// otherwise.
+  const LoopInfo *getParent() const { return Parent; }
+
 private:
   /// Loop ID metadata.
   llvm::TempMDTuple TempLoopID;
@@ -291,12 +295,14 @@ class LoopInfoStack {
   /// Set no progress for the next loop pushed.
   void setMustProgress(bool P) { StagedAttrs.MustProgress = P; }
 
-private:
   /// Returns true if there is LoopInfo on the stack.
   bool hasInfo() const { return !Active.empty(); }
+
   /// Return the LoopInfo for the current loop. HasInfo should be called
   /// first to ensure LoopInfo is present.
   const LoopInfo &getInfo() const { return *Active.back(); }
+
+private:
   /// The set of attributes that will be applied to the next pushed loop.
   LoopAttributes StagedAttrs;
   /// Stack of active loops.
diff --git a/clang/lib/CodeGen/CodeGenFunction.h b/clang/lib/CodeGen/CodeGenFunction.h
index 6c825a302913df..c475b80db0fc41 100644
--- a/clang/lib/CodeGen/CodeGenFunction.h
+++ b/clang/lib/CodeGen/CodeGenFunction.h
@@ -4868,6 +4868,25 @@ class CodeGenFunction : public CodeGenTypeCache {
   llvm::Value *emitBoolVecConversion(llvm::Value *SrcVec,
                                      unsigned NumElementsDst,
                                      const llvm::Twine &Name = "");
+  // Adds a convergence_ctrl attribute to |Input| and emits the required parent
+  // convergence instructions.
+  llvm::CallBase *AddControlledConvergenceAttr(llvm::CallBase *Input);
+
+private:
+  // Emits a convergence_loop instruction for the given |BB|, with |ParentToken|
+  // as it's parent convergence instr.
+  llvm::IntrinsicInst *EmitConvergenceLoop(llvm::BasicBlock *BB,
+                                           llvm::Value *ParentToken);
+  // Adds a convergence_ctrl attribute with |ParentToken| as parent convergence
+  // instr to the call |Input|.
+  llvm::CallBase *AddConvergenceControlAttr(llvm::CallBase *Input,
+                                            llvm::Value *ParentToken);
+  // Find the convergence_entry instruction |F|, or emits ones if none exists.
+  // Returns the convergence instruction.
+  llvm::IntrinsicInst *getOrEmitConvergenceEntryToken(llvm::Function *F);
+  // Find the convergence_loop instruction for the loop defined by |LI|, or
+  // emits one if none exists. Returns the convergence instruction.
+  llvm::IntrinsicInst *getOrEmitConvergenceLoopToken(const LoopInfo *LI);
 
 private:
   llvm::MDNode *getRangeForLoadFromType(QualType Ty);
diff --git a/clang/lib/Headers/hlsl/hlsl_intrinsics.h b/clang/lib/Headers/hlsl/hlsl_intrinsics.h
index 45f8544392584e..108588e5e0af60 100644
--- a/clang/lib/Headers/hlsl/hlsl_intrinsics.h
+++ b/clang/lib/Headers/hlsl/hlsl_intrinsics.h
@@ -1297,5 +1297,10 @@ _HLSL_AVAILABILITY(shadermodel, 6.0)
 _HLSL_BUILTIN_ALIAS(__builtin_hlsl_wave_active_count_bits)
 uint WaveActiveCountBits(bool Val);
 
+/// \brief Returns the index of the current lane within the current wave.
+_HLSL_AVAILABILITY(shadermodel, 6.0)
+_HLSL_BUILTIN_ALIAS(__builtin_hlsl_wave_get_lane_index)
+uint WaveGetLaneIndex();
+
 } // namespace hlsl
 #endif //_HLSL_HLSL_INTRINSICS_H_
diff --git a/clang/test/CodeGenHLSL/builtins/wave_get_lane_index_do_while.hlsl b/clang/test/CodeGenHLSL/builtins/wave_get_lane_index_do_while.hlsl
new file mode 100644
index 00000000000000..9481b0d60a2723
--- /dev/null
+++ b/clang/test/CodeGenHLSL/builtins/wave_get_lane_index_do_while.hlsl
@@ -0,0 +1,40 @@
+// RUN: %clang_cc1 -std=hlsl2021 -finclude-default-header -x hlsl -triple \
+// RUN:   spirv-pc-vulkan-library %s -emit-llvm -disable-llvm-passes -o - | FileCheck %s
+
+// CHECK: define spir_func void @main() [[A0:#[0-9]+]] {
+void main() {
+// CHECK: entry:
+// CHECK:   %[[CT_ENTRY:[0-9]+]] = call token @llvm.experimental.convergence.entry()
+// CHECK:   br label %[[LABEL_WHILE_COND:.+]]
+  int cond = 0;
+
+// CHECK: [[LABEL_WHILE_COND]]:
+// CHECK:   %[[CT_LOOP:[0-9]+]] = call token @llvm.experimental.convergence.loop() [ "convergencectrl"(token %[[CT_ENTRY]]) ]
+// CHECK:   br label %[[LABEL_WHILE_BODY:.+]]
+  while (true) {
+
+// CHECK: [[LABEL_WHILE_BODY]]:
+// CHECK:   br i1 {{%.+}}, label %[[LABEL_IF_THEN:.+]], label %[[LABEL_IF_END:.+]]
+
+// CHECK: [[LABEL_IF_THEN]]:
+// CHECK:   call i32 @__hlsl_wave_get_lane_index() [ "convergencectrl"(token %[[CT_LOOP]]) ]
+// CHECK:   br label %[[LABEL_WHILE_END:.+]]
+    if (cond == 2) {
+      uint index = WaveGetLaneIndex();
+      break;
+    }
+
+// CHECK: [[LABEL_IF_END]]:
+// CHECK:   br label %[[LABEL_WHILE_COND]]
+    cond++;
+  }
+
+// CHECK: [[LABEL_WHILE_END]]:
+// CHECK:   ret void
+}
+
+// CHECK-DAG: declare i32 @__hlsl_wave_get_lane_index() [[A1:#[0-9]+]]
+
+// CHECK-DAG: attributes [[A0]] = {{{.*}}convergent{{.*}}}
+// CHECK-DAG: attributes [[A1]] = {{{.*}}convergent{{.*}}}
+
diff --git a/clang/test/CodeGenHLSL/builtins/wave_get_lane_index_simple.hlsl b/clang/test/CodeGenHLSL/builtins/wave_get_lane_index_simple.hlsl
new file mode 100644
index 00000000000000..8f52d81091c180
--- /dev/null
+++ b/clang/test/CodeGenHLSL/builtins/wave_get_lane_index_simple.hlsl
@@ -0,0 +1,14 @@
+// RUN: %clang_cc1 -std=hlsl2021 -finclude-default-header -x hlsl -triple \
+// RUN:   spirv-pc-vulkan-library %s -emit-llvm -disable-llvm-passes -o - | FileCheck %s
+
+// CHECK: define spir_func noundef i32 @_Z6test_1v() [[A0:#[0-9]+]] {
+// CHECK: %[[CI:[0-9]+]] = call token @llvm.experimental.convergence.entry()
+// CHECK: call i32 @__hlsl_wave_get_lane_index() [ "convergencectrl"(token %[[CI]]) ]
+uint test_1() {
+  return WaveGetLaneIndex();
+}
+
+// CHECK: declare i32 @__hlsl_wave_get_lane_index() [[A1:#[0-9]+]]
+
+// CHECK-DAG: attributes [[A0]] = { {{.*}}convergent{{.*}} }
+// CHECK-DAG: attributes [[A1]] = { {{.*}}convergent{{.*}} }
diff --git a/clang/test/CodeGenHLSL/builtins/wave_get_lane_index_subcall.hlsl b/clang/test/CodeGenHLSL/builtins/wave_get_lane_index_subcall.hlsl
new file mode 100644
index 00000000000000..379c8f118f52f3
--- /dev/null
+++ b/clang/test/CodeGenHLSL/builtins/wave_get_lane_index_subcall.hlsl
@@ -0,0 +1,21 @@
+// RUN: %clang_cc1 -std=hlsl2021 -finclude-default-header -x hlsl -triple \
+// RUN:   spirv-pc-vulkan-library %s -emit-llvm -disable-llvm-passes -o - | FileCheck %s
+
+// CHECK: define spir_func noundef i32 @_Z6test_1v() [[A0:#[0-9]+]] {
+// CHECK: %[[C1:[0-9]+]] = call token @llvm.experimental.convergence.entry()
+// CHECK: call i32 @__hlsl_wave_get_lane_index() [ "convergencectrl"(token %[[C1]]) ]
+uint test_1() {
+  return WaveGetLaneIndex();
+}
+
+// CHECK-DAG: declare i32 @__hlsl_wave_get_lane_index() [[A1:#[0-9]+]]
+
+// CHECK: define spir_func noundef i32 @_Z6test_2v() [[A0]] {
+// CHECK: %[[C2:[0-9]+]] = call token @llvm.experimental.convergence.entry()
+// CHECK: call spir_func noundef i32 @_Z6test_1v() [ "convergencectrl"(token %[[C2]]) ]
+uint test_2() {
+  return test_1();
+}
+
+// CHECK-DAG: attributes [[A0]] = {{{.*}}convergent{{.*}}}
+// CHECK-DAG: attributes [[A1]] = {{{.*}}convergent{{.*}}}
diff --git a/llvm/include/llvm/IR/IntrinsicInst.h b/llvm/include/llvm/IR/IntrinsicInst.h
index c07b83a81a63e1..4f22720f1c558d 100644
--- a/llvm/include/llvm/IR/IntrinsicInst.h
+++ b/llvm/include/llvm/IR/IntrinsicInst.h
@@ -1782,6 +1782,19 @@ class ConvergenceControlInst : public IntrinsicInst {
   static bool classof(const Value *V) {
     return isa<IntrinsicInst>(V) && classof(cast<IntrinsicInst>(V));
   }
+
+  // Returns the convergence intrinsic referenced by |I|'s convergencectrl
+  // attribute if any.
+  static IntrinsicInst *getParentConvergenceToken(Instruction *I) {
+    auto *CI = dyn_cast<llvm::CallInst>(I);
+    if (!CI)
+      return nullptr;
+
+    auto Bundle = CI->getOperandBundle(llvm::LLVMContext::OB_convergencectrl);
+    assert(Bundle->Inputs.size() == 1 &&
+           Bundle->Inputs[0]->getType()->isTokenTy());
+    return dyn_cast<llvm::IntrinsicInst>(Bundle->Inputs[0].get());
+  }
 };
 
 } // end namespace llvm

@Keenuts
Copy link
Contributor Author

Keenuts commented Mar 11, 2024

Hi, thanks for the reviews so far!
I believe we are ready to move forward on our side. Adding @llvm-beanz for the HLSL part.
The builtin I added is mostly to get something we can generate those intrinsics for. I am fine changing the name, or the implementation around it, it's just to have a wave an easy intrinsic to use/test with (no input, simplest one)

Copy link

github-actions bot commented Mar 12, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

Copy link
Member

@sudonatalie sudonatalie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM in terms of how this affects the SPIR-V backend, but if others who've reviewed here could +1 when ready, that would be good feedback for the other parts of the codebase that this touches before merging.

@Keenuts Keenuts force-pushed the generate-convergence-frontent branch from 4951e53 to 4d407ce Compare March 25, 2024 10:30
@Keenuts
Copy link
Contributor Author

Keenuts commented Mar 25, 2024

Rebases on main (almost, HEAD is slightly broken), and added back the convergence attribute.
The backend changes are ready for this intrinsic.

Copy link

✅ With the latest revision this PR passed the Python code formatter.

@Keenuts
Copy link
Contributor Author

Keenuts commented Mar 26, 2024

@arsenm would you be fine with those codegen changes as-is? Given that the convergent/no-convergent switch will be done later, depending on when the required IR change is merged?

@arsenm
Copy link
Contributor

arsenm commented Mar 26, 2024

@arsenm would you be fine with those codegen changes as-is? Given that the convergent/no-convergent switch will be done later, depending on when the required IR change is merged?

Yes

@Keenuts
Copy link
Contributor Author

Keenuts commented Mar 26, 2024

@ssahasra it is up to you then 😊

Copy link
Collaborator

@llvm-beanz llvm-beanz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM too.

Copy link
Collaborator

@ssahasra ssahasra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, with a few nits.

For the record, I did not have the bandwidth to comment on the translation from HLSL to SPIR-V. But given that the generated IR passes the verifier, this seems like a correct and very interesting first use of convergence control.

@Keenuts
Copy link
Contributor Author

Keenuts commented Mar 28, 2024

Thanks all the the reviews! We have 3 LGTMs and an ack from Arsenm, so I'm going to rebase on main, wait for the bots & tests, and if all is green, merge this.

Keenuts added 9 commits March 28, 2024 14:11
HLSL has wave operations and other kind of function which required the
control flow to either be converged, or respect certain constraints as
where and how to re-converge.

At the HLSL level, the convergence are mostly obvious: the control flow
is expected to re-converge at the end of a scope.
Once translated to IR, HLSL scopes disapear. This means we need a way to
communicate convergence restrictions down to the backend.

For this, the SPIR-V backend uses convergence intrinsics. So this commit
adds some code to generate convergence intrinsics when required.

This commit is not to be submitted as-is (lacks testing), but
should serve as a basis for an upcoming RFC.

Signed-off-by: Nathan Gauër <[email protected]>
Signed-off-by: Nathan Gauër <[email protected]>
Signed-off-by: Nathan Gauër <[email protected]>
This reverts commit bc6fd04b73a195981ee77823cf1382d04ab96c44.
Signed-off-by: Nathan Gauër <[email protected]>
@Keenuts Keenuts force-pushed the generate-convergence-frontent branch from 98b4ee8 to cffb7d8 Compare March 28, 2024 13:20
@Keenuts
Copy link
Contributor Author

Keenuts commented Mar 28, 2024

Local tests for SPIR-V & DXIL pass

@Keenuts Keenuts merged commit 0f61051 into llvm:main Mar 28, 2024
@Keenuts Keenuts deleted the generate-convergence-frontent branch March 28, 2024 16:18
@@ -5803,6 +5887,15 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
{NDRange, Kernel, Block}));
}

case Builtin::BI__builtin_hlsl_wave_get_lane_index: {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed this PR. Can you move this to Value *CodeGenFunction::EmitHLSLBuiltinExpr(unsigned BuiltinID, const CallExpr *E)

Keenuts added a commit to Keenuts/llvm-project that referenced this pull request Apr 16, 2024
PR llvm#80680 added bits in the codegen to lazily add convergence intrinsics
when required. This logic relied on the LoopStack. The issue is
when parsing the condition, the loopstack doesn't yet reflect the
correct values, as expected since we are not yet in the loop.

However, convergence tokens should sometimes already be available.
The solution which seemed the simplest is to greedily generate the
tokens when we generate SPIR-V.

Fixes llvm#88144

Signed-off-by: Nathan Gauër <[email protected]>
Keenuts added a commit to Keenuts/llvm-project that referenced this pull request May 14, 2024
PR llvm#80680 added bits in the codegen to lazily add convergence intrinsics
when required. This logic relied on the LoopStack. The issue is
when parsing the condition, the loopstack doesn't yet reflect the
correct values, as expected since we are not yet in the loop.

However, convergence tokens should sometimes already be available.
The solution which seemed the simplest is to greedily generate the
tokens when we generate SPIR-V.

Fixes llvm#88144

Signed-off-by: Nathan Gauër <[email protected]>
Keenuts added a commit that referenced this pull request May 14, 2024
PR #80680 added bits in the codegen to lazily add convergence intrinsics
when required. This logic relied on the LoopStack. The issue is when
parsing the condition, the loopstack doesn't yet reflect the correct
values, as expected since we are not yet in the loop.

However, convergence tokens should sometimes already be available. The
solution which seemed the simplest is to greedily generate the tokens
when we generate SPIR-V.

Fixes #88144

---------

Signed-off-by: Nathan Gauër <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:X86 clang:codegen IR generation bugs: mangling, exceptions, etc. clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:headers Headers provided by Clang, e.g. for intrinsics clang Clang issues not falling into any other category HLSL HLSL Language Support llvm:ir
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants