[NVPTX] Add support for nvvm.flo.[us] intrinsics #114489

AlexMaclean · 2024-10-31T23:24:18Z

Add support for 'llvm.nvvm.flo.[su].*' intrinsics which correspond to a PTX bfind instruction. See PTX ISA 9.7.1.16. Integer Arithmetic Instructions: bfind

The 'llvm.nvvm.flo.u' family of intrinsics identifies the bit position of the leading one, returning either it's offset from the most or least significant bit.

The 'llvm.nvvm.flo.s' family of intrinsics identifies the bit position of the leading non-sign bit, returning either it's offset from the most or least significant bit.

llvmbot · 2024-10-31T23:24:50Z

@llvm/pr-subscribers-llvm-ir

@llvm/pr-subscribers-backend-nvptx

Author: Alex MacLean (AlexMaclean)

Changes

Add support for 'llvm.nvvm.flo.[su].*' intrinsics which correspond to a PTX bfind instruction. See PTX ISA 9.7.1.16. Integer Arithmetic Instructions: bfind

The 'llvm.nvvm.flo.u' family of intrinsics identifies the bit position of the leading one, returning either it's offset from the most or least significant bit.

The 'llvm.nvvm.flo.s' family of intrinsics identifies the bit position of the leading non-sign bit, returning either it's offset from the most or least significant bit.

Full diff: https://github.com/llvm/llvm-project/pull/114489.diff

4 Files Affected:

(modified) llvm/docs/NVPTXUsage.rst (+53)
(modified) llvm/include/llvm/IR/IntrinsicsNVVM.td (+8)
(modified) llvm/lib/Target/NVPTX/NVPTXIntrinsics.td (+32)
(added) llvm/test/CodeGen/NVPTX/flo.ll (+132)

diff --git a/llvm/docs/NVPTXUsage.rst b/llvm/docs/NVPTXUsage.rst
index b7097308f6e890..f225b9e8bd268b 100644
--- a/llvm/docs/NVPTXUsage.rst
+++ b/llvm/docs/NVPTXUsage.rst
@@ -378,6 +378,59 @@ right, and the least significant bits are extracted to produce a result that is
 the same size as the original arguments. The shift amount is the minimum of the
 value of %n and the bit width of the integer type.
 
+'``llvm.nvvm.flo.u.*``' Intrinsic
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+
+.. code-block:: llvm
+
+    declare i32 @llvm.nvvm.flo.u.i32(i32 %a, i1 %shiftamt)
+    declare i32 @llvm.nvvm.flo.u.i64(i64 %a, i1 %shiftamt)
+
+Overview:
+"""""""""
+
+The '``llvm.nvvm.flo.u``' family of intrinsics identifies the bit position of the
+leading one, returning either it's offset from the most or least significant bit.
+
+Semantics:
+""""""""""
+
+The '``llvm.nvvm.flo.u``' family of intrinsics returns the bit position of the
+most significant 1. If %shiftamt is true, The result is the shift amount needed
+to left-shift the found bit into the most-significant bit position, otherwise
+the result is the shift amount needed to right-shift the found bit into the
+least-significant bit position. 0xffffffff is returned if no 1 bit is found.
+
+'``llvm.nvvm.flo.s.*``' Intrinsic
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+
+.. code-block:: llvm
+
+    declare i32 @llvm.nvvm.flo.s.i32(i32 %a, i1 %shiftamt)
+    declare i32 @llvm.nvvm.flo.s.i64(i64 %a, i1 %shiftamt)
+
+Overview:
+"""""""""
+
+The '``llvm.nvvm.flo.s``' family of intrinsics identifies the bit position of the
+leading non-sign bit, returning either it's offset from the most or least
+significant bit.
+
+Semantics:
+""""""""""
+
+The '``llvm.nvvm.flo.s``' family of intrinsics returns the bit position of the
+most significant 0 for negative inputs and the most significant 1 for 
+non-negative inputs. If %shiftamt is true, The result is the shift amount needed
+to left-shift the found bit into the most-significant bit position, otherwise
+the result is the shift amount needed to right-shift the found bit into the
+least-significant bit position. 0xffffffff is returned if no 1 bit is found.
 
 Other Intrinsics
 ----------------
diff --git a/llvm/include/llvm/IR/IntrinsicsNVVM.td b/llvm/include/llvm/IR/IntrinsicsNVVM.td
index 5164f873d00f48..fd0cbed8b25661 100644
--- a/llvm/include/llvm/IR/IntrinsicsNVVM.td
+++ b/llvm/include/llvm/IR/IntrinsicsNVVM.td
@@ -1092,6 +1092,14 @@ let TargetPrefix = "nvvm" in {
         [LLVMMatchType<0>, LLVMMatchType<0>, LLVMMatchType<0>],
         [IntrNoMem, IntrSpeculatable, IntrWillReturn]>;
 
+//
+// FLO - Find Leading One
+//
+  foreach sign = ["s", "u"] in
+    def int_nvvm_flo_ # sign :
+      DefaultAttrsIntrinsic<[llvm_i32_ty],
+        [llvm_anyint_ty, llvm_i1_ty],
+        [IntrNoMem, IntrSpeculatable, IntrWillReturn, ImmArg<ArgIndex<1>>]>;
 
 //
 // Convert
diff --git a/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td b/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
index f5ac3c4e964363..1f4938d9fcf5a5 100644
--- a/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
+++ b/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
@@ -6,6 +6,19 @@
 //
 //===----------------------------------------------------------------------===//
 
+// Utility class to wrap up information about a register and DAG type for more
+// convenient iteration and parameterization
+class RegTyInfo<ValueType ty, NVPTXRegClass rc, Operand imm> {
+  ValueType Ty = ty;
+  NVPTXRegClass RC = rc;
+  Operand Imm = imm;
+  int Size = ty.Size;
+}
+
+def I32RT : RegTyInfo<i32, Int32Regs, i32imm>;
+def I64RT : RegTyInfo<i64, Int64Regs, i64imm>;
+
+
 def immFloat0 : PatLeaf<(fpimm), [{
     float f = (float)N->getValueAPF().convertToFloat();
     return (f==0.0f);
@@ -1299,6 +1312,25 @@ def INT_NVVM_ADD_RM_D : F_MATH_2<"add.rm.f64 \t$dst, $src0, $src1;",
 def INT_NVVM_ADD_RP_D : F_MATH_2<"add.rp.f64 \t$dst, $src0, $src1;",
   Float64Regs, Float64Regs, Float64Regs, int_nvvm_add_rp_d>;
 
+//
+// BFIND
+//
+
+foreach t = [I32RT, I64RT] in {
+  foreach sign = ["s", "u"] in {
+    defvar flo_intrin = !cast<Intrinsic>("int_nvvm_flo_" # sign);
+    def BFIND_ # sign # t.Size
+      : NVPTXInst<(outs Int32Regs:$dst), (ins t.RC:$src),
+                  "bfind." # sign # t.Size # " \t$dst, $src;",
+                  [(set (i32 Int32Regs:$dst), (flo_intrin (t.Ty t.RC:$src), 0))]>;
+
+    def BFIND_SHIFTAMT_ # sign # t.Size
+      : NVPTXInst<(outs Int32Regs:$dst), (ins t.RC:$src),
+                  "bfind.shiftamt." # sign # t.Size # " \t$dst, $src;",
+                  [(set (i32 Int32Regs:$dst), (flo_intrin (t.Ty t.RC:$src), -1))]>;
+  }
+}
+
 //
 // Convert
 //
diff --git a/llvm/test/CodeGen/NVPTX/flo.ll b/llvm/test/CodeGen/NVPTX/flo.ll
new file mode 100644
index 00000000000000..bc7f765e40ab43
--- /dev/null
+++ b/llvm/test/CodeGen/NVPTX/flo.ll
@@ -0,0 +1,132 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc < %s | FileCheck %s
+; RUN: %if ptxas %{ llc < %s | %ptxas-verify %}
+
+target triple = "nvptx64-nvidia-cuda"
+
+define i32 @flo_1(i32 %a) {
+; CHECK-LABEL: flo_1(
+; CHECK:       {
+; CHECK-NEXT:    .reg .b32 %r<3>;
+; CHECK-EMPTY:
+; CHECK-NEXT:  // %bb.0:
+; CHECK-NEXT:    ld.param.u32 %r1, [flo_1_param_0];
+; CHECK-NEXT:    bfind.s32 %r2, %r1;
+; CHECK-NEXT:    st.param.b32 [func_retval0], %r2;
+; CHECK-NEXT:    ret;
+  %r = call i32 @llvm.nvvm.flo.s.i32(i32 %a, i1 false)
+  ret i32 %r
+}
+
+
+define i32 @flo_2(i32 %a) {
+; CHECK-LABEL: flo_2(
+; CHECK:       {
+; CHECK-NEXT:    .reg .b32 %r<3>;
+; CHECK-EMPTY:
+; CHECK-NEXT:  // %bb.0:
+; CHECK-NEXT:    ld.param.u32 %r1, [flo_2_param_0];
+; CHECK-NEXT:    bfind.shiftamt.s32 %r2, %r1;
+; CHECK-NEXT:    st.param.b32 [func_retval0], %r2;
+; CHECK-NEXT:    ret;
+  %r = call i32 @llvm.nvvm.flo.s.i32(i32 %a, i1 true)
+  ret i32 %r
+}
+
+define i32 @flo_3(i32 %a) {
+; CHECK-LABEL: flo_3(
+; CHECK:       {
+; CHECK-NEXT:    .reg .b32 %r<3>;
+; CHECK-EMPTY:
+; CHECK-NEXT:  // %bb.0:
+; CHECK-NEXT:    ld.param.u32 %r1, [flo_3_param_0];
+; CHECK-NEXT:    bfind.u32 %r2, %r1;
+; CHECK-NEXT:    st.param.b32 [func_retval0], %r2;
+; CHECK-NEXT:    ret;
+  %r = call i32 @llvm.nvvm.flo.u.i32(i32 %a, i1 false)
+  ret i32 %r
+}
+
+
+define i32 @flo_4(i32 %a) {
+; CHECK-LABEL: flo_4(
+; CHECK:       {
+; CHECK-NEXT:    .reg .b32 %r<3>;
+; CHECK-EMPTY:
+; CHECK-NEXT:  // %bb.0:
+; CHECK-NEXT:    ld.param.u32 %r1, [flo_4_param_0];
+; CHECK-NEXT:    bfind.shiftamt.u32 %r2, %r1;
+; CHECK-NEXT:    st.param.b32 [func_retval0], %r2;
+; CHECK-NEXT:    ret;
+  %r = call i32 @llvm.nvvm.flo.u.i32(i32 %a, i1 true)
+  ret i32 %r
+}
+
+
+
+define i32 @flo_5(i64 %a) {
+; CHECK-LABEL: flo_5(
+; CHECK:       {
+; CHECK-NEXT:    .reg .b32 %r<2>;
+; CHECK-NEXT:    .reg .b64 %rd<2>;
+; CHECK-EMPTY:
+; CHECK-NEXT:  // %bb.0:
+; CHECK-NEXT:    ld.param.u64 %rd1, [flo_5_param_0];
+; CHECK-NEXT:    bfind.s64 %r1, %rd1;
+; CHECK-NEXT:    st.param.b32 [func_retval0], %r1;
+; CHECK-NEXT:    ret;
+  %r = call i32 @llvm.nvvm.flo.s.i64(i64 %a, i1 false)
+  ret i32 %r
+}
+
+
+define i32 @flo_6(i64 %a) {
+; CHECK-LABEL: flo_6(
+; CHECK:       {
+; CHECK-NEXT:    .reg .b32 %r<2>;
+; CHECK-NEXT:    .reg .b64 %rd<2>;
+; CHECK-EMPTY:
+; CHECK-NEXT:  // %bb.0:
+; CHECK-NEXT:    ld.param.u64 %rd1, [flo_6_param_0];
+; CHECK-NEXT:    bfind.shiftamt.s64 %r1, %rd1;
+; CHECK-NEXT:    st.param.b32 [func_retval0], %r1;
+; CHECK-NEXT:    ret;
+  %r = call i32 @llvm.nvvm.flo.s.i64(i64 %a, i1 true)
+  ret i32 %r
+}
+
+define i32 @flo_7(i64 %a) {
+; CHECK-LABEL: flo_7(
+; CHECK:       {
+; CHECK-NEXT:    .reg .b32 %r<2>;
+; CHECK-NEXT:    .reg .b64 %rd<2>;
+; CHECK-EMPTY:
+; CHECK-NEXT:  // %bb.0:
+; CHECK-NEXT:    ld.param.u64 %rd1, [flo_7_param_0];
+; CHECK-NEXT:    bfind.u64 %r1, %rd1;
+; CHECK-NEXT:    st.param.b32 [func_retval0], %r1;
+; CHECK-NEXT:    ret;
+  %r = call i32 @llvm.nvvm.flo.u.i64(i64 %a, i1 false)
+  ret i32 %r
+}
+
+
+define i32 @flo_8(i64 %a) {
+; CHECK-LABEL: flo_8(
+; CHECK:       {
+; CHECK-NEXT:    .reg .b32 %r<2>;
+; CHECK-NEXT:    .reg .b64 %rd<2>;
+; CHECK-EMPTY:
+; CHECK-NEXT:  // %bb.0:
+; CHECK-NEXT:    ld.param.u64 %rd1, [flo_8_param_0];
+; CHECK-NEXT:    bfind.shiftamt.u64 %r1, %rd1;
+; CHECK-NEXT:    st.param.b32 [func_retval0], %r1;
+; CHECK-NEXT:    ret;
+  %r = call i32 @llvm.nvvm.flo.u.i64(i64 %a, i1 true)
+  ret i32 %r
+}
+
+declare i32 @llvm.nvvm.flo.s.i32(i32, i1)
+declare i32 @llvm.nvvm.flo.u.i32(i32, i1)
+declare i32 @llvm.nvvm.flo.s.i64(i64, i1)
+declare i32 @llvm.nvvm.flo.u.i64(i64, i1)

Artem-B · 2024-11-01T17:22:46Z

These intrinsics look like a solution in search of a problem. The instruction has been present in PTX ~ forever, but I've only learned about it today and I can't think of a single case where I would wish for something like llvm.nvvm.flo, nor did any LLVM/NVPTX users ever asked me for them.

So, my first question is -- do we really need them? If so, why?

If we do need them, is there any benefit in using the instructions? In case where there's no direct h/w support for the functionality, it may be better to expand to LLVM IR and let LLVM optimize that. To me it looks like a combination of logical ops and llvm.ctlz might just do the job.

AlexMaclean · 2024-11-01T20:24:30Z

These intrinsics look like a solution in search of a problem. The instruction has been present in PTX ~ forever, but I've only learned about it today and I can't think of a single case where I would wish for something like llvm.nvvm.flo, nor did any LLVM/NVPTX users ever asked me for them.

So, my first question is -- do we really need them? If so, why?

If we do need them, is there any benefit in using the instructions? In case where there's no direct h/w support for the functionality, it may be better to expand to LLVM IR and let LLVM optimize that. To me it looks like a combination of logical ops and llvm.ctlz might just do the job.

Here is a small example of what the PTX for this instruction will be lowered to according to godbolt: https://godbolt.org/z/4crzY67oo

While this operation could be simulated with other existing generic instructions, it would be quite complex, especially for the signed case. The expansion could then be transformed by optimizations in various ways, making it difficult for NVPTX ISel or ptxas to fold back to this instruction.

While I agree it is not the most commonly used operation, there are cases where this intrinsic is right for the job and where using it can improve performance. A quick search of github did yield some cases where people have fallen back to inline asm: https://github.com/search?type=code&auto_enroll=true&q=%28%22bfind.. There were also internal use cases which motivate the introduction of this intrinsic, though I'm not sure I can elaborate.

Artem-B · 2024-11-01T21:03:08Z

Here is a small example of what the PTX for this instruction will be lowered to according to godbolt: https://godbolt.org/z/4crzY67oo

Fascinating. OK, given that there's h/w support for exactly that instruction, I see no problem providing an intrinsic for it.

llvm-ci · 2024-11-02T01:28:40Z

LLVM Buildbot has detected a new failure on builder sanitizer-x86_64-linux-fast running on sanitizer-buildbot4 while building llvm at step 2 "annotate".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/169/builds/4883

Here is the relevant piece of the build log for the reference

Step 2 (annotate) failure: 'python ../sanitizer_buildbot/sanitizers/zorg/buildbot/builders/sanitizers/buildbot_selector.py' (failure)
...
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using ld64.lld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using wasm-ld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using ld.lld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/ld.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using lld-link: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/lld-link
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using ld64.lld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using wasm-ld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/main.py:72: note: The test suite configuration requested an individual test timeout of 0 seconds but a timeout of 900 seconds was requested on the command line. Forcing timeout to be 900 seconds.
-- Testing: 86836 of 86837 tests, 88 workers --
Testing:  0.. 10.. 20.. 30.. 40.
TIMEOUT: Clangd Unit Tests :: ./ClangdTests/25/158 (2959 of 86836)
******************** TEST 'Clangd Unit Tests :: ./ClangdTests/25/158' FAILED ********************
Script(shard):
--
GTEST_OUTPUT=json:/home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/tools/clang/tools/extra/clangd/unittests/./ClangdTests-Clangd Unit Tests-1112544-25-158.json GTEST_SHUFFLE=0 GTEST_TOTAL_SHARDS=158 GTEST_SHARD_INDEX=25 /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/tools/clang/tools/extra/clangd/unittests/./ClangdTests
--

Note: This is test shard 26 of 158.
[==========] Running 8 tests from 8 test suites.
[----------] Global test environment set-up.
[----------] 1 test from BackgroundQueueTest
[ RUN      ] BackgroundQueueTest.Boost
[       OK ] BackgroundQueueTest.Boost (0 ms)
[----------] 1 test from BackgroundQueueTest (0 ms total)

[----------] 1 test from CompletionTest
[ RUN      ] CompletionTest.NamespaceDoubleInsertion
Built preamble of size 215784 for file /clangd-test/foo.cpp version null in 3.06 seconds
Ignored diagnostic. /clangd-test/foo.cpp:5:5:expected unqualified-id
Sema said no scope specifier, but we saw ns:: in the source code
Code complete: fuzzyFind({
  "AnyScope": false,
  "Limit": null,
  "PreferredTypes": [],
  "ProximityPaths": [
    "/clangd-test/foo.cpp"
  ],
  "Query": "ABC",
  "RestrictForCodeCompletion": true,
  "Scopes": [
    "foo::ns::",
    "ns::"
  ]
})
Code complete: sema context TopLevel, query scopes [foo::ns::,ns::] (AnyScope=false), expected type <none>
Code complete: 0 results from Sema, 1 from Index, 0 matched, 0 from identifiers, 1 returned.
[       OK ] CompletionTest.NamespaceDoubleInsertion (4337 ms)
[----------] 1 test from CompletionTest (4337 ms total)

[----------] 1 test from DexTest
Step 10 (stage2/asan_ubsan check) failure: stage2/asan_ubsan check (failure)
...
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using ld64.lld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using wasm-ld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using ld.lld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/ld.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using lld-link: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/lld-link
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using ld64.lld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using wasm-ld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/main.py:72: note: The test suite configuration requested an individual test timeout of 0 seconds but a timeout of 900 seconds was requested on the command line. Forcing timeout to be 900 seconds.
-- Testing: 86836 of 86837 tests, 88 workers --
Testing:  0.. 10.. 20.. 30.. 40.
TIMEOUT: Clangd Unit Tests :: ./ClangdTests/25/158 (2959 of 86836)
******************** TEST 'Clangd Unit Tests :: ./ClangdTests/25/158' FAILED ********************
Script(shard):
--
GTEST_OUTPUT=json:/home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/tools/clang/tools/extra/clangd/unittests/./ClangdTests-Clangd Unit Tests-1112544-25-158.json GTEST_SHUFFLE=0 GTEST_TOTAL_SHARDS=158 GTEST_SHARD_INDEX=25 /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/tools/clang/tools/extra/clangd/unittests/./ClangdTests
--

Note: This is test shard 26 of 158.
[==========] Running 8 tests from 8 test suites.
[----------] Global test environment set-up.
[----------] 1 test from BackgroundQueueTest
[ RUN      ] BackgroundQueueTest.Boost
[       OK ] BackgroundQueueTest.Boost (0 ms)
[----------] 1 test from BackgroundQueueTest (0 ms total)

[----------] 1 test from CompletionTest
[ RUN      ] CompletionTest.NamespaceDoubleInsertion
Built preamble of size 215784 for file /clangd-test/foo.cpp version null in 3.06 seconds
Ignored diagnostic. /clangd-test/foo.cpp:5:5:expected unqualified-id
Sema said no scope specifier, but we saw ns:: in the source code
Code complete: fuzzyFind({
  "AnyScope": false,
  "Limit": null,
  "PreferredTypes": [],
  "ProximityPaths": [
    "/clangd-test/foo.cpp"
  ],
  "Query": "ABC",
  "RestrictForCodeCompletion": true,
  "Scopes": [
    "foo::ns::",
    "ns::"
  ]
})
Code complete: sema context TopLevel, query scopes [foo::ns::,ns::] (AnyScope=false), expected type <none>
Code complete: 0 results from Sema, 1 from Index, 0 matched, 0 from identifiers, 1 returned.
[       OK ] CompletionTest.NamespaceDoubleInsertion (4337 ms)
[----------] 1 test from CompletionTest (4337 ms total)

[----------] 1 test from DexTest

Add support for '`llvm.nvvm.flo.[su].*`' intrinsics which correspond to a PTX `bfind` instruction. See [PTX ISA 9.7.1.16. Integer Arithmetic Instructions: bfind] (https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#integer-arithmetic-instructions-bfind) The '`llvm.nvvm.flo.u`' family of intrinsics identifies the bit position of the leading one, returning either it's offset from the most or least significant bit. The '`llvm.nvvm.flo.s`' family of intrinsics identifies the bit position of the leading non-sign bit, returning either it's offset from the most or least significant bit.

[NVPTX] Add support for nvvm.flo.[us] intrinsics

6c866ec

AlexMaclean requested review from Artem-B and justinfargnoli October 31, 2024 23:24

AlexMaclean self-assigned this Oct 31, 2024

llvmbot added backend:NVPTX llvm:ir labels Oct 31, 2024

justinfargnoli approved these changes Nov 1, 2024

View reviewed changes

Artem-B approved these changes Nov 1, 2024

View reviewed changes

AlexMaclean merged commit 8ff60c4 into llvm:main Nov 1, 2024
10 of 12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[NVPTX] Add support for nvvm.flo.[us] intrinsics #114489

[NVPTX] Add support for nvvm.flo.[us] intrinsics #114489

Uh oh!

AlexMaclean commented Oct 31, 2024

Uh oh!

llvmbot commented Oct 31, 2024 •

edited

Loading

Uh oh!

Artem-B commented Nov 1, 2024

Uh oh!

AlexMaclean commented Nov 1, 2024

Uh oh!

Artem-B commented Nov 1, 2024

Uh oh!

Uh oh!

llvm-ci commented Nov 2, 2024

Uh oh!

Uh oh!

[NVPTX] Add support for nvvm.flo.[us] intrinsics #114489

[NVPTX] Add support for nvvm.flo.[us] intrinsics #114489

Uh oh!

Conversation

AlexMaclean commented Oct 31, 2024

Uh oh!

llvmbot commented Oct 31, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Artem-B commented Nov 1, 2024

Uh oh!

AlexMaclean commented Nov 1, 2024

Uh oh!

Artem-B commented Nov 1, 2024

Uh oh!

Uh oh!

llvm-ci commented Nov 2, 2024

Uh oh!

Uh oh!

llvmbot commented Oct 31, 2024 •

edited

Loading