Skip to content

[NVPTX] Add support for nvvm.flo.[us] intrinsics #114489

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 1, 2024

Conversation

AlexMaclean
Copy link
Member

Add support for 'llvm.nvvm.flo.[su].*' intrinsics which correspond to a PTX bfind instruction. See PTX ISA 9.7.1.16. Integer Arithmetic Instructions: bfind

The 'llvm.nvvm.flo.u' family of intrinsics identifies the bit position of the leading one, returning either it's offset from the most or least significant bit.

The 'llvm.nvvm.flo.s' family of intrinsics identifies the bit position of the leading non-sign bit, returning either it's offset from the most or least significant bit.

@llvmbot
Copy link
Member

llvmbot commented Oct 31, 2024

@llvm/pr-subscribers-llvm-ir

@llvm/pr-subscribers-backend-nvptx

Author: Alex MacLean (AlexMaclean)

Changes

Add support for 'llvm.nvvm.flo.[su].*' intrinsics which correspond to a PTX bfind instruction. See PTX ISA 9.7.1.16. Integer Arithmetic Instructions: bfind

The 'llvm.nvvm.flo.u' family of intrinsics identifies the bit position of the leading one, returning either it's offset from the most or least significant bit.

The 'llvm.nvvm.flo.s' family of intrinsics identifies the bit position of the leading non-sign bit, returning either it's offset from the most or least significant bit.


Full diff: https://github.com/llvm/llvm-project/pull/114489.diff

4 Files Affected:

  • (modified) llvm/docs/NVPTXUsage.rst (+53)
  • (modified) llvm/include/llvm/IR/IntrinsicsNVVM.td (+8)
  • (modified) llvm/lib/Target/NVPTX/NVPTXIntrinsics.td (+32)
  • (added) llvm/test/CodeGen/NVPTX/flo.ll (+132)
diff --git a/llvm/docs/NVPTXUsage.rst b/llvm/docs/NVPTXUsage.rst
index b7097308f6e890..f225b9e8bd268b 100644
--- a/llvm/docs/NVPTXUsage.rst
+++ b/llvm/docs/NVPTXUsage.rst
@@ -378,6 +378,59 @@ right, and the least significant bits are extracted to produce a result that is
 the same size as the original arguments. The shift amount is the minimum of the
 value of %n and the bit width of the integer type.
 
+'``llvm.nvvm.flo.u.*``' Intrinsic
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+
+.. code-block:: llvm
+
+    declare i32 @llvm.nvvm.flo.u.i32(i32 %a, i1 %shiftamt)
+    declare i32 @llvm.nvvm.flo.u.i64(i64 %a, i1 %shiftamt)
+
+Overview:
+"""""""""
+
+The '``llvm.nvvm.flo.u``' family of intrinsics identifies the bit position of the
+leading one, returning either it's offset from the most or least significant bit.
+
+Semantics:
+""""""""""
+
+The '``llvm.nvvm.flo.u``' family of intrinsics returns the bit position of the
+most significant 1. If %shiftamt is true, The result is the shift amount needed
+to left-shift the found bit into the most-significant bit position, otherwise
+the result is the shift amount needed to right-shift the found bit into the
+least-significant bit position. 0xffffffff is returned if no 1 bit is found.
+
+'``llvm.nvvm.flo.s.*``' Intrinsic
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+
+.. code-block:: llvm
+
+    declare i32 @llvm.nvvm.flo.s.i32(i32 %a, i1 %shiftamt)
+    declare i32 @llvm.nvvm.flo.s.i64(i64 %a, i1 %shiftamt)
+
+Overview:
+"""""""""
+
+The '``llvm.nvvm.flo.s``' family of intrinsics identifies the bit position of the
+leading non-sign bit, returning either it's offset from the most or least
+significant bit.
+
+Semantics:
+""""""""""
+
+The '``llvm.nvvm.flo.s``' family of intrinsics returns the bit position of the
+most significant 0 for negative inputs and the most significant 1 for 
+non-negative inputs. If %shiftamt is true, The result is the shift amount needed
+to left-shift the found bit into the most-significant bit position, otherwise
+the result is the shift amount needed to right-shift the found bit into the
+least-significant bit position. 0xffffffff is returned if no 1 bit is found.
 
 Other Intrinsics
 ----------------
diff --git a/llvm/include/llvm/IR/IntrinsicsNVVM.td b/llvm/include/llvm/IR/IntrinsicsNVVM.td
index 5164f873d00f48..fd0cbed8b25661 100644
--- a/llvm/include/llvm/IR/IntrinsicsNVVM.td
+++ b/llvm/include/llvm/IR/IntrinsicsNVVM.td
@@ -1092,6 +1092,14 @@ let TargetPrefix = "nvvm" in {
         [LLVMMatchType<0>, LLVMMatchType<0>, LLVMMatchType<0>],
         [IntrNoMem, IntrSpeculatable, IntrWillReturn]>;
 
+//
+// FLO - Find Leading One
+//
+  foreach sign = ["s", "u"] in
+    def int_nvvm_flo_ # sign :
+      DefaultAttrsIntrinsic<[llvm_i32_ty],
+        [llvm_anyint_ty, llvm_i1_ty],
+        [IntrNoMem, IntrSpeculatable, IntrWillReturn, ImmArg<ArgIndex<1>>]>;
 
 //
 // Convert
diff --git a/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td b/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
index f5ac3c4e964363..1f4938d9fcf5a5 100644
--- a/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
+++ b/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
@@ -6,6 +6,19 @@
 //
 //===----------------------------------------------------------------------===//
 
+// Utility class to wrap up information about a register and DAG type for more
+// convenient iteration and parameterization
+class RegTyInfo<ValueType ty, NVPTXRegClass rc, Operand imm> {
+  ValueType Ty = ty;
+  NVPTXRegClass RC = rc;
+  Operand Imm = imm;
+  int Size = ty.Size;
+}
+
+def I32RT : RegTyInfo<i32, Int32Regs, i32imm>;
+def I64RT : RegTyInfo<i64, Int64Regs, i64imm>;
+
+
 def immFloat0 : PatLeaf<(fpimm), [{
     float f = (float)N->getValueAPF().convertToFloat();
     return (f==0.0f);
@@ -1299,6 +1312,25 @@ def INT_NVVM_ADD_RM_D : F_MATH_2<"add.rm.f64 \t$dst, $src0, $src1;",
 def INT_NVVM_ADD_RP_D : F_MATH_2<"add.rp.f64 \t$dst, $src0, $src1;",
   Float64Regs, Float64Regs, Float64Regs, int_nvvm_add_rp_d>;
 
+//
+// BFIND
+//
+
+foreach t = [I32RT, I64RT] in {
+  foreach sign = ["s", "u"] in {
+    defvar flo_intrin = !cast<Intrinsic>("int_nvvm_flo_" # sign);
+    def BFIND_ # sign # t.Size
+      : NVPTXInst<(outs Int32Regs:$dst), (ins t.RC:$src),
+                  "bfind." # sign # t.Size # " \t$dst, $src;",
+                  [(set (i32 Int32Regs:$dst), (flo_intrin (t.Ty t.RC:$src), 0))]>;
+
+    def BFIND_SHIFTAMT_ # sign # t.Size
+      : NVPTXInst<(outs Int32Regs:$dst), (ins t.RC:$src),
+                  "bfind.shiftamt." # sign # t.Size # " \t$dst, $src;",
+                  [(set (i32 Int32Regs:$dst), (flo_intrin (t.Ty t.RC:$src), -1))]>;
+  }
+}
+
 //
 // Convert
 //
diff --git a/llvm/test/CodeGen/NVPTX/flo.ll b/llvm/test/CodeGen/NVPTX/flo.ll
new file mode 100644
index 00000000000000..bc7f765e40ab43
--- /dev/null
+++ b/llvm/test/CodeGen/NVPTX/flo.ll
@@ -0,0 +1,132 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc < %s | FileCheck %s
+; RUN: %if ptxas %{ llc < %s | %ptxas-verify %}
+
+target triple = "nvptx64-nvidia-cuda"
+
+define i32 @flo_1(i32 %a) {
+; CHECK-LABEL: flo_1(
+; CHECK:       {
+; CHECK-NEXT:    .reg .b32 %r<3>;
+; CHECK-EMPTY:
+; CHECK-NEXT:  // %bb.0:
+; CHECK-NEXT:    ld.param.u32 %r1, [flo_1_param_0];
+; CHECK-NEXT:    bfind.s32 %r2, %r1;
+; CHECK-NEXT:    st.param.b32 [func_retval0], %r2;
+; CHECK-NEXT:    ret;
+  %r = call i32 @llvm.nvvm.flo.s.i32(i32 %a, i1 false)
+  ret i32 %r
+}
+
+
+define i32 @flo_2(i32 %a) {
+; CHECK-LABEL: flo_2(
+; CHECK:       {
+; CHECK-NEXT:    .reg .b32 %r<3>;
+; CHECK-EMPTY:
+; CHECK-NEXT:  // %bb.0:
+; CHECK-NEXT:    ld.param.u32 %r1, [flo_2_param_0];
+; CHECK-NEXT:    bfind.shiftamt.s32 %r2, %r1;
+; CHECK-NEXT:    st.param.b32 [func_retval0], %r2;
+; CHECK-NEXT:    ret;
+  %r = call i32 @llvm.nvvm.flo.s.i32(i32 %a, i1 true)
+  ret i32 %r
+}
+
+define i32 @flo_3(i32 %a) {
+; CHECK-LABEL: flo_3(
+; CHECK:       {
+; CHECK-NEXT:    .reg .b32 %r<3>;
+; CHECK-EMPTY:
+; CHECK-NEXT:  // %bb.0:
+; CHECK-NEXT:    ld.param.u32 %r1, [flo_3_param_0];
+; CHECK-NEXT:    bfind.u32 %r2, %r1;
+; CHECK-NEXT:    st.param.b32 [func_retval0], %r2;
+; CHECK-NEXT:    ret;
+  %r = call i32 @llvm.nvvm.flo.u.i32(i32 %a, i1 false)
+  ret i32 %r
+}
+
+
+define i32 @flo_4(i32 %a) {
+; CHECK-LABEL: flo_4(
+; CHECK:       {
+; CHECK-NEXT:    .reg .b32 %r<3>;
+; CHECK-EMPTY:
+; CHECK-NEXT:  // %bb.0:
+; CHECK-NEXT:    ld.param.u32 %r1, [flo_4_param_0];
+; CHECK-NEXT:    bfind.shiftamt.u32 %r2, %r1;
+; CHECK-NEXT:    st.param.b32 [func_retval0], %r2;
+; CHECK-NEXT:    ret;
+  %r = call i32 @llvm.nvvm.flo.u.i32(i32 %a, i1 true)
+  ret i32 %r
+}
+
+
+
+define i32 @flo_5(i64 %a) {
+; CHECK-LABEL: flo_5(
+; CHECK:       {
+; CHECK-NEXT:    .reg .b32 %r<2>;
+; CHECK-NEXT:    .reg .b64 %rd<2>;
+; CHECK-EMPTY:
+; CHECK-NEXT:  // %bb.0:
+; CHECK-NEXT:    ld.param.u64 %rd1, [flo_5_param_0];
+; CHECK-NEXT:    bfind.s64 %r1, %rd1;
+; CHECK-NEXT:    st.param.b32 [func_retval0], %r1;
+; CHECK-NEXT:    ret;
+  %r = call i32 @llvm.nvvm.flo.s.i64(i64 %a, i1 false)
+  ret i32 %r
+}
+
+
+define i32 @flo_6(i64 %a) {
+; CHECK-LABEL: flo_6(
+; CHECK:       {
+; CHECK-NEXT:    .reg .b32 %r<2>;
+; CHECK-NEXT:    .reg .b64 %rd<2>;
+; CHECK-EMPTY:
+; CHECK-NEXT:  // %bb.0:
+; CHECK-NEXT:    ld.param.u64 %rd1, [flo_6_param_0];
+; CHECK-NEXT:    bfind.shiftamt.s64 %r1, %rd1;
+; CHECK-NEXT:    st.param.b32 [func_retval0], %r1;
+; CHECK-NEXT:    ret;
+  %r = call i32 @llvm.nvvm.flo.s.i64(i64 %a, i1 true)
+  ret i32 %r
+}
+
+define i32 @flo_7(i64 %a) {
+; CHECK-LABEL: flo_7(
+; CHECK:       {
+; CHECK-NEXT:    .reg .b32 %r<2>;
+; CHECK-NEXT:    .reg .b64 %rd<2>;
+; CHECK-EMPTY:
+; CHECK-NEXT:  // %bb.0:
+; CHECK-NEXT:    ld.param.u64 %rd1, [flo_7_param_0];
+; CHECK-NEXT:    bfind.u64 %r1, %rd1;
+; CHECK-NEXT:    st.param.b32 [func_retval0], %r1;
+; CHECK-NEXT:    ret;
+  %r = call i32 @llvm.nvvm.flo.u.i64(i64 %a, i1 false)
+  ret i32 %r
+}
+
+
+define i32 @flo_8(i64 %a) {
+; CHECK-LABEL: flo_8(
+; CHECK:       {
+; CHECK-NEXT:    .reg .b32 %r<2>;
+; CHECK-NEXT:    .reg .b64 %rd<2>;
+; CHECK-EMPTY:
+; CHECK-NEXT:  // %bb.0:
+; CHECK-NEXT:    ld.param.u64 %rd1, [flo_8_param_0];
+; CHECK-NEXT:    bfind.shiftamt.u64 %r1, %rd1;
+; CHECK-NEXT:    st.param.b32 [func_retval0], %r1;
+; CHECK-NEXT:    ret;
+  %r = call i32 @llvm.nvvm.flo.u.i64(i64 %a, i1 true)
+  ret i32 %r
+}
+
+declare i32 @llvm.nvvm.flo.s.i32(i32, i1)
+declare i32 @llvm.nvvm.flo.u.i32(i32, i1)
+declare i32 @llvm.nvvm.flo.s.i64(i64, i1)
+declare i32 @llvm.nvvm.flo.u.i64(i64, i1)

@Artem-B
Copy link
Member

Artem-B commented Nov 1, 2024

These intrinsics look like a solution in search of a problem. The instruction has been present in PTX ~ forever, but I've only learned about it today and I can't think of a single case where I would wish for something like llvm.nvvm.flo, nor did any LLVM/NVPTX users ever asked me for them.

So, my first question is -- do we really need them? If so, why?

If we do need them, is there any benefit in using the instructions? In case where there's no direct h/w support for the functionality, it may be better to expand to LLVM IR and let LLVM optimize that. To me it looks like a combination of logical ops and llvm.ctlz might just do the job.

@AlexMaclean
Copy link
Member Author

These intrinsics look like a solution in search of a problem. The instruction has been present in PTX ~ forever, but I've only learned about it today and I can't think of a single case where I would wish for something like llvm.nvvm.flo, nor did any LLVM/NVPTX users ever asked me for them.

So, my first question is -- do we really need them? If so, why?

If we do need them, is there any benefit in using the instructions? In case where there's no direct h/w support for the functionality, it may be better to expand to LLVM IR and let LLVM optimize that. To me it looks like a combination of logical ops and llvm.ctlz might just do the job.

Here is a small example of what the PTX for this instruction will be lowered to according to godbolt: https://godbolt.org/z/4crzY67oo

While this operation could be simulated with other existing generic instructions, it would be quite complex, especially for the signed case. The expansion could then be transformed by optimizations in various ways, making it difficult for NVPTX ISel or ptxas to fold back to this instruction.

While I agree it is not the most commonly used operation, there are cases where this intrinsic is right for the job and where using it can improve performance. A quick search of github did yield some cases where people have fallen back to inline asm: https://github.com/search?type=code&auto_enroll=true&q=%28%22bfind.. There were also internal use cases which motivate the introduction of this intrinsic, though I'm not sure I can elaborate.

@Artem-B
Copy link
Member

Artem-B commented Nov 1, 2024

Here is a small example of what the PTX for this instruction will be lowered to according to godbolt: https://godbolt.org/z/4crzY67oo

Fascinating. OK, given that there's h/w support for exactly that instruction, I see no problem providing an intrinsic for it.

@AlexMaclean AlexMaclean merged commit 8ff60c4 into llvm:main Nov 1, 2024
10 of 12 checks passed
@llvm-ci
Copy link
Collaborator

llvm-ci commented Nov 2, 2024

LLVM Buildbot has detected a new failure on builder sanitizer-x86_64-linux-fast running on sanitizer-buildbot4 while building llvm at step 2 "annotate".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/169/builds/4883

Here is the relevant piece of the build log for the reference
Step 2 (annotate) failure: 'python ../sanitizer_buildbot/sanitizers/zorg/buildbot/builders/sanitizers/buildbot_selector.py' (failure)
...
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using ld64.lld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using wasm-ld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using ld.lld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/ld.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using lld-link: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/lld-link
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using ld64.lld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using wasm-ld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/main.py:72: note: The test suite configuration requested an individual test timeout of 0 seconds but a timeout of 900 seconds was requested on the command line. Forcing timeout to be 900 seconds.
-- Testing: 86836 of 86837 tests, 88 workers --
Testing:  0.. 10.. 20.. 30.. 40.
TIMEOUT: Clangd Unit Tests :: ./ClangdTests/25/158 (2959 of 86836)
******************** TEST 'Clangd Unit Tests :: ./ClangdTests/25/158' FAILED ********************
Script(shard):
--
GTEST_OUTPUT=json:/home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/tools/clang/tools/extra/clangd/unittests/./ClangdTests-Clangd Unit Tests-1112544-25-158.json GTEST_SHUFFLE=0 GTEST_TOTAL_SHARDS=158 GTEST_SHARD_INDEX=25 /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/tools/clang/tools/extra/clangd/unittests/./ClangdTests
--

Note: This is test shard 26 of 158.
[==========] Running 8 tests from 8 test suites.
[----------] Global test environment set-up.
[----------] 1 test from BackgroundQueueTest
[ RUN      ] BackgroundQueueTest.Boost
[       OK ] BackgroundQueueTest.Boost (0 ms)
[----------] 1 test from BackgroundQueueTest (0 ms total)

[----------] 1 test from CompletionTest
[ RUN      ] CompletionTest.NamespaceDoubleInsertion
Built preamble of size 215784 for file /clangd-test/foo.cpp version null in 3.06 seconds
Ignored diagnostic. /clangd-test/foo.cpp:5:5:expected unqualified-id
Sema said no scope specifier, but we saw ns:: in the source code
Code complete: fuzzyFind({
  "AnyScope": false,
  "Limit": null,
  "PreferredTypes": [],
  "ProximityPaths": [
    "/clangd-test/foo.cpp"
  ],
  "Query": "ABC",
  "RestrictForCodeCompletion": true,
  "Scopes": [
    "foo::ns::",
    "ns::"
  ]
})
Code complete: sema context TopLevel, query scopes [foo::ns::,ns::] (AnyScope=false), expected type <none>
Code complete: 0 results from Sema, 1 from Index, 0 matched, 0 from identifiers, 1 returned.
[       OK ] CompletionTest.NamespaceDoubleInsertion (4337 ms)
[----------] 1 test from CompletionTest (4337 ms total)

[----------] 1 test from DexTest
Step 10 (stage2/asan_ubsan check) failure: stage2/asan_ubsan check (failure)
...
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using ld64.lld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using wasm-ld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using ld.lld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/ld.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using lld-link: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/lld-link
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using ld64.lld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using wasm-ld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/main.py:72: note: The test suite configuration requested an individual test timeout of 0 seconds but a timeout of 900 seconds was requested on the command line. Forcing timeout to be 900 seconds.
-- Testing: 86836 of 86837 tests, 88 workers --
Testing:  0.. 10.. 20.. 30.. 40.
TIMEOUT: Clangd Unit Tests :: ./ClangdTests/25/158 (2959 of 86836)
******************** TEST 'Clangd Unit Tests :: ./ClangdTests/25/158' FAILED ********************
Script(shard):
--
GTEST_OUTPUT=json:/home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/tools/clang/tools/extra/clangd/unittests/./ClangdTests-Clangd Unit Tests-1112544-25-158.json GTEST_SHUFFLE=0 GTEST_TOTAL_SHARDS=158 GTEST_SHARD_INDEX=25 /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/tools/clang/tools/extra/clangd/unittests/./ClangdTests
--

Note: This is test shard 26 of 158.
[==========] Running 8 tests from 8 test suites.
[----------] Global test environment set-up.
[----------] 1 test from BackgroundQueueTest
[ RUN      ] BackgroundQueueTest.Boost
[       OK ] BackgroundQueueTest.Boost (0 ms)
[----------] 1 test from BackgroundQueueTest (0 ms total)

[----------] 1 test from CompletionTest
[ RUN      ] CompletionTest.NamespaceDoubleInsertion
Built preamble of size 215784 for file /clangd-test/foo.cpp version null in 3.06 seconds
Ignored diagnostic. /clangd-test/foo.cpp:5:5:expected unqualified-id
Sema said no scope specifier, but we saw ns:: in the source code
Code complete: fuzzyFind({
  "AnyScope": false,
  "Limit": null,
  "PreferredTypes": [],
  "ProximityPaths": [
    "/clangd-test/foo.cpp"
  ],
  "Query": "ABC",
  "RestrictForCodeCompletion": true,
  "Scopes": [
    "foo::ns::",
    "ns::"
  ]
})
Code complete: sema context TopLevel, query scopes [foo::ns::,ns::] (AnyScope=false), expected type <none>
Code complete: 0 results from Sema, 1 from Index, 0 matched, 0 from identifiers, 1 returned.
[       OK ] CompletionTest.NamespaceDoubleInsertion (4337 ms)
[----------] 1 test from CompletionTest (4337 ms total)

[----------] 1 test from DexTest

smallp-o-p pushed a commit to smallp-o-p/llvm-project that referenced this pull request Nov 3, 2024
Add support for '`llvm.nvvm.flo.[su].*`' intrinsics which correspond to
a PTX `bfind` instruction.
See [PTX ISA 9.7.1.16. Integer Arithmetic Instructions: bfind]
(https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#integer-arithmetic-instructions-bfind)

The '`llvm.nvvm.flo.u`' family of intrinsics identifies the bit position
of the leading one, returning either it's offset from the most or least
significant bit.

The '`llvm.nvvm.flo.s`' family of intrinsics identifies the bit position
of the leading non-sign bit, returning either it's offset from the most
or least significant bit.
NoumanAmir657 pushed a commit to NoumanAmir657/llvm-project that referenced this pull request Nov 4, 2024
Add support for '`llvm.nvvm.flo.[su].*`' intrinsics which correspond to
a PTX `bfind` instruction.
See [PTX ISA 9.7.1.16. Integer Arithmetic Instructions: bfind]
(https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#integer-arithmetic-instructions-bfind)

The '`llvm.nvvm.flo.u`' family of intrinsics identifies the bit position
of the leading one, returning either it's offset from the most or least
significant bit.

The '`llvm.nvvm.flo.s`' family of intrinsics identifies the bit position
of the leading non-sign bit, returning either it's offset from the most
or least significant bit.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants