Skip to content

[RISCV] Use a DAG combine to prune pointless vrgather.vi #135392

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 12, 2025

Conversation

preames
Copy link
Collaborator

@preames preames commented Apr 11, 2025

If the vrgather.vi is preceeded by a vmv.v.x which writes a superset of the lanes writen by the vrgather, and the vrgather has no passthru, then the vrgather has no semantic effect.

This is the start of a mini-series of patches around rewriting vrgather.vi/vx preceeded by vmv.v.x, vfmf.v.f, vmv.s.x, etc... Starting with the simplest, but also lowest impact.

One point I'd like a second oppinion on is the out of bounds semenatic change. As far as I can tell, all the indices are in bounds by construction. The doc change is as much as I couldn't figure out how to test the alternative as anything else.

If the vrgather.vi is preceeded by a vmv.v.x which writes a superset
of the lanes writen by the vrgather, and the vrgather has no passthru,
then the vrgather has no semantic effect.

This is the start of a mini-series of patches around rewriting
vrgather.vi/vx preceeded by vmv.v.x, vfmf.v.f, vmv.s.x, etc...
Starting with the simplest, but also lowest impact.

One point I'd like a second oppinion on is the out of bounds
semenatic change.  As far as I can tell, all the indices are in
bounds by construction.  The doc change is as much as I couldn't
figure out how to test the alternative as anything else.
@llvmbot
Copy link
Member

llvmbot commented Apr 11, 2025

@llvm/pr-subscribers-backend-risc-v

Author: Philip Reames (preames)

Changes

If the vrgather.vi is preceeded by a vmv.v.x which writes a superset of the lanes writen by the vrgather, and the vrgather has no passthru, then the vrgather has no semantic effect.

This is the start of a mini-series of patches around rewriting vrgather.vi/vx preceeded by vmv.v.x, vfmf.v.f, vmv.s.x, etc... Starting with the simplest, but also lowest impact.

One point I'd like a second oppinion on is the out of bounds semenatic change. As far as I can tell, all the indices are in bounds by construction. The doc change is as much as I couldn't figure out how to test the alternative as anything else.


Full diff: https://github.com/llvm/llvm-project/pull/135392.diff

3 Files Affected:

  • (modified) llvm/lib/Target/RISCV/RISCVISelLowering.cpp (+13)
  • (modified) llvm/lib/Target/RISCV/RISCVISelLowering.h (+2-1)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-shuffle-int.ll (+2-3)
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index fef9084bd0e73..e5843477e04e5 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -19709,6 +19709,19 @@ SDValue RISCVTargetLowering::PerformDAGCombine(SDNode *N,
     if (SDValue V = combineToVCPOP(N, DAG, Subtarget))
       return V;
     break;
+  case RISCVISD::VRGATHER_VX_VL: {
+    // Drop a redundant vrgather_vx.
+    // Note this assumes that out of bounds indices produce poison
+    // and can thus be replaced without having to prove them inbounds..
+    SDValue Src = N->getOperand(0);
+    SDValue Passthru = N->getOperand(2);
+    SDValue VL = N->getOperand(4);
+    // TODO: Handle fmv.v.f?
+    if (Src.getOpcode() == RISCVISD::VMV_V_X_VL && Passthru.isUndef() &&
+        VL == Src.getOperand(2))
+      return Src;
+    break;
+  }
   }
 
   return SDValue();
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.h b/llvm/lib/Target/RISCV/RISCVISelLowering.h
index 5ebdbbd51f2b1..d624ee33d6a63 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.h
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.h
@@ -388,7 +388,8 @@ enum NodeType : unsigned {
   VMSET_VL,
 
   // Matches the semantics of vrgather.vx and vrgather.vv with extra operands
-  // for passthru and VL. Operands are (src, index, mask, passthru, vl).
+  // for passthru and VL, except that out of bound indices result in a poison
+  // result not zero.  Operands are (src, index, mask, passthru, vl).
   VRGATHER_VX_VL,
   VRGATHER_VV_VL,
   VRGATHEREI16_VV_VL,
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-shuffle-int.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-shuffle-int.ll
index e6375e276d37f..185b8b6d03134 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-shuffle-int.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-shuffle-int.ll
@@ -1346,10 +1346,9 @@ define <4 x i16> @vmerge_2(<4 x i16> %x) {
 define <4 x i16> @vmerge_3(<4 x i16> %x) {
 ; CHECK-LABEL: vmerge_3:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetivli zero, 4, e16, mf2, ta, mu
+; CHECK-NEXT:    vsetivli zero, 4, e16, mf2, ta, ma
 ; CHECK-NEXT:    vmv.v.i v0, 6
-; CHECK-NEXT:    vmv.v.i v9, 5
-; CHECK-NEXT:    vrgather.vi v8, v9, 1, v0.t
+; CHECK-NEXT:    vmerge.vim v8, v8, 5, v0
 ; CHECK-NEXT:    ret
    %s = shufflevector <4 x i16> %x, <4 x i16> <i16 poison, i16 5, i16 poison, i16 poison>, <4 x i32> <i32 0, i32 5, i32 5, i32 3>
    ret <4 x i16> %s

Copy link
Collaborator

@topperc topperc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@preames preames merged commit 336b290 into llvm:main Apr 12, 2025
13 checks passed
@preames preames deleted the pr-riscv-vrgather_vx-dag-combine branch April 12, 2025 03:02
@llvm-ci
Copy link
Collaborator

llvm-ci commented Apr 12, 2025

LLVM Buildbot has detected a new failure on builder sanitizer-x86_64-linux-qemu running on sanitizer-buildbot4 while building llvm at step 2 "annotate".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/139/builds/13436

Here is the relevant piece of the build log for the reference
Step 2 (annotate) failure: 'python ../sanitizer_buildbot/sanitizers/zorg/buildbot/builders/sanitizers/buildbot_selector.py' (failure)
...
[75/77] Generating ScudoUnitTestsObjects.combined_test.cpp.mips64el.o
clang++: warning: -Wl,-z,execstack: 'linker' input unused [-Wunused-command-line-argument]
clang++: warning: -Wl,-z,execstack: 'linker' input unused [-Wunused-command-line-argument]
clang++: warning: -Wl,-z,execstack: 'linker' input unused [-Wunused-command-line-argument]
[76/77] Generating ScudoUnitTest-mips64el-Test
[76/77] Running Scudo Standalone tests
llvm-lit: /home/b/sanitizer-x86_64-linux-qemu/build/llvm-project/llvm/utils/lit/lit/main.py:72: note: The test suite configuration requested an individual test timeout of 0 seconds but a timeout of 900 seconds was requested on the command line. Forcing timeout to be 900 seconds.
-- Testing: 159 tests, 88 workers --
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90..
TIMEOUT: ScudoStandalone-Unit :: ./ScudoUnitTest-mips64el-Test/69/141 (159 of 159)
******************** TEST 'ScudoStandalone-Unit :: ./ScudoUnitTest-mips64el-Test/69/141' FAILED ********************
Script(shard):
--
GTEST_OUTPUT=json:/home/b/sanitizer-x86_64-linux-qemu/build/llvm_build2_debug_mips64el_qemu/lib/scudo/standalone/tests/./ScudoUnitTest-mips64el-Test-ScudoStandalone-Unit-1202096-69-141.json GTEST_SHUFFLE=0 GTEST_TOTAL_SHARDS=141 GTEST_SHARD_INDEX=69 /home/b/sanitizer-x86_64-linux-qemu/build/qemu_build/qemu-mips64el -L /usr/mips64el-linux-gnuabi64 /home/b/sanitizer-x86_64-linux-qemu/build/llvm_build2_debug_mips64el_qemu/lib/scudo/standalone/tests/./ScudoUnitTest-mips64el-Test
--

Note: This is test shard 70 of 141.
[==========] Running 2 tests from 2 test suites.
[----------] Global test environment set-up.
[----------] 1 test from ScudoCombinedDeathTestBasicCombined14_DefaultConfig
[ RUN      ] ScudoCombinedDeathTestBasicCombined14_DefaultConfig.BasicCombined14
Stats: SizeClassAllocator64: 2M mapped (0M rss) in 34 allocations; remains 34; ReleaseToOsIntervalMs = 5000
  00 (   128): mapped:    256K popped:      24 pushed:       0 inuse:     24 total:     64 releases attempted:      0 last released:      0K latest pushed bytes:      5K region: 0x556c56872000 (0x556c56869000)
  32 ( 16384): mapped:    256K popped:       1 pushed:       0 inuse:      1 total:      8 releases attempted:      0 last released:      0K latest pushed bytes:    112K region: 0x556d56871000 (0x556d56869000)
  33 ( 20480): mapped:    256K popped:       1 pushed:       0 inuse:      1 total:      8 releases attempted:      0 last released:      0K latest pushed bytes:    140K region: 0x557856872000 (0x557856869000)
  34 ( 24576): mapped:    256K popped:       1 pushed:       0 inuse:      1 total:      8 releases attempted:      0 last released:      0K latest pushed bytes:    168K region: 0x557e56873000 (0x557e56869000)
  35 ( 28672): mapped:    256K popped:       1 pushed:       0 inuse:      1 total:      8 releases attempted:      0 last released:      0K latest pushed bytes:    196K region: 0x557556878000 (0x557556869000)
  36 ( 32768): mapped:    256K popped:       1 pushed:       0 inuse:      1 total:      8 releases attempted:      0 last released:      0K latest pushed bytes:    224K region: 0x556156871000 (0x556156869000)
  37 ( 40960): mapped:    256K popped:       1 pushed:       0 inuse:      1 total:      6 releases attempted:      0 last released:      0K latest pushed bytes:    200K region: 0x558056878000 (0x558056869000)
  38 ( 49152): mapped:    256K popped:       1 pushed:       0 inuse:      1 total:      5 releases attempted:      0 last released:      0K latest pushed bytes:    192K region: 0x556456877000 (0x556456869000)
  39 ( 57344): mapped:    256K popped:       1 pushed:       0 inuse:      1 total:      4 releases attempted:      0 last released:      0K latest pushed bytes:    168K region: 0x55745686e000 (0x557456869000)
  41 ( 81920): mapped:    256K popped:       1 pushed:       0 inuse:      1 total:      3 releases attempted:      0 last released:      0K latest pushed bytes:    160K region: 0x55635686e000 (0x556356869000)
  42 ( 98304): mapped:    256K popped:       1 pushed:       0 inuse:      1 total:      2 releases attempted:      0 last released:      0K latest pushed bytes:     96K region: 0x556e56876000 (0x556e56869000)
Stats: MapAllocator: allocated 0 times (0K), freed 0 times (0K), remains 0 (0K) max 0M, Fragmented 0K
Stats: MapAllocatorCache: EntriesCount: 0, MaxEntriesCount: 32, MaxEntrySize: 524288, ReleaseToOsSkips: 0, ReleaseToOsIntervalMs = 5000
Stats: CacheRetrievalStats: SuccessRate: 0/0 (100.00%)
Cache Entry Info (Most Recent -> Least Recent):
Stats: Quarantine: batches: 0; bytes: 0 (user: 0); chunks: 0 (capacity: 0); 0% chunks used; 0% memory overhead
Quarantine limits: global: 0K; thread local: 0K
Exclusive TSD don't support iterating each TSD
Fragmentation Stats: SizeClassAllocator64: page size = 4096 bytes
  01 (    32): inuse/total blocks:      0/     0 inuse/total pages:      0/     0 inuse bytes:      0K util: 100.00%
  02 (    64): inuse/total blocks:      0/     0 inuse/total pages:      0/     0 inuse bytes:      0K util: 100.00%
  03 (    96): inuse/total blocks:      0/     0 inuse/total pages:      0/     0 inuse bytes:      0K util: 100.00%
  04 (   128): inuse/total blocks:      0/     0 inuse/total pages:      0/     0 inuse bytes:      0K util: 100.00%
  05 (   160): inuse/total blocks:      0/     0 inuse/total pages:      0/     0 inuse bytes:      0K util: 100.00%
  06 (   192): inuse/total blocks:      0/     0 inuse/total pages:      0/     0 inuse bytes:      0K util: 100.00%
  07 (   224): inuse/total blocks:      0/     0 inuse/total pages:      0/     0 inuse bytes:      0K util: 100.00%
  08 (   256): inuse/total blocks:      0/     0 inuse/total pages:      0/     0 inuse bytes:      0K util: 100.00%
Step 23 (scudo debug_mips64el_qemu) failure: scudo debug_mips64el_qemu (failure)
...
[75/77] Generating ScudoUnitTestsObjects.combined_test.cpp.mips64el.o
clang++: warning: -Wl,-z,execstack: 'linker' input unused [-Wunused-command-line-argument]
clang++: warning: -Wl,-z,execstack: 'linker' input unused [-Wunused-command-line-argument]
clang++: warning: -Wl,-z,execstack: 'linker' input unused [-Wunused-command-line-argument]
[76/77] Generating ScudoUnitTest-mips64el-Test
[76/77] Running Scudo Standalone tests
llvm-lit: /home/b/sanitizer-x86_64-linux-qemu/build/llvm-project/llvm/utils/lit/lit/main.py:72: note: The test suite configuration requested an individual test timeout of 0 seconds but a timeout of 900 seconds was requested on the command line. Forcing timeout to be 900 seconds.
-- Testing: 159 tests, 88 workers --
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90..
TIMEOUT: ScudoStandalone-Unit :: ./ScudoUnitTest-mips64el-Test/69/141 (159 of 159)
******************** TEST 'ScudoStandalone-Unit :: ./ScudoUnitTest-mips64el-Test/69/141' FAILED ********************
Script(shard):
--
GTEST_OUTPUT=json:/home/b/sanitizer-x86_64-linux-qemu/build/llvm_build2_debug_mips64el_qemu/lib/scudo/standalone/tests/./ScudoUnitTest-mips64el-Test-ScudoStandalone-Unit-1202096-69-141.json GTEST_SHUFFLE=0 GTEST_TOTAL_SHARDS=141 GTEST_SHARD_INDEX=69 /home/b/sanitizer-x86_64-linux-qemu/build/qemu_build/qemu-mips64el -L /usr/mips64el-linux-gnuabi64 /home/b/sanitizer-x86_64-linux-qemu/build/llvm_build2_debug_mips64el_qemu/lib/scudo/standalone/tests/./ScudoUnitTest-mips64el-Test
--

Note: This is test shard 70 of 141.
[==========] Running 2 tests from 2 test suites.
[----------] Global test environment set-up.
[----------] 1 test from ScudoCombinedDeathTestBasicCombined14_DefaultConfig
[ RUN      ] ScudoCombinedDeathTestBasicCombined14_DefaultConfig.BasicCombined14
Stats: SizeClassAllocator64: 2M mapped (0M rss) in 34 allocations; remains 34; ReleaseToOsIntervalMs = 5000
  00 (   128): mapped:    256K popped:      24 pushed:       0 inuse:     24 total:     64 releases attempted:      0 last released:      0K latest pushed bytes:      5K region: 0x556c56872000 (0x556c56869000)
  32 ( 16384): mapped:    256K popped:       1 pushed:       0 inuse:      1 total:      8 releases attempted:      0 last released:      0K latest pushed bytes:    112K region: 0x556d56871000 (0x556d56869000)
  33 ( 20480): mapped:    256K popped:       1 pushed:       0 inuse:      1 total:      8 releases attempted:      0 last released:      0K latest pushed bytes:    140K region: 0x557856872000 (0x557856869000)
  34 ( 24576): mapped:    256K popped:       1 pushed:       0 inuse:      1 total:      8 releases attempted:      0 last released:      0K latest pushed bytes:    168K region: 0x557e56873000 (0x557e56869000)
  35 ( 28672): mapped:    256K popped:       1 pushed:       0 inuse:      1 total:      8 releases attempted:      0 last released:      0K latest pushed bytes:    196K region: 0x557556878000 (0x557556869000)
  36 ( 32768): mapped:    256K popped:       1 pushed:       0 inuse:      1 total:      8 releases attempted:      0 last released:      0K latest pushed bytes:    224K region: 0x556156871000 (0x556156869000)
  37 ( 40960): mapped:    256K popped:       1 pushed:       0 inuse:      1 total:      6 releases attempted:      0 last released:      0K latest pushed bytes:    200K region: 0x558056878000 (0x558056869000)
  38 ( 49152): mapped:    256K popped:       1 pushed:       0 inuse:      1 total:      5 releases attempted:      0 last released:      0K latest pushed bytes:    192K region: 0x556456877000 (0x556456869000)
  39 ( 57344): mapped:    256K popped:       1 pushed:       0 inuse:      1 total:      4 releases attempted:      0 last released:      0K latest pushed bytes:    168K region: 0x55745686e000 (0x557456869000)
  41 ( 81920): mapped:    256K popped:       1 pushed:       0 inuse:      1 total:      3 releases attempted:      0 last released:      0K latest pushed bytes:    160K region: 0x55635686e000 (0x556356869000)
  42 ( 98304): mapped:    256K popped:       1 pushed:       0 inuse:      1 total:      2 releases attempted:      0 last released:      0K latest pushed bytes:     96K region: 0x556e56876000 (0x556e56869000)
Stats: MapAllocator: allocated 0 times (0K), freed 0 times (0K), remains 0 (0K) max 0M, Fragmented 0K
Stats: MapAllocatorCache: EntriesCount: 0, MaxEntriesCount: 32, MaxEntrySize: 524288, ReleaseToOsSkips: 0, ReleaseToOsIntervalMs = 5000
Stats: CacheRetrievalStats: SuccessRate: 0/0 (100.00%)
Cache Entry Info (Most Recent -> Least Recent):
Stats: Quarantine: batches: 0; bytes: 0 (user: 0); chunks: 0 (capacity: 0); 0% chunks used; 0% memory overhead
Quarantine limits: global: 0K; thread local: 0K
Exclusive TSD don't support iterating each TSD
Fragmentation Stats: SizeClassAllocator64: page size = 4096 bytes
  01 (    32): inuse/total blocks:      0/     0 inuse/total pages:      0/     0 inuse bytes:      0K util: 100.00%
  02 (    64): inuse/total blocks:      0/     0 inuse/total pages:      0/     0 inuse bytes:      0K util: 100.00%
  03 (    96): inuse/total blocks:      0/     0 inuse/total pages:      0/     0 inuse bytes:      0K util: 100.00%
  04 (   128): inuse/total blocks:      0/     0 inuse/total pages:      0/     0 inuse bytes:      0K util: 100.00%
  05 (   160): inuse/total blocks:      0/     0 inuse/total pages:      0/     0 inuse bytes:      0K util: 100.00%
  06 (   192): inuse/total blocks:      0/     0 inuse/total pages:      0/     0 inuse bytes:      0K util: 100.00%
  07 (   224): inuse/total blocks:      0/     0 inuse/total pages:      0/     0 inuse bytes:      0K util: 100.00%
  08 (   256): inuse/total blocks:      0/     0 inuse/total pages:      0/     0 inuse bytes:      0K util: 100.00%

var-const pushed a commit to ldionne/llvm-project that referenced this pull request Apr 17, 2025
If the vrgather.vi is preceeded by a vmv.v.x which writes a superset of
the lanes writen by the vrgather, and the vrgather has no passthru, then
the vrgather has no semantic effect.

This is the start of a mini-series of patches around rewriting
vrgather.vi/vx preceeded by vmv.v.x, vfmf.v.f, vmv.s.x, etc... Starting
with the simplest, but also lowest impact.

One point I'd like a second oppinion on is the out of bounds semenatic
change. As far as I can tell, all the indices are in bounds by
construction. The doc change is as much as I couldn't figure out how to
test the alternative as anything else.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants