Skip to content

Commit bc1f3eb

Browse files
committed
[DAGCombiner] Pre-commit test case for ReduceLoadOpStoreWidth. NFC
Adding test cases related to narrowing of load-op-store sequences. ReduceLoadOpStoreWidth isn't careful enough, so it may end up creating load/store operations that access memory outside the region touched by the original load/store. Using ARM as a target for the test cases to show what happens for both little-endian and big-endian. This patch also adds a way to override the TLI.isNarrowingProfitable check in DAGCombiner::ReduceLoadOpStoreWidth by using the option -combiner-reduce-load-op-store-width-force-narrowing-profitable. Idea is that it should be simpler to for example add lit tests verifying that the code is correct for big-endian (which otherwise is difficult since there are no in-tree big-endian targets that is overriding TLI.isNarrowingProfitable). This is a pre-commit for #119203
1 parent 03019c6 commit bc1f3eb

File tree

2 files changed

+80
-3
lines changed

2 files changed

+80
-3
lines changed

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

Lines changed: 23 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -138,6 +138,11 @@ static cl::opt<bool> EnableReduceLoadOpStoreWidth(
138138
"combiner-reduce-load-op-store-width", cl::Hidden, cl::init(true),
139139
cl::desc("DAG combiner enable reducing the width of load/op/store "
140140
"sequence"));
141+
static cl::opt<bool> ReduceLoadOpStoreWidthForceNarrowingProfitable(
142+
"combiner-reduce-load-op-store-width-force-narrowing-profitable",
143+
cl::Hidden, cl::init(false),
144+
cl::desc("DAG combiner force override the narrowing profitable check when"
145+
"reducing the width of load/op/store sequences"));
141146

142147
static cl::opt<bool> EnableShrinkLoadReplaceStoreWithStore(
143148
"combiner-shrink-load-replace-store-with-store", cl::Hidden, cl::init(true),
@@ -20351,19 +20356,34 @@ SDValue DAGCombiner::ReduceLoadOpStoreWidth(SDNode *N) {
2035120356
EVT NewVT = EVT::getIntegerVT(*DAG.getContext(), NewBW);
2035220357
// The narrowing should be profitable, the load/store operation should be
2035320358
// legal (or custom) and the store size should be equal to the NewVT width.
20354-
while (NewBW < BitWidth && (NewVT.getStoreSizeInBits() != NewBW ||
20355-
!TLI.isOperationLegalOrCustom(Opc, NewVT) ||
20356-
!TLI.isNarrowingProfitable(N, VT, NewVT))) {
20359+
while (NewBW < BitWidth &&
20360+
(NewVT.getStoreSizeInBits() != NewBW ||
20361+
!TLI.isOperationLegalOrCustom(Opc, NewVT) ||
20362+
(!ReduceLoadOpStoreWidthForceNarrowingProfitable &&
20363+
!TLI.isNarrowingProfitable(N, VT, NewVT)))) {
2035720364
NewBW = NextPowerOf2(NewBW);
2035820365
NewVT = EVT::getIntegerVT(*DAG.getContext(), NewBW);
2035920366
}
2036020367
if (NewBW >= BitWidth)
2036120368
return SDValue();
2036220369

20370+
// TODO: For big-endian we probably want to align given the most significant
20371+
// bit being modified instead of adjusting ShAmt based on least significant
20372+
// bits. This to reduce the risk of failing on the alignment check below. If
20373+
// for example VT.getStoreSize()==5 and Imm is 0x0000ffff00, then we want to
20374+
// find NewBW=16, and we want to load/store with a PtrOff set to 2. But then
20375+
// ShAmt should be set to 8, which isn't a multiple of NewBW. But given
20376+
// that isNarrowingProfitable doesn't seem to be overridden for any in-tree
20377+
// big-endian target, then the support for big-endian here isn't covered by
20378+
// any in-tree lit tests, so it is unfortunately not highly optimized
20379+
// either. It should be possible to improve that by using
20380+
// ReduceLoadOpStoreWidthForceNarrowingProfitable.
20381+
2036320382
// If the lsb changed does not start at the type bitwidth boundary,
2036420383
// start at the previous one.
2036520384
if (ShAmt % NewBW)
2036620385
ShAmt = (((ShAmt + NewBW - 1) / NewBW) * NewBW) - NewBW;
20386+
2036720387
APInt Mask = APInt::getBitsSet(BitWidth, ShAmt,
2036820388
std::min(BitWidth, ShAmt + NewBW));
2036920389
if ((Imm & Mask) == Imm) {
Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
2+
; RUN: llc < %s -mtriple armv7 -O1 | FileCheck %s -check-prefix=CHECK-LE-NORMAL
3+
; RUN: llc < %s -mtriple armv7 -O1 -combiner-reduce-load-op-store-width-force-narrowing-profitable=1 | FileCheck %s -check-prefix=CHECK-LE-NARROW
4+
; RUN: llc < %s -mtriple armv7eb -O1 | FileCheck %s -check-prefix=CHECK-BE-NORMAL
5+
; XXXRUNXXX: llc < %s -mtriple armv7eb -O1 -combiner-reduce-load-op-store-width-force-narrowing-profitable=1 | FileCheck %s -check-prefix=CHECK-BE-NARROW
6+
7+
; This is a reproducer for a bug when DAGCombiner::ReduceLoadOpStoreWidth
8+
; would end up narrowing the load-op-store sequence into this SDNode sequence
9+
; for little-endian
10+
;
11+
; t18: i32,ch = load<(load (s32) from %ir.p1 + 8, align 8)> t0, t17, undef:i32
12+
; t20: i32 = or t18, Constant:i32<65534>
13+
; t21: ch = store<(store (s32) into %ir.p1 + 8, align 8)> t18:1, t20, t17, undef:i32
14+
;
15+
; This is wrong since it accesses memory above %ir.p1+9 which is outside the
16+
; "store size" for the original store.
17+
;
18+
; For big-endian we hit an assertion due to passing a negative offset to
19+
; getMemBasePlusOffset (at least after commit 3e1b55cafc95d4ef4, while before
20+
; that commit we got load/store instructions that accessed memory at a
21+
; negative offset from %p1).
22+
;
23+
define i16 @test(ptr %p1) {
24+
; CHECK-LE-NORMAL-LABEL: test:
25+
; CHECK-LE-NORMAL: @ %bb.0: @ %entry
26+
; CHECK-LE-NORMAL-NEXT: ldrh r1, [r0, #8]
27+
; CHECK-LE-NORMAL-NEXT: movw r2, #65534
28+
; CHECK-LE-NORMAL-NEXT: orr r1, r1, r2
29+
; CHECK-LE-NORMAL-NEXT: strh r1, [r0, #8]
30+
; CHECK-LE-NORMAL-NEXT: mov r0, #0
31+
; CHECK-LE-NORMAL-NEXT: bx lr
32+
;
33+
; CHECK-LE-NARROW-LABEL: test:
34+
; CHECK-LE-NARROW: @ %bb.0: @ %entry
35+
; CHECK-LE-NARROW-NEXT: ldr r1, [r0, #8]
36+
; CHECK-LE-NARROW-NEXT: movw r2, #65534
37+
; CHECK-LE-NARROW-NEXT: orr r1, r1, r2
38+
; CHECK-LE-NARROW-NEXT: str r1, [r0, #8]
39+
; CHECK-LE-NARROW-NEXT: mov r0, #0
40+
; CHECK-LE-NARROW-NEXT: bx lr
41+
;
42+
; CHECK-BE-NORMAL-LABEL: test:
43+
; CHECK-BE-NORMAL: @ %bb.0: @ %entry
44+
; CHECK-BE-NORMAL-NEXT: ldrh r1, [r0]
45+
; CHECK-BE-NORMAL-NEXT: movw r2, #65534
46+
; CHECK-BE-NORMAL-NEXT: orr r1, r1, r2
47+
; CHECK-BE-NORMAL-NEXT: strh r1, [r0]
48+
; CHECK-BE-NORMAL-NEXT: mov r0, #0
49+
; CHECK-BE-NORMAL-NEXT: bx lr
50+
entry:
51+
%load = load i80, ptr %p1, align 32
52+
%mask = shl i80 -1, 65
53+
%op = or i80 %load, %mask
54+
store i80 %op, ptr %p1, align 32
55+
ret i16 0
56+
}
57+

0 commit comments

Comments
 (0)