-
Notifications
You must be signed in to change notification settings - Fork 14.4k
[RISCV][CostModel] Correct the cost of some reductions #118072
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-llvm-analysis @llvm/pr-subscribers-backend-risc-v Author: LiqinWeng (LiqinWeng) ChangesReductions include: and/or/max/min Full diff: https://github.com/llvm/llvm-project/pull/118072.diff 1 Files Affected:
diff --git a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
index 8f0ef69258b165..b098dd0b7613e5 100644
--- a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
+++ b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
@@ -1470,27 +1470,27 @@ RISCVTTIImpl::getMinMaxReductionCost(Intrinsic::ID IID, VectorType *Ty,
llvm_unreachable("Unsupported intrinsic");
case Intrinsic::smax:
SplitOp = RISCV::VMAX_VV;
- Opcodes = {RISCV::VMV_S_X, RISCV::VREDMAX_VS, RISCV::VMV_X_S};
+ Opcodes = {RISCV::VREDMAX_VS, RISCV::VMV_X_S};
break;
case Intrinsic::smin:
SplitOp = RISCV::VMIN_VV;
- Opcodes = {RISCV::VMV_S_X, RISCV::VREDMIN_VS, RISCV::VMV_X_S};
+ Opcodes = {RISCV::VREDMIN_VS, RISCV::VMV_X_S};
break;
case Intrinsic::umax:
SplitOp = RISCV::VMAXU_VV;
- Opcodes = {RISCV::VMV_S_X, RISCV::VREDMAXU_VS, RISCV::VMV_X_S};
+ Opcodes = {RISCV::VREDMAXU_VS, RISCV::VMV_X_S};
break;
case Intrinsic::umin:
SplitOp = RISCV::VMINU_VV;
- Opcodes = {RISCV::VMV_S_X, RISCV::VREDMINU_VS, RISCV::VMV_X_S};
+ Opcodes = {RISCV::VREDMINU_VS, RISCV::VMV_X_S};
break;
case Intrinsic::maxnum:
SplitOp = RISCV::VFMAX_VV;
- Opcodes = {RISCV::VFMV_S_F, RISCV::VFREDMAX_VS, RISCV::VFMV_F_S};
+ Opcodes = {RISCV::VFREDMAX_VS, RISCV::VFMV_F_S};
break;
case Intrinsic::minnum:
SplitOp = RISCV::VFMIN_VV;
- Opcodes = {RISCV::VFMV_S_F, RISCV::VFREDMIN_VS, RISCV::VFMV_F_S};
+ Opcodes = {RISCV::VFREDMIN_VS, RISCV::VFMV_F_S};
break;
}
// Add a cost for data larger than LMUL8
@@ -1534,6 +1534,13 @@ RISCVTTIImpl::getArithmeticReductionCost(unsigned Opcode, VectorType *Ty,
getRISCVInstructionCost(Opcodes, LT.second, CostKind) +
getCmpSelInstrCost(Instruction::ICmp, ElementTy, ElementTy,
CmpInst::ICMP_EQ, CostKind);
+ } else if (ISD == ISD::XOR) {
+ // Example sequences:
+ // vsetvli a0, zero, e8, mf8, ta, ma
+ // vcpop.m a0, v0
+ // andi a0, a0, 1
+ Opcodes = {RISCV::VCPOP_M};
+ return LT.first + getRISCVInstructionCost(Opcodes, LT.second, CostKind);
} else {
// Example sequences:
// vsetvli a0, zero, e8, mf8, ta, ma
|
429a864
to
ffe5b35
Compare
Reductions include: and/or/max/min
ffe5b35
to
2a43729
Compare
// vcpop.m a0, v0 | ||
// andi a0, a0, 1 | ||
Opcodes = {RISCV::VCPOP_M}; | ||
return LT.first + getRISCVInstructionCost(Opcodes, LT.second, CostKind); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be
return LT.first + getRISCVInstructionCost(Opcodes, LT.second, CostKind); | |
return LT.first * getRISCVInstructionCost(Opcodes, LT.second, CostKind); |
With that said, I'm also not sure why the existing sequences are adding the legalization cost instead of multiplying it. We multiply it everywhere else in RISCVTargetTransformInfo. Maybe this is something to fix in a follow up patch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
deal with the type: ElementTy->isIntegerTy(1), should we real need use *???? I don't understand the case of ISD == ISD::AND
, it uses LT.first - 1
. I'm not sure why it subtracts 1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mycost : LT.first - 1 + getRISCVInstructionCost + 1(1 is cost of the andi)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(LT.first - 1)
first appeared in this commit.
It seems to be used to calculate the additional cost when the LMUL exceeds 8.
For example,
define zeroext i1 @vreduce_and_nxv1i1(<vscale x 128 x i1> %v) {
; vmand.mm v8, v0, v8 // addtional cost when LMUL exceeds 8.
; vmnot.m v8, v8
; vcpop.m a0, v8
; seqz a0, a0
%red = call i1 @llvm.vector.reduce.and.nxv128i1(<vscale x 128 x i1> %v)
ret i1 %red
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks~
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah LT.first is the cost of type legalization, i.e. splitting. But it's the number of times the operation will be split, so we need to multiply that by the cost of whatever node. I'll open up a PR to fix the existing cases, but for this PR we might as well do it the correct way i.e. by multiplying. (Although I don't think it will make a difference to the costs, since these mask instructions are always LMUL 1!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've gone ahead and done this in df10f1c
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
65d8998
to
1588e92
Compare
} else if (ISD == ISD::XOR) { | ||
// Example sequences: | ||
// vsetvli a0, zero, e8, mf8, ta, ma | ||
// vcpop.m a0, v0 | ||
// andi a0, a0, 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't the else block already cover this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
XOR doesn't use a seqz though so it shouldn't use getCmpSelInstrCost
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lukel97 You're right. I initially considered replacing getCmpSelInstrCost with 1 to streamline the code. However, I'm fine if you'd prefer to handle the cost separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Please wait for one more approval.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks
Reductions include: and/or/max/min