Skip to content

Commit 473f0a7

Browse files
petechouigcbot
authored andcommitted
Avoid folding pseudo-and/pseudo-or into its 2 source defining instructions in some cases.
Do not perform flag opt for the pseudo-and/pseudo-or when its mask option is mismatched with the mask options of its 2-source defining instructions, and the dst of pseudo-and/pseudo-or is global.
1 parent cd8b365 commit 473f0a7

File tree

1 file changed

+18
-9
lines changed

1 file changed

+18
-9
lines changed

visa/Optimizer.cpp

Lines changed: 18 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -3690,7 +3690,7 @@ bool Optimizer::foldPseudoNot(G4_BB *bb, INST_LIST_ITER &iter) {
36903690
}
36913691

36923692
/***
3693-
this function optmize the following cases:
3693+
this function optimizes the following cases:
36943694
36953695
case 1:
36963696
cmp.gt.P0 s0 s1
@@ -3723,7 +3723,7 @@ mov (1) P0 Imm (NoMask)
37233723
smov (8) r[A0, 0] src0 src1 Imm
37243724
37253725
case 5:
3726-
psuedo_not (1) P2 P1
3726+
pseudo_not (1) P2 P1
37273727
and (1) P4 P3 P2
37283728
==>
37293729
and (1) P4 P3 ~P1
@@ -3818,7 +3818,7 @@ void Optimizer::optimizeLogicOperation() {
38183818
merged = foldPseudoAndOr(bb, ii);
38193819
}
38203820

3821-
// translate the psuedo op
3821+
// translate the pseudo op
38223822
if (!merged) {
38233823
expandPseudoLogic(builder, bb, ii);
38243824
}
@@ -3835,7 +3835,9 @@ bool Optimizer::foldPseudoAndOr(G4_BB *bb, INST_LIST_ITER &ii) {
38353835

38363836
// optimization should apply even when the dst of the pseudo-and/pseudo-or is
38373837
// global, since we are just hoisting it up, and WAR/WAW checks should be
3838-
// performed as we search for the src0 and src1 inst.
3838+
// performed as we search for the src0 and src1 inst. Also need to check if
3839+
// the mask option of the pseudo-and/pseudo-or matches with the options of
3840+
// the defining instructions when dst is global.
38393841

38403842
G4_INST *inst = *ii;
38413843
// look for def of srcs
@@ -3852,7 +3854,7 @@ bool Optimizer::foldPseudoAndOr(G4_BB *bb, INST_LIST_ITER &ii) {
38523854
38533855
The new code uses defInstList directly, and aborts if there are more then are
38543856
two definitions. Which means there is more then one instruction writing to
3855-
source. Disadvantage of that is that it is less precisise. For example if we
3857+
source. Disadvantage of that is that it is less precise. For example if we
38563858
are folding in to closest definition then before it was OK, but now will be
38573859
disallowed.
38583860
*/
@@ -3889,13 +3891,13 @@ bool Optimizer::foldPseudoAndOr(G4_BB *bb, INST_LIST_ITER &ii) {
38893891
std::swap(defInstructions[0], defInstructions[1]);
38903892
std::swap(maxSrc1, maxSrc2);
38913893
}
3892-
// Doing backward scan until earlist src to make sure dst of and/or is not
3894+
// Doing backward scan until earliest src to make sure dst of and/or is not
38933895
// being written to or being read
38943896
/*
38953897
handling case like in spmv_csr
3896-
cmp.lt (M1, 1) P15 V40(0,0)<0;1,0> 0x10:w /// $191 cmp.lt (M1, 1) P16
3897-
V110(0,0)<0;1,0> V34(0,0)<0;1,0> /// $192 and (M1,
3898-
1) P16 P16 P15 /// $193
3898+
cmp.lt (M1, 1) P15 V40(0,0)<0;1,0> 0x10:w /// $191
3899+
cmp.lt (M1, 1) P16 V110(0,0)<0;1,0> V34(0,0)<0;1,0> /// $192
3900+
and (M1, 1) P16 P16 P15 /// $193
38993901
*/
39003902
if (chkBwdOutputHazard(defInstructions[1], ii, defInstructions[0])) {
39013903
return false;
@@ -3950,6 +3952,13 @@ bool Optimizer::foldPseudoAndOr(G4_BB *bb, INST_LIST_ITER &ii) {
39503952
return false;
39513953
}
39523954

3955+
// Check if mask options are mismatched between the pseudo-and/pseudo-or and
3956+
// its defining instructions.
3957+
if ((inst->getMaskOption() != src0DefInst->getMaskOption() ||
3958+
inst->getMaskOption() != src1DefInst->getMaskOption()) &&
3959+
fg.globalOpndHT.isOpndGlobal(inst->getDst()))
3960+
return false;
3961+
39533962
// do the case 3 optimization
39543963

39553964
G4_PredState ps =

0 commit comments

Comments
 (0)