-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[AArch64][GlobalISel] Select UMULL instruction #65469
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GlobalISel's custom legalization action doesn't work in the way that most would expect. When a custom legalization rule runs, the legalizer expects the result to be now fully legalized. Therefore you must completely handle every possible situation, since the legalizer will not re-add the instruction onto the worklist.
In this case, the legalization of <2 x s64>
G_MUL for a specific pattern can fail and as a result illegal <2 x s64>
G_MULs will escape. E.g the following:
---
name: long_mul
legalized: true
liveins:
- { reg: '$x0' }
- { reg: '$x1' }
body: |
bb.1:
liveins: $x0, $x1
%0:_(s64) = COPY $x0
%1:_(s64) = COPY $x1
%bv1:_(<2 x s64>) = G_BUILD_VECTOR %0, %0
%bv2:_(<2 x s64>) = G_BUILD_VECTOR %1, %1
%mul:_(<2 x s64>) = G_MUL %bv1, %bv2
$q0 = COPY %mul(<2 x s64>)
RET_ReallyLR implicit $q0
...
will not scalarize the G_MUL. I think you can fix this by directly calling the LegalizerHelper::fewerElementsVector()
helper with a scalar type.
eb78b42
to
aa0918f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GlobalISel's custom legalization action doesn't work in the way that most would expect. When a custom legalization rule runs, the legalizer expects the result to be now fully legalized. Therefore you must completely handle every possible situation, since the legalizer will not re-add the instruction onto the worklist.
In this case, the legalization of
<2 x s64>
G_MUL for a specific pattern can fail and as a result illegal<2 x s64>
G_MULs will escape. E.g the following:--- name: long_mul legalized: true liveins: - { reg: '$x0' } - { reg: '$x1' } body: | bb.1: liveins: $x0, $x1 %0:_(s64) = COPY $x0 %1:_(s64) = COPY $x1 %bv1:_(<2 x s64>) = G_BUILD_VECTOR %0, %0 %bv2:_(<2 x s64>) = G_BUILD_VECTOR %1, %1 %mul:_(<2 x s64>) = G_MUL %bv1, %bv2 $q0 = COPY %mul(<2 x s64>) RET_ReallyLR implicit $q0 ...
will not scalarize the G_MUL. I think you can fix this by directly calling the
LegalizerHelper::fewerElementsVector()
helper with a scalar type.
That matches how this patch currently works, and I think the testcase works with it. There is a call to fewerElementsVector, it just gets complex with the way larger vectors are expanded and the order nodes are visited during legalization.
We are trying to convert mul(zext, zext)
into UMULL
. The legal types for this are v8i8->v8i16, v4i16->v4i32 and v2i32->v2i64. The last one is the awkward one as there is no v2i64 mul and so it needs to be scalarized. The other types are simpler and could honestly be done with tblgen patterns (as nodes like uaddl are). The scalarization of v2i64 mul means we end up needing to come up with something special for them, and we use the same method for all umull to be consistent.
On it's own I think those types would be easy enough to support. It would just match them during legalization and produce a UMULL in a custom method. The problem comes from v4i32->v4i64 type operations, and all the others larger type sizes. They need to be split up into parts of size v2i64. DAG would split types before operations, so it becomes simpler. GlobalISel is trying to do them both at the same time, so even if it splits the larger vectors it can be left with merge/unmerge pairs between the mul and the extends. (It seems to re-vist the mul repeatedly until legal, not move up to the operands that could be simplified).
So this patch adds a check to see if the operands will be simplified away and are extends. What I understand happens is that the operands are then visited and split/simplified, and the legalizer reiterates and legalizes the mul on the second iteration to a UMULL. It did have to lie about the mul being legal though.
I think there are several ways to try and get this to work, they each have some difficulties.
- Do it post-legalization by letting the v2i64 mul scalarize and getting it to recombine later. I don't think this would work very well as it's often too difficult or impossible to reliably recombine the scalarized muls.
- Do it during legalization like this patch does. It just needs to deal with the case where the larger types are being split as operation is being legalized, and still generate something that is optimal. From what I can tell this seems to work (minus my comment about multi-use extends).
- Do it pre-legalization (or during legalization) but generate UMULL for all type sizes, then have separate legalization for UMULL to split it to legal vectors.
- Pretend all v2i64 mul are legal and do the legalization of them in AArch64PostLegalizerLowering? The rest of GlobalISel then might believe that v2s64 mul are legal.
- I think there may be another way in the long run, where we check the known bits of the operands and generate UMULL more aggressively, even generating UMULL(trunc, trunc) if needed. That might make the custom method more "complete", not relying on reiteration, providing the extra truncs would get removed.
I'm not sure which method would be best. Probably either generating the UMULL with larger types and having separate legalization or this method so long as we are not too unhappy about the lying and reiteration from the custom method. AArch64PostLegalizerLowering probably works, providing the normal v2s64 mul get cleaned up well enough, we are just re-inventing progressive type lowering and lying about v2s64 muls being legal more generally.
Indeed, I see that now. Not sure how I managed to see non-scalarization behaviour, must have done something wrong on my end.
Yep, agreed on this.
I'm generally ok with this now that I'm no longer blind and see the fewerElementsVector being called.
Personally: prefer not to introduce target specific ops before legalization.
This could actually work. Our definition of legal is that the instructions can be "selected", which is a somewhat fuzzier concept than just " |
Yeah it might work OK. We can give it a try. My worry would be that |
As long as PostLegalizerLowering scalarizes it I think it's fine. That pass runs even at -O0 so a v2s64 MUL should never survive past that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I can tell this looks OK. The expanded v2i64 muls are a little less optimized in places as the legalization artefact combiners haven't kicked in. That is probably something that could be improved with some other targeted optimizations.
@aemerson any thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, for some reason this doesn't appear on my "Review Requested" page.
That's unfortunate about the artifacts, I hadn't considered that issue when using LegalizerHelper at the lowering phase. In fact in the general case not running the artifact combiner can be a fatal error since we rely on the combines to eliminate un-selectable operations. In this case it doesn't result in that issue however.
To fix that we can schedule artifact combining after lowering, either as a new pass or just append it to the end of lowering's runOnMachineFunction()
. It's not ideal but we may need to do that eventually anyway for other reasons.
Don't need to fix that right now though, it can be follow up later.
e7ab290
to
51d7bab
Compare
You can merge this when you're ready. |
There is a problem with my stage 2 built right now, I will need to see if this patch is causing the failure. |
Global ISel now selects UMULL and UMULL2 instructions. MUL instructions with input operands coming from ZEXT or SEXT operations are turned into UMULL.
51d7bab
to
552c09c
Compare
Global ISel now selects
UMULL
andUMULL2
instructions.G_MUL instruction with input operands coming from
SEXT
orZEXT
operations are turned into UMULLG_MUL instructions with v2s64 result type is always scalarised except:
mul ( unmerge( ext ), unmerge( ext ))
So the extend could be unmerged and fold away the unmerge in the middle:
mul ( unmerge( ext ), unmerge( ext ))
=>mul ( unmerge( merge( ext( unmerge )), unmerge( merge( ext( unmerge ))))
=>mul ( ext(unmerge)), ( ext( unmerge )))