Skip to content

[AArch64][GlobalISel] Select UMULL instruction #65469

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Sep 25, 2023

Conversation

chuongg3
Copy link
Contributor

@chuongg3 chuongg3 commented Sep 6, 2023

Global ISel now selects UMULL and UMULL2 instructions.
G_MUL instruction with input operands coming from SEXT or ZEXT operations are turned into UMULL

G_MUL instructions with v2s64 result type is always scalarised except:
mul ( unmerge( ext ), unmerge( ext ))

So the extend could be unmerged and fold away the unmerge in the middle:
mul ( unmerge( ext ), unmerge( ext )) =>
mul ( unmerge( merge( ext( unmerge )), unmerge( merge( ext( unmerge )))) =>
mul ( ext(unmerge)), ( ext( unmerge )))

Copy link
Contributor

@aemerson aemerson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GlobalISel's custom legalization action doesn't work in the way that most would expect. When a custom legalization rule runs, the legalizer expects the result to be now fully legalized. Therefore you must completely handle every possible situation, since the legalizer will not re-add the instruction onto the worklist.

In this case, the legalization of <2 x s64> G_MUL for a specific pattern can fail and as a result illegal <2 x s64> G_MULs will escape. E.g the following:

---
name:            long_mul
legalized: true
liveins:
  - { reg: '$x0' }
  - { reg: '$x1' }
body:             |
  bb.1:
    liveins: $x0, $x1

    %0:_(s64) = COPY $x0
    %1:_(s64) = COPY $x1
    %bv1:_(<2 x s64>) = G_BUILD_VECTOR %0, %0
    %bv2:_(<2 x s64>) = G_BUILD_VECTOR %1, %1
    %mul:_(<2 x s64>) = G_MUL %bv1, %bv2
    $q0 = COPY %mul(<2 x s64>)
    RET_ReallyLR implicit $q0

...

will not scalarize the G_MUL. I think you can fix this by directly calling the LegalizerHelper::fewerElementsVector() helper with a scalar type.

Copy link
Collaborator

@davemgreen davemgreen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GlobalISel's custom legalization action doesn't work in the way that most would expect. When a custom legalization rule runs, the legalizer expects the result to be now fully legalized. Therefore you must completely handle every possible situation, since the legalizer will not re-add the instruction onto the worklist.

In this case, the legalization of <2 x s64> G_MUL for a specific pattern can fail and as a result illegal <2 x s64> G_MULs will escape. E.g the following:

---
name:            long_mul
legalized: true
liveins:
  - { reg: '$x0' }
  - { reg: '$x1' }
body:             |
  bb.1:
    liveins: $x0, $x1

    %0:_(s64) = COPY $x0
    %1:_(s64) = COPY $x1
    %bv1:_(<2 x s64>) = G_BUILD_VECTOR %0, %0
    %bv2:_(<2 x s64>) = G_BUILD_VECTOR %1, %1
    %mul:_(<2 x s64>) = G_MUL %bv1, %bv2
    $q0 = COPY %mul(<2 x s64>)
    RET_ReallyLR implicit $q0

...

will not scalarize the G_MUL. I think you can fix this by directly calling the LegalizerHelper::fewerElementsVector() helper with a scalar type.

That matches how this patch currently works, and I think the testcase works with it. There is a call to fewerElementsVector, it just gets complex with the way larger vectors are expanded and the order nodes are visited during legalization.

We are trying to convert mul(zext, zext) into UMULL. The legal types for this are v8i8->v8i16, v4i16->v4i32 and v2i32->v2i64. The last one is the awkward one as there is no v2i64 mul and so it needs to be scalarized. The other types are simpler and could honestly be done with tblgen patterns (as nodes like uaddl are). The scalarization of v2i64 mul means we end up needing to come up with something special for them, and we use the same method for all umull to be consistent.

On it's own I think those types would be easy enough to support. It would just match them during legalization and produce a UMULL in a custom method. The problem comes from v4i32->v4i64 type operations, and all the others larger type sizes. They need to be split up into parts of size v2i64. DAG would split types before operations, so it becomes simpler. GlobalISel is trying to do them both at the same time, so even if it splits the larger vectors it can be left with merge/unmerge pairs between the mul and the extends. (It seems to re-vist the mul repeatedly until legal, not move up to the operands that could be simplified).

So this patch adds a check to see if the operands will be simplified away and are extends. What I understand happens is that the operands are then visited and split/simplified, and the legalizer reiterates and legalizes the mul on the second iteration to a UMULL. It did have to lie about the mul being legal though.

I think there are several ways to try and get this to work, they each have some difficulties.

  • Do it post-legalization by letting the v2i64 mul scalarize and getting it to recombine later. I don't think this would work very well as it's often too difficult or impossible to reliably recombine the scalarized muls.
  • Do it during legalization like this patch does. It just needs to deal with the case where the larger types are being split as operation is being legalized, and still generate something that is optimal. From what I can tell this seems to work (minus my comment about multi-use extends).
  • Do it pre-legalization (or during legalization) but generate UMULL for all type sizes, then have separate legalization for UMULL to split it to legal vectors.
  • Pretend all v2i64 mul are legal and do the legalization of them in AArch64PostLegalizerLowering? The rest of GlobalISel then might believe that v2s64 mul are legal.
  • I think there may be another way in the long run, where we check the known bits of the operands and generate UMULL more aggressively, even generating UMULL(trunc, trunc) if needed. That might make the custom method more "complete", not relying on reiteration, providing the extra truncs would get removed.

I'm not sure which method would be best. Probably either generating the UMULL with larger types and having separate legalization or this method so long as we are not too unhappy about the lying and reiteration from the custom method. AArch64PostLegalizerLowering probably works, providing the normal v2s64 mul get cleaned up well enough, we are just re-inventing progressive type lowering and lying about v2s64 muls being legal more generally.

@aemerson
Copy link
Contributor

That matches how this patch currently works, and I think the testcase works with it. There is a call to fewerElementsVector, it just gets complex with the way larger vectors are expanded and the order nodes are visited during legalization.

Indeed, I see that now. Not sure how I managed to see non-scalarization behaviour, must have done something wrong on my end.

...

I think there are several ways to try and get this to work, they each have some difficulties.

* Do it post-legalization by letting the v2i64 mul scalarize and getting it to recombine later. I don't think this would work very well as it's often too difficult or impossible to reliably recombine the scalarized muls.

Yep, agreed on this.

* Do it during legalization like this patch does. It just needs to deal with the case where the larger types are being split as operation is being legalized, and still generate something that is optimal. From what I can tell this seems to work (minus my comment about multi-use extends).

I'm generally ok with this now that I'm no longer blind and see the fewerElementsVector being called.

* Do it pre-legalization (or during legalization) but generate UMULL for all type sizes, then have separate legalization for UMULL to split it to legal vectors.

Personally: prefer not to introduce target specific ops before legalization.

* Pretend all v2i64 mul are legal and do the legalization of them in AArch64PostLegalizerLowering? The rest of GlobalISel then might believe that v2s64 mul are legal.

This could actually work. Our definition of legal is that the instructions can be "selected", which is a somewhat fuzzier concept than just "-run-pass=instruction-select doesn't abort". An instruction doesn't have to have native instruction to be "legal", and our concept of "selection" basically means the mandatory post-legalization pipeline. So as long as the combination of PostLegalizationLowering + InstructionSelect can handle an operation, that to me makes the operation legal. I think this method feels cleaner architecturally to me than the others.

@davemgreen
Copy link
Collaborator

  • Pretend all v2i64 mul are legal and do the legalization of them in AArch64PostLegalizerLowering? The rest of GlobalISel then might believe that v2s64 mul are legal.

This could actually work. Our definition of legal is that the instructions can be "selected", which is a somewhat fuzzier concept than just "-run-pass=instruction-select doesn't abort". An instruction doesn't have to have native instruction to be "legal", and our concept of "selection" basically means the mandatory post-legalization pipeline. So as long as the combination of PostLegalizationLowering + InstructionSelect can handle an operation, that to me makes the operation legal. I think this method feels cleaner architecturally to me than the others.

Yeah it might work OK. We can give it a try. My worry would be that LegalityQuery->getAction({G_MUL, v2s64}).Action would return Legal, and other operations or optimizations in the generic parts of GISel create them where they would not end up being legal. I'm not sure if there are many combines that would want to generate a MUL though, if it didn't start from a MUL already.

@aemerson
Copy link
Contributor

As long as PostLegalizerLowering scalarizes it I think it's fine. That pass runs even at -O0 so a v2s64 MUL should never survive past that.

Copy link
Collaborator

@davemgreen davemgreen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I can tell this looks OK. The expanded v2i64 muls are a little less optimized in places as the legalization artefact combiners haven't kicked in. That is probably something that could be improved with some other targeted optimizations.
@aemerson any thoughts?

Copy link
Contributor

@aemerson aemerson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, for some reason this doesn't appear on my "Review Requested" page.

That's unfortunate about the artifacts, I hadn't considered that issue when using LegalizerHelper at the lowering phase. In fact in the general case not running the artifact combiner can be a fatal error since we rely on the combines to eliminate un-selectable operations. In this case it doesn't result in that issue however.

To fix that we can schedule artifact combining after lowering, either as a new pass or just append it to the end of lowering's runOnMachineFunction(). It's not ideal but we may need to do that eventually anyway for other reasons.

Don't need to fix that right now though, it can be follow up later.

@aemerson
Copy link
Contributor

You can merge this when you're ready.

@chuongg3
Copy link
Contributor Author

There is a problem with my stage 2 built right now, I will need to see if this patch is causing the failure.

@chuongg3 chuongg3 merged commit 45f51f9 into llvm:main Sep 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants