Skip to content

[AArch64][SME] Don't use glue for call lowering's SMSTART. #68630

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 2 additions & 3 deletions llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -7687,14 +7687,13 @@ AArch64TargetLowering::LowerCall(CallLoweringInfo &CLI,
if (!MemOpChains.empty())
Chain = DAG.getNode(ISD::TokenFactor, DL, MVT::Other, MemOpChains);

SDValue InGlue;
if (RequiresSMChange) {
SDValue NewChain = changeStreamingMode(DAG, DL, *RequiresSMChange, Chain,
InGlue, PStateSM, true);
SDValue(), PStateSM, true);
Chain = NewChain.getValue(0);
InGlue = NewChain.getValue(1);
}

SDValue InGlue;
// Build a sequence of copy-to-reg nodes chained together with token chain
// and flag operands which copy the outgoing args into the appropriate regs.
for (auto &RegToPass : RegsToPass) {
Expand Down
12 changes: 4 additions & 8 deletions llvm/test/CodeGen/AArch64/sme-streaming-interface.ll
Original file line number Diff line number Diff line change
Expand Up @@ -368,15 +368,11 @@ define i8 @call_to_non_streaming_pass_sve_objects(ptr nocapture noundef readnone
; CHECK-NEXT: stp d9, d8, [sp, #48] // 16-byte Folded Spill
; CHECK-NEXT: stp x29, x30, [sp, #64] // 16-byte Folded Spill
; CHECK-NEXT: addvl sp, sp, #-3
; CHECK-NEXT: rdsvl x8, #1
; CHECK-NEXT: addvl x9, sp, #2
; CHECK-NEXT: addvl x10, sp, #1
; CHECK-NEXT: mov x11, sp
; CHECK-NEXT: rdsvl x3, #1
; CHECK-NEXT: smstop sm
; CHECK-NEXT: mov x0, x9
; CHECK-NEXT: mov x1, x10
; CHECK-NEXT: mov x2, x11
; CHECK-NEXT: mov x3, x8
; CHECK-NEXT: addvl x0, sp, #2
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is semantically different because addvl has a different meaning after the smstop then it did before.
Before the smstop it would have calculated the address of %Data1, %Data2 and %Data3 using the streaming VL (SVL), whereas now it uses the regular (non-streaming) VL. If SVL != VL, then this is a different address.

That's also the reason for having this Glue, so that these are not scheduled after the smstop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. Do you have any suggestions for how to fix this GPR move issue? It causes unnecessary size bloat.

; CHECK-NEXT: addvl x1, sp, #1
; CHECK-NEXT: mov x2, sp
; CHECK-NEXT: bl foo
; CHECK-NEXT: smstart sm
; CHECK-NEXT: ptrue p0.b
Expand Down