Skip to content

[StackFrameLayoutAnalysis] Add basic Scalable stack slot output #99883

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 22, 2024

Conversation

davemgreen
Copy link
Collaborator

The existing StackFrameLayoutAnalysis details do not do well with Scalable vector stack slots, which are not marked as scalable and intertwined with the other fixed-size slots. This patch adds some very basic support, marking them as scalable and sorting them to the end of the list. The slot addresses are not really correct (for fixed as well as scalable), but this prints something a little better with the limited information curently available.

The existing StackFrameLayoutAnalysis details do not do well with Scalable
vector stack slots, which are not marked as scalable and intertwined with the
other fixed-size slots. This patch adds some very basic support, marking them
as scalable and sorting them to the end of the list. The slot addresses are not
really correct (for fixed as well as scalable), but this prints something a
little better with the limited information curently available.
@llvmbot
Copy link
Member

llvmbot commented Jul 22, 2024

@llvm/pr-subscribers-backend-aarch64

Author: David Green (davemgreen)

Changes

The existing StackFrameLayoutAnalysis details do not do well with Scalable vector stack slots, which are not marked as scalable and intertwined with the other fixed-size slots. This patch adds some very basic support, marking them as scalable and sorting them to the end of the list. The slot addresses are not really correct (for fixed as well as scalable), but this prints something a little better with the limited information curently available.


Full diff: https://github.com/llvm/llvm-project/pull/99883.diff

2 Files Affected:

  • (modified) llvm/lib/CodeGen/StackFrameLayoutAnalysisPass.cpp (+11-5)
  • (modified) llvm/test/CodeGen/AArch64/sve-stack-frame-layout.ll (+4-4)
diff --git a/llvm/lib/CodeGen/StackFrameLayoutAnalysisPass.cpp b/llvm/lib/CodeGen/StackFrameLayoutAnalysisPass.cpp
index 5d3903ed84ce8..940aecd1cb363 100644
--- a/llvm/lib/CodeGen/StackFrameLayoutAnalysisPass.cpp
+++ b/llvm/lib/CodeGen/StackFrameLayoutAnalysisPass.cpp
@@ -62,11 +62,14 @@ struct StackFrameLayoutAnalysisPass : public MachineFunctionPass {
     int Align;
     int Offset;
     SlotType SlotTy;
+    bool Scalable;
 
     SlotData(const MachineFrameInfo &MFI, const int ValOffset, const int Idx)
         : Slot(Idx), Size(MFI.getObjectSize(Idx)),
           Align(MFI.getObjectAlign(Idx).value()),
-          Offset(MFI.getObjectOffset(Idx) - ValOffset), SlotTy(Invalid) {
+          Offset(MFI.getObjectOffset(Idx) - ValOffset), SlotTy(Invalid),
+          Scalable(false) {
+      Scalable = MFI.getStackID(Idx) == TargetStackID::ScalableVector;
       if (MFI.isSpillSlotObjectIndex(Idx))
         SlotTy = SlotType::Spill;
       else if (Idx == MFI.getStackProtectorIndex())
@@ -75,9 +78,12 @@ struct StackFrameLayoutAnalysisPass : public MachineFunctionPass {
         SlotTy = SlotType::Variable;
     }
 
-    // we use this to sort in reverse order, so that the layout is displayed
-    // correctly
-    bool operator<(const SlotData &Rhs) const { return Offset > Rhs.Offset; }
+    // We use this to sort in reverse order, so that the layout is displayed
+    // correctly. Scalable slots are sorted to the end of the list.
+    bool operator<(const SlotData &Rhs) const {
+      return std::make_tuple(!Scalable, Offset) >
+             std::make_tuple(!Rhs.Scalable, Rhs.Offset);
+    }
   };
 
   StackFrameLayoutAnalysisPass() : MachineFunctionPass(ID) {}
@@ -153,7 +159,7 @@ struct StackFrameLayoutAnalysisPass : public MachineFunctionPass {
     Rem << Prefix << ore::NV("Offset", D.Offset)
         << "], Type: " << ore::NV("Type", getTypeString(D.SlotTy))
         << ", Align: " << ore::NV("Align", D.Align)
-        << ", Size: " << ore::NV("Size", D.Size);
+        << ", Size: " << ore::NV("Size", ElementCount::get(D.Size, D.Scalable));
   }
 
   void emitSourceLocRemark(const MachineFunction &MF, const DILocalVariable *N,
diff --git a/llvm/test/CodeGen/AArch64/sve-stack-frame-layout.ll b/llvm/test/CodeGen/AArch64/sve-stack-frame-layout.ll
index 67c1486b7a8c0..34d85d1f76086 100644
--- a/llvm/test/CodeGen/AArch64/sve-stack-frame-layout.ll
+++ b/llvm/test/CodeGen/AArch64/sve-stack-frame-layout.ll
@@ -4,10 +4,10 @@
 
 ; CHECK-FRAMELAYOUT-LABEL: Function: csr_d8_allocnxv4i32i32f64
 ; CHECK-FRAMELAYOUT-NEXT: Offset: [SP-8], Type: Spill, Align: 8, Size: 8
-; CHECK-FRAMELAYOUT-NEXT: Offset: [SP-16], Type: Variable, Align: 16, Size: 16
 ; CHECK-FRAMELAYOUT-NEXT: Offset: [SP-16], Type: Spill, Align: 8, Size: 8
 ; CHECK-FRAMELAYOUT-NEXT: Offset: [SP-20], Type: Variable, Align: 4, Size: 4
 ; CHECK-FRAMELAYOUT-NEXT: Offset: [SP-32], Type: Variable, Align: 8, Size: 8
+; CHECK-FRAMELAYOUT-NEXT: Offset: [SP-16], Type: Variable, Align: 16, Size: vscale x 16
 
 define i32 @csr_d8_allocnxv4i32i32f64(double %d) "aarch64_pstate_sm_compatible" {
 ; CHECK-LABEL: csr_d8_allocnxv4i32i32f64:
@@ -46,11 +46,11 @@ entry:
 
 ; CHECK-FRAMELAYOUT-LABEL: Function: csr_d8_allocnxv4i32i32f64_fp
 ; CHECK-FRAMELAYOUT-NEXT: Offset: [SP-8], Type: Spill, Align: 8, Size: 8
-; CHECK-FRAMELAYOUT-NEXT: Offset: [SP-16], Type: Variable, Align: 16, Size: 16
 ; CHECK-FRAMELAYOUT-NEXT: Offset: [SP-16], Type: Spill, Align: 8, Size: 8
 ; CHECK-FRAMELAYOUT-NEXT: Offset: [SP-20], Type: Variable, Align: 4, Size: 4
 ; CHECK-FRAMELAYOUT-NEXT: Offset: [SP-32], Type: Spill, Align: 16, Size: 8
 ; CHECK-FRAMELAYOUT-NEXT: Offset: [SP-40], Type: Variable, Align: 8, Size: 8
+; CHECK-FRAMELAYOUT-NEXT: Offset: [SP-16], Type: Variable, Align: 16, Size: vscale x 16
 
 define i32 @csr_d8_allocnxv4i32i32f64_fp(double %d) "aarch64_pstate_sm_compatible" "frame-pointer"="all" {
 ; CHECK-LABEL: csr_d8_allocnxv4i32i32f64_fp:
@@ -92,11 +92,11 @@ entry:
 
 ; CHECK-FRAMELAYOUT-LABEL: Function: svecc_z8_allocnxv4i32i32f64_fp
 ; CHECK-FRAMELAYOUT-NEXT: Offset: [SP-8], Type: Spill, Align: 8, Size: 8
-; CHECK-FRAMELAYOUT-NEXT: Offset: [SP-16], Type: Spill, Align: 16, Size: 16
 ; CHECK-FRAMELAYOUT-NEXT: Offset: [SP-16], Type: Spill, Align: 8, Size: 8
 ; CHECK-FRAMELAYOUT-NEXT: Offset: [SP-20], Type: Variable, Align: 4, Size: 4
-; CHECK-FRAMELAYOUT-NEXT: Offset: [SP-32], Type: Variable, Align: 16, Size: 16
 ; CHECK-FRAMELAYOUT-NEXT: Offset: [SP-32], Type: Variable, Align: 8, Size: 8
+; CHECK-FRAMELAYOUT-NEXT: Offset: [SP-16], Type: Spill, Align: 16, Size: vscale x 16
+; CHECK-FRAMELAYOUT-NEXT: Offset: [SP-32], Type: Variable, Align: 16, Size: vscale x 16
 
 define i32 @svecc_z8_allocnxv4i32i32f64_fp(double %d, <vscale x 4 x i32> %v) "aarch64_pstate_sm_compatible" "frame-pointer"="all" {
 ; CHECK-LABEL: svecc_z8_allocnxv4i32i32f64_fp:

Copy link
Contributor

@hazzlim hazzlim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

Copy link
Contributor

@ilovepi ilovepi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for catching this, I hadn’t realized we were handling scalable vector differently. I’ll see if I can find a way to sort out the offsets being incorrect.

@davemgreen
Copy link
Collaborator Author

I’ll see if I can find a way to sort out the offsets being incorrect.

Thanks folks. I'd be interested if there was a good way to handle this. Unfortunately we might need to invent something. The AArch64 stack frame can have scalable and fixed-width objects combined in its layout, where some parts are fp-based and others are sp. I don't know if there is a way at the moment to get the real offset and was wondering what the best way to add it might be.

@ilovepi
Copy link
Contributor

ilovepi commented Jul 22, 2024

I’ll see if I can find a way to sort out the offsets being incorrect.

Thanks folks. I'd be interested if there was a good way to handle this. Unfortunately we might need to invent something. The AArch64 stack frame can have scalable and fixed-width objects combined in its layout, where some parts are fp-based and others are sp. I don't know if there is a way at the moment to get the real offset and was wondering what the best way to add it might be.

Yeah, I think this will probably take some surgery to work out. I know the DWARF for RISC-V vector spills is pretty complicated, too, so I doubt it will be simple to fix.

I’m probably going to have to rethink a few assumptions this pass made. I wonder if BOLT has some tricks for this kind of thing? Anyway, lots to think about.

@davemgreen
Copy link
Collaborator Author

The test I pre-committed for this earlier is acting differently on different bots due to the sort order of the objects with identical offsets not being deterministic. Hopefully this should fix it as the objects will have different orders now.

@davemgreen davemgreen merged commit e09032f into llvm:main Jul 22, 2024
9 checks passed
@davemgreen davemgreen deleted the gh-stackframelayout-scalable branch July 22, 2024 19:45
yuxuanchen1997 pushed a commit that referenced this pull request Jul 25, 2024
Summary:
The existing StackFrameLayoutAnalysis details do not do well with
Scalable vector stack slots, which are not marked as scalable and
intertwined with the other fixed-size slots. This patch adds some very
basic support, marking them as scalable and sorting them to the end of
the list. The slot addresses are not really correct (for fixed as well
as scalable), but this prints something a little better with the limited
information curently available.

Test Plan: 

Reviewers: 

Subscribers: 

Tasks: 

Tags: 


Differential Revision: https://phabricator.intern.facebook.com/D60251075
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants