Skip to content

Adding support in llvm-exegesis for Aarch64 for handling FPR64/128, PPR16 and ZPR128 reg class. #127564

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 22 commits into from
Mar 6, 2025

Conversation

lakshayk-nv
Copy link
Contributor

@lakshayk-nv lakshayk-nv commented Feb 18, 2025

Current implementation (for Aarch64) in llvm-exegesis only supports GRP32 and GPR64 bit register class, thus for opcodes variants which used FPR64/128, PPR16 and ZPR128, llvm-exegesis throws warning "setReg is not implemented". This code will handle the above register class and initialize the registers using appropriate base instruction class.

Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot
Copy link
Member

llvmbot commented Feb 18, 2025

@llvm/pr-subscribers-tools-llvm-exegesis

Author: None (lakshayk-nv)

Changes

Current implementation (for Aarch64) in llvm-exegesis only supports GRP32 and GPR64 bit register class, thus for opcodes variants which used FPR64/128, PPR16 and ZPR128, llvm-exegesis throws error "setReg is not implemented". This code will handle the above register class and initialize the registers using appropriate base instruction class.


Full diff: https://github.com/llvm/llvm-project/pull/127564.diff

1 Files Affected:

  • (modified) llvm/tools/llvm-exegesis/lib/AArch64/Target.cpp (+55-1)
diff --git a/llvm/tools/llvm-exegesis/lib/AArch64/Target.cpp b/llvm/tools/llvm-exegesis/lib/AArch64/Target.cpp
index 5a7cc6f5e30d3..dc312f4916703 100644
--- a/llvm/tools/llvm-exegesis/lib/AArch64/Target.cpp
+++ b/llvm/tools/llvm-exegesis/lib/AArch64/Target.cpp
@@ -35,6 +35,48 @@ static MCInst loadImmediate(MCRegister Reg, unsigned RegBitWidth,
       .addImm(Value.getZExtValue());
 }
 
+static MCInst loadZPRImmediate(MCRegister Reg, unsigned RegBitWidth,
+                               const APInt &Value) {
+  if (Value.getBitWidth() > RegBitWidth)
+    llvm_unreachable("Value must fit in the ZPR Register");
+  // For ZPR, we typically use DUPM instruction to load immediate values
+  return MCInstBuilder(AArch64::DUPM_ZI)
+      .addReg(Reg)
+      .addImm(Value.getZExtValue());
+}
+
+static MCInst loadPPRImmediate(MCRegister Reg, unsigned RegBitWidth,
+                               const APInt &Value) {
+  if (Value.getBitWidth() > RegBitWidth)
+    llvm_unreachable("Value must fit in the PPR Register");
+  // For PPR, we typically use PTRUE instruction to set predicate registers
+  return MCInstBuilder(AArch64::PTRUE_B)
+      .addReg(Reg)
+      .addImm(31); // All lanes true
+}
+
+// Generates instruction to load an FP immediate value into a register.
+static unsigned getLoadFPImmediateOpcode(unsigned RegBitWidth) {
+  switch (RegBitWidth) {
+  case 64:
+    return AArch64::FMOVDi; 
+  case 128:
+    return AArch64::MOVIv2d_ns;
+  }
+  llvm_unreachable("Invalid Value Width");
+}
+
+
+// Generates instruction to load an FP immediate value into a register.
+static MCInst loadFPImmediate(MCRegister Reg, unsigned RegBitWidth,
+                            const APInt &Value) {
+  if (Value.getBitWidth() > RegBitWidth)
+    llvm_unreachable("Value must fit in the FP Register");
+  return MCInstBuilder(getLoadFPImmediateOpcode(RegBitWidth))
+      .addReg(Reg)
+      .addImm(Value.getZExtValue());
+}
+
 #include "AArch64GenExegesis.inc"
 
 namespace {
@@ -51,6 +93,18 @@ class ExegesisAArch64Target : public ExegesisTarget {
       return {loadImmediate(Reg, 32, Value)};
     if (AArch64::GPR64RegClass.contains(Reg))
       return {loadImmediate(Reg, 64, Value)};
+
+    if (AArch64::PPRRegClass.contains(Reg))
+      return {loadPPRImmediate(Reg, 16, Value)}; 
+
+    if (AArch64::FPR64RegClass.contains(Reg)) 
+      return {loadFPImmediate(Reg, 64, Value)};
+    if (AArch64::FPR128RegClass.contains(Reg)) 
+      return {loadFPImmediate(Reg, 128, Value)};
+
+    if (AArch64::ZPRRegClass.contains(Reg)) 
+      return {loadZPRImmediate(Reg, 128, Value)};
+    
     errs() << "setRegTo is not implemented, results will be unreliable\n";
     return {};
   }
@@ -77,4 +131,4 @@ void InitializeAArch64ExegesisTarget() {
 }
 
 } // namespace exegesis
-} // namespace llvm
+} // namespace llvm
\ No newline at end of file

@boomanaiden154 boomanaiden154 self-requested a review February 18, 2025 06:24
Copy link
Collaborator

@davemgreen davemgreen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thus for opcodes variants which used FPR64/128, PPR16 and ZPR128, llvm-exegesis throws error "setReg is not implemented"

I believe it is a warning - it still produces results when I've tried it in the past.

Can you add a test, and maybe fix the existing one for ADDXrr?

Copy link

github-actions bot commented Feb 18, 2025

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
git-clang-format --diff d7784a649e80d346ba57ccc39f5e2a3e2f7e0a51 c21ee8bfa51ff7a51060c1048258bf9af086cf51 --extensions cpp -- llvm/tools/llvm-exegesis/lib/AArch64/Target.cpp
View the diff from clang-format here.
diff --git a/llvm/tools/llvm-exegesis/lib/AArch64/Target.cpp b/llvm/tools/llvm-exegesis/lib/AArch64/Target.cpp
index 781ac37bfc..d06059186c 100644
--- a/llvm/tools/llvm-exegesis/lib/AArch64/Target.cpp
+++ b/llvm/tools/llvm-exegesis/lib/AArch64/Target.cpp
@@ -57,7 +57,7 @@ static MCInst loadPPRImmediate(MCRegister Reg, unsigned RegBitWidth,
 static unsigned getLoadFPImmediateOpcode(unsigned RegBitWidth) {
   switch (RegBitWidth) {
   case 64:
-    return AArch64::MOVID; //FMOVDi;
+    return AArch64::MOVID; // FMOVDi;
   case 128:
     return AArch64::MOVIv2d_ns;
   }
@@ -67,8 +67,7 @@ static unsigned getLoadFPImmediateOpcode(unsigned RegBitWidth) {
 // Generates instruction to load an FP immediate value into a register.
 static MCInst loadFPImmediate(MCRegister Reg, unsigned RegBitWidth,
                               const APInt &Value) {
-  assert(Value.getZExtValue() == 0 &&
-         "Expected initialisation value 0");
+  assert(Value.getZExtValue() == 0 && "Expected initialisation value 0");
   return MCInstBuilder(getLoadFPImmediateOpcode(RegBitWidth))
       .addReg(Reg)
       .addImm(Value.getZExtValue());

Copy link
Contributor

@boomanaiden154 boomanaiden154 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly style nits. We want to use assertions to check a value in most cases rather than the if + llvm_unreachable pattern used here as the former is more readable and canonical throughout the rest of LLVM.

@boomanaiden154
Copy link
Contributor

I believe it is a warning - it still produces results when I've tried it in the past.

It's a warning, but definitely one that should be fixed. Leaving whatever values were "live-in" to the registers upon starting the benchmark can cause a lot of problems depending upon the instructions being benchmarked. Having consistent values can be very important for benchmark reproducibility. Not familiar enough with value dependent variable latency instructions on ARM to say whether or not it would matter here, but good to fix it regardless.

@sjoerdmeijer
Copy link
Collaborator

About the warning, yes, it will produce a result, but:

  • the warning is annoying,
  • and for some instructions it might actually influence results.

Can you add a test, and maybe fix the existing one for ADDXrr?

Yep, we definitely need some tests.

Copy link
Contributor

@boomanaiden154 boomanaiden154 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM after addressing my last comment.

const APInt &Value) {
assert(Value.getBitWidth() <= RegBitWidth && "Value must fit in the PPR Register");
// For ZPR, we typically use DUPM instruction to load immediate values
return MCInstBuilder(AArch64::DUPM_ZI)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this just use DUP and not DUPM? It is simpler to get the immediate correct. Both have a limit to the value they can represent. Can we add an assert that it will be encoded validly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ZPR Reg Class seems to require DUPM.
DUP has only variants for 8 bits to 64 bits (DUPi8, DIPi16, DUPi32, DUPi64) which can't set a vector
For opcode FADDV_VPZ_D when tried to use base instruction DUPi64, It tries $z6 = DUPi64 0 which throws error $z6 is not a FPR64 register. .

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this use a DUP_ZI and check the value is -128 to 127? The instruction might then need an extra .addImm(0) to represent the shift.

It think dupm is causing it to be a different value, which it why it is coming through as dupm z{{[0-9]+}}.s, #0x1.

@sjoerdmeijer
Copy link
Collaborator

Here's an idea for the tests, for example:

$ bin/llvm-exegesis -mcpu=neoverse-v2 --mode=latency --opcode-name=ADDVv16i8v --dump-object-to-disk=t.o
$ bin/llvm-objdump -d t.o | head

t.o:    file format elf64-littleaarch64

Disassembly of section .text:

0000000000000000 <foo>:
       0: 6f00e414      movi    v20.2d, #0000000000000000
       4: 4e31ba94      addv    b20, v20.16b
       8: 4e31ba94      addv    b20, v20.16b
       c: 4e31ba94      addv    b20, v20.16b

In other words, can we dump the object file, disassemble and then just match the assembly?
This allows us to CHECK: the MOVI instruction, which is a lot clearer than matching a hexidecimal string.

What do we think?

I

@@ -2,22 +2,30 @@

# ppr register class initialization testcase 
# ideally we should use PTRUE_{B/H?S/D} instead of FADDV_VPZ_D for isolated testcase; but exegesis does not support PTRUE_{B/H?S/D} yet;
# RUN: llvm-exegesis -mcpu=neoverse-v2 -mode=latency -opcode-name=FADDV_VPZ_D 2>&1 | FileCheck %s --check-prefix=PPR
# RUN: llvm-exegesis -mcpu=neoverse-v2 -mode=latency --dump-object-to-disk=FADDV_VPZ_D.o --opcode-name=FADDV_VPZ_D 2>&1 | FileCheck %s --check-prefix=PPR
# RUN: llvm-objdump -d FADDV_VPZ_D.o | FileCheck %s --check-prefix=PPR_ASM
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than a hardcoded file name here and above, you can use %t, so that will look like something like this:

   --dump-object-to-disk=%t 
   llvm-objdump -d %t 

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, Thanks !

# PPR-NOT: setRegTo is not implemented, results will be unreliable
# PPR: assembled_snippet: {{.*}}C0035FD6
# PPR_ASM: {{0|4}}: {{.*}} ptrue p{{[0-9]|1[0-5]}}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to match the file offset, you can just match the ptrue and probably simplify the regexp, something like this:

     ptrue p{{[0-9]+}}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, Thanks!

# ZPR-NOT: setRegTo is not implemented, results will be unreliable
# ZPR: assembled_snippet: {{.*}}C0035FD6
# ZPR_ASM: {{4|8}}: {{.*}} dupm z{{[0-9]|[1-2][0-9]|3[0-1]}}.s, #0x1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here and below, can you simplify the regexps?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

# FPR64-NOT: setRegTo is not implemented, results will be unreliable
# FPR64: assembled_snippet: {{.*}}C0035FD6
# FPR64-ASM: {{0|4}}: {{.*}} fmov d{{[0-9]|[1-2][0-9]|3[0-1]}}, #2.0{{.*}}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need the {{.*}} at the end.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@lakshayk-nv
Copy link
Contributor Author

lakshayk-nv commented Feb 27, 2025

Here's an idea for the tests, for example:

$ bin/llvm-exegesis -mcpu=neoverse-v2 --mode=latency --opcode-name=ADDVv16i8v --dump-object-to-disk=t.o
$ bin/llvm-objdump -d t.o | head

t.o:    file format elf64-littleaarch64

Disassembly of section .text:

0000000000000000 <foo>:
       0: 6f00e414      movi    v20.2d, #0000000000000000
       4: 4e31ba94      addv    b20, v20.16b
       8: 4e31ba94      addv    b20, v20.16b
       c: 4e31ba94      addv    b20, v20.16b

In other words, can we dump the object file, disassemble and then just match the assembly? This allows us to CHECK: the MOVI instruction, which is a lot clearer than matching a hexidecimal string.

What do we think?

I

Seems like good idea.
Added checks for disassembly, It checks for correct setting of register based on generated assembly.

# FPR128-ASM: {{0|4}}: {{.*}} movi v{{[0-9]|[1-2][0-9]|3[0-1]}}.2d, {{#0x0|#0000000000000000}}

So, now test case checks for setReg warning, return in assembly snippet and setting of register in assembly; for all four register classes.
Thanks.

…n permissible range of base instruction. Style nits changes to Testcases.
Copy link
Collaborator

@davemgreen davemgreen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updated. This looks OK to me baring some nits about the exact range of instructions.

Comment on lines 32 to 33
assert(Value.getZExtValue() < (1 << 16) &&
"Value must be in the range of the immediate opcode");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MOVi32imm and MOVi64imm are pseudo's that should be able to handle any immedated AFAIU. They should be expanded later into a series of MOVs if needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, MOVi32imm and MOVi64imm are pseudo-instructions that decompose into the MOV instruction, such as mov w21, #0x0, as seen in disassembly using GPR32/64 for opcodes like LSRVWr.
The assertion primarily ensures codebase consistency. In practice, setRegTo is almost always used to set a register to zero.
Since the MOV instruction accepts a 16-bit immediate, the upper bound check is set to 2^16 accordingly. [Referred ISA doc here].

PS: I tried with larger value setting using the pseudo, it works for larger values than 2^16. In my opinion, we can skip the assert. What do you think ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah removing this assert sounds OK to me.

const APInt &Value) {
assert(Value.getBitWidth() <= RegBitWidth && "Value must fit in the PPR Register");
// For ZPR, we typically use DUPM instruction to load immediate values
return MCInstBuilder(AArch64::DUPM_ZI)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this use a DUP_ZI and check the value is -128 to 127? The instruction might then need an extra .addImm(0) to represent the shift.

It think dupm is causing it to be a different value, which it why it is coming through as dupm z{{[0-9]+}}.s, #0x1.

static MCInst loadFPImmediate(MCRegister Reg, unsigned RegBitWidth,
const APInt &Value) {
// -31 <= Value.getZExtValue() <= 31
assert(Value.getZExtValue() <= 31 &&
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for this and loadFP128Immediate (for FMOVDi and MOVIv2d_ns) the immediate is encoded. For example MOVIv2d_ns is a vector of all ones for every element in the vector.

Is it possible to just check that the Value == 0 for the time being, and we can expand it in the future if we need to? And use MOVID instead of FMOVDi to make sure the value is 0.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this use a DUP_ZI and check the value is -128 to 127?

Updated the loadZPRImmediate() to use DUP_ZI_D base instruction and range assert and corresponding disassembly check in test file.

just check that the Value == 0

Added assert for Value == 0 and used MOVID base instruction for FPR64, thus check is in disassembly is for 0 only.

Thanks!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assert is not checking what the comment is suggesting. You could check the >= -31 condition too, but I am happy to keep it simple and just check the upperbound. As I wrote below too, the comment is then not adding anything to what is already in the code, so just get rid of the comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Value.getZExtValue() returns an unsigned integer since Value is of type APInt. Given this, the >= -31 check is not applicable. Removed the unnecessary comment, as suggested.

Thanks!

Copy link
Contributor Author

@lakshayk-nv lakshayk-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated base instructions for FPR64 and ZPR with corrected disassembly check in testfile

const APInt &Value) {
assert(Value.getBitWidth() <= RegBitWidth && "Value must fit in the PPR Register");
// For ZPR, we typically use DUPM instruction to load immediate values
return MCInstBuilder(AArch64::DUPM_ZI)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ZPR Reg Class seems to require DUPM.
DUP has only variants for 8 bits to 64 bits (DUPi8, DIPi16, DUPi32, DUPi64) which can't set a vector
For opcode FADDV_VPZ_D when tried to use base instruction DUPi64, It tries $z6 = DUPi64 0 which throws error $z6 is not a FPR64 register. .

@@ -0,0 +1,45 @@
; REQUIRES: aarch64-registered-target

# PPR REGISTER CLASS INITIALIZATION TESTCASE 
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are working on your suggested changes.
We misclicked the re-request option (Mistaken it for reload). Apologies

@@ -0,0 +1,45 @@
; REQUIRES: aarch64-registered-target

# PPR REGISTER CLASS INITIALIZATION TESTCASE 
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Reverted capitalized (all char) comments.
  • Using ## for comments in testcases. Updated testcases.
    Thanks!

Comment on lines 32 to 33
assert(Value.getZExtValue() < (1 << 16) &&
"Value must be in the range of the immediate opcode");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, MOVi32imm and MOVi64imm are pseudo-instructions that decompose into the MOV instruction, such as mov w21, #0x0, as seen in disassembly using GPR32/64 for opcodes like LSRVWr.
The assertion primarily ensures codebase consistency. In practice, setRegTo is almost always used to set a register to zero.
Since the MOV instruction accepts a 16-bit immediate, the upper bound check is set to 2^16 accordingly. [Referred ISA doc here].

PS: I tried with larger value setting using the pseudo, it works for larger values than 2^16. In my opinion, we can skip the assert. What do you think ?

PPR_ASM: {{<foo>:}}
PPR_ASM: ptrue p{{[0-9]+}}.b
PPR_ASM-NEXT: dupm z{{[0-9]+}}.s, #0x1
PPR_ASM-NEXT: faddv d{{[0-9]+}}, p{{[0-9]+}}, z{{[0-9]+}}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a nitpick, but I think we are testing the same things with:

setRegTo is not implemented, results will be unreliable

and

PPR_ASM-NEXT: dupm z{{[0-9]+}}.s, #0x1

So I think we can just test one thing, and the best thing to check is the assembly.
I also don't know what this means or is testing:

PPR: assembled_snippet: {{.*}}C0035FD6

Here's my suggestion for this test just to clean this up, and with a bit of extra formatting and spacing to make it slightly more easy to read (in my opinion):

    ## PPR Register Class Initialization Testcase
    ## Ideally, we should use PTRUE_{B/H/S/D} instead of FADDV_VPZ_D for an isolated test case; however,             
    Exegesis does not yet support PTRUE_{B/H/S/D}.

    RUN: llvm-exegesis -mcpu=neoverse-v2 -mode=latency --dump-object-to-disk=%d --opcode- 
    name=FADDV_VPZ_D 2>&1 | FileCheck %s --check-prefix=PPR
    RUN: llvm-objdump -d %d > %t.s
    RUN: FileCheck %s --check-prefix=PPR_ASM < %t.s

    PPR_ASM:       {{<foo>:}}
    PPR_ASM:       ptrue p{{[0-9]+}}.b
    PPR_ASM-NEXT:  dupm z{{[0-9]+}}.s, #0x1
    PPR_ASM-NEXT:  faddv d{{[0-9]+}}, p{{[0-9]+}}, z{{[0-9]+}}

Can you give the other tests below a similar treatment?

Copy link
Contributor Author

@lakshayk-nv lakshayk-nv Mar 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you give the other tests below a similar treatment?

Cleaned up format in testfile. Only checking required assembly checks.

PPR: assembled_snippet: {{.*}}C0035FD6

No needed, Removed.
Just for reference, this checks for assembly snippet to end with return statement (C0035FD6 is return instruction for aarch64) i.e "Check that we add ret instr to snippet" as one of the exegesis existing test case used, took inspiration from there.

setRegTo is not implemented, results will be unreliable

and

PPR_ASM-NEXT: dupm z{{[0-9]+}}.s, #0x1

They check the same things using different commands. Thus removed.

Thanks!

static MCInst loadFPImmediate(MCRegister Reg, unsigned RegBitWidth,
const APInt &Value) {
// -31 <= Value.getZExtValue() <= 31
assert(Value.getZExtValue() <= 31 &&
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assert is not checking what the comment is suggesting. You could check the >= -31 condition too, but I am happy to keep it simple and just check the upperbound. As I wrote below too, the comment is then not adding anything to what is already in the code, so just get rid of the comment.

Copy link
Collaborator

@sjoerdmeijer sjoerdmeijer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks reasonable as a first version and I know you'll continue to work on this, so LGTM.

Please merge this tomorrow morning if there are no objections.

Copy link
Collaborator

@davemgreen davemgreen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating the instructions, I think they look OK now.

I think it is worth removing the assert from loadImmediate, or setting it back to how it was, but otherwise this looks OK to me.

Comment on lines 32 to 33
assert(Value.getZExtValue() < (1 << 16) &&
"Value must be in the range of the immediate opcode");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah removing this assert sounds OK to me.

Copy link
Collaborator

@davemgreen davemgreen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. LGTM

@sjoerdmeijer sjoerdmeijer merged commit d61d219 into llvm:main Mar 6, 2025
5 of 7 checks passed
Copy link

github-actions bot commented Mar 6, 2025

@lakshayk-nv Congratulations on having your first Pull Request (PR) merged into the LLVM Project!

Your changes will be combined with recent changes from other authors, then tested by our build bots. If there is a problem with a build, you may receive a report in an email or a comment on this PR.

Please check whether problems have been caused by your change specifically, as the builds can include changes from many authors. It is not uncommon for your change to be included in a build that fails due to someone else's changes, or infrastructure issues.

How to do this, and the rest of the post-merge process, is covered in detail here.

If your change does cause a problem, it may be reverted, or you can revert it yourself. This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again.

If you don't get any reports, no action is required from you. Your changes are working as expected, well done!

@llvm-ci
Copy link
Collaborator

llvm-ci commented Mar 6, 2025

LLVM Buildbot has detected a new failure on builder clang-aarch64-quick running on linaro-clang-aarch64-quick while building llvm at step 5 "ninja check 1".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/65/builds/13279

Here is the relevant piece of the build log for the reference
Step 5 (ninja check 1) failure: stage 1 checked (failure)
******************** TEST 'LLVM :: tools/llvm-exegesis/AArch64/setReg_init_check.s' FAILED ********************
Exit Code: 1

Command Output (stdout):
--
llvm-exegesis error: benchmarking unavailable, LLVM was built without libpfm. You can pass --benchmark-phase=... to skip the actual benchmarking or --use-dummy-perf-counters to not query the kernel for real event counts.

--
Command Output (stderr):
--
RUN: at line 6: /home/tcwg-buildbot/worker/clang-aarch64-quick/stage1/bin/llvm-exegesis -mcpu=neoverse-v2 -mode=latency --dump-object-to-disk=%d --opcode-name=FADDV_VPZ_D 2>&1
+ /home/tcwg-buildbot/worker/clang-aarch64-quick/stage1/bin/llvm-exegesis -mcpu=neoverse-v2 -mode=latency --dump-object-to-disk=%d --opcode-name=FADDV_VPZ_D

--

********************


@sjoerdmeijer
Copy link
Collaborator

sjoerdmeijer commented Mar 6, 2025

I think that e.g. this run command tries to execute and run llvm-exegesis:

llvm-exegesis -mcpu=neoverse-v2 -mode=latency --dump-object-to-disk=%d --opcode-name=FADDV_VPZ_D 

But we don't want or need to run it, I think, we just want to prepare the snippet. Can you check if we achieve that with:

 --benchmark-phase=assemble-measured-code

or some other value to this option.

@sjoerdmeijer
Copy link
Collaborator

I think we need to add:

--benchmark-phase=assemble-measured-code

Please check whether this works locally, then you can quickly follow up to get the build green again.

lakshayk-nv added a commit to lakshayk-nv/llvm-project that referenced this pull request Mar 6, 2025
We don't need to run llvm-exegesis for the regression tests, just dump
the object code, so this adds option --benchmark-phase=assemble-measured-code
to achieve that.
sjoerdmeijer added a commit to sjoerdmeijer/llvm-project that referenced this pull request Mar 6, 2025
We don't need to run llvm-exegesis for the regression tests, just dump
the object code, so this adds option --benchmark-phase=assemble-measured-code
to achieve that.
sjoerdmeijer added a commit that referenced this pull request Mar 6, 2025
We don't need to run llvm-exegesis for the regression tests, just dump
the object code, so this adds option
--benchmark-phase=assemble-measured-code to achieve that.
lakshayk-nv added a commit to lakshayk-nv/llvm-project that referenced this pull request Mar 6, 2025
lakshayk-nv added a commit to lakshayk-nv/llvm-project that referenced this pull request Mar 6, 2025
We don't need to run llvm-exegesis for the regression tests, just dump
the object code, so this adds option --benchmark-phase=assemble-measured-code
to achieve that.
jph-13 pushed a commit to jph-13/llvm-project that referenced this pull request Mar 21, 2025
…PR16 and ZPR128 reg class. (llvm#127564)

Current implementation (for Aarch64) in llvm-exegesis only supports
GRP32 and GPR64 bit register class, thus for opcodes variants which used
FPR64/128, PPR16 and ZPR128, llvm-exegesis throws warning "setReg is not
implemented". This code will handle the above register class and
initialize the registers using appropriate base instruction class.
jph-13 pushed a commit to jph-13/llvm-project that referenced this pull request Mar 21, 2025
We don't need to run llvm-exegesis for the regression tests, just dump
the object code, so this adds option
--benchmark-phase=assemble-measured-code to achieve that.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants