Skip to content

[RISCV] Implement RISCVTargetLowering::getRoundingControlRegisters #139864

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 19, 2025

Conversation

futog
Copy link
Contributor

@futog futog commented May 14, 2025

By adding FRM/FFLAGS as implicit defs, ReadFRM is not optimized out.

@llvmbot
Copy link
Member

llvmbot commented May 14, 2025

@llvm/pr-subscribers-backend-risc-v

Author: Gergely Futo (futog)

Changes

ReadFRM should not be optimized out.


Full diff: https://github.com/llvm/llvm-project/pull/139864.diff

1 Files Affected:

  • (modified) llvm/lib/Target/RISCV/RISCVInstrInfo.td (+2)
diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfo.td b/llvm/lib/Target/RISCV/RISCVInstrInfo.td
index e9bdeb88e4ca8..88ee311b51543 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrInfo.td
+++ b/llvm/lib/Target/RISCV/RISCVInstrInfo.td
@@ -2006,7 +2006,9 @@ class SwapSysRegImm<SysReg SR, list<Register> Regs>
   let Defs = Regs;
 }
 
+let hasSideEffects = true in
 def ReadFRM : ReadSysReg<SysRegFRM, [FRM]>;
+
 let hasPostISelHook = 1 in {
 def WriteFRM : WriteSysReg<SysRegFRM, [FRM]>;
 def WriteFRMImm : WriteSysRegImm<SysRegFRM, [FRM]>;

@futog futog requested a review from topperc May 14, 2025 09:22
Copy link
Contributor

@wangpc-pp wangpc-pp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a case to show the miscompile?

@futog
Copy link
Contributor Author

futog commented May 14, 2025

Do you have a case to show the miscompile?

Sure, here you go: https://godbolt.org/z/Kx1qsTeGY

@wangpc-pp
Copy link
Contributor

Do you have a case to show the miscompile?

Sure, here you go: https://godbolt.org/z/Kx1qsTeGY

Then please add a .ll test and precommit it.

@futog futog force-pushed the futog/riscv-readfrm branch from 240ec09 to d2515dd Compare May 14, 2025 15:07
@topperc
Copy link
Collaborator

topperc commented May 14, 2025

Does implementing RISCVTargetLowering::getRoundingControlRegisters() fix this?

@futog
Copy link
Contributor Author

futog commented May 15, 2025

Does implementing RISCVTargetLowering::getRoundingControlRegisters() fix this?

No. As far as I understand, this is used in case of inline asm.

@tclin914
Copy link
Contributor

Should we mark frm not preserved across call?

@topperc
Copy link
Collaborator

topperc commented May 16, 2025

Does implementing RISCVTargetLowering::getRoundingControlRegisters() fix this?

No. As far as I understand, this is used in case of inline asm.

It's used to add FRM as an implicit def to calls.

  // Add rounding control registers as implicit def for function call.
  if (II.isCall() && MF->getFunction().hasFnAttribute(Attribute::StrictFP)) {
    ArrayRef<MCPhysReg> RCRegs = TLI->getRoundingControlRegisters();
    for (MCPhysReg Reg : RCRegs)
      UsedRegs.push_back(Reg);
  } 

I've tested locally that it does fix your test case.

@futog
Copy link
Contributor Author

futog commented May 16, 2025

Does implementing RISCVTargetLowering::getRoundingControlRegisters() fix this?

No. As far as I understand, this is used in case of inline asm.

It's used to add FRM as an implicit def to calls.

  // Add rounding control registers as implicit def for function call.
  if (II.isCall() && MF->getFunction().hasFnAttribute(Attribute::StrictFP)) {
    ArrayRef<MCPhysReg> RCRegs = TLI->getRoundingControlRegisters();
    for (MCPhysReg Reg : RCRegs)
      UsedRegs.push_back(Reg);
  } 

I've tested locally that it does fix your test case.

Oops I missed that if, sorry, but yes I also implemented RISCVTargetLowering::getRoundingControlRegisters() and machine-cse still reordered things.

@topperc
Copy link
Collaborator

topperc commented May 16, 2025

Does implementing RISCVTargetLowering::getRoundingControlRegisters() fix this?

No. As far as I understand, this is used in case of inline asm.

It's used to add FRM as an implicit def to calls.

  // Add rounding control registers as implicit def for function call.
  if (II.isCall() && MF->getFunction().hasFnAttribute(Attribute::StrictFP)) {
    ArrayRef<MCPhysReg> RCRegs = TLI->getRoundingControlRegisters();
    for (MCPhysReg Reg : RCRegs)
      UsedRegs.push_back(Reg);
  } 

I've tested locally that it does fix your test case.

Oops I missed that if, sorry, but yes I also implemented RISCVTargetLowering::getRoundingControlRegisters() and machine-cse still reordered things.

Do you have a test case that shows machine-cse reordering things?

@futog
Copy link
Contributor Author

futog commented May 17, 2025

Does implementing RISCVTargetLowering::getRoundingControlRegisters() fix this?

No. As far as I understand, this is used in case of inline asm.

It's used to add FRM as an implicit def to calls.

  // Add rounding control registers as implicit def for function call.
  if (II.isCall() && MF->getFunction().hasFnAttribute(Attribute::StrictFP)) {
    ArrayRef<MCPhysReg> RCRegs = TLI->getRoundingControlRegisters();
    for (MCPhysReg Reg : RCRegs)
      UsedRegs.push_back(Reg);
  } 

I've tested locally that it does fix your test case.

Oops I missed that if, sorry, but yes I also implemented RISCVTargetLowering::getRoundingControlRegisters() and machine-cse still reordered things.

Do you have a test case that shows machine-cse reordering things?

Sure, the test in the precommit have this (#139921).

This is the behavior without the patch:

# *** IR Dump After Verify generated machine code (machineverifier) ***:
# Machine code for function test_get_rounding_sideeffect: IsSSA, TracksLiveness

bb.0.entry:
  successors: %bb.1(0x30000000), %bb.2(0x50000000); %bb.1(37.50%), %bb.2(62.50%)

  ADJCALLSTACKDOWN 0, 0, implicit-def dead $x2, implicit $x2
  %3:gpr = ADDI $x0, 1
  $x10 = COPY %3:gpr
  PseudoCALL target-flags(riscv-call) @fesetround, <regmask $vlenb $x0 $x1 $x8 $x9 $x18 $x19 $x20 $x21 $x22 $x23 $x24 $x25 $x26 $x27 $f8_f $f9_f $f18_f $f19_f $f20_f $f21_f $f22_f $f23_f $f24_f $f25_f $f26_f $f27_f $f8_h $f9_h $f18_h $f19_h $f20_h $f21_h and 40 more...>, implicit-def dead $x1, implicit $x10, implicit-def $x2, implicit-def $x10
  ADJCALLSTACKUP 0, 0, implicit-def dead $x2, implicit $x2
  %5:gpr = ReadFRM implicit $frm
  %6:gpr = SLLIW killed %5:gpr, 2
  %7:gpr = LUI 66
  %8:gpr = ADDIW killed %7:gpr, 769
  %9:gpr = SRL killed %8:gpr, killed %6:gpr
  %10:gpr = ANDI killed %9:gpr, 7
  %11:gpr = COPY $x0
  %2:gpr = COPY %11:gpr
  BNE killed %10:gpr, $x0, %bb.2
  PseudoBR %bb.1

bb.1.if.end:
; predecessors: %bb.0
  successors: %bb.2(0x80000000); %bb.2(100.00%)

  ADJCALLSTACKDOWN 0, 0, implicit-def dead $x2, implicit $x2
  %12:gpr = COPY $x0
  $x10 = COPY %12:gpr
  PseudoCALL target-flags(riscv-call) @fesetround, <regmask $vlenb $x0 $x1 $x8 $x9 $x18 $x19 $x20 $x21 $x22 $x23 $x24 $x25 $x26 $x27 $f8_f $f9_f $f18_f $f19_f $f20_f $f21_f $f22_f $f23_f $f24_f $f25_f $f26_f $f27_f $f8_h $f9_h $f18_h $f19_h $f20_h $f21_h and 40 more...>, implicit-def dead $x1, implicit $x10, implicit-def $x2, implicit-def $x10
  ADJCALLSTACKUP 0, 0, implicit-def dead $x2, implicit $x2
  %14:gpr = ReadFRM implicit $frm
  %15:gpr = SLLIW killed %14:gpr, 2
  %16:gpr = LUI 66
  %17:gpr = ADDIW killed %16:gpr, 769
  %18:gpr = SRL killed %17:gpr, killed %15:gpr
  %19:gpr = ANDI killed %18:gpr, 7
  %20:gpr = ADDI killed %19:gpr, -1
  %0:gpr = SLTIU killed %20:gpr, 1

bb.2.return:
; predecessors: %bb.0, %bb.1

  %1:gpr = PHI %2:gpr, %bb.0, %0:gpr, %bb.1
  $x10 = COPY %1:gpr
  PseudoRET implicit $x10

# End machine code for function test_get_rounding_sideeffect.
# *** IR Dump After Machine Common Subexpression Elimination (machine-cse) ***:
# Machine code for function test_get_rounding_sideeffect: IsSSA, TracksLiveness

bb.0.entry:
  successors: %bb.1(0x30000000), %bb.2(0x50000000); %bb.1(37.50%), %bb.2(62.50%)

  ADJCALLSTACKDOWN 0, 0, implicit-def dead $x2, implicit $x2
  %3:gpr = ADDI $x0, 1
  $x10 = COPY %3:gpr
  PseudoCALL target-flags(riscv-call) @fesetround, <regmask $vlenb $x0 $x1 $x8 $x9 $x18 $x19 $x20 $x21 $x22 $x23 $x24 $x25 $x26 $x27 $f8_f $f9_f $f18_f $f19_f $f20_f $f21_f $f22_f $f23_f $f24_f $f25_f $f26_f $f27_f $f8_h $f9_h $f18_h $f19_h $f20_h $f21_h and 40 more...>, implicit-def dead $x1, implicit $x10, implicit-def $x2, implicit-def $x10
  ADJCALLSTACKUP 0, 0, implicit-def dead $x2, implicit $x2
  %5:gpr = ReadFRM implicit $frm
  %6:gpr = SLLIW %5:gpr, 2
  %7:gpr = LUI 66
  %8:gpr = ADDIW %7:gpr, 769
  %9:gpr = SRL %8:gpr, %6:gpr
  %10:gpr = ANDI %9:gpr, 7
  %11:gpr = COPY $x0
  %2:gpr = COPY %11:gpr
  BNE %10:gpr, $x0, %bb.2
  PseudoBR %bb.1

bb.1.if.end:
; predecessors: %bb.0
  successors: %bb.2(0x80000000); %bb.2(100.00%)

  ADJCALLSTACKDOWN 0, 0, implicit-def dead $x2, implicit $x2
  %12:gpr = COPY $x0
  $x10 = COPY %12:gpr
  PseudoCALL target-flags(riscv-call) @fesetround, <regmask $vlenb $x0 $x1 $x8 $x9 $x18 $x19 $x20 $x21 $x22 $x23 $x24 $x25 $x26 $x27 $f8_f $f9_f $f18_f $f19_f $f20_f $f21_f $f22_f $f23_f $f24_f $f25_f $f26_f $f27_f $f8_h $f9_h $f18_h $f19_h $f20_h $f21_h and 40 more...>, implicit-def dead $x1, implicit $x10, implicit-def $x2, implicit-def $x10
  ADJCALLSTACKUP 0, 0, implicit-def dead $x2, implicit $x2
  %20:gpr = ADDI %10:gpr, -1
  %0:gpr = SLTIU killed %20:gpr, 1

bb.2.return:
; predecessors: %bb.0, %bb.1

  %1:gpr = PHI %2:gpr, %bb.0, %0:gpr, %bb.1
  $x10 = COPY %1:gpr
  PseudoRET implicit $x10

# End machine code for function test_get_rounding_sideeffect.

@topperc
Copy link
Collaborator

topperc commented May 17, 2025

Does implementing RISCVTargetLowering::getRoundingControlRegisters() fix this?

No. As far as I understand, this is used in case of inline asm.

It's used to add FRM as an implicit def to calls.

  // Add rounding control registers as implicit def for function call.
  if (II.isCall() && MF->getFunction().hasFnAttribute(Attribute::StrictFP)) {
    ArrayRef<MCPhysReg> RCRegs = TLI->getRoundingControlRegisters();
    for (MCPhysReg Reg : RCRegs)
      UsedRegs.push_back(Reg);
  } 

I've tested locally that it does fix your test case.

Oops I missed that if, sorry, but yes I also implemented RISCVTargetLowering::getRoundingControlRegisters() and machine-cse still reordered things.

Do you have a test case that shows machine-cse reordering things?

Sure, the test in the precommit have this (#139921).

This is the behavior without the patch:

# *** IR Dump After Verify generated machine code (machineverifier) ***:
# Machine code for function test_get_rounding_sideeffect: IsSSA, TracksLiveness

bb.0.entry:
  successors: %bb.1(0x30000000), %bb.2(0x50000000); %bb.1(37.50%), %bb.2(62.50%)

  ADJCALLSTACKDOWN 0, 0, implicit-def dead $x2, implicit $x2
  %3:gpr = ADDI $x0, 1
  $x10 = COPY %3:gpr
  PseudoCALL target-flags(riscv-call) @fesetround, <regmask $vlenb $x0 $x1 $x8 $x9 $x18 $x19 $x20 $x21 $x22 $x23 $x24 $x25 $x26 $x27 $f8_f $f9_f $f18_f $f19_f $f20_f $f21_f $f22_f $f23_f $f24_f $f25_f $f26_f $f27_f $f8_h $f9_h $f18_h $f19_h $f20_h $f21_h and 40 more...>, implicit-def dead $x1, implicit $x10, implicit-def $x2, implicit-def $x10
  ADJCALLSTACKUP 0, 0, implicit-def dead $x2, implicit $x2
  %5:gpr = ReadFRM implicit $frm
  %6:gpr = SLLIW killed %5:gpr, 2
  %7:gpr = LUI 66
  %8:gpr = ADDIW killed %7:gpr, 769
  %9:gpr = SRL killed %8:gpr, killed %6:gpr
  %10:gpr = ANDI killed %9:gpr, 7
  %11:gpr = COPY $x0
  %2:gpr = COPY %11:gpr
  BNE killed %10:gpr, $x0, %bb.2
  PseudoBR %bb.1

bb.1.if.end:
; predecessors: %bb.0
  successors: %bb.2(0x80000000); %bb.2(100.00%)

  ADJCALLSTACKDOWN 0, 0, implicit-def dead $x2, implicit $x2
  %12:gpr = COPY $x0
  $x10 = COPY %12:gpr
  PseudoCALL target-flags(riscv-call) @fesetround, <regmask $vlenb $x0 $x1 $x8 $x9 $x18 $x19 $x20 $x21 $x22 $x23 $x24 $x25 $x26 $x27 $f8_f $f9_f $f18_f $f19_f $f20_f $f21_f $f22_f $f23_f $f24_f $f25_f $f26_f $f27_f $f8_h $f9_h $f18_h $f19_h $f20_h $f21_h and 40 more...>, implicit-def dead $x1, implicit $x10, implicit-def $x2, implicit-def $x10
  ADJCALLSTACKUP 0, 0, implicit-def dead $x2, implicit $x2
  %14:gpr = ReadFRM implicit $frm
  %15:gpr = SLLIW killed %14:gpr, 2
  %16:gpr = LUI 66
  %17:gpr = ADDIW killed %16:gpr, 769
  %18:gpr = SRL killed %17:gpr, killed %15:gpr
  %19:gpr = ANDI killed %18:gpr, 7
  %20:gpr = ADDI killed %19:gpr, -1
  %0:gpr = SLTIU killed %20:gpr, 1

bb.2.return:
; predecessors: %bb.0, %bb.1

  %1:gpr = PHI %2:gpr, %bb.0, %0:gpr, %bb.1
  $x10 = COPY %1:gpr
  PseudoRET implicit $x10

# End machine code for function test_get_rounding_sideeffect.
# *** IR Dump After Machine Common Subexpression Elimination (machine-cse) ***:
# Machine code for function test_get_rounding_sideeffect: IsSSA, TracksLiveness

bb.0.entry:
  successors: %bb.1(0x30000000), %bb.2(0x50000000); %bb.1(37.50%), %bb.2(62.50%)

  ADJCALLSTACKDOWN 0, 0, implicit-def dead $x2, implicit $x2
  %3:gpr = ADDI $x0, 1
  $x10 = COPY %3:gpr
  PseudoCALL target-flags(riscv-call) @fesetround, <regmask $vlenb $x0 $x1 $x8 $x9 $x18 $x19 $x20 $x21 $x22 $x23 $x24 $x25 $x26 $x27 $f8_f $f9_f $f18_f $f19_f $f20_f $f21_f $f22_f $f23_f $f24_f $f25_f $f26_f $f27_f $f8_h $f9_h $f18_h $f19_h $f20_h $f21_h and 40 more...>, implicit-def dead $x1, implicit $x10, implicit-def $x2, implicit-def $x10
  ADJCALLSTACKUP 0, 0, implicit-def dead $x2, implicit $x2
  %5:gpr = ReadFRM implicit $frm
  %6:gpr = SLLIW %5:gpr, 2
  %7:gpr = LUI 66
  %8:gpr = ADDIW %7:gpr, 769
  %9:gpr = SRL %8:gpr, %6:gpr
  %10:gpr = ANDI %9:gpr, 7
  %11:gpr = COPY $x0
  %2:gpr = COPY %11:gpr
  BNE %10:gpr, $x0, %bb.2
  PseudoBR %bb.1

bb.1.if.end:
; predecessors: %bb.0
  successors: %bb.2(0x80000000); %bb.2(100.00%)

  ADJCALLSTACKDOWN 0, 0, implicit-def dead $x2, implicit $x2
  %12:gpr = COPY $x0
  $x10 = COPY %12:gpr
  PseudoCALL target-flags(riscv-call) @fesetround, <regmask $vlenb $x0 $x1 $x8 $x9 $x18 $x19 $x20 $x21 $x22 $x23 $x24 $x25 $x26 $x27 $f8_f $f9_f $f18_f $f19_f $f20_f $f21_f $f22_f $f23_f $f24_f $f25_f $f26_f $f27_f $f8_h $f9_h $f18_h $f19_h $f20_h $f21_h and 40 more...>, implicit-def dead $x1, implicit $x10, implicit-def $x2, implicit-def $x10
  ADJCALLSTACKUP 0, 0, implicit-def dead $x2, implicit $x2
  %20:gpr = ADDI %10:gpr, -1
  %0:gpr = SLTIU killed %20:gpr, 1

bb.2.return:
; predecessors: %bb.0, %bb.1

  %1:gpr = PHI %2:gpr, %bb.0, %0:gpr, %bb.1
  $x10 = COPY %1:gpr
  PseudoRET implicit $x10

# End machine code for function test_get_rounding_sideeffect.

The strictfp attribute is missing from the fesetround calls so the code I pasted from InstrEmitter.cpp that uses getRoundingControlRegisters doesn't fire.

@futog futog force-pushed the futog/riscv-readfrm branch from d2515dd to 73b1089 Compare May 17, 2025 07:57
@futog futog changed the title [RISCV] Add hasSideEffects = true to ReadFRM [RISCV] Implement RISCVTargetLowering::getRoundingControlRegisters May 17, 2025
@futog
Copy link
Contributor Author

futog commented May 17, 2025

updated the patch, thanks @topperc for your time and patience.

@futog futog force-pushed the futog/riscv-readfrm branch from 73b1089 to 3d808a0 Compare May 18, 2025 19:03
Copy link
Collaborator

@topperc topperc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

By adding FRM/FFLAGS as implicit defs, ReadFRM is not optimized out.
@futog futog force-pushed the futog/riscv-readfrm branch from 3d808a0 to bc2e640 Compare May 19, 2025 19:30
@futog futog merged commit e3b167c into llvm:main May 19, 2025
11 checks passed
sivan-shani pushed a commit to sivan-shani/llvm-project that referenced this pull request Jun 3, 2025
…lvm#139864)

By adding FRM/FFLAGS as implicit defs, ReadFRM is not optimized out.
ajaden-codes pushed a commit to Jaddyen/llvm-project that referenced this pull request Jun 6, 2025
…lvm#139864)

By adding FRM/FFLAGS as implicit defs, ReadFRM is not optimized out.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants