Skip to content

[RISCV] Mark {vl, vtype} as clobber in inline assembly #128636

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

HankChang736
Copy link
Contributor

This patch use the hook getClobbers() in RISCV target and mark {vl, type} as clobber to prevent Post-RA scheduler moving vsetvl across inline assembly.

Fixing #97794.

This patch use the hook getClobbers() in RISCV target and mark {vl, type} as clobber to prevent
Post-RA scheduler move vsetvl across inline assembly.
Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot llvmbot added clang Clang issues not falling into any other category backend:RISC-V clang:frontend Language frontend issues, e.g. anything involving "Sema" labels Feb 25, 2025
@llvmbot
Copy link
Member

llvmbot commented Feb 25, 2025

@llvm/pr-subscribers-clang

@llvm/pr-subscribers-backend-risc-v

Author: Hank Chang (HankChang736)

Changes

This patch use the hook getClobbers() in RISCV target and mark {vl, type} as clobber to prevent Post-RA scheduler moving vsetvl across inline assembly.

Fixing #97794.


Full diff: https://github.com/llvm/llvm-project/pull/128636.diff

6 Files Affected:

  • (modified) clang/lib/Basic/Targets/RISCV.h (+1-1)
  • (modified) clang/test/CodeGen/RISCV/riscv-inline-asm-clobber.c (+36-12)
  • (modified) clang/test/CodeGen/RISCV/riscv-inline-asm-rvv.c (+4-4)
  • (modified) clang/test/CodeGen/RISCV/riscv-inline-asm.c (+21-21)
  • (added) llvm/test/CodeGen/RISCV/rvv/vsetvl-cross-inline-asm-clobber.ll (+37)
  • (added) llvm/test/CodeGen/RISCV/rvv/vsetvl-cross-inline-asm.ll (+37)
diff --git a/clang/lib/Basic/Targets/RISCV.h b/clang/lib/Basic/Targets/RISCV.h
index c26aa19080162..5590aa9d03c75 100644
--- a/clang/lib/Basic/Targets/RISCV.h
+++ b/clang/lib/Basic/Targets/RISCV.h
@@ -68,7 +68,7 @@ class RISCVTargetInfo : public TargetInfo {
     return TargetInfo::VoidPtrBuiltinVaList;
   }
 
-  std::string_view getClobbers() const override { return ""; }
+  std::string_view getClobbers() const override { return "~{vl},~{vtype}"; }
 
   StringRef getConstraintRegister(StringRef Constraint,
                                   StringRef Expression) const override {
diff --git a/clang/test/CodeGen/RISCV/riscv-inline-asm-clobber.c b/clang/test/CodeGen/RISCV/riscv-inline-asm-clobber.c
index 8aa80386f205f..65cfc081fe26d 100644
--- a/clang/test/CodeGen/RISCV/riscv-inline-asm-clobber.c
+++ b/clang/test/CodeGen/RISCV/riscv-inline-asm-clobber.c
@@ -7,38 +7,62 @@
 
 // Test RISC-V specific clobbered registers in inline assembly.
 
-// CHECK-LABEL: define {{.*}} void @test_fflags
-// CHECK:    tail call void asm sideeffect "", "~{fflags}"()
+// CHECK-LABEL: define dso_local void @test_fflags(
+// CHECK-SAME: ) local_unnamed_addr #[[ATTR0:[0-9]+]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    tail call void asm sideeffect "", "~{fflags},~{vl},~{vtype}"() #[[ATTR1:[0-9]+]], !srcloc [[META6:![0-9]+]]
+// CHECK-NEXT:    ret void
+//
 void test_fflags(void) {
   asm volatile ("" :::"fflags");
 }
 
-// CHECK-LABEL: define {{.*}} void @test_frm
-// CHECK:    tail call void asm sideeffect "", "~{frm}"()
+// CHECK-LABEL: define dso_local void @test_frm(
+// CHECK-SAME: ) local_unnamed_addr #[[ATTR0]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    tail call void asm sideeffect "", "~{frm},~{vl},~{vtype}"() #[[ATTR1]], !srcloc [[META7:![0-9]+]]
+// CHECK-NEXT:    ret void
+//
 void test_frm(void) {
   asm volatile ("" :::"frm");
 }
 
-// CHECK-LABEL: define {{.*}} void @test_vtype
-// CHECK:    tail call void asm sideeffect "", "~{vtype}"()
+// CHECK-LABEL: define dso_local void @test_vtype(
+// CHECK-SAME: ) local_unnamed_addr #[[ATTR0]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    tail call void asm sideeffect "", "~{vtype},~{vl},~{vtype}"() #[[ATTR1]], !srcloc [[META8:![0-9]+]]
+// CHECK-NEXT:    ret void
+//
 void test_vtype(void) {
   asm volatile ("" :::"vtype");
 }
 
-// CHECK-LABEL: define {{.*}} void @test_vl
-// CHECK:    tail call void asm sideeffect "", "~{vl}"()
+// CHECK-LABEL: define dso_local void @test_vl(
+// CHECK-SAME: ) local_unnamed_addr #[[ATTR0]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    tail call void asm sideeffect "", "~{vl},~{vl},~{vtype}"() #[[ATTR1]], !srcloc [[META9:![0-9]+]]
+// CHECK-NEXT:    ret void
+//
 void test_vl(void) {
   asm volatile ("" :::"vl");
 }
 
-// CHECK-LABEL: define {{.*}} void @test_vxsat
-// CHECK:    tail call void asm sideeffect "", "~{vxsat}"()
+// CHECK-LABEL: define dso_local void @test_vxsat(
+// CHECK-SAME: ) local_unnamed_addr #[[ATTR0]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    tail call void asm sideeffect "", "~{vxsat},~{vl},~{vtype}"() #[[ATTR1]], !srcloc [[META10:![0-9]+]]
+// CHECK-NEXT:    ret void
+//
 void test_vxsat(void) {
   asm volatile ("" :::"vxsat");
 }
 
-// CHECK-LABEL: define {{.*}} void @test_vxrm
-// CHECK:    tail call void asm sideeffect "", "~{vxrm}"()
+// CHECK-LABEL: define dso_local void @test_vxrm(
+// CHECK-SAME: ) local_unnamed_addr #[[ATTR0]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    tail call void asm sideeffect "", "~{vxrm},~{vl},~{vtype}"() #[[ATTR1]], !srcloc [[META11:![0-9]+]]
+// CHECK-NEXT:    ret void
+//
 void test_vxrm(void) {
   asm volatile ("" :::"vxrm");
 }
diff --git a/clang/test/CodeGen/RISCV/riscv-inline-asm-rvv.c b/clang/test/CodeGen/RISCV/riscv-inline-asm-rvv.c
index 879fb1238d83a..e9444c6a76d87 100644
--- a/clang/test/CodeGen/RISCV/riscv-inline-asm-rvv.c
+++ b/clang/test/CodeGen/RISCV/riscv-inline-asm-rvv.c
@@ -18,12 +18,12 @@ void test_v_reg() {
       :
       : "v1", "x1");
 // CHECK-LABEL: define{{.*}} @test_v_reg
-// CHECK: "~{v1},~{x1}"
+// CHECK: "~{v1},~{x1},~{vl},~{vtype}"
 }
 
 vint32m1_t test_vr(vint32m1_t a, vint32m1_t b) {
 // CHECK-LABEL: define{{.*}} @test_vr
-// CHECK: %0 = tail call <vscale x 2 x i32> asm sideeffect "vadd.vv $0, $1, $2", "=^vr,^vr,^vr"(<vscale x 2 x i32> %a, <vscale x 2 x i32> %b)
+// CHECK: %0 = tail call <vscale x 2 x i32> asm sideeffect "vadd.vv $0, $1, $2", "=^vr,^vr,^vr,~{vl},~{vtype}"(<vscale x 2 x i32> %a, <vscale x 2 x i32> %b)
   vint32m1_t ret;
   asm volatile ("vadd.vv %0, %1, %2" : "=vr"(ret) : "vr"(a), "vr"(b));
   return ret;
@@ -31,7 +31,7 @@ vint32m1_t test_vr(vint32m1_t a, vint32m1_t b) {
 
 vint32m1_t test_vd(vint32m1_t a, vint32m1_t b) {
 // CHECK-LABEL: define{{.*}} @test_vd
-// CHECK: %0 = tail call <vscale x 2 x i32> asm sideeffect "vadd.vv $0, $1, $2", "=^vd,^vd,^vd"(<vscale x 2 x i32> %a, <vscale x 2 x i32> %b)
+// CHECK: %0 = tail call <vscale x 2 x i32> asm sideeffect "vadd.vv $0, $1, $2", "=^vd,^vd,^vd,~{vl},~{vtype}"(<vscale x 2 x i32> %a, <vscale x 2 x i32> %b)
   vint32m1_t ret;
   asm volatile ("vadd.vv %0, %1, %2" : "=vd"(ret) : "vd"(a), "vd"(b));
   return ret;
@@ -39,7 +39,7 @@ vint32m1_t test_vd(vint32m1_t a, vint32m1_t b) {
 
 vbool1_t test_vm(vbool1_t a, vbool1_t b) {
 // CHECK-LABEL: define{{.*}} @test_vm
-// CHECK: %0 = tail call <vscale x 64 x i1> asm sideeffect "vmand.mm $0, $1, $2", "=^vm,^vm,^vm"(<vscale x 64 x i1> %a, <vscale x 64 x i1> %b)
+// CHECK: %0 = tail call <vscale x 64 x i1> asm sideeffect "vmand.mm $0, $1, $2", "=^vm,^vm,^vm,~{vl},~{vtype}"(<vscale x 64 x i1> %a, <vscale x 64 x i1> %b)
   vbool1_t ret;
   asm volatile ("vmand.mm %0, %1, %2" : "=vm"(ret) : "vm"(a), "vm"(b));
   return ret;
diff --git a/clang/test/CodeGen/RISCV/riscv-inline-asm.c b/clang/test/CodeGen/RISCV/riscv-inline-asm.c
index f2031e0adcbcb..1a8ccb7ea5a68 100644
--- a/clang/test/CodeGen/RISCV/riscv-inline-asm.c
+++ b/clang/test/CodeGen/RISCV/riscv-inline-asm.c
@@ -7,17 +7,17 @@
 
 long test_r(long x) {
 // CHECK-LABEL: define{{.*}} {{i64|i32}} @test_r(
-// CHECK: call {{i64|i32}} asm sideeffect "", "=r,r"({{i64|i32}} %{{.*}})
+// CHECK: call {{i64|i32}} asm sideeffect "", "=r,r,~{vl},~{vtype}"({{i64|i32}} %{{.*}})
   long ret;
   asm volatile ("" : "=r"(ret) : "r"(x));
-// CHECK: call {{i64|i32}} asm sideeffect "", "=r,r"({{i64|i32}} %{{.*}})
+// CHECK: call {{i64|i32}} asm sideeffect "", "=r,r,~{vl},~{vtype}"({{i64|i32}} %{{.*}})
   asm volatile ("" : "=r"(ret) : "r"(x));
   return ret;
 }
 
 long test_cr(long x) {
 // CHECK-LABEL: define{{.*}} {{i64|i32}} @test_cr(
-// CHECK: call {{i64|i32}} asm sideeffect "", "=^cr,^cr"({{i64|i32}} %{{.*}})
+// CHECK: call {{i64|i32}} asm sideeffect "", "=^cr,^cr,~{vl},~{vtype}"({{i64|i32}} %{{.*}})
   long ret;
   asm volatile ("" : "=cr"(ret) : "cr"(x));
   return ret;
@@ -27,9 +27,9 @@ float cf;
 double cd;
 void test_cf(float f, double d) {
 // CHECK-LABEL: define{{.*}} void @test_cf(
-// CHECK: call float asm sideeffect "", "=^cf,^cf"(float %{{.*}})
+// CHECK: call float asm sideeffect "", "=^cf,^cf,~{vl},~{vtype}"(float %{{.*}})
   asm volatile("" : "=cf"(cf) : "cf"(f));
-// CHECK: call double asm sideeffect "", "=^cf,^cf"(double %{{.*}})
+// CHECK: call double asm sideeffect "", "=^cf,^cf,~{vl},~{vtype}"(double %{{.*}})
   asm volatile("" : "=cf"(cd) : "cf"(d));
 }
 
@@ -40,7 +40,7 @@ typedef __int128_t double_xlen_t;
 #endif
 double_xlen_t test_R_wide_scalar(double_xlen_t p) {
 // CHECK-LABEL: define{{.*}} {{i128|i64}} @test_R_wide_scalar(
-// CHECK: call {{i128|i64}} asm sideeffect "", "=R,R"({{i128|i64}} %{{.*}})
+// CHECK: call {{i128|i64}} asm sideeffect "", "=R,R,~{vl},~{vtype}"({{i128|i64}} %{{.*}})
   double_xlen_t ret;
   asm volatile("" : "=R"(ret) : "R"(p));
   return ret;
@@ -48,7 +48,7 @@ double_xlen_t test_R_wide_scalar(double_xlen_t p) {
 
 double_xlen_t test_cR_wide_scalar(double_xlen_t p) {
 // CHECK-LABEL: define{{.*}} {{i128|i64}} @test_cR_wide_scalar(
-// CHECK: call {{i128|i64}} asm sideeffect "", "=^cR,^cR"({{i128|i64}} %{{.*}})
+// CHECK: call {{i128|i64}} asm sideeffect "", "=^cR,^cR,~{vl},~{vtype}"({{i128|i64}} %{{.*}})
   double_xlen_t ret;
   asm volatile("" : "=cR"(ret) : "cR"(p));
   return ret;
@@ -56,23 +56,23 @@ double_xlen_t test_cR_wide_scalar(double_xlen_t p) {
 
 void test_I(void) {
 // CHECK-LABEL: define{{.*}} void @test_I()
-// CHECK: call void asm sideeffect "", "I"(i32 2047)
+// CHECK: call void asm sideeffect "", "I,~{vl},~{vtype}"(i32 2047)
   asm volatile ("" :: "I"(2047));
-// CHECK: call void asm sideeffect "", "I"(i32 -2048)
+// CHECK: call void asm sideeffect "", "I,~{vl},~{vtype}"(i32 -2048)
   asm volatile ("" :: "I"(-2048));
 }
 
 void test_J(void) {
 // CHECK-LABEL: define{{.*}} void @test_J()
-// CHECK: call void asm sideeffect "", "J"(i32 0)
+// CHECK: call void asm sideeffect "", "J,~{vl},~{vtype}"(i32 0)
   asm volatile ("" :: "J"(0));
 }
 
 void test_K(void) {
 // CHECK-LABEL: define{{.*}} void @test_K()
-// CHECK: call void asm sideeffect "", "K"(i32 31)
+// CHECK: call void asm sideeffect "", "K,~{vl},~{vtype}"(i32 31)
   asm volatile ("" :: "K"(31));
-// CHECK: call void asm sideeffect "", "K"(i32 0)
+// CHECK: call void asm sideeffect "", "K,~{vl},~{vtype}"(i32 0)
   asm volatile ("" :: "K"(0));
 }
 
@@ -81,16 +81,16 @@ double d;
 void test_f(void) {
 // CHECK-LABEL: define{{.*}} void @test_f()
 // CHECK: [[FLT_ARG:%[a-zA-Z_0-9]+]] = load float, ptr @f
-// CHECK: call void asm sideeffect "", "f"(float [[FLT_ARG]])
+// CHECK: call void asm sideeffect "", "f,~{vl},~{vtype}"(float [[FLT_ARG]])
   asm volatile ("" :: "f"(f));
 // CHECK: [[FLT_ARG:%[a-zA-Z_0-9]+]] = load double, ptr @d
-// CHECK: call void asm sideeffect "", "f"(double [[FLT_ARG]])
+// CHECK: call void asm sideeffect "", "f,~{vl},~{vtype}"(double [[FLT_ARG]])
   asm volatile ("" :: "f"(d));
 }
 
 void test_A(int *p) {
 // CHECK-LABEL: define{{.*}} void @test_A(ptr noundef %p)
-// CHECK: call void asm sideeffect "", "*A"(ptr elementtype(i32) %p)
+// CHECK: call void asm sideeffect "", "*A,~{vl},~{vtype}"(ptr elementtype(i32) %p)
   asm volatile("" :: "A"(*p));
 }
 
@@ -98,9 +98,9 @@ extern int var, arr[2][2];
 struct Pair { int a, b; } pair;
 
 // CHECK-LABEL: test_s(
-// CHECK:         call void asm sideeffect "// $0 $1 $2", "s,s,s"(ptr nonnull @var, ptr nonnull getelementptr inbounds nuw (i8, ptr @arr, {{.*}}), ptr nonnull @test_s)
-// CHECK:         call void asm sideeffect "// $0", "s"(ptr nonnull getelementptr inbounds nuw (i8, ptr @pair, {{.*}}))
-// CHECK:         call void asm sideeffect "// $0 $1 $2", "S,S,S"(ptr nonnull @var, ptr nonnull getelementptr inbounds nuw (i8, ptr @arr, {{.*}}), ptr nonnull @test_s)
+// CHECK:         call void asm sideeffect "// $0 $1 $2", "s,s,s,~{vl},~{vtype}"(ptr nonnull @var, ptr nonnull getelementptr inbounds nuw (i8, ptr @arr, {{.*}}), ptr nonnull @test_s)
+// CHECK:         call void asm sideeffect "// $0", "s,~{vl},~{vtype}"(ptr nonnull getelementptr inbounds nuw (i8, ptr @pair, {{.*}}))
+// CHECK:         call void asm sideeffect "// $0 $1 $2", "S,S,S,~{vl},~{vtype}"(ptr nonnull @var, ptr nonnull getelementptr inbounds nuw (i8, ptr @arr, {{.*}}), ptr nonnull @test_s)
 void test_s(void) {
   asm("// %0 %1 %2" :: "s"(&var), "s"(&arr[1][1]), "s"(test_s));
   asm("// %0" :: "s"(&pair.b));
@@ -109,9 +109,9 @@ void test_s(void) {
 }
 
 // CHECK-LABEL: test_modifiers(
-// CHECK: call void asm sideeffect "// ${0:i} ${1:i}", "r,r"({{i32|i64}} %val, i32 37)
-// CHECK: call void asm sideeffect "// ${0:z} ${1:z}", "i,i"(i32 0, i32 1)
-// CHECK: call void asm sideeffect "// ${0:N}", "r"({{i32|i64}} %val)
+// CHECK: call void asm sideeffect "// ${0:i} ${1:i}", "r,r,~{vl},~{vtype}"({{i32|i64}} %val, i32 37)
+// CHECK: call void asm sideeffect "// ${0:z} ${1:z}", "i,i,~{vl},~{vtype}"(i32 0, i32 1)
+// CHECK: call void asm sideeffect "// ${0:N}", "r,~{vl},~{vtype}"({{i32|i64}} %val)
 void test_modifiers(long val) {
   asm volatile("// %i0 %i1" :: "r"(val), "r"(37));
   asm volatile("// %z0 %z1" :: "i"(0), "i"(1));
diff --git a/llvm/test/CodeGen/RISCV/rvv/vsetvl-cross-inline-asm-clobber.ll b/llvm/test/CodeGen/RISCV/rvv/vsetvl-cross-inline-asm-clobber.ll
new file mode 100644
index 0000000000000..75d8b4c5437b1
--- /dev/null
+++ b/llvm/test/CodeGen/RISCV/rvv/vsetvl-cross-inline-asm-clobber.ll
@@ -0,0 +1,37 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc -mtriple=riscv64 -mcpu=sifive-x280 -verify-machineinstrs < %s | FileCheck %s
+
+define void @foo(<vscale x 8 x half> %0, <vscale x 8 x half> %1) {
+; CHECK-LABEL: foo:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    vsetvli a0, zero, e32, m4, ta, ma
+; CHECK-NEXT:    vmv.v.i v12, 0
+; CHECK-NEXT:    lui a0, 1
+; CHECK-NEXT:    addiw a0, a0, -1096
+; CHECK-NEXT:    vmv.v.i v16, 0
+; CHECK-NEXT:    vsetvli zero, a0, e8, m1, ta, ma
+; CHECK-NEXT:    #APP
+; CHECK-NEXT:    vfmadd.vv v16, v12, v12
+; CHECK-NEXT:    #NO_APP
+; CHECK-NEXT:    #APP
+; CHECK-NEXT:    vfmadd.vv v16, v12, v12
+; CHECK-NEXT:    #NO_APP
+; CHECK-NEXT:    vsetvli zero, a0, e16, m2, ta, ma
+; CHECK-NEXT:    vse16.v v8, (zero)
+; CHECK-NEXT:    ret
+entry:
+  %2 = tail call i64 @llvm.riscv.vsetvli.i64(i64 3000, i64 0, i64 0)
+  %3 = tail call <vscale x 8 x float> asm sideeffect "vfmadd.vv $0, $1, $2", "=^vr,^vr,^vr,0,~{vl},~{vtype}"(<vscale x 8 x float> zeroinitializer, <vscale x 8 x float> zeroinitializer, <vscale x 8 x float> zeroinitializer)
+  %4 = tail call <vscale x 8 x float> asm sideeffect "vfmadd.vv $0, $1, $2", "=^vr,^vr,^vr,0,~{vl},~{vtype}"(<vscale x 8 x float> zeroinitializer, <vscale x 8 x float> zeroinitializer, <vscale x 8 x float> %3)
+  tail call void @llvm.riscv.vse.nxv8f16.i64(<vscale x 8 x half> %0, ptr null, i64 %2)
+  ret void
+}
+
+; Function Attrs: nocallback nofree nosync nounwind willreturn memory(none)
+declare i64 @llvm.riscv.vsetvli.i64(i64, i64 immarg, i64 immarg) #0
+
+; Function Attrs: nocallback nofree nosync nounwind willreturn memory(argmem: write)
+declare void @llvm.riscv.vse.nxv8f16.i64(<vscale x 8 x half>, ptr nocapture, i64) #1
+
+attributes #0 = { nocallback nofree nosync nounwind willreturn memory(none) }
+attributes #1 = { nocallback nofree nosync nounwind willreturn memory(argmem: write) }
diff --git a/llvm/test/CodeGen/RISCV/rvv/vsetvl-cross-inline-asm.ll b/llvm/test/CodeGen/RISCV/rvv/vsetvl-cross-inline-asm.ll
new file mode 100644
index 0000000000000..e60ce90d5ed8b
--- /dev/null
+++ b/llvm/test/CodeGen/RISCV/rvv/vsetvl-cross-inline-asm.ll
@@ -0,0 +1,37 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc -mtriple=riscv64 -mcpu=sifive-x280 -verify-machineinstrs < %s | FileCheck %s
+
+define void @foo(<vscale x 8 x half> %0, <vscale x 8 x half> %1) {
+; CHECK-LABEL: foo:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    vsetvli a0, zero, e32, m4, ta, ma
+; CHECK-NEXT:    vmv.v.i v12, 0
+; CHECK-NEXT:    lui a0, 1
+; CHECK-NEXT:    addiw a0, a0, -1096
+; CHECK-NEXT:    vmv.v.i v16, 0
+; CHECK-NEXT:    vsetvli zero, a0, e8, m1, ta, ma
+; CHECK-NEXT:    #APP
+; CHECK-NEXT:    vfmadd.vv v16, v12, v12
+; CHECK-NEXT:    #NO_APP
+; CHECK-NEXT:    vsetvli zero, a0, e16, m2, ta, ma
+; CHECK-NEXT:    #APP
+; CHECK-NEXT:    vfmadd.vv v16, v12, v12
+; CHECK-NEXT:    #NO_APP
+; CHECK-NEXT:    vse16.v v8, (zero)
+; CHECK-NEXT:    ret
+entry:
+  %2 = tail call i64 @llvm.riscv.vsetvli.i64(i64 3000, i64 0, i64 0)
+  %3 = tail call <vscale x 8 x float> asm sideeffect "vfmadd.vv $0, $1, $2", "=^vr,^vr,^vr,0"(<vscale x 8 x float> zeroinitializer, <vscale x 8 x float> zeroinitializer, <vscale x 8 x float> zeroinitializer)
+  %4 = tail call <vscale x 8 x float> asm sideeffect "vfmadd.vv $0, $1, $2", "=^vr,^vr,^vr,0"(<vscale x 8 x float> zeroinitializer, <vscale x 8 x float> zeroinitializer, <vscale x 8 x float> %3)
+  tail call void @llvm.riscv.vse.nxv8f16.i64(<vscale x 8 x half> %0, ptr null, i64 %2)
+  ret void
+}
+
+; Function Attrs: nocallback nofree nosync nounwind willreturn memory(none)
+declare i64 @llvm.riscv.vsetvli.i64(i64, i64 immarg, i64 immarg) #0
+
+; Function Attrs: nocallback nofree nosync nounwind willreturn memory(argmem: write)
+declare void @llvm.riscv.vse.nxv8f16.i64(<vscale x 8 x half>, ptr nocapture, i64) #1
+
+attributes #0 = { nocallback nofree nosync nounwind willreturn memory(none) }
+attributes #1 = { nocallback nofree nosync nounwind willreturn memory(argmem: write) }

@topperc topperc requested review from preames and wangpc-pp February 25, 2025 06:02
@wangpc-pp
Copy link
Contributor

I think this may not be the right way.

  1. We should at least add vl/vtype to clobbered registers when V is specified.
  2. The asm may not depend on vl/type, but adding vl/type dependencies unconditionally stop further scheduling.

@HankChang736
Copy link
Contributor Author

I think this may not be the right way.

1. We should at least add vl/vtype to clobbered registers when V is specified.

2. The asm may not depend on vl/type, but adding vl/type dependencies unconditionally stop further scheduling.

Yes, you're right. But the purpose we add vl/vtype dependencies is to prevent the Post-RA scheduler moving vsetvl instruction across inline assembly. I'm not sure if there's better approach to solve this problem.

Copy link
Contributor

@wangpc-pp wangpc-pp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. But what's the difference if we add vl/vtype to the list of clobbered registers explicitly in C/C++ asm statements?

@@ -68,7 +68,7 @@ class RISCVTargetInfo : public TargetInfo {
return TargetInfo::VoidPtrBuiltinVaList;
}

std::string_view getClobbers() const override { return ""; }
std::string_view getClobbers() const override { return "~{vl},~{vtype}"; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
std::string_view getClobbers() const override { return "~{vl},~{vtype}"; }
std::string_view getClobbers() const override {
if (ISAInfo->hasExtension("zve32x"))
return "~{vl},~{vtype}";
return "";
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this implementation is better, just update it with your implementation.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That doesn't work with the target attribute.

@preames
Copy link
Collaborator

preames commented Feb 25, 2025

But the purpose we add vl/vtype dependencies is to prevent the Post-RA scheduler moving vsetvl instruction across inline assembly. I'm not sure if there's better approach to solve this problem.

Maybe have RISCVInsertVSETVLI add implicit use operands to the inline assembly at that point? It's what we do for all the vector instructions. In this case, you might want implicit defs.

@mshockwave
Copy link
Member

I think what @wangpc-pp advocated here (please correct me if I'm wrong) was that user should be responsible annotating these registers as clobbered so that we have more freedom on scheduling when the inline assembly is not using any vector instructions. While other approaches -- regardless of marking them as clobber or implicit-def these two registers in RISCVInsertVSETVLI -- are taking a more conservative path to ensure correctness.

It's a shame that we're not able to analyze the instructions within inline assembly to make a better decision. Personally I think we should prioritize correctness, therefore I'm more incline to the latter approaches. I do agree that we probably should only do it when vector instructions are present, as pointed out by one of the review comments by @wangpc-pp.

@wangpc-pp
Copy link
Contributor

wangpc-pp commented Feb 26, 2025

I think what @wangpc-pp advocated here (please correct me if I'm wrong) was that user should be responsible annotating these registers as clobbered so that we have more freedom on scheduling when the inline assembly is not using any vector instructions. While other approaches -- regardless of marking them as clobber or implicit-def these two registers in RISCVInsertVSETVLI -- are taking a more conservative path to ensure correctness.

Yes this is exactly what I meant!

It's a shame that we're not able to analyze the instructions within inline assembly to make a better decision. Personally I think we should prioritize correctness, therefore I'm more incline to the latter approaches. I do agree that we probably should only do it when vector instructions are present, as pointed out by one of the review comments by @wangpc-pp.

I am not opposed to this. But I have to say, the use case is not so common. We don't mix assemblies with RVV instructions and RVV intrinsics in the same code. In this scenario, we should write all RVV instructions in asm statements or just use RVV intrinsics. Even for these rare cases, the user can achieve the same effect if they put vl/vtype into clobbered registers explicitly.
And let me quote @kito-cheng's comments: #97794 (comment), there are some other approaches.

@topperc
Copy link
Collaborator

topperc commented Feb 26, 2025

Even for these rare cases, the user can achieve the same effect if they put vl/vtype into clobbered registers explicitly.

They can, but will they? clobbers are a relatively uncommon feature I suspect many people don't know about it.

I think maybe we need documentation of how to use RVV with inline assembly. I've seen a lot of things that aren't technically correct, but usually work.

For example, one statement per instruction with hand allocated vector registers and no clobbers. The vector register aren't technically alive between the statements as far as the compiler is concerned.

@@ -68,7 +68,11 @@ class RISCVTargetInfo : public TargetInfo {
return TargetInfo::VoidPtrBuiltinVaList;
}

std::string_view getClobbers() const override { return ""; }
std::string_view getClobbers() const override {
if (ISAInfo->hasExtension("zve32x"))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't work if zve32x is added by attribute(target("zve32x"))

Copy link
Contributor Author

@HankChang736 HankChang736 Mar 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just reverted to original implementation.

@wangpc-pp
Copy link
Contributor

I am not familiar with the target attribute implementation, can we get the list of function features here and add the clobbers at:

// Add machine specific clobbers
std::string_view MachineClobbers = getTarget().getClobbers();
if (!MachineClobbers.empty()) {
if (!Constraints.empty())
Constraints += ',';
Constraints += MachineClobbers;
}

cc @topperc @4vtomat

@topperc
Copy link
Collaborator

topperc commented Mar 5, 2025

I am not familiar with the target attribute implementation, can we get the list of function features here and add the clobbers at:

// Add machine specific clobbers
std::string_view MachineClobbers = getTarget().getClobbers();
if (!MachineClobbers.empty()) {
if (!Constraints.empty())
Constraints += ',';
Constraints += MachineClobbers;
}

cc @topperc @4vtomat

I'm not sure. But I don't understand why we need to check for Zve32x at all?

@wangpc-pp
Copy link
Contributor

I am not familiar with the target attribute implementation, can we get the list of function features here and add the clobbers at:

// Add machine specific clobbers
std::string_view MachineClobbers = getTarget().getClobbers();
if (!MachineClobbers.empty()) {
if (!Constraints.empty())
Constraints += ',';
Constraints += MachineClobbers;
}

cc @topperc @4vtomat

I'm not sure. But I don't understand why we need to check for Zve32x at all?

Just see the previous discussion, it is weird to add vector CSRs to clobbers if no vector support.

@topperc
Copy link
Collaborator

topperc commented Mar 5, 2025

But the purpose we add vl/vtype dependencies is to prevent the Post-RA scheduler moving vsetvl instruction across inline assembly. I'm not sure if there's better approach to solve this problem.

Maybe have RISCVInsertVSETVLI add implicit use operands to the inline assembly at that point? It's what we do for all the vector instructions. In this case, you might want implicit defs.

I think we should try this RISCVInsertVSETVLI idea.

@HankChang736
Copy link
Contributor Author

I tested the following case without passing the 'v' extension in the Clang command line argument:

__attribute__((target("arch=rv32gcv_zve32x")))
void test_A(int *p) {
  asm volatile("" :: "A"(*p));
}

The generated LLVM IR result is:

; Function Attrs: nounwind
define dso_local void @test_A(ptr noundef %p) local_unnamed_addr #0 {
entry:
  tail call void asm sideeffect "", "*A"(ptr elementtype(i32) %p) #1, !srcloc !6
  ret void
}

attributes #0 = { nounwind "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="generic-rv32" "target-features"="+32bit,+a,+c,+d,+f,+m,+relax,+v,+zaamo,+zalrsc,+zicsr,+zifencei,+zmmul,+zve32f,+zve32x,+zve64d,+zve64f,+zve64x,+zvl128b,+zvl32b,+zvl64b ... 

From this result, it appears that the target attribute does not have the intended effect in this case. Given this, perhaps we should keep the initial implementation temporary and try the RISCVInsertVSETVLI approach for future improvements.

cc @wangpc-pp @topperc

@wangpc-pp
Copy link
Contributor

Agree, we should try the RISCVInsertVSETVLI approach.

@topperc topperc self-requested a review March 6, 2025 03:56
@topperc
Copy link
Collaborator

topperc commented Mar 8, 2025

The RISCVInsertVSETVLI approach is probably just a matter of doing something like. May should check if an implicit def already exists before adding it. I don't think addOperand checks for duplicates.

diff --git a/llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp b/llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp
index 7433603daff8..2247610c21ff 100644
--- a/llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp
+++ b/llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp
@@ -1531,6 +1531,13 @@ void RISCVInsertVSETVLI::emitVSETVLIs(MachineBasicBlock &MBB) {
                                               /*isImp*/ true));
     }
 
+    if (MI.isInlineAsm()) {
+      MI.addOperand(MachineOperand::CreateReg(RISCV::VL, /*isDef*/ true,
+                                              /*isImp*/ true));
+      MI.addOperand(MachineOperand::CreateReg(RISCV::VTYPE, /*isDef*/ true,
+                                              /*isImp*/ true));
+    }
+
     if (MI.isCall() || MI.isInlineAsm() ||
         MI.modifiesRegister(RISCV::VL, /*TRI=*/nullptr) ||
         MI.modifiesRegister(RISCV::VTYPE, /*TRI=*/nullptr))

@dzaima
Copy link

dzaima commented Mar 8, 2025

Would there be some way to explicitly request to not clobber vl/vtype? An asm block is quite useful as an "optimization fence" on x86 & ARM NEON, but it'd be sad if that necessarily meant extra vsetvl spam on RVV (currently clang doesn't do much optimization on RVV intrinsics so there's not much chance for it to misoptimize, but presumably that may change; more significant for __attribute__ vector_size vectors, though using those with asm is non-trivial and generally not ideal on RVV; some quite contrived examples)

@HankChang736
Copy link
Contributor Author

I create another pull request #130733 that is implemented with the RISCVInsertVSETVL approach, since it's a new approach.

HankChang736 added a commit that referenced this pull request Mar 13, 2025
#130733)

…sing inline assembly.
Fixing [#128636](#128636).

This patch has RISCVInsertVSETVLI to add implicit use operand to inline
assembly, this approach is suggested by @preames and the implementation
I referenced is from @topperc . The purpose of adding vl, vtype implicit
operand is to prevent Post-RA scheduler moving vsetvl across inline
assembly.
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Mar 13, 2025
…TVLI when u… (#130733)

…sing inline assembly.
Fixing [#128636](llvm/llvm-project#128636).

This patch has RISCVInsertVSETVLI to add implicit use operand to inline
assembly, this approach is suggested by @preames and the implementation
I referenced is from @topperc . The purpose of adding vl, vtype implicit
operand is to prevent Post-RA scheduler moving vsetvl across inline
assembly.
@HankChang736
Copy link
Contributor Author

#130733 is merged. This pr will be marked as closed.

frederik-h pushed a commit to frederik-h/llvm-project that referenced this pull request Mar 18, 2025
llvm#130733)

…sing inline assembly.
Fixing [llvm#128636](llvm#128636).

This patch has RISCVInsertVSETVLI to add implicit use operand to inline
assembly, this approach is suggested by @preames and the implementation
I referenced is from @topperc . The purpose of adding vl, vtype implicit
operand is to prevent Post-RA scheduler moving vsetvl across inline
assembly.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:RISC-V clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants