-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[Clang] [ARM] Ensure FPU Features are collected when using the Clang Assembler #134366
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…Assembler Previously, FPU features were not collected when forming a list of features for the Assembler. This fixes a regression from 8fa0f0e, which caused VFPv4 to be unavailable if assembling for an armv7s-apple-darwin target. Co-authored-by: Martin Storsjö <[email protected]>
@llvm/pr-subscribers-backend-arm @llvm/pr-subscribers-clang Author: Martin Storsjö (mstorsjo) ChangesPreviously, FPU features were not collected when forming a list of features for the Assembler. This fixes a regression from 8fa0f0e, which caused VFPv4 to be unavailable if assembling for an armv7s-apple-darwin target. Full diff: https://github.com/llvm/llvm-project/pull/134366.diff 6 Files Affected:
diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst
index c4e82678949ff..2267f0d06e834 100644
--- a/clang/docs/ReleaseNotes.rst
+++ b/clang/docs/ReleaseNotes.rst
@@ -38,6 +38,9 @@ Potentially Breaking Changes
- Fix missing diagnostics for uses of declarations when performing typename access,
such as when performing member access on a '[[deprecated]]' type alias.
(#GH58547)
+- For ARM targets, when using cc1as, the features included in the selected CPU or
+ Arch's FPU are now loaded and utilized. If you wish not to use a specific feature,
+ this will need appending to the command line used.
C/C++ Language Potentially Breaking Changes
-------------------------------------------
@@ -440,6 +443,7 @@ X86 Support
Arm and AArch64 Support
^^^^^^^^^^^^^^^^^^^^^^^
+- For ARM targets, cc1as now considers the FPU's features for the selected CPU or Arch.
Android Support
^^^^^^^^^^^^^^^
diff --git a/clang/lib/Driver/ToolChains/Arch/ARM.cpp b/clang/lib/Driver/ToolChains/Arch/ARM.cpp
index e50cb3836f2c9..ff0e52aa285b1 100644
--- a/clang/lib/Driver/ToolChains/Arch/ARM.cpp
+++ b/clang/lib/Driver/ToolChains/Arch/ARM.cpp
@@ -679,20 +679,17 @@ llvm::ARM::FPUKind arm::getARMTargetFeatures(const Driver &D,
CPUArgFPUKind != llvm::ARM::FK_INVALID ? CPUArgFPUKind : ArchArgFPUKind;
(void)llvm::ARM::getFPUFeatures(FPUKind, Features);
} else {
- bool Generic = true;
- if (!ForAS) {
- std::string CPU = arm::getARMTargetCPU(CPUName, ArchName, Triple);
- if (CPU != "generic")
- Generic = false;
- llvm::ARM::ArchKind ArchKind =
- arm::getLLVMArchKindForARM(CPU, ArchName, Triple);
- FPUKind = llvm::ARM::getDefaultFPU(CPU, ArchKind);
- (void)llvm::ARM::getFPUFeatures(FPUKind, Features);
- }
+ std::string CPU = arm::getARMTargetCPU(CPUName, ArchName, Triple);
+ bool Generic = CPU == "generic";
if (Generic && (Triple.isOSWindows() || Triple.isOSDarwin()) &&
getARMSubArchVersionNumber(Triple) >= 7) {
FPUKind = llvm::ARM::parseFPU("neon");
(void)llvm::ARM::getFPUFeatures(FPUKind, Features);
+ } else {
+ llvm::ARM::ArchKind ArchKind =
+ arm::getLLVMArchKindForARM(CPU, ArchName, Triple);
+ FPUKind = llvm::ARM::getDefaultFPU(CPU, ArchKind);
+ (void)llvm::ARM::getFPUFeatures(FPUKind, Features);
}
}
diff --git a/clang/test/Driver/arm-fpu-selection.s b/clang/test/Driver/arm-fpu-selection.s
new file mode 100644
index 0000000000000..c40e0f5b25852
--- /dev/null
+++ b/clang/test/Driver/arm-fpu-selection.s
@@ -0,0 +1,32 @@
+// REQUIRES: arm-registered-target
+// Ensures that when targeting an ARM target with an Asm file, clang
+// collects the features from the FPU. This is critical in the
+// activation of NEON for supported targets. The Cortex-R52 will be
+// used and tested for VFP and NEON Support
+
+// RUN: %clang -target arm-none-eabi -mcpu=cortex-r52 -c %s -o /dev/null | count 0
+// RUN: %clang -target arm-none-eabi -mcpu=cortex-r52 -c %s -o /dev/null -### 2>&1 | FileCheck --check-prefix=CHECK-TARGET-FEATURES %s
+
+// Check that NEON and VFPV5 have been activated when using Cortex-R52 when using cc1as
+// CHECK-TARGET-FEATURES: "-target-feature" "+vfp2sp"
+// CHECK-TARGET-FEATURES: "-target-feature" "+vfp3"
+// CHECK-TARGET-FEATURES: "-target-feature" "+fp-armv8"
+// CHECK-TARGET-FEATURES: "-target-feature" "+fp-armv8d16"
+// CHECK-TARGET-FEATURES: "-target-feature" "+fp-armv8d16sp"
+// CHECK-TARGET-FEATURES: "-target-feature" "+fp-armv8sp"
+// CHECK-TARGET-FEATURES: "-target-feature" "+neon"
+
+ vadd.f32 s0, s1, s2
+ vadd.f64 d0, d1, d2
+ vcvt.u32.f32 s0, s0, #1
+ vcvt.u32.f64 d0, d0, #1
+ vcvtb.f32.f16 s0, s1
+ vcvtb.f64.f16 d0, s1
+ vfma.f32 s0, s1, s2
+ vfma.f64 d0, d1, d2
+ vcvta.u32.f32 s0, s1
+ vcvta.u32.f64 s0, d1
+ vadd.f32 q0, q1, q2
+ vcvt.f32.f16 q0, d1
+ vfma.f32 q0, q1, q2
+ vcvta.u32.f32 q0, q1
diff --git a/clang/test/Driver/armv7-default-neon.s b/clang/test/Driver/armv7-default-neon.s
new file mode 100644
index 0000000000000..1b72cd65590bf
--- /dev/null
+++ b/clang/test/Driver/armv7-default-neon.s
@@ -0,0 +1,8 @@
+// Ensure that we can assemble NEON by just specifying an armv7
+// Apple or Windows target.
+
+// REQUIRES: arm-registered-target
+// RUN: %clang -c -target armv7-apple-darwin -o /dev/null %s
+// RUN: %clang -c -target armv7-windows -o /dev/null %s
+
+vadd.i32 q0, q0, q0
diff --git a/clang/test/Driver/armv7s-default-vfpv4.s b/clang/test/Driver/armv7s-default-vfpv4.s
new file mode 100644
index 0000000000000..bb7d8e860c400
--- /dev/null
+++ b/clang/test/Driver/armv7s-default-vfpv4.s
@@ -0,0 +1,6 @@
+// Ensure that we can assemble VFPv4 by just specifying an armv7s target.
+
+// REQUIRES: arm-registered-target
+// RUN: %clang -c -target armv7s-apple-darwin -o /dev/null %s
+
+vfma.f32 q1, q2, q3
diff --git a/clang/test/Driver/armv8.1m.main.s b/clang/test/Driver/armv8.1m.main.s
index 8fc94cf772fae..a660e56c2d1e7 100644
--- a/clang/test/Driver/armv8.1m.main.s
+++ b/clang/test/Driver/armv8.1m.main.s
@@ -8,21 +8,21 @@
# RUN: not %clang -c -target arm-none-none-eabi -march=armv8.1-m.main+fp -o /dev/null %s 2>%t
# RUN: FileCheck --check-prefix=ERROR-V81M_FP < %t %s
# RUN: not %clang -c -target arm-none-none-eabi -march=armv8.1-m.main+nofp -o /dev/null %s 2>%t
-# RUN: FileCheck --check-prefix=ERROR-V81M_FP < %t %s
+# RUN: FileCheck --check-prefix=ERROR-V81M_NOFP < %t %s
# RUN: not %clang -c -target arm-none-none-eabi -march=armv8.1-m.main+fp.dp -o /dev/null %s 2>%t
# RUN: FileCheck --check-prefix=ERROR-V81M_FPDP < %t %s
# RUN: not %clang -c -target arm-none-none-eabi -march=armv8.1-m.main+nofp.dp -o /dev/null %s 2>%t
-# RUN: FileCheck --check-prefix=ERROR-V81M_FPDP < %t %s
+# RUN: FileCheck --check-prefix=ERROR-V81M_NOFPDP < %t %s
# RUN: not %clang -c -target arm-none-none-eabi -march=armv8.1-m.main+mve -o /dev/null %s 2>%t
# RUN: FileCheck --check-prefix=ERROR-V81M_MVE < %t %s
# RUN: not %clang -c -target arm-none-none-eabi -march=armv8.1-m.main+nomve -o /dev/null %s 2>%t
-# RUN: FileCheck --check-prefix=ERROR-V81M_MVE < %t %s
+# RUN: FileCheck --check-prefix=ERROR-V81M_NOMVE < %t %s
# RUN: not %clang -c -target arm-none-none-eabi -march=armv8.1-m.main+mve+fp -o /dev/null %s 2>%t
# RUN: FileCheck --check-prefix=ERROR-V81M_MVE_FP < %t %s
# RUN: not %clang -c -target arm-none-none-eabi -march=armv8.1-m.main+mve.fp -o /dev/null %s 2>%t
# RUN: FileCheck --check-prefix=ERROR-V81M_MVEFP < %t %s
# RUN: not %clang -c -target arm-none-none-eabi -march=armv8.1-m.main+nomve.fp -o /dev/null %s 2>%t
-# RUN: FileCheck --check-prefix=ERROR-V81M_MVEFP < %t %s
+# RUN: FileCheck --check-prefix=ERROR-V81M_NOMVEFP < %t %s
.syntax unified
.thumb
@@ -35,28 +35,41 @@ qadd r0, r1, r2
# ERROR-V8M: :[[@LINE-1]]:1: error
# ERROR-V81M: :[[@LINE-2]]:1: error
# ERROR-V81M_FP: :[[@LINE-3]]:1: error
-# ERROR-V81M_FPDP: :[[@LINE-4]]:1: error
+# ERROR-V81M_NOFP: :[[@LINE-4]]:1: error
+# ERROR-V81M_FPDP: :[[@LINE-5]]:1: error
+# ERROR-V81M_NOFPDP: :[[@LINE-6]]:1: error
+# ERROR-V81M_NOMVE: :[[@LINE-7]]:1: error
+# ERROR-V81M_NOMVEFP: :[[@LINE-8]]:1: error
vadd.f16 s0, s1, s2
# ERROR-V8M: :[[@LINE-1]]:1: error
-# ERROR-V81M: :[[@LINE-2]]:1: error
-# ERROR-V81M_DSP: :[[@LINE-3]]:1: error
-# ERROR-V81M_MVE: :[[@LINE-4]]:1: error
+# ERROR-V81M_NOFP: :[[@LINE-2]]:1: error
vabs.f32 s0, s1
-# ERROR-V8M: :[[@LINE-1]]:1: error
-# ERROR-V81M: :[[@LINE-2]]:1: error
-# ERROR-V81M_DSP: :[[@LINE-3]]:1: error
-# ERROR-V81M_MVE: :[[@LINE-4]]:1: error
+# ERROR-V81M_NOFP: :[[@LINE-1]]:1: error
-vcmp.f64 d0,d1
+vabs.s32 q0, q1
# ERROR-V8M: :[[@LINE-1]]:1: error
# ERROR-V81M: :[[@LINE-2]]:1: error
# ERROR-V81M_DSP: :[[@LINE-3]]:1: error
# ERROR-V81M_FP: :[[@LINE-4]]:1: error
-# ERROR-V81M_MVE: :[[@LINE-5]]:1: error
-# ERROR-V81M_MVE_FP: :[[@LINE-6]]:1: error
-# ERROR-V81M_MVEFP: :[[@LINE-7]]:1: error
+# ERROR-V81M_NOFP: :[[@LINE-5]]:1: error
+# ERROR-V81M_FPDP: :[[@LINE-6]]:1: error
+# ERROR-V81M_NOFPDP: :[[@LINE-7]]:1: error
+# ERROR-V81M_NOMVE: :[[@LINE-8]]:1: error
+# ERROR-V81M_NOMVEFP: :[[@LINE-9]]:1: error
+
+vcmp.f64 d0,d1
+# ERROR-V81M: :[[@LINE-1]]:1: error
+# ERROR-V81M_DSP: :[[@LINE-2]]:1: error
+# ERROR-V81M_FP: :[[@LINE-3]]:1: error
+# ERROR-V81M_NOFP: :[[@LINE-4]]:1: error
+# ERROR-V81M_NOFPDP: :[[@LINE-5]]:1: error
+# ERROR-V81M_MVE: :[[@LINE-6]]:1: error
+# ERROR-V81M_NOMVE: :[[@LINE-7]]:1: error
+# ERROR-V81M_MVE_FP: :[[@LINE-8]]:1: error
+# ERROR-V81M_MVEFP: :[[@LINE-9]]:1: error
+# ERROR-V81M_NOMVEFP: :[[@LINE-10]]:1: error
asrl r0, r1, r2
# ERROR-V8M: :[[@LINE-1]]:1: error
@@ -64,6 +77,9 @@ asrl r0, r1, r2
# ERROR-V81M_DSP: :[[@LINE-3]]:1: error
# ERROR-V81M_FP: :[[@LINE-4]]:1: error
# ERROR-V81M_FPDP: :[[@LINE-5]]:1: error
+# ERROR-V81M_NOFPDP: :[[@LINE-6]]:1: error
+# ERROR-V81M_NOMVE: :[[@LINE-7]]:1: error
+# ERROR-V81M_NOMVEFP: :[[@LINE-8]]:1: error
vcadd.i8 q0, q1, q2, #90
# ERROR-V8M: :[[@LINE-1]]:1: error
@@ -71,3 +87,6 @@ vcadd.i8 q0, q1, q2, #90
# ERROR-V81M_DSP: :[[@LINE-3]]:1: error
# ERROR-V81M_FP: :[[@LINE-4]]:1: error
# ERROR-V81M_FPDP: :[[@LINE-5]]:1: error
+# ERROR-V81M_NOFPDP: :[[@LINE-6]]:1: error
+# ERROR-V81M_NOMVE: :[[@LINE-7]]:1: error
+# ERROR-V81M_NOMVEFP: :[[@LINE-8]]:1: error
|
This is a smaller, self-isolated change split out from #130623 - in order to fix a regression from 8fa0f0e as noted by @anemet in #122095 (comment); in a form that hopefully can be considered for backporting to 20.x. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that this would be breaking for code unintentionally relying on the features being omitted, but it makes sense to fix it nevertheless. I would have expected us to add the features all along.
What does gcc/binutils do in this scenario? I don't think it would change the decision here, but curious anyway.
@@ -38,6 +38,9 @@ Potentially Breaking Changes | |||
- Fix missing diagnostics for uses of declarations when performing typename access, | |||
such as when performing member access on a '[[deprecated]]' type alias. | |||
(#GH58547) | |||
- For ARM targets, when using cc1as, the features included in the selected CPU or | |||
Arch's FPU are now loaded and utilized. If you wish not to use a specific feature, | |||
this will need appending to the command line used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Arch's" -> "Architecture's"
this will need appending to the command line used.
"this" seems to refer to "specific feature" but what you'd want to append is the option to disable that feature. Perhaps instead:
"If you wish not to use a specific feature, an option to disable it will need appending to the command line used."
And we assume folks know, or can find out, that "+no..." is probably what they want.
Regardless of whether this is backportable or not (it probably isn't), it's probably nicer for future digging into the history to land this as a separate split out part from the rest of #130623 (which currently changes quite a lot of different things), as this is mostly an independent step required by the rest of that PR. But as it's @Stylie777 driving that change, it probably makes most sense if he takes over this PR, or we close it and he opens a separate one under his own name. |
I think we close it and I will open a new PR under my name next week. |
I was going to suggest in #130623 that we undid this part of the change and made it an NFC except for +[no]simd. But after looking at it this morning.. I wasn't sure. It at least it deserved to be its own change :) I believe that if it used I would expect that for the same options, assembly and C compiles would have the same features enabled. That sounds like a sensible way for a compiler to behave. But I would expect the preprocessor and assembler to have the same features when compiling the same file but that doesn't appear to always happen at the moment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding my review here so it can be tracked. I will make the changes in the PR I will open today @mstorsjo so these changes can be merged.
|
||
// REQUIRES: arm-registered-target | ||
// RUN: %clang -c -target armv7-apple-darwin -o /dev/null %s | ||
// RUN: %clang -c -target armv7-windows -o /dev/null %s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this test, and armv7s-default-vfpv4.s
, we should also test to ensure the expected -target-feature
arguments are passed to cc1as
by the Driver. That way we can ensure this change is having the desired effect. I did similar in arm-fpu-selection.s
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, sure.
FWIW, the prime motivation behind all this, is to make sure that these testcases still compile successfully in this form, when downstream applies patches like https://salsa.debian.org/pkg-llvm-team/llvm-toolchain/-/blob/19/debian/patches/clang-arm-default-vfp3-on-armv7a.patch?ref_type=heads.
Previously, testing that was a bit hard as we didn't pass the -target-feature
flags when calling -cc1as
, but as we do now, we can definitely test that (and the actual RUN
when we try to assemble things, which isn't really kosher in clang/test/Driver
, perhaps even could be skipped - although it's nice to have for extra test coverage).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I expect the downstream patch you have listed there will need updating with the -target-feature
lines being checked as Armv7 defaults to NEON for the FPU. However I think for upstream consistency there should be there. If we feel the compilation step can be removed, then I think it's fair to do at this stage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, the downstream patch doesn't need to update these tests; the whole point of the code added in #122095 is to make sure that NEON still does get enabled by default for the windows/darwin targets. (I just retested building your PR 134612 together with that downstream patch, and those tests still pass.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes ok I am with you.
These changes were originally opened under llvm#134366 but were moved to this PR to allow me to drive further changes as I was the original author. This commit responds to comments made on the original PR.
I have created #134612 so I will now close this PR. @DavidSpickett your comments for the Release Notes have been worked on in the new PR. |
Previously, FPU features were not collected when forming a list of features for the Assembler.
This fixes a regression from 8fa0f0e, which caused VFPv4 to be unavailable if assembling for an armv7s-apple-darwin target.