-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[libc] Extend fputil::sqrt to use floating point instructions for arm32. #134499
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-libc Author: None (lntue) ChangesFull diff: https://github.com/llvm/llvm-project/pull/134499.diff 3 Files Affected:
diff --git a/libc/src/__support/FPUtil/aarch64/sqrt.h b/libc/src/__support/FPUtil/aarch64/sqrt.h
index b69267ff91f5c..cfd3579f621d0 100644
--- a/libc/src/__support/FPUtil/aarch64/sqrt.h
+++ b/libc/src/__support/FPUtil/aarch64/sqrt.h
@@ -12,8 +12,9 @@
#include "src/__support/common.h"
#include "src/__support/macros/config.h"
#include "src/__support/macros/properties/architectures.h"
+#include "src/__support/macros/properties/cpu_features.h"
-#if !defined(LIBC_TARGET_ARCH_IS_AARCH64)
+#if !defined(LIBC_TARGET_ARCH_IS_ANY_ARM)
#error "Invalid include"
#endif
@@ -22,17 +23,21 @@
namespace LIBC_NAMESPACE_DECL {
namespace fputil {
+#ifdef LIBC_TARGET_CPU_HAS_ARM_FPU_FLOAT
template <> LIBC_INLINE float sqrt<float>(float x) {
float y;
- __asm__ __volatile__("fsqrt %s0, %s1\n\t" : "=w"(y) : "w"(x));
+ asm("fsqrt %s0, %s1\n\t" : "=w"(y) : "w"(x));
return y;
}
+#endif // LIBC_TARGET_CPU_HAS_ARM_FPU_FLOAT
+#ifdef LIBC_TARGET_CPU_HAS_ARM_FPU_DOUBLE
template <> LIBC_INLINE double sqrt<double>(double x) {
double y;
- __asm__ __volatile__("fsqrt %d0, %d1\n\t" : "=w"(y) : "w"(x));
+ asm("fsqrt %d0, %d1\n\t" : "=w"(y) : "w"(x));
return y;
}
+#endif // LIBC_TARGET_CPU_HAS_ARM_FPU_DOUBLE
} // namespace fputil
} // namespace LIBC_NAMESPACE_DECL
diff --git a/libc/src/__support/FPUtil/sqrt.h b/libc/src/__support/FPUtil/sqrt.h
index eb86ddfa89d8e..ca9890600168f 100644
--- a/libc/src/__support/FPUtil/sqrt.h
+++ b/libc/src/__support/FPUtil/sqrt.h
@@ -14,7 +14,7 @@
#if defined(LIBC_TARGET_ARCH_IS_X86_64) && defined(LIBC_TARGET_CPU_HAS_SSE2)
#include "x86_64/sqrt.h"
-#elif defined(LIBC_TARGET_ARCH_IS_AARCH64)
+#elif defined(LIBC_TARGET_ARCH_IS_ANY_ARM)
#include "aarch64/sqrt.h"
#elif defined(LIBC_TARGET_ARCH_IS_ANY_RISCV)
#include "riscv/sqrt.h"
diff --git a/libc/src/__support/macros/properties/cpu_features.h b/libc/src/__support/macros/properties/cpu_features.h
index 1714775ca334d..47dd8a03613e7 100644
--- a/libc/src/__support/macros/properties/cpu_features.h
+++ b/libc/src/__support/macros/properties/cpu_features.h
@@ -42,18 +42,30 @@
#define LIBC_TARGET_CPU_HAS_AVX512BW
#endif
+#if defined(__ARM_FP)
+#if (__ARM_FP & 0x2)
+#define LIBC_TARGET_CPU_HAS_ARM_FPU_HALF
+#endif // LIBC_TARGET_CPU_HAS_ARM_FPU_HALF
+#if (__ARM_FP & 0x4)
+#define LIBC_TARGET_CPU_HAS_ARM_FPU_FLOAT
+#endif // LIBC_TARGET_CPU_HAS_ARM_FPU_FLOAT
+#if (__ARM_FP & 0x8)
+#define LIBC_TARGET_CPU_HAS_ARM_FPU_DOUBLE
+#endif // LIBC_TARGET_CPU_HAS_ARM_FPU_DOUBLE
+#endif // __ARM_FP
+
#if defined(__ARM_FEATURE_FMA) || (defined(__AVX2__) && defined(__FMA__)) || \
defined(__NVPTX__) || defined(__AMDGPU__) || defined(__LIBC_RISCV_USE_FMA)
#define LIBC_TARGET_CPU_HAS_FMA
// Provide a more fine-grained control of FMA instruction for ARM targets.
#if defined(__ARM_FP)
-#if (__ARM_FP & 0x2)
+#if defined(LIBC_TARGET_CPU_HAS_ARM_FPU_HALF)
#define LIBC_TARGET_CPU_HAS_FMA_HALF
#endif // LIBC_TARGET_CPU_HAS_FMA_HALF
-#if (__ARM_FP & 0x4)
+#if defined(LIBC_TARGET_CPU_HAS_ARM_FPU_FLOAT)
#define LIBC_TARGET_CPU_HAS_FMA_FLOAT
#endif // LIBC_TARGET_CPU_HAS_FMA_FLOAT
-#if (__ARM_FP & 0x8)
+#if defined(LIBC_TARGET_CPU_HAS_ARM_FPU_DOUBLE)
#define LIBC_TARGET_CPU_HAS_FMA_DOUBLE
#endif // LIBC_TARGET_CPU_HAS_FMA_DOUBLE
#else
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this approach better than using a builtin, like it's done for fma?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The __builtin_elementwise_sqrt
seems to work properly, but that's clang only. https://godbolt.org/z/3s9z6G1a9.
Updated to use |
Does it have to be element wise? Isn't this only required if the argument is a vector? |
Yes, |
I see. Can we move the implementation out of the platform-specific dirs? Something like https://github.com/llvm/llvm-project/blob/main/libc/src/__support/FPUtil/FMA.h but for sqrt? |
It's a bit surprising (and unexpected) for the 32-bit ARM implementation to be in a directory called |
Done. I also make riscv support to match arm on the current set up. |
Done. |
The AMD docs say that f32 sqrt has 1ULP and f64 has (2**29)ULP accuracy. Currently AMDGPU uses the builtins, so we could probably remove |
are |
I was talking with Tue yesterday about this. Does it mean that the FMA implementation is incorrect? I see we also have a |
By incorrect, you mean not respecting the errno requirements (sometime) then yes, I think that's the current behavior. And it needs to be fixed and clean up a bit. I'll need to separate fma used for complete fma functions, and fma used for internal computations. Right now they are a bit mixing with each other. |
Thanks for the explanation, and sorry for dragging this on. I thought we should be following the pattern used for fma. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
#if defined(LIBC_TARGET_CPU_HAS_FPU_FLOAT) || \ | ||
defined(LIBC_TARGET_CPU_HAS_FPU_DOUBLE) | ||
|
||
#if __has_builtin(__builtin_elementwise_sqrt) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clang will always evaluate this to 1
and if the target doesn't support this instruction natively it'll lower __builtin_elementwise_sqrt
to a call to sqrt
(or sqrtf
) which is a problem when __builtin_elementwise_sqrt
is used to implement sqrt
(or sqrtf
).
I think we'll need a CMake check that compiles __builtin_elementwise_sqrt
and checks if the resulting object file has a reference to sqrt
. https://github.com/llvm/llvm-project/blob/36cb81cced6cb9ab8a68a4313963d4ccf7669e76/compiler-rt/cmake/Modules/CheckSectionExists.cmake is an example of such a check. Let me know if you need help with this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
even __ARM_FP
flag does not guard this correctly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like __ARM_FP
is sufficient to check if sqrt
is supported by the target (at least based on my local testing) so ignore my comment.
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/183/builds/12004 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/188/builds/12907 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/182/builds/9788 Here is the relevant piece of the build log for the reference
|
No description provided.