-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[libc][math] Implement fast pass for double precision atan2 with 1 ULP errors. #100648
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-libc Author: None (lntue) ChangesPatch is 26.31 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/100648.diff 14 Files Affected:
diff --git a/libc/config/darwin/arm/entrypoints.txt b/libc/config/darwin/arm/entrypoints.txt
index 3b50666b968e8..38eace26f10ab 100644
--- a/libc/config/darwin/arm/entrypoints.txt
+++ b/libc/config/darwin/arm/entrypoints.txt
@@ -120,6 +120,7 @@ set(TARGET_LIBM_ENTRYPOINTS
libc.src.math.acoshf
libc.src.math.asinf
libc.src.math.asinhf
+ libc.src.math.atan2
libc.src.math.atan2f
libc.src.math.atanf
libc.src.math.atanhf
diff --git a/libc/config/linux/aarch64/entrypoints.txt b/libc/config/linux/aarch64/entrypoints.txt
index 2334fed773702..b2c5341b39d27 100644
--- a/libc/config/linux/aarch64/entrypoints.txt
+++ b/libc/config/linux/aarch64/entrypoints.txt
@@ -343,6 +343,7 @@ set(TARGET_LIBM_ENTRYPOINTS
libc.src.math.acoshf
libc.src.math.asinf
libc.src.math.asinhf
+ libc.src.math.atan2
libc.src.math.atan2f
libc.src.math.atanf
libc.src.math.atanhf
diff --git a/libc/config/linux/arm/entrypoints.txt b/libc/config/linux/arm/entrypoints.txt
index 61ee68ac66082..8e77105fdb13e 100644
--- a/libc/config/linux/arm/entrypoints.txt
+++ b/libc/config/linux/arm/entrypoints.txt
@@ -213,6 +213,7 @@ set(TARGET_LIBM_ENTRYPOINTS
libc.src.math.acoshf
libc.src.math.asinf
libc.src.math.asinhf
+ libc.src.math.atan2
libc.src.math.atan2f
libc.src.math.atanf
libc.src.math.atanhf
diff --git a/libc/config/linux/riscv/entrypoints.txt b/libc/config/linux/riscv/entrypoints.txt
index 07466805b34cd..e3ed5a5988fed 100644
--- a/libc/config/linux/riscv/entrypoints.txt
+++ b/libc/config/linux/riscv/entrypoints.txt
@@ -365,6 +365,7 @@ set(TARGET_LIBM_ENTRYPOINTS
libc.src.math.acoshf
libc.src.math.asinf
libc.src.math.asinhf
+ libc.src.math.atan2
libc.src.math.atan2f
libc.src.math.atanf
libc.src.math.atanhf
diff --git a/libc/config/linux/x86_64/entrypoints.txt b/libc/config/linux/x86_64/entrypoints.txt
index 035ceb8ca57bf..96f975520fe61 100644
--- a/libc/config/linux/x86_64/entrypoints.txt
+++ b/libc/config/linux/x86_64/entrypoints.txt
@@ -365,6 +365,7 @@ set(TARGET_LIBM_ENTRYPOINTS
libc.src.math.acoshf
libc.src.math.asinf
libc.src.math.asinhf
+ libc.src.math.atan2
libc.src.math.atan2f
libc.src.math.atanf
libc.src.math.atanhf
diff --git a/libc/config/windows/entrypoints.txt b/libc/config/windows/entrypoints.txt
index b6aced83c5815..06c3682255c45 100644
--- a/libc/config/windows/entrypoints.txt
+++ b/libc/config/windows/entrypoints.txt
@@ -118,6 +118,7 @@ set(TARGET_LIBM_ENTRYPOINTS
libc.src.math.acoshf
libc.src.math.asinf
libc.src.math.asinhf
+ libc.src.math.atan2
libc.src.math.atan2f
libc.src.math.atanf
libc.src.math.atanhf
diff --git a/libc/docs/math/index.rst b/libc/docs/math/index.rst
index e7db07f071a8c..9f88b4d9a44b3 100644
--- a/libc/docs/math/index.rst
+++ b/libc/docs/math/index.rst
@@ -260,7 +260,7 @@ Higher Math Functions
+-----------+------------------+-----------------+------------------------+----------------------+------------------------+------------------------+----------------------------+
| atan | |check| | | | | | 7.12.4.3 | F.10.1.3 |
+-----------+------------------+-----------------+------------------------+----------------------+------------------------+------------------------+----------------------------+
-| atan2 | |check| | | | | | 7.12.4.4 | F.10.1.4 |
+| atan2 | |check| | 1 ULP | | | | 7.12.4.4 | F.10.1.4 |
+-----------+------------------+-----------------+------------------------+----------------------+------------------------+------------------------+----------------------------+
| atan2pi | | | | | | 7.12.4.11 | F.10.1.11 |
+-----------+------------------+-----------------+------------------------+----------------------+------------------------+------------------------+----------------------------+
diff --git a/libc/spec/stdc.td b/libc/spec/stdc.td
index d5a5cb6fedb4b..9c84accd72cff 100644
--- a/libc/spec/stdc.td
+++ b/libc/spec/stdc.td
@@ -692,6 +692,7 @@ def StdC : StandardSpec<"stdc"> {
FunctionSpec<"atanf", RetValSpec<FloatType>, [ArgSpec<FloatType>]>,
+ FunctionSpec<"atan2", RetValSpec<DoubleType>, [ArgSpec<DoubleType>, ArgSpec<DoubleType>]>,
FunctionSpec<"atan2f", RetValSpec<FloatType>, [ArgSpec<FloatType>, ArgSpec<FloatType>]>,
FunctionSpec<"acoshf", RetValSpec<FloatType>, [ArgSpec<FloatType>]>,
diff --git a/libc/src/math/generic/CMakeLists.txt b/libc/src/math/generic/CMakeLists.txt
index 7280add7c8bb0..ff2e02878ffe4 100644
--- a/libc/src/math/generic/CMakeLists.txt
+++ b/libc/src/math/generic/CMakeLists.txt
@@ -3848,6 +3848,26 @@ add_entrypoint_object(
libc.src.__support.macros.optimization
)
+add_entrypoint_object(
+ atan2
+ SRCS
+ atan2.cpp
+ HDRS
+ ../atan2.h
+ COMPILE_OPTIONS
+ -O3
+ DEPENDS
+ .inv_trigf_utils
+ libc.src.__support.FPUtil.double_double
+ libc.src.__support.FPUtil.dyadic_float
+ libc.src.__support.FPUtil.fp_bits
+ libc.src.__support.FPUtil.multiply_add
+ libc.src.__support.FPUtil.nearest_integer
+ libc.src.__support.FPUtil.polyeval
+ libc.src.__support.FPUtil.rounding_mode
+ libc.src.__support.macros.optimization
+)
+
add_entrypoint_object(
scalblnf16
SRCS
diff --git a/libc/src/math/generic/atan2.cpp b/libc/src/math/generic/atan2.cpp
new file mode 100644
index 0000000000000..8d5717a7afec2
--- /dev/null
+++ b/libc/src/math/generic/atan2.cpp
@@ -0,0 +1,314 @@
+//===-- Double-precision atan2 function -----------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "src/math/atan2.h"
+#include "inv_trigf_utils.h"
+#include "src/__support/FPUtil/FPBits.h"
+#include "src/__support/FPUtil/PolyEval.h"
+#include "src/__support/FPUtil/double_double.h"
+#include "src/__support/FPUtil/multiply_add.h"
+#include "src/__support/FPUtil/nearest_integer.h"
+#include "src/__support/FPUtil/rounding_mode.h"
+#include "src/__support/macros/config.h"
+#include "src/__support/macros/optimization.h" // LIBC_UNLIKELY
+
+namespace LIBC_NAMESPACE_DECL {
+
+namespace {
+
+using DoubleDouble = fputil::DoubleDouble;
+// using Float128 = fputil::DyadicFloat<128>;
+
+// atan(i/64) with i = 0..64, generated by Sollya with:
+// > for i from 0 to 64 do {
+// a = round(atan(i/64), D, RN);
+// b = round(atan(i/64) - a, D, RN);
+// print("{", b, ",", a, "},");
+// };
+constexpr fputil::DoubleDouble ATAN_I[65] = {
+ {0.0, 0.0},
+ {-0x1.220c39d4dff5p-61, 0x1.fff555bbb729bp-7},
+ {-0x1.5ec431444912cp-60, 0x1.ffd55bba97625p-6},
+ {-0x1.86ef8f794f105p-63, 0x1.7fb818430da2ap-5},
+ {-0x1.c934d86d23f1dp-60, 0x1.ff55bb72cfdeap-5},
+ {0x1.ac4ce285df847p-58, 0x1.3f59f0e7c559dp-4},
+ {-0x1.cfb654c0c3d98p-58, 0x1.7ee182602f10fp-4},
+ {0x1.f7b8f29a05987p-58, 0x1.be39ebe6f07c3p-4},
+ {-0x1.cd37686760c17p-59, 0x1.fd5ba9aac2f6ep-4},
+ {-0x1.b485914dacf8cp-59, 0x1.1e1fafb043727p-3},
+ {0x1.61a3b0ce9281bp-57, 0x1.3d6eee8c6626cp-3},
+ {-0x1.054ab2c010f3dp-58, 0x1.5c9811e3ec26ap-3},
+ {0x1.347b0b4f881cap-58, 0x1.7b97b4bce5b02p-3},
+ {0x1.cf601e7b4348ep-59, 0x1.9a6a8e96c8626p-3},
+ {0x1.17b10d2e0e5abp-61, 0x1.b90d7529260a2p-3},
+ {0x1.c648d1534597ep-57, 0x1.d77d5df205736p-3},
+ {0x1.8ab6e3cf7afbdp-57, 0x1.f5b75f92c80ddp-3},
+ {0x1.62e47390cb865p-56, 0x1.09dc597d86362p-2},
+ {0x1.30ca4748b1bf9p-57, 0x1.18bf5a30bf178p-2},
+ {-0x1.077cdd36dfc81p-56, 0x1.278372057ef46p-2},
+ {-0x1.963a544b672d8p-57, 0x1.362773707ebccp-2},
+ {-0x1.5d5e43c55b3bap-56, 0x1.44aa436c2af0ap-2},
+ {-0x1.2566480884082p-57, 0x1.530ad9951cd4ap-2},
+ {-0x1.a725715711fp-56, 0x1.614840309cfe2p-2},
+ {-0x1.c63aae6f6e918p-56, 0x1.6f61941e4def1p-2},
+ {0x1.69c885c2b249ap-56, 0x1.7d5604b63b3f7p-2},
+ {0x1.b6d0ba3748fa8p-56, 0x1.8b24d394a1b25p-2},
+ {0x1.9e6c988fd0a77p-56, 0x1.98cd5454d6b18p-2},
+ {-0x1.24dec1b50b7ffp-56, 0x1.a64eec3cc23fdp-2},
+ {0x1.ae187b1ca504p-56, 0x1.b3a911da65c6cp-2},
+ {-0x1.cc1ce70934c34p-56, 0x1.c0db4c94ec9fp-2},
+ {-0x1.a2cfa4418f1adp-56, 0x1.cde53432c1351p-2},
+ {0x1.a2b7f222f65e2p-56, 0x1.dac670561bb4fp-2},
+ {0x1.0e53dc1bf3435p-56, 0x1.e77eb7f175a34p-2},
+ {-0x1.a3992dc382a23p-57, 0x1.f40dd0b541418p-2},
+ {-0x1.b32c949c9d593p-55, 0x1.0039c73c1a40cp-1},
+ {-0x1.d5b495f6349e6p-56, 0x1.0657e94db30dp-1},
+ {0x1.974fa13b5404fp-58, 0x1.0c6145b5b43dap-1},
+ {-0x1.2bdaee1c0ee35p-58, 0x1.1255d9bfbd2a9p-1},
+ {0x1.c621cec00c301p-55, 0x1.1835a88be7c13p-1},
+ {-0x1.928df287a668fp-58, 0x1.1e00babdefeb4p-1},
+ {0x1.c421c9f38224ep-57, 0x1.23b71e2cc9e6ap-1},
+ {-0x1.09e73b0c6c087p-56, 0x1.2958e59308e31p-1},
+ {0x1.c5d5e9ff0cf8dp-55, 0x1.2ee628406cbcap-1},
+ {0x1.1021137c71102p-55, 0x1.345f01cce37bbp-1},
+ {-0x1.2304331d8bf46p-55, 0x1.39c391cd4171ap-1},
+ {0x1.ecf8b492644fp-56, 0x1.3f13fb89e96f4p-1},
+ {-0x1.f76d0163f79c8p-56, 0x1.445065b795b56p-1},
+ {0x1.2419a87f2a458p-56, 0x1.4978fa3269ee1p-1},
+ {0x1.4a33dbeb3796cp-55, 0x1.4e8de5bb6ec04p-1},
+ {-0x1.1bb74abda520cp-55, 0x1.538f57b89061fp-1},
+ {-0x1.5e5c9d8c5a95p-56, 0x1.587d81f732fbbp-1},
+ {0x1.0028e4bc5e7cap-57, 0x1.5d58987169b18p-1},
+ {-0x1.2b785350ee8c1p-57, 0x1.6220d115d7b8ep-1},
+ {-0x1.6ea6febe8bbbap-56, 0x1.66d663923e087p-1},
+ {-0x1.a80386188c50ep-55, 0x1.6b798920b3d99p-1},
+ {-0x1.8c34d25aadef6p-56, 0x1.700a7c5784634p-1},
+ {0x1.7b2a6165884a1p-59, 0x1.748978fba8e0fp-1},
+ {0x1.406a08980374p-55, 0x1.78f6bbd5d315ep-1},
+ {0x1.560821e2f3aa9p-55, 0x1.7d528289fa093p-1},
+ {-0x1.bf76229d3b917p-56, 0x1.819d0b7158a4dp-1},
+ {0x1.6b66e7fc8b8c3p-57, 0x1.85d69576cc2c5p-1},
+ {-0x1.55b9a5e177a1bp-55, 0x1.89ff5ff57f1f8p-1},
+ {-0x1.ec182ab042f61p-56, 0x1.8e17aa99cc05ep-1},
+ {0x1.1a62633145c07p-55, 0x1.921fb54442d18p-1},
+};
+
+// Approximate atan(x) for |x| <= 2^-7.
+// Using degree-9 Taylor polynomial:
+// P = x - x^3/3 + x^5/5 -x^7/7 + x^9/9;
+// Then the absolute error is bounded by:
+// |atan(x) - P(x)| < |x|^11/11 < 2^(-7*11) / 11 < 2^-80.
+// And the relative error is bounded by:
+// |(atan(x) - P(x))/atan(x)| < |x|^10 / 10 < 2^-73.
+// For x = x_hi + x_lo, fully expand the polynomial and drop any terms less than
+// ulp(x_hi^3 / 3) gives us:
+// P(x) ~ x_hi - x_hi^3/3 + x_hi^5/5 - x_hi^7/7 + x_hi^9/9 +
+// + x_lo * (1 - x_hi^2 + x_hi^4)
+DoubleDouble atan_eval(const DoubleDouble &x) {
+ DoubleDouble p;
+ p.hi = x.hi;
+ double x_hi_sq = x.hi * x.hi;
+ // c0 ~ x_hi^2 * 1/5 - 1/3
+ double c0 = fputil::multiply_add(x_hi_sq, 0x1.999999999999ap-3,
+ -0x1.5555555555555p-2);
+ // c1 ~ x_hi^2 * 1/9 - 1/7
+ double c1 = fputil::multiply_add(x_hi_sq, 0x1.c71c71c71c71cp-4,
+ -0x1.2492492492492p-3);
+ // x_hi^3
+ double x_hi_3 = x_hi_sq * x.hi;
+ // x_hi^4
+ double x_hi_4 = x_hi_sq * x_hi_sq;
+ // d0 ~ 1/3 - x_hi^2 / 5 + x_hi^4 / 7 - x_hi^6 / 9
+ double d0 = fputil::multiply_add(x_hi_4, c1, c0);
+ // x_lo - x_lo * x_hi^2 + x_lo * x_hi^4
+ double d1 = fputil::multiply_add(x_hi_4 - x_hi_sq, x.lo, x.lo);
+ // p.lo ~ -x_hi^3/3 + x_hi^5/5 - x_hi^7/7 + x_hi^9/9 +
+ // + x_lo * (1 - x_hi^2 + x_hi^4)
+ p.lo = fputil::multiply_add(x_hi_3, d0, d1);
+ return p;
+}
+
+} // anonymous namespace
+
+// There are several range reduction steps we can take for atan2(y, x) as
+// follow:
+
+// * Range reduction 1: signness
+// atan2(y, x) will return a number between -PI and PI representing the angle
+// forming by the 0x axis and the vector (x, y) on the 0xy-plane.
+// In particular, we have that:
+// atan2(y, x) = atan( y/x ) if x >= 0 and y >= 0 (I-quadrant)
+// = pi + atan( y/x ) if x < 0 and y >= 0 (II-quadrant)
+// = -pi + atan( y/x ) if x < 0 and y < 0 (III-quadrant)
+// = atan( y/x ) if x >= 0 and y < 0 (IV-quadrant)
+// Since atan function is odd, we can use the formula:
+// atan(-u) = -atan(u)
+// to adjust the above conditions a bit further:
+// atan2(y, x) = atan( |y|/|x| ) if x >= 0 and y >= 0 (I-quadrant)
+// = pi - atan( |y|/|x| ) if x < 0 and y >= 0 (II-quadrant)
+// = -pi + atan( |y|/|x| ) if x < 0 and y < 0 (III-quadrant)
+// = -atan( |y|/|x| ) if x >= 0 and y < 0 (IV-quadrant)
+// Which can be simplified to:
+// atan2(y, x) = sign(y) * atan( |y|/|x| ) if x >= 0
+// = sign(y) * (pi - atan( |y|/|x| )) if x < 0
+
+// * Range reduction 2: reciprocal
+// Now that the argument inside atan is positive, we can use the formula:
+// atan(1/x) = pi/2 - atan(x)
+// to make the argument inside atan <= 1 as follow:
+// atan2(y, x) = sign(y) * atan( |y|/|x|) if 0 <= |y| <= x
+// = sign(y) * (pi/2 - atan( |x|/|y| ) if 0 <= x < |y|
+// = sign(y) * (pi - atan( |y|/|x| )) if 0 <= |y| <= -x
+// = sign(y) * (pi/2 + atan( |x|/|y| )) if 0 <= -x < |y|
+
+// * Range reduction 3: look up table.
+// After the previous two range reduction steps, we reduce the problem to
+// compute atan(u) with 0 <= u <= 1, or to be precise:
+// atan( n / d ) where n = min(|x|, |y|) and d = max(|x|, |y|).
+// An accurate polynomial approximation for the whole [0, 1] input range will
+// require a very large degree. To make it more efficient, we reduce the input
+// range further by finding an integer idx such that:
+// | n/d - idx/64 | <= 1/128.
+// In particular,
+// idx := round(2^6 * n/d)
+// Then for the fast pass, we find a polynomial approximation for:
+// atan( n/d ) ~ atan( idx/64 ) + (n/d - idx/64) * Q(n/d - idx/64)
+// For the accurate pass, we use the addition formula:
+// atan( n/d ) - atan( idx/64 ) = atan( (n/d - idx/64)/(1 + (n*idx)/(64*d)) )
+// = atan( (n - d*(idx/64))/(d + n*(idx/64)) )
+// And for the fast pass, we use degree-9 Taylor polynomial to compute the RHS:
+// atan(u) ~ P(u) = u - u^3/3 + u^5/5 - u^7/7 + u^9/9
+// with absolute errors bounded by:
+// |atan(u) - P(u)| < |u|^11 / 11 < 2^-80
+// and relative errors bounded by:
+// |(atan(u) - P(u)) / P(u)| < u^10 / 11 < 2^-73.
+
+LLVM_LIBC_FUNCTION(double, atan2, (double y, double x)) {
+ using FPBits = typename fputil::FPBits<double>;
+
+ constexpr double IS_NEG[2] = {1.0, -1.0};
+ constexpr DoubleDouble ZERO = {0.0, 0.0};
+ constexpr DoubleDouble MZERO = {-0.0, -0.0};
+ constexpr DoubleDouble PI = {0x1.1a62633145c07p-53, 0x1.921fb54442d18p+1};
+ constexpr DoubleDouble MPI = {-0x1.1a62633145c07p-53, -0x1.921fb54442d18p+1};
+ constexpr DoubleDouble PI_OVER_2 = {0x1.1a62633145c07p-54,
+ 0x1.921fb54442d18p0};
+ constexpr DoubleDouble MPI_OVER_2 = {-0x1.1a62633145c07p-54,
+ -0x1.921fb54442d18p0};
+ constexpr DoubleDouble PI_OVER_4 = {0x1.1a62633145c07p-55,
+ 0x1.921fb54442d18p-1};
+ constexpr DoubleDouble THREE_PI_OVER_4 = {0x1.a79394c9e8a0ap-54,
+ 0x1.2d97c7f3321d2p+1};
+ // Adjustment for constant term:
+ // CONST_ADJ[x_sign][y_sign][recip]
+ constexpr DoubleDouble CONST_ADJ[2][2][2] = {
+ {{ZERO, MPI_OVER_2}, {MZERO, MPI_OVER_2}},
+ {{MPI, PI_OVER_2}, {MPI, PI_OVER_2}}};
+
+ FPBits x_bits(x), y_bits(y);
+ bool x_sign = x_bits.sign().is_neg();
+ bool y_sign = y_bits.sign().is_neg();
+ x_bits = x_bits.abs();
+ y_bits = y_bits.abs();
+ uint64_t x_abs = x_bits.uintval();
+ uint64_t y_abs = y_bits.uintval();
+ bool recip = x_abs < y_abs;
+ uint64_t min_abs = recip ? x_abs : y_abs;
+ uint64_t max_abs = !recip ? x_abs : y_abs;
+ unsigned min_exp = static_cast<unsigned>(min_abs >> FPBits::FRACTION_LEN);
+ unsigned max_exp = static_cast<unsigned>(max_abs >> FPBits::FRACTION_LEN);
+
+ double num = FPBits(min_abs).get_val();
+ double den = FPBits(max_abs).get_val();
+
+ // Check for exceptional cases, whether inputs are 0, inf, nan, or close to
+ // overflow, or close to underflow.
+ if (LIBC_UNLIKELY(max_exp > 0x7ffU - 128U || min_exp < 128U)) {
+ if (x_bits.is_nan() || y_bits.is_nan())
+ return FPBits::quiet_nan().get_val();
+ unsigned x_except = x_abs == 0 ? 0 : (FPBits(x_abs).is_inf() ? 2 : 1);
+ unsigned y_except = y_abs == 0 ? 0 : (FPBits(y_abs).is_inf() ? 2 : 1);
+
+ // Exceptional cases:
+ // EXCEPT[y_except][x_except][x_is_neg]
+ // with x_except & y_except:
+ // 0: zero
+ // 1: finite, non-zero
+ // 2: infinity
+ constexpr DoubleDouble EXCEPTS[3][3][2] = {
+ {{ZERO, PI}, {ZERO, PI}, {ZERO, PI}},
+ {{PI_OVER_2, PI_OVER_2}, {ZERO, ZERO}, {ZERO, PI}},
+ {{PI_OVER_2, PI_OVER_2},
+ {PI_OVER_2, PI_OVER_2},
+ {PI_OVER_4, THREE_PI_OVER_4}},
+ };
+
+ if ((x_except != 1) || (y_except != 1)) {
+ DoubleDouble r = EXCEPTS[y_except][x_except][x_sign];
+ return fputil::multiply_add(IS_NEG[y_sign], r.hi, IS_NEG[y_sign] * r.lo);
+ }
+ bool scale_up = min_exp < 128U;
+ bool scale_down = max_exp > 0x7ffU - 128U;
+ // At least one input is denormal, multiply both numerator and denominator
+ // by some large enough power of 2 to normalize denormal inputs.
+ if (scale_up) {
+ num *= 0x1.0p64;
+ if (!scale_down)
+ den *= 0x1.0p64;
+ } else if (scale_down) {
+ den *= 0x1.0p-64;
+ if (!scale_up)
+ num *= 0x1.0p-64;
+ }
+
+ min_abs = FPBits(num).uintval();
+ max_abs = FPBits(den).uintval();
+ min_exp = static_cast<unsigned>(min_abs >> FPBits::FRACTION_LEN);
+ max_exp = static_cast<unsigned>(max_abs >> FPBits::FRACTION_LEN);
+ }
+
+ double final_sign = IS_NEG[(x_sign != y_sign) != recip];
+ DoubleDouble const_term = CONST_ADJ[x_sign][y_sign][recip];
+ unsigned exp_diff = max_exp - min_exp;
+ // We have the following bound for normalized n and d:
+ // 2^(-exp_diff - 1) < n/d < 2^(-exp_diff + 1).
+ if (LIBC_UNLIKELY(exp_diff > 54)) {
+ return fputil::multiply_add(final_sign, const_term.hi,
+ final_sign * (const_term.lo + num / den));
+ }
+
+ double k = fputil::nearest_integer(64.0 * num / den);
+ unsigned idx = static_cast<unsigned>(k);
+ // k = idx / 64
+ k *= 0x1.0p-6;
+
+ // Range reduction:
+ // atan(n/d) - atan(k/64) = atan((n/d - k/64) / (1 + (n/d) * (k/64)))
+ // = atan((n - d * k/64)) / (d + n * k/64))
+ DoubleDouble num_k = fputil::exact_mult(num, k);
+ DoubleDouble den_k = fputil::exact_mult(den, k);
+
+ // num_dd = n - d * k
+ DoubleDouble num_dd = fputil::exact_add(num - den_k.hi, -den_k.lo);
+ // den_dd = d + n * k
+ DoubleDouble den_dd = fputil::exact_add(den, num_k.hi);
+ den_dd.lo += num_k.lo;
+
+ // q = (n - d * k) / (d + n * k)
+ DoubleDouble q = fputil::div(num_dd, den_dd);
+ // p ~ atan(q)
+ DoubleDouble p = atan_eval(q);
+
+ DoubleDouble r = fputil::add(const_term, fputil::add(ATAN_I[idx], p));
+ r.hi *= final_sign;
+ r.lo *= final_sign;
+
+ return r.hi + r.lo;
+}
+
+} // namespace LIBC_NAMESPACE_DECL
diff --git a/libc/test/src/math/CMakeLists.txt b/libc/test/src/math/CMakeLists.txt
index 3ad5d98858165..380d283763afa 100644
--- a/libc/test/src/math/CMakeLists.txt
+++ b/libc/test/src/math/CMakeLists.txt
@@ -2044,6 +2044,18 @@ add_fp_unittest(
libc.src.__support.FPUtil.fp_bits
)
+add_fp_unittest(
+ atan2_test
+ NEED_MPFR
+ SUITE
+ libc-math-unittests
+ SRCS
+ atan2_test.cpp
+ DEPENDS
+ libc.src.math.atan2
+ libc.src.__support.FPUtil.fp_bits
+)
+
add_fp_unittest(
f16add_test
NEED_MPFR
diff --git a/libc/test/src/math/atan2_test.cpp b/libc/test/src/math/at...
[truncated]
|
unsigned min_exp = static_cast<unsigned>(min_abs >> FPBits::FRACTION_LEN); | ||
unsigned max_exp = static_cast<unsigned>(max_abs >> FPBits::FRACTION_LEN); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does using .get_biased_exponent()
instead generate extra instructions? Same comment for lines 271-272 below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, they generate extra masking instructions, whereas these are already non-negative.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it would be worth it to add a new function to FPBits
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about that, it + its name will be confusing to choose vs get_biased_exponent
. This optimization is kind of small and not always applied to all cases, so I think for new we could leave it as is.
|
||
namespace { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, from the LLVM code style:
Avoid putting declarations other than classes into anonymous namespaces
https://llvm.org/docs/CodingStandards.html#anonymous-namespaces.
Clang and LLVM Core largely conform to this, but we don't seem to follow this guideline as much in LLVM libc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for us, these file we only want 1 public function, the rest are static, so I think anonymous namespace provide a more clear-cut boundary, and less chance for us to forget to mark the local functions static.
No description provided.