[libc][math] Implement fast pass for double precision atan2 with 1 ULP errors. #100648

lntue · 2024-07-25T20:06:48Z

No description provided.

…P errors.

llvmbot · 2024-07-25T20:07:36Z

@llvm/pr-subscribers-libc

Author: None (lntue)

Changes

Patch is 26.31 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/100648.diff

14 Files Affected:

(modified) libc/config/darwin/arm/entrypoints.txt (+1)
(modified) libc/config/linux/aarch64/entrypoints.txt (+1)
(modified) libc/config/linux/arm/entrypoints.txt (+1)
(modified) libc/config/linux/riscv/entrypoints.txt (+1)
(modified) libc/config/linux/x86_64/entrypoints.txt (+1)
(modified) libc/config/windows/entrypoints.txt (+1)
(modified) libc/docs/math/index.rst (+1-1)
(modified) libc/spec/stdc.td (+1)
(modified) libc/src/math/generic/CMakeLists.txt (+20)
(added) libc/src/math/generic/atan2.cpp (+314)
(modified) libc/test/src/math/CMakeLists.txt (+12)
(added) libc/test/src/math/atan2_test.cpp (+131)
(modified) libc/test/src/math/smoke/CMakeLists.txt (+10)
(added) libc/test/src/math/smoke/atan2_test.cpp (+22)

diff --git a/libc/config/darwin/arm/entrypoints.txt b/libc/config/darwin/arm/entrypoints.txt
index 3b50666b968e8..38eace26f10ab 100644
--- a/libc/config/darwin/arm/entrypoints.txt
+++ b/libc/config/darwin/arm/entrypoints.txt
@@ -120,6 +120,7 @@ set(TARGET_LIBM_ENTRYPOINTS
     libc.src.math.acoshf
     libc.src.math.asinf
     libc.src.math.asinhf
+    libc.src.math.atan2
     libc.src.math.atan2f
     libc.src.math.atanf
     libc.src.math.atanhf
diff --git a/libc/config/linux/aarch64/entrypoints.txt b/libc/config/linux/aarch64/entrypoints.txt
index 2334fed773702..b2c5341b39d27 100644
--- a/libc/config/linux/aarch64/entrypoints.txt
+++ b/libc/config/linux/aarch64/entrypoints.txt
@@ -343,6 +343,7 @@ set(TARGET_LIBM_ENTRYPOINTS
     libc.src.math.acoshf
     libc.src.math.asinf
     libc.src.math.asinhf
+    libc.src.math.atan2
     libc.src.math.atan2f
     libc.src.math.atanf
     libc.src.math.atanhf
diff --git a/libc/config/linux/arm/entrypoints.txt b/libc/config/linux/arm/entrypoints.txt
index 61ee68ac66082..8e77105fdb13e 100644
--- a/libc/config/linux/arm/entrypoints.txt
+++ b/libc/config/linux/arm/entrypoints.txt
@@ -213,6 +213,7 @@ set(TARGET_LIBM_ENTRYPOINTS
     libc.src.math.acoshf
     libc.src.math.asinf
     libc.src.math.asinhf
+    libc.src.math.atan2
     libc.src.math.atan2f
     libc.src.math.atanf
     libc.src.math.atanhf
diff --git a/libc/config/linux/riscv/entrypoints.txt b/libc/config/linux/riscv/entrypoints.txt
index 07466805b34cd..e3ed5a5988fed 100644
--- a/libc/config/linux/riscv/entrypoints.txt
+++ b/libc/config/linux/riscv/entrypoints.txt
@@ -365,6 +365,7 @@ set(TARGET_LIBM_ENTRYPOINTS
     libc.src.math.acoshf
     libc.src.math.asinf
     libc.src.math.asinhf
+    libc.src.math.atan2
     libc.src.math.atan2f
     libc.src.math.atanf
     libc.src.math.atanhf
diff --git a/libc/config/linux/x86_64/entrypoints.txt b/libc/config/linux/x86_64/entrypoints.txt
index 035ceb8ca57bf..96f975520fe61 100644
--- a/libc/config/linux/x86_64/entrypoints.txt
+++ b/libc/config/linux/x86_64/entrypoints.txt
@@ -365,6 +365,7 @@ set(TARGET_LIBM_ENTRYPOINTS
     libc.src.math.acoshf
     libc.src.math.asinf
     libc.src.math.asinhf
+    libc.src.math.atan2
     libc.src.math.atan2f
     libc.src.math.atanf
     libc.src.math.atanhf
diff --git a/libc/config/windows/entrypoints.txt b/libc/config/windows/entrypoints.txt
index b6aced83c5815..06c3682255c45 100644
--- a/libc/config/windows/entrypoints.txt
+++ b/libc/config/windows/entrypoints.txt
@@ -118,6 +118,7 @@ set(TARGET_LIBM_ENTRYPOINTS
     libc.src.math.acoshf
     libc.src.math.asinf
     libc.src.math.asinhf
+    libc.src.math.atan2
     libc.src.math.atan2f
     libc.src.math.atanf
     libc.src.math.atanhf
diff --git a/libc/docs/math/index.rst b/libc/docs/math/index.rst
index e7db07f071a8c..9f88b4d9a44b3 100644
--- a/libc/docs/math/index.rst
+++ b/libc/docs/math/index.rst
@@ -260,7 +260,7 @@ Higher Math Functions
 +-----------+------------------+-----------------+------------------------+----------------------+------------------------+------------------------+----------------------------+
 | atan      | |check|          |                 |                        |                      |                        | 7.12.4.3               | F.10.1.3                   |
 +-----------+------------------+-----------------+------------------------+----------------------+------------------------+------------------------+----------------------------+
-| atan2     | |check|          |                 |                        |                      |                        | 7.12.4.4               | F.10.1.4                   |
+| atan2     | |check|          | 1 ULP           |                        |                      |                        | 7.12.4.4               | F.10.1.4                   |
 +-----------+------------------+-----------------+------------------------+----------------------+------------------------+------------------------+----------------------------+
 | atan2pi   |                  |                 |                        |                      |                        | 7.12.4.11              | F.10.1.11                  |
 +-----------+------------------+-----------------+------------------------+----------------------+------------------------+------------------------+----------------------------+
diff --git a/libc/spec/stdc.td b/libc/spec/stdc.td
index d5a5cb6fedb4b..9c84accd72cff 100644
--- a/libc/spec/stdc.td
+++ b/libc/spec/stdc.td
@@ -692,6 +692,7 @@ def StdC : StandardSpec<"stdc"> {
 
           FunctionSpec<"atanf", RetValSpec<FloatType>, [ArgSpec<FloatType>]>,
 
+          FunctionSpec<"atan2", RetValSpec<DoubleType>, [ArgSpec<DoubleType>, ArgSpec<DoubleType>]>,
           FunctionSpec<"atan2f", RetValSpec<FloatType>, [ArgSpec<FloatType>, ArgSpec<FloatType>]>,
 
           FunctionSpec<"acoshf", RetValSpec<FloatType>, [ArgSpec<FloatType>]>,
diff --git a/libc/src/math/generic/CMakeLists.txt b/libc/src/math/generic/CMakeLists.txt
index 7280add7c8bb0..ff2e02878ffe4 100644
--- a/libc/src/math/generic/CMakeLists.txt
+++ b/libc/src/math/generic/CMakeLists.txt
@@ -3848,6 +3848,26 @@ add_entrypoint_object(
     libc.src.__support.macros.optimization
 )
 
+add_entrypoint_object(
+  atan2
+  SRCS
+    atan2.cpp
+  HDRS
+    ../atan2.h
+  COMPILE_OPTIONS
+    -O3
+  DEPENDS
+    .inv_trigf_utils
+    libc.src.__support.FPUtil.double_double
+    libc.src.__support.FPUtil.dyadic_float
+    libc.src.__support.FPUtil.fp_bits
+    libc.src.__support.FPUtil.multiply_add
+    libc.src.__support.FPUtil.nearest_integer
+    libc.src.__support.FPUtil.polyeval
+    libc.src.__support.FPUtil.rounding_mode
+    libc.src.__support.macros.optimization
+)
+
 add_entrypoint_object(
   scalblnf16
   SRCS
diff --git a/libc/src/math/generic/atan2.cpp b/libc/src/math/generic/atan2.cpp
new file mode 100644
index 0000000000000..8d5717a7afec2
--- /dev/null
+++ b/libc/src/math/generic/atan2.cpp
@@ -0,0 +1,314 @@
+//===-- Double-precision atan2 function -----------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "src/math/atan2.h"
+#include "inv_trigf_utils.h"
+#include "src/__support/FPUtil/FPBits.h"
+#include "src/__support/FPUtil/PolyEval.h"
+#include "src/__support/FPUtil/double_double.h"
+#include "src/__support/FPUtil/multiply_add.h"
+#include "src/__support/FPUtil/nearest_integer.h"
+#include "src/__support/FPUtil/rounding_mode.h"
+#include "src/__support/macros/config.h"
+#include "src/__support/macros/optimization.h" // LIBC_UNLIKELY
+
+namespace LIBC_NAMESPACE_DECL {
+
+namespace {
+
+using DoubleDouble = fputil::DoubleDouble;
+// using Float128 = fputil::DyadicFloat<128>;
+
+// atan(i/64) with i = 0..64, generated by Sollya with:
+// > for i from 0 to 64 do {
+//     a = round(atan(i/64), D, RN);
+//     b = round(atan(i/64) - a, D, RN);
+//     print("{", b, ",", a, "},");
+//   };
+constexpr fputil::DoubleDouble ATAN_I[65] = {
+    {0.0, 0.0},
+    {-0x1.220c39d4dff5p-61, 0x1.fff555bbb729bp-7},
+    {-0x1.5ec431444912cp-60, 0x1.ffd55bba97625p-6},
+    {-0x1.86ef8f794f105p-63, 0x1.7fb818430da2ap-5},
+    {-0x1.c934d86d23f1dp-60, 0x1.ff55bb72cfdeap-5},
+    {0x1.ac4ce285df847p-58, 0x1.3f59f0e7c559dp-4},
+    {-0x1.cfb654c0c3d98p-58, 0x1.7ee182602f10fp-4},
+    {0x1.f7b8f29a05987p-58, 0x1.be39ebe6f07c3p-4},
+    {-0x1.cd37686760c17p-59, 0x1.fd5ba9aac2f6ep-4},
+    {-0x1.b485914dacf8cp-59, 0x1.1e1fafb043727p-3},
+    {0x1.61a3b0ce9281bp-57, 0x1.3d6eee8c6626cp-3},
+    {-0x1.054ab2c010f3dp-58, 0x1.5c9811e3ec26ap-3},
+    {0x1.347b0b4f881cap-58, 0x1.7b97b4bce5b02p-3},
+    {0x1.cf601e7b4348ep-59, 0x1.9a6a8e96c8626p-3},
+    {0x1.17b10d2e0e5abp-61, 0x1.b90d7529260a2p-3},
+    {0x1.c648d1534597ep-57, 0x1.d77d5df205736p-3},
+    {0x1.8ab6e3cf7afbdp-57, 0x1.f5b75f92c80ddp-3},
+    {0x1.62e47390cb865p-56, 0x1.09dc597d86362p-2},
+    {0x1.30ca4748b1bf9p-57, 0x1.18bf5a30bf178p-2},
+    {-0x1.077cdd36dfc81p-56, 0x1.278372057ef46p-2},
+    {-0x1.963a544b672d8p-57, 0x1.362773707ebccp-2},
+    {-0x1.5d5e43c55b3bap-56, 0x1.44aa436c2af0ap-2},
+    {-0x1.2566480884082p-57, 0x1.530ad9951cd4ap-2},
+    {-0x1.a725715711fp-56, 0x1.614840309cfe2p-2},
+    {-0x1.c63aae6f6e918p-56, 0x1.6f61941e4def1p-2},
+    {0x1.69c885c2b249ap-56, 0x1.7d5604b63b3f7p-2},
+    {0x1.b6d0ba3748fa8p-56, 0x1.8b24d394a1b25p-2},
+    {0x1.9e6c988fd0a77p-56, 0x1.98cd5454d6b18p-2},
+    {-0x1.24dec1b50b7ffp-56, 0x1.a64eec3cc23fdp-2},
+    {0x1.ae187b1ca504p-56, 0x1.b3a911da65c6cp-2},
+    {-0x1.cc1ce70934c34p-56, 0x1.c0db4c94ec9fp-2},
+    {-0x1.a2cfa4418f1adp-56, 0x1.cde53432c1351p-2},
+    {0x1.a2b7f222f65e2p-56, 0x1.dac670561bb4fp-2},
+    {0x1.0e53dc1bf3435p-56, 0x1.e77eb7f175a34p-2},
+    {-0x1.a3992dc382a23p-57, 0x1.f40dd0b541418p-2},
+    {-0x1.b32c949c9d593p-55, 0x1.0039c73c1a40cp-1},
+    {-0x1.d5b495f6349e6p-56, 0x1.0657e94db30dp-1},
+    {0x1.974fa13b5404fp-58, 0x1.0c6145b5b43dap-1},
+    {-0x1.2bdaee1c0ee35p-58, 0x1.1255d9bfbd2a9p-1},
+    {0x1.c621cec00c301p-55, 0x1.1835a88be7c13p-1},
+    {-0x1.928df287a668fp-58, 0x1.1e00babdefeb4p-1},
+    {0x1.c421c9f38224ep-57, 0x1.23b71e2cc9e6ap-1},
+    {-0x1.09e73b0c6c087p-56, 0x1.2958e59308e31p-1},
+    {0x1.c5d5e9ff0cf8dp-55, 0x1.2ee628406cbcap-1},
+    {0x1.1021137c71102p-55, 0x1.345f01cce37bbp-1},
+    {-0x1.2304331d8bf46p-55, 0x1.39c391cd4171ap-1},
+    {0x1.ecf8b492644fp-56, 0x1.3f13fb89e96f4p-1},
+    {-0x1.f76d0163f79c8p-56, 0x1.445065b795b56p-1},
+    {0x1.2419a87f2a458p-56, 0x1.4978fa3269ee1p-1},
+    {0x1.4a33dbeb3796cp-55, 0x1.4e8de5bb6ec04p-1},
+    {-0x1.1bb74abda520cp-55, 0x1.538f57b89061fp-1},
+    {-0x1.5e5c9d8c5a95p-56, 0x1.587d81f732fbbp-1},
+    {0x1.0028e4bc5e7cap-57, 0x1.5d58987169b18p-1},
+    {-0x1.2b785350ee8c1p-57, 0x1.6220d115d7b8ep-1},
+    {-0x1.6ea6febe8bbbap-56, 0x1.66d663923e087p-1},
+    {-0x1.a80386188c50ep-55, 0x1.6b798920b3d99p-1},
+    {-0x1.8c34d25aadef6p-56, 0x1.700a7c5784634p-1},
+    {0x1.7b2a6165884a1p-59, 0x1.748978fba8e0fp-1},
+    {0x1.406a08980374p-55, 0x1.78f6bbd5d315ep-1},
+    {0x1.560821e2f3aa9p-55, 0x1.7d528289fa093p-1},
+    {-0x1.bf76229d3b917p-56, 0x1.819d0b7158a4dp-1},
+    {0x1.6b66e7fc8b8c3p-57, 0x1.85d69576cc2c5p-1},
+    {-0x1.55b9a5e177a1bp-55, 0x1.89ff5ff57f1f8p-1},
+    {-0x1.ec182ab042f61p-56, 0x1.8e17aa99cc05ep-1},
+    {0x1.1a62633145c07p-55, 0x1.921fb54442d18p-1},
+};
+
+// Approximate atan(x) for |x| <= 2^-7.
+// Using degree-9 Taylor polynomial:
+//  P = x - x^3/3 + x^5/5 -x^7/7 + x^9/9;
+// Then the absolute error is bounded by:
+//   |atan(x) - P(x)| < |x|^11/11 < 2^(-7*11) / 11 < 2^-80.
+// And the relative error is bounded by:
+//   |(atan(x) - P(x))/atan(x)| < |x|^10 / 10 < 2^-73.
+// For x = x_hi + x_lo, fully expand the polynomial and drop any terms less than
+//   ulp(x_hi^3 / 3) gives us:
+// P(x) ~ x_hi - x_hi^3/3 + x_hi^5/5 - x_hi^7/7 + x_hi^9/9 +
+//        + x_lo * (1 - x_hi^2 + x_hi^4)
+DoubleDouble atan_eval(const DoubleDouble &x) {
+  DoubleDouble p;
+  p.hi = x.hi;
+  double x_hi_sq = x.hi * x.hi;
+  // c0 ~ x_hi^2 * 1/5 - 1/3
+  double c0 = fputil::multiply_add(x_hi_sq, 0x1.999999999999ap-3,
+                                   -0x1.5555555555555p-2);
+  // c1 ~ x_hi^2 * 1/9 - 1/7
+  double c1 = fputil::multiply_add(x_hi_sq, 0x1.c71c71c71c71cp-4,
+                                   -0x1.2492492492492p-3);
+  // x_hi^3
+  double x_hi_3 = x_hi_sq * x.hi;
+  // x_hi^4
+  double x_hi_4 = x_hi_sq * x_hi_sq;
+  // d0 ~ 1/3 - x_hi^2 / 5 + x_hi^4 / 7 - x_hi^6 / 9
+  double d0 = fputil::multiply_add(x_hi_4, c1, c0);
+  // x_lo - x_lo * x_hi^2 + x_lo * x_hi^4
+  double d1 = fputil::multiply_add(x_hi_4 - x_hi_sq, x.lo, x.lo);
+  // p.lo ~ -x_hi^3/3 + x_hi^5/5 - x_hi^7/7 + x_hi^9/9 +
+  //        + x_lo * (1 - x_hi^2 + x_hi^4)
+  p.lo = fputil::multiply_add(x_hi_3, d0, d1);
+  return p;
+}
+
+} // anonymous namespace
+
+// There are several range reduction steps we can take for atan2(y, x) as
+// follow:
+
+// * Range reduction 1: signness
+// atan2(y, x) will return a number between -PI and PI representing the angle
+// forming by the 0x axis and the vector (x, y) on the 0xy-plane.
+// In particular, we have that:
+//   atan2(y, x) = atan( y/x )         if x >= 0 and y >= 0 (I-quadrant)
+//               = pi + atan( y/x )    if x < 0 and y >= 0  (II-quadrant)
+//               = -pi + atan( y/x )   if x < 0 and y < 0   (III-quadrant)
+//               = atan( y/x )         if x >= 0 and y < 0  (IV-quadrant)
+// Since atan function is odd, we can use the formula:
+//   atan(-u) = -atan(u)
+// to adjust the above conditions a bit further:
+//   atan2(y, x) = atan( |y|/|x| )         if x >= 0 and y >= 0 (I-quadrant)
+//               = pi - atan( |y|/|x| )    if x < 0 and y >= 0  (II-quadrant)
+//               = -pi + atan( |y|/|x| )   if x < 0 and y < 0   (III-quadrant)
+//               = -atan( |y|/|x| )        if x >= 0 and y < 0  (IV-quadrant)
+// Which can be simplified to:
+//   atan2(y, x) = sign(y) * atan( |y|/|x| )             if x >= 0
+//               = sign(y) * (pi - atan( |y|/|x| ))      if x < 0
+
+// * Range reduction 2: reciprocal
+// Now that the argument inside atan is positive, we can use the formula:
+//   atan(1/x) = pi/2 - atan(x)
+// to make the argument inside atan <= 1 as follow:
+//   atan2(y, x) = sign(y) * atan( |y|/|x|)            if 0 <= |y| <= x
+//               = sign(y) * (pi/2 - atan( |x|/|y| )   if 0 <= x < |y|
+//               = sign(y) * (pi - atan( |y|/|x| ))    if 0 <= |y| <= -x
+//               = sign(y) * (pi/2 + atan( |x|/|y| ))  if 0 <= -x < |y|
+
+// * Range reduction 3: look up table.
+// After the previous two range reduction steps, we reduce the problem to
+// compute atan(u) with 0 <= u <= 1, or to be precise:
+//   atan( n / d ) where n = min(|x|, |y|) and d = max(|x|, |y|).
+// An accurate polynomial approximation for the whole [0, 1] input range will
+// require a very large degree.  To make it more efficient, we reduce the input
+// range further by finding an integer idx such that:
+//   | n/d - idx/64 | <= 1/128.
+// In particular,
+//   idx := round(2^6 * n/d)
+// Then for the fast pass, we find a polynomial approximation for:
+//   atan( n/d ) ~ atan( idx/64 ) + (n/d - idx/64) * Q(n/d - idx/64)
+// For the accurate pass, we use the addition formula:
+//   atan( n/d ) - atan( idx/64 ) = atan( (n/d - idx/64)/(1 + (n*idx)/(64*d)) )
+//                                = atan( (n - d*(idx/64))/(d + n*(idx/64)) )
+// And for the fast pass, we use degree-9 Taylor polynomial to compute the RHS:
+//   atan(u) ~ P(u) = u - u^3/3 + u^5/5 - u^7/7 + u^9/9
+// with absolute errors bounded by:
+//   |atan(u) - P(u)| < |u|^11 / 11 < 2^-80
+// and relative errors bounded by:
+//   |(atan(u) - P(u)) / P(u)| < u^10 / 11 < 2^-73.
+
+LLVM_LIBC_FUNCTION(double, atan2, (double y, double x)) {
+  using FPBits = typename fputil::FPBits<double>;
+
+  constexpr double IS_NEG[2] = {1.0, -1.0};
+  constexpr DoubleDouble ZERO = {0.0, 0.0};
+  constexpr DoubleDouble MZERO = {-0.0, -0.0};
+  constexpr DoubleDouble PI = {0x1.1a62633145c07p-53, 0x1.921fb54442d18p+1};
+  constexpr DoubleDouble MPI = {-0x1.1a62633145c07p-53, -0x1.921fb54442d18p+1};
+  constexpr DoubleDouble PI_OVER_2 = {0x1.1a62633145c07p-54,
+                                      0x1.921fb54442d18p0};
+  constexpr DoubleDouble MPI_OVER_2 = {-0x1.1a62633145c07p-54,
+                                       -0x1.921fb54442d18p0};
+  constexpr DoubleDouble PI_OVER_4 = {0x1.1a62633145c07p-55,
+                                      0x1.921fb54442d18p-1};
+  constexpr DoubleDouble THREE_PI_OVER_4 = {0x1.a79394c9e8a0ap-54,
+                                            0x1.2d97c7f3321d2p+1};
+  // Adjustment for constant term:
+  //   CONST_ADJ[x_sign][y_sign][recip]
+  constexpr DoubleDouble CONST_ADJ[2][2][2] = {
+      {{ZERO, MPI_OVER_2}, {MZERO, MPI_OVER_2}},
+      {{MPI, PI_OVER_2}, {MPI, PI_OVER_2}}};
+
+  FPBits x_bits(x), y_bits(y);
+  bool x_sign = x_bits.sign().is_neg();
+  bool y_sign = y_bits.sign().is_neg();
+  x_bits = x_bits.abs();
+  y_bits = y_bits.abs();
+  uint64_t x_abs = x_bits.uintval();
+  uint64_t y_abs = y_bits.uintval();
+  bool recip = x_abs < y_abs;
+  uint64_t min_abs = recip ? x_abs : y_abs;
+  uint64_t max_abs = !recip ? x_abs : y_abs;
+  unsigned min_exp = static_cast<unsigned>(min_abs >> FPBits::FRACTION_LEN);
+  unsigned max_exp = static_cast<unsigned>(max_abs >> FPBits::FRACTION_LEN);
+
+  double num = FPBits(min_abs).get_val();
+  double den = FPBits(max_abs).get_val();
+
+  // Check for exceptional cases, whether inputs are 0, inf, nan, or close to
+  // overflow, or close to underflow.
+  if (LIBC_UNLIKELY(max_exp > 0x7ffU - 128U || min_exp < 128U)) {
+    if (x_bits.is_nan() || y_bits.is_nan())
+      return FPBits::quiet_nan().get_val();
+    unsigned x_except = x_abs == 0 ? 0 : (FPBits(x_abs).is_inf() ? 2 : 1);
+    unsigned y_except = y_abs == 0 ? 0 : (FPBits(y_abs).is_inf() ? 2 : 1);
+
+    // Exceptional cases:
+    //   EXCEPT[y_except][x_except][x_is_neg]
+    // with x_except & y_except:
+    //   0: zero
+    //   1: finite, non-zero
+    //   2: infinity
+    constexpr DoubleDouble EXCEPTS[3][3][2] = {
+        {{ZERO, PI}, {ZERO, PI}, {ZERO, PI}},
+        {{PI_OVER_2, PI_OVER_2}, {ZERO, ZERO}, {ZERO, PI}},
+        {{PI_OVER_2, PI_OVER_2},
+         {PI_OVER_2, PI_OVER_2},
+         {PI_OVER_4, THREE_PI_OVER_4}},
+    };
+
+    if ((x_except != 1) || (y_except != 1)) {
+      DoubleDouble r = EXCEPTS[y_except][x_except][x_sign];
+      return fputil::multiply_add(IS_NEG[y_sign], r.hi, IS_NEG[y_sign] * r.lo);
+    }
+    bool scale_up = min_exp < 128U;
+    bool scale_down = max_exp > 0x7ffU - 128U;
+    // At least one input is denormal, multiply both numerator and denominator
+    // by some large enough power of 2 to normalize denormal inputs.
+    if (scale_up) {
+      num *= 0x1.0p64;
+      if (!scale_down)
+        den *= 0x1.0p64;
+    } else if (scale_down) {
+      den *= 0x1.0p-64;
+      if (!scale_up)
+        num *= 0x1.0p-64;
+    }
+
+    min_abs = FPBits(num).uintval();
+    max_abs = FPBits(den).uintval();
+    min_exp = static_cast<unsigned>(min_abs >> FPBits::FRACTION_LEN);
+    max_exp = static_cast<unsigned>(max_abs >> FPBits::FRACTION_LEN);
+  }
+
+  double final_sign = IS_NEG[(x_sign != y_sign) != recip];
+  DoubleDouble const_term = CONST_ADJ[x_sign][y_sign][recip];
+  unsigned exp_diff = max_exp - min_exp;
+  // We have the following bound for normalized n and d:
+  //   2^(-exp_diff - 1) < n/d < 2^(-exp_diff + 1).
+  if (LIBC_UNLIKELY(exp_diff > 54)) {
+    return fputil::multiply_add(final_sign, const_term.hi,
+                                final_sign * (const_term.lo + num / den));
+  }
+
+  double k = fputil::nearest_integer(64.0 * num / den);
+  unsigned idx = static_cast<unsigned>(k);
+  // k = idx / 64
+  k *= 0x1.0p-6;
+
+  // Range reduction:
+  // atan(n/d) - atan(k/64) = atan((n/d - k/64) / (1 + (n/d) * (k/64)))
+  //                        = atan((n - d * k/64)) / (d + n * k/64))
+  DoubleDouble num_k = fputil::exact_mult(num, k);
+  DoubleDouble den_k = fputil::exact_mult(den, k);
+
+  // num_dd = n - d * k
+  DoubleDouble num_dd = fputil::exact_add(num - den_k.hi, -den_k.lo);
+  // den_dd = d + n * k
+  DoubleDouble den_dd = fputil::exact_add(den, num_k.hi);
+  den_dd.lo += num_k.lo;
+
+  // q = (n - d * k) / (d + n * k)
+  DoubleDouble q = fputil::div(num_dd, den_dd);
+  // p ~ atan(q)
+  DoubleDouble p = atan_eval(q);
+
+  DoubleDouble r = fputil::add(const_term, fputil::add(ATAN_I[idx], p));
+  r.hi *= final_sign;
+  r.lo *= final_sign;
+
+  return r.hi + r.lo;
+}
+
+} // namespace LIBC_NAMESPACE_DECL
diff --git a/libc/test/src/math/CMakeLists.txt b/libc/test/src/math/CMakeLists.txt
index 3ad5d98858165..380d283763afa 100644
--- a/libc/test/src/math/CMakeLists.txt
+++ b/libc/test/src/math/CMakeLists.txt
@@ -2044,6 +2044,18 @@ add_fp_unittest(
     libc.src.__support.FPUtil.fp_bits
 )
 
+add_fp_unittest(
+  atan2_test
+  NEED_MPFR
+  SUITE
+    libc-math-unittests
+  SRCS
+    atan2_test.cpp
+  DEPENDS
+    libc.src.math.atan2
+    libc.src.__support.FPUtil.fp_bits
+)
+
 add_fp_unittest(
   f16add_test
   NEED_MPFR
diff --git a/libc/test/src/math/atan2_test.cpp b/libc/test/src/math/at...
[truncated]

libc/test/src/math/atan2_test.cpp

libc/src/math/generic/atan2.cpp

overmighty · 2024-07-25T21:24:48Z

libc/src/math/generic/atan2.cpp

+  unsigned min_exp = static_cast<unsigned>(min_abs >> FPBits::FRACTION_LEN);
+  unsigned max_exp = static_cast<unsigned>(max_abs >> FPBits::FRACTION_LEN);


Does using .get_biased_exponent() instead generate extra instructions? Same comment for lines 271-272 below.

Yes, they generate extra masking instructions, whereas these are already non-negative.

Do you think it would be worth it to add a new function to FPBits instead?

I'm not sure about that, it + its name will be confusing to choose vs get_biased_exponent. This optimization is kind of small and not always applied to all cases, so I think for new we could leave it as is.

overmighty · 2024-07-26T09:04:55Z

libc/src/math/generic/atan2.cpp

+
+namespace {


Actually, from the LLVM code style:

Avoid putting declarations other than classes into anonymous namespaces

https://llvm.org/docs/CodingStandards.html#anonymous-namespaces.

Clang and LLVM Core largely conform to this, but we don't seem to follow this guideline as much in LLVM libc.

for us, these file we only want 1 public function, the rest are static, so I think anonymous namespace provide a more clear-cut boundary, and less chance for us to forget to mark the local functions static.

[libc][math] Implement fast pass for double precision atan2 with 1 UL…

37ac236

…P errors.

lntue requested review from jhuber6, overmighty and petrhosek July 25, 2024 20:06

llvmbot added the libc label Jul 25, 2024

overmighty reviewed Jul 25, 2024

View reviewed changes

Address comments.

ae493b7

overmighty approved these changes Jul 26, 2024

View reviewed changes

overmighty reviewed Jul 26, 2024

View reviewed changes

lntue merged commit ca8b14d into llvm:main Jul 26, 2024
7 checks passed

lntue deleted the atan2 branch July 26, 2024 13:56

lntue mentioned this pull request Aug 12, 2024

Provide atan2 in libc #92876

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[libc][math] Implement fast pass for double precision atan2 with 1 ULP errors. #100648

[libc][math] Implement fast pass for double precision atan2 with 1 ULP errors. #100648

Uh oh!

lntue commented Jul 25, 2024

Uh oh!

llvmbot commented Jul 25, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

overmighty Jul 25, 2024

Uh oh!

lntue Jul 25, 2024

Uh oh!

overmighty Jul 26, 2024

Uh oh!

lntue Jul 26, 2024

Uh oh!

overmighty Jul 26, 2024

Uh oh!

lntue Jul 26, 2024

Uh oh!

Uh oh!

Uh oh!

		unsigned min_exp = static_cast<unsigned>(min_abs >> FPBits::FRACTION_LEN);
		unsigned max_exp = static_cast<unsigned>(max_abs >> FPBits::FRACTION_LEN);

[libc][math] Implement fast pass for double precision atan2 with 1 ULP errors. #100648

[libc][math] Implement fast pass for double precision atan2 with 1 ULP errors. #100648

Uh oh!

Conversation

lntue commented Jul 25, 2024

Uh oh!

llvmbot commented Jul 25, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

overmighty Jul 25, 2024

Choose a reason for hiding this comment

Uh oh!

lntue Jul 25, 2024

Choose a reason for hiding this comment

Uh oh!

overmighty Jul 26, 2024

Choose a reason for hiding this comment

Uh oh!

lntue Jul 26, 2024

Choose a reason for hiding this comment

Uh oh!

overmighty Jul 26, 2024

Choose a reason for hiding this comment

Uh oh!

lntue Jul 26, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!