Skip to content

[libc][math] Implement fast pass for double precision atan2 with 1 ULP errors. #100648

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jul 26, 2024

Conversation

lntue
Copy link
Contributor

@lntue lntue commented Jul 25, 2024

No description provided.

@llvmbot llvmbot added the libc label Jul 25, 2024
@llvmbot
Copy link
Member

llvmbot commented Jul 25, 2024

@llvm/pr-subscribers-libc

Author: None (lntue)

Changes

Patch is 26.31 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/100648.diff

14 Files Affected:

  • (modified) libc/config/darwin/arm/entrypoints.txt (+1)
  • (modified) libc/config/linux/aarch64/entrypoints.txt (+1)
  • (modified) libc/config/linux/arm/entrypoints.txt (+1)
  • (modified) libc/config/linux/riscv/entrypoints.txt (+1)
  • (modified) libc/config/linux/x86_64/entrypoints.txt (+1)
  • (modified) libc/config/windows/entrypoints.txt (+1)
  • (modified) libc/docs/math/index.rst (+1-1)
  • (modified) libc/spec/stdc.td (+1)
  • (modified) libc/src/math/generic/CMakeLists.txt (+20)
  • (added) libc/src/math/generic/atan2.cpp (+314)
  • (modified) libc/test/src/math/CMakeLists.txt (+12)
  • (added) libc/test/src/math/atan2_test.cpp (+131)
  • (modified) libc/test/src/math/smoke/CMakeLists.txt (+10)
  • (added) libc/test/src/math/smoke/atan2_test.cpp (+22)
diff --git a/libc/config/darwin/arm/entrypoints.txt b/libc/config/darwin/arm/entrypoints.txt
index 3b50666b968e8..38eace26f10ab 100644
--- a/libc/config/darwin/arm/entrypoints.txt
+++ b/libc/config/darwin/arm/entrypoints.txt
@@ -120,6 +120,7 @@ set(TARGET_LIBM_ENTRYPOINTS
     libc.src.math.acoshf
     libc.src.math.asinf
     libc.src.math.asinhf
+    libc.src.math.atan2
     libc.src.math.atan2f
     libc.src.math.atanf
     libc.src.math.atanhf
diff --git a/libc/config/linux/aarch64/entrypoints.txt b/libc/config/linux/aarch64/entrypoints.txt
index 2334fed773702..b2c5341b39d27 100644
--- a/libc/config/linux/aarch64/entrypoints.txt
+++ b/libc/config/linux/aarch64/entrypoints.txt
@@ -343,6 +343,7 @@ set(TARGET_LIBM_ENTRYPOINTS
     libc.src.math.acoshf
     libc.src.math.asinf
     libc.src.math.asinhf
+    libc.src.math.atan2
     libc.src.math.atan2f
     libc.src.math.atanf
     libc.src.math.atanhf
diff --git a/libc/config/linux/arm/entrypoints.txt b/libc/config/linux/arm/entrypoints.txt
index 61ee68ac66082..8e77105fdb13e 100644
--- a/libc/config/linux/arm/entrypoints.txt
+++ b/libc/config/linux/arm/entrypoints.txt
@@ -213,6 +213,7 @@ set(TARGET_LIBM_ENTRYPOINTS
     libc.src.math.acoshf
     libc.src.math.asinf
     libc.src.math.asinhf
+    libc.src.math.atan2
     libc.src.math.atan2f
     libc.src.math.atanf
     libc.src.math.atanhf
diff --git a/libc/config/linux/riscv/entrypoints.txt b/libc/config/linux/riscv/entrypoints.txt
index 07466805b34cd..e3ed5a5988fed 100644
--- a/libc/config/linux/riscv/entrypoints.txt
+++ b/libc/config/linux/riscv/entrypoints.txt
@@ -365,6 +365,7 @@ set(TARGET_LIBM_ENTRYPOINTS
     libc.src.math.acoshf
     libc.src.math.asinf
     libc.src.math.asinhf
+    libc.src.math.atan2
     libc.src.math.atan2f
     libc.src.math.atanf
     libc.src.math.atanhf
diff --git a/libc/config/linux/x86_64/entrypoints.txt b/libc/config/linux/x86_64/entrypoints.txt
index 035ceb8ca57bf..96f975520fe61 100644
--- a/libc/config/linux/x86_64/entrypoints.txt
+++ b/libc/config/linux/x86_64/entrypoints.txt
@@ -365,6 +365,7 @@ set(TARGET_LIBM_ENTRYPOINTS
     libc.src.math.acoshf
     libc.src.math.asinf
     libc.src.math.asinhf
+    libc.src.math.atan2
     libc.src.math.atan2f
     libc.src.math.atanf
     libc.src.math.atanhf
diff --git a/libc/config/windows/entrypoints.txt b/libc/config/windows/entrypoints.txt
index b6aced83c5815..06c3682255c45 100644
--- a/libc/config/windows/entrypoints.txt
+++ b/libc/config/windows/entrypoints.txt
@@ -118,6 +118,7 @@ set(TARGET_LIBM_ENTRYPOINTS
     libc.src.math.acoshf
     libc.src.math.asinf
     libc.src.math.asinhf
+    libc.src.math.atan2
     libc.src.math.atan2f
     libc.src.math.atanf
     libc.src.math.atanhf
diff --git a/libc/docs/math/index.rst b/libc/docs/math/index.rst
index e7db07f071a8c..9f88b4d9a44b3 100644
--- a/libc/docs/math/index.rst
+++ b/libc/docs/math/index.rst
@@ -260,7 +260,7 @@ Higher Math Functions
 +-----------+------------------+-----------------+------------------------+----------------------+------------------------+------------------------+----------------------------+
 | atan      | |check|          |                 |                        |                      |                        | 7.12.4.3               | F.10.1.3                   |
 +-----------+------------------+-----------------+------------------------+----------------------+------------------------+------------------------+----------------------------+
-| atan2     | |check|          |                 |                        |                      |                        | 7.12.4.4               | F.10.1.4                   |
+| atan2     | |check|          | 1 ULP           |                        |                      |                        | 7.12.4.4               | F.10.1.4                   |
 +-----------+------------------+-----------------+------------------------+----------------------+------------------------+------------------------+----------------------------+
 | atan2pi   |                  |                 |                        |                      |                        | 7.12.4.11              | F.10.1.11                  |
 +-----------+------------------+-----------------+------------------------+----------------------+------------------------+------------------------+----------------------------+
diff --git a/libc/spec/stdc.td b/libc/spec/stdc.td
index d5a5cb6fedb4b..9c84accd72cff 100644
--- a/libc/spec/stdc.td
+++ b/libc/spec/stdc.td
@@ -692,6 +692,7 @@ def StdC : StandardSpec<"stdc"> {
 
           FunctionSpec<"atanf", RetValSpec<FloatType>, [ArgSpec<FloatType>]>,
 
+          FunctionSpec<"atan2", RetValSpec<DoubleType>, [ArgSpec<DoubleType>, ArgSpec<DoubleType>]>,
           FunctionSpec<"atan2f", RetValSpec<FloatType>, [ArgSpec<FloatType>, ArgSpec<FloatType>]>,
 
           FunctionSpec<"acoshf", RetValSpec<FloatType>, [ArgSpec<FloatType>]>,
diff --git a/libc/src/math/generic/CMakeLists.txt b/libc/src/math/generic/CMakeLists.txt
index 7280add7c8bb0..ff2e02878ffe4 100644
--- a/libc/src/math/generic/CMakeLists.txt
+++ b/libc/src/math/generic/CMakeLists.txt
@@ -3848,6 +3848,26 @@ add_entrypoint_object(
     libc.src.__support.macros.optimization
 )
 
+add_entrypoint_object(
+  atan2
+  SRCS
+    atan2.cpp
+  HDRS
+    ../atan2.h
+  COMPILE_OPTIONS
+    -O3
+  DEPENDS
+    .inv_trigf_utils
+    libc.src.__support.FPUtil.double_double
+    libc.src.__support.FPUtil.dyadic_float
+    libc.src.__support.FPUtil.fp_bits
+    libc.src.__support.FPUtil.multiply_add
+    libc.src.__support.FPUtil.nearest_integer
+    libc.src.__support.FPUtil.polyeval
+    libc.src.__support.FPUtil.rounding_mode
+    libc.src.__support.macros.optimization
+)
+
 add_entrypoint_object(
   scalblnf16
   SRCS
diff --git a/libc/src/math/generic/atan2.cpp b/libc/src/math/generic/atan2.cpp
new file mode 100644
index 0000000000000..8d5717a7afec2
--- /dev/null
+++ b/libc/src/math/generic/atan2.cpp
@@ -0,0 +1,314 @@
+//===-- Double-precision atan2 function -----------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "src/math/atan2.h"
+#include "inv_trigf_utils.h"
+#include "src/__support/FPUtil/FPBits.h"
+#include "src/__support/FPUtil/PolyEval.h"
+#include "src/__support/FPUtil/double_double.h"
+#include "src/__support/FPUtil/multiply_add.h"
+#include "src/__support/FPUtil/nearest_integer.h"
+#include "src/__support/FPUtil/rounding_mode.h"
+#include "src/__support/macros/config.h"
+#include "src/__support/macros/optimization.h" // LIBC_UNLIKELY
+
+namespace LIBC_NAMESPACE_DECL {
+
+namespace {
+
+using DoubleDouble = fputil::DoubleDouble;
+// using Float128 = fputil::DyadicFloat<128>;
+
+// atan(i/64) with i = 0..64, generated by Sollya with:
+// > for i from 0 to 64 do {
+//     a = round(atan(i/64), D, RN);
+//     b = round(atan(i/64) - a, D, RN);
+//     print("{", b, ",", a, "},");
+//   };
+constexpr fputil::DoubleDouble ATAN_I[65] = {
+    {0.0, 0.0},
+    {-0x1.220c39d4dff5p-61, 0x1.fff555bbb729bp-7},
+    {-0x1.5ec431444912cp-60, 0x1.ffd55bba97625p-6},
+    {-0x1.86ef8f794f105p-63, 0x1.7fb818430da2ap-5},
+    {-0x1.c934d86d23f1dp-60, 0x1.ff55bb72cfdeap-5},
+    {0x1.ac4ce285df847p-58, 0x1.3f59f0e7c559dp-4},
+    {-0x1.cfb654c0c3d98p-58, 0x1.7ee182602f10fp-4},
+    {0x1.f7b8f29a05987p-58, 0x1.be39ebe6f07c3p-4},
+    {-0x1.cd37686760c17p-59, 0x1.fd5ba9aac2f6ep-4},
+    {-0x1.b485914dacf8cp-59, 0x1.1e1fafb043727p-3},
+    {0x1.61a3b0ce9281bp-57, 0x1.3d6eee8c6626cp-3},
+    {-0x1.054ab2c010f3dp-58, 0x1.5c9811e3ec26ap-3},
+    {0x1.347b0b4f881cap-58, 0x1.7b97b4bce5b02p-3},
+    {0x1.cf601e7b4348ep-59, 0x1.9a6a8e96c8626p-3},
+    {0x1.17b10d2e0e5abp-61, 0x1.b90d7529260a2p-3},
+    {0x1.c648d1534597ep-57, 0x1.d77d5df205736p-3},
+    {0x1.8ab6e3cf7afbdp-57, 0x1.f5b75f92c80ddp-3},
+    {0x1.62e47390cb865p-56, 0x1.09dc597d86362p-2},
+    {0x1.30ca4748b1bf9p-57, 0x1.18bf5a30bf178p-2},
+    {-0x1.077cdd36dfc81p-56, 0x1.278372057ef46p-2},
+    {-0x1.963a544b672d8p-57, 0x1.362773707ebccp-2},
+    {-0x1.5d5e43c55b3bap-56, 0x1.44aa436c2af0ap-2},
+    {-0x1.2566480884082p-57, 0x1.530ad9951cd4ap-2},
+    {-0x1.a725715711fp-56, 0x1.614840309cfe2p-2},
+    {-0x1.c63aae6f6e918p-56, 0x1.6f61941e4def1p-2},
+    {0x1.69c885c2b249ap-56, 0x1.7d5604b63b3f7p-2},
+    {0x1.b6d0ba3748fa8p-56, 0x1.8b24d394a1b25p-2},
+    {0x1.9e6c988fd0a77p-56, 0x1.98cd5454d6b18p-2},
+    {-0x1.24dec1b50b7ffp-56, 0x1.a64eec3cc23fdp-2},
+    {0x1.ae187b1ca504p-56, 0x1.b3a911da65c6cp-2},
+    {-0x1.cc1ce70934c34p-56, 0x1.c0db4c94ec9fp-2},
+    {-0x1.a2cfa4418f1adp-56, 0x1.cde53432c1351p-2},
+    {0x1.a2b7f222f65e2p-56, 0x1.dac670561bb4fp-2},
+    {0x1.0e53dc1bf3435p-56, 0x1.e77eb7f175a34p-2},
+    {-0x1.a3992dc382a23p-57, 0x1.f40dd0b541418p-2},
+    {-0x1.b32c949c9d593p-55, 0x1.0039c73c1a40cp-1},
+    {-0x1.d5b495f6349e6p-56, 0x1.0657e94db30dp-1},
+    {0x1.974fa13b5404fp-58, 0x1.0c6145b5b43dap-1},
+    {-0x1.2bdaee1c0ee35p-58, 0x1.1255d9bfbd2a9p-1},
+    {0x1.c621cec00c301p-55, 0x1.1835a88be7c13p-1},
+    {-0x1.928df287a668fp-58, 0x1.1e00babdefeb4p-1},
+    {0x1.c421c9f38224ep-57, 0x1.23b71e2cc9e6ap-1},
+    {-0x1.09e73b0c6c087p-56, 0x1.2958e59308e31p-1},
+    {0x1.c5d5e9ff0cf8dp-55, 0x1.2ee628406cbcap-1},
+    {0x1.1021137c71102p-55, 0x1.345f01cce37bbp-1},
+    {-0x1.2304331d8bf46p-55, 0x1.39c391cd4171ap-1},
+    {0x1.ecf8b492644fp-56, 0x1.3f13fb89e96f4p-1},
+    {-0x1.f76d0163f79c8p-56, 0x1.445065b795b56p-1},
+    {0x1.2419a87f2a458p-56, 0x1.4978fa3269ee1p-1},
+    {0x1.4a33dbeb3796cp-55, 0x1.4e8de5bb6ec04p-1},
+    {-0x1.1bb74abda520cp-55, 0x1.538f57b89061fp-1},
+    {-0x1.5e5c9d8c5a95p-56, 0x1.587d81f732fbbp-1},
+    {0x1.0028e4bc5e7cap-57, 0x1.5d58987169b18p-1},
+    {-0x1.2b785350ee8c1p-57, 0x1.6220d115d7b8ep-1},
+    {-0x1.6ea6febe8bbbap-56, 0x1.66d663923e087p-1},
+    {-0x1.a80386188c50ep-55, 0x1.6b798920b3d99p-1},
+    {-0x1.8c34d25aadef6p-56, 0x1.700a7c5784634p-1},
+    {0x1.7b2a6165884a1p-59, 0x1.748978fba8e0fp-1},
+    {0x1.406a08980374p-55, 0x1.78f6bbd5d315ep-1},
+    {0x1.560821e2f3aa9p-55, 0x1.7d528289fa093p-1},
+    {-0x1.bf76229d3b917p-56, 0x1.819d0b7158a4dp-1},
+    {0x1.6b66e7fc8b8c3p-57, 0x1.85d69576cc2c5p-1},
+    {-0x1.55b9a5e177a1bp-55, 0x1.89ff5ff57f1f8p-1},
+    {-0x1.ec182ab042f61p-56, 0x1.8e17aa99cc05ep-1},
+    {0x1.1a62633145c07p-55, 0x1.921fb54442d18p-1},
+};
+
+// Approximate atan(x) for |x| <= 2^-7.
+// Using degree-9 Taylor polynomial:
+//  P = x - x^3/3 + x^5/5 -x^7/7 + x^9/9;
+// Then the absolute error is bounded by:
+//   |atan(x) - P(x)| < |x|^11/11 < 2^(-7*11) / 11 < 2^-80.
+// And the relative error is bounded by:
+//   |(atan(x) - P(x))/atan(x)| < |x|^10 / 10 < 2^-73.
+// For x = x_hi + x_lo, fully expand the polynomial and drop any terms less than
+//   ulp(x_hi^3 / 3) gives us:
+// P(x) ~ x_hi - x_hi^3/3 + x_hi^5/5 - x_hi^7/7 + x_hi^9/9 +
+//        + x_lo * (1 - x_hi^2 + x_hi^4)
+DoubleDouble atan_eval(const DoubleDouble &x) {
+  DoubleDouble p;
+  p.hi = x.hi;
+  double x_hi_sq = x.hi * x.hi;
+  // c0 ~ x_hi^2 * 1/5 - 1/3
+  double c0 = fputil::multiply_add(x_hi_sq, 0x1.999999999999ap-3,
+                                   -0x1.5555555555555p-2);
+  // c1 ~ x_hi^2 * 1/9 - 1/7
+  double c1 = fputil::multiply_add(x_hi_sq, 0x1.c71c71c71c71cp-4,
+                                   -0x1.2492492492492p-3);
+  // x_hi^3
+  double x_hi_3 = x_hi_sq * x.hi;
+  // x_hi^4
+  double x_hi_4 = x_hi_sq * x_hi_sq;
+  // d0 ~ 1/3 - x_hi^2 / 5 + x_hi^4 / 7 - x_hi^6 / 9
+  double d0 = fputil::multiply_add(x_hi_4, c1, c0);
+  // x_lo - x_lo * x_hi^2 + x_lo * x_hi^4
+  double d1 = fputil::multiply_add(x_hi_4 - x_hi_sq, x.lo, x.lo);
+  // p.lo ~ -x_hi^3/3 + x_hi^5/5 - x_hi^7/7 + x_hi^9/9 +
+  //        + x_lo * (1 - x_hi^2 + x_hi^4)
+  p.lo = fputil::multiply_add(x_hi_3, d0, d1);
+  return p;
+}
+
+} // anonymous namespace
+
+// There are several range reduction steps we can take for atan2(y, x) as
+// follow:
+
+// * Range reduction 1: signness
+// atan2(y, x) will return a number between -PI and PI representing the angle
+// forming by the 0x axis and the vector (x, y) on the 0xy-plane.
+// In particular, we have that:
+//   atan2(y, x) = atan( y/x )         if x >= 0 and y >= 0 (I-quadrant)
+//               = pi + atan( y/x )    if x < 0 and y >= 0  (II-quadrant)
+//               = -pi + atan( y/x )   if x < 0 and y < 0   (III-quadrant)
+//               = atan( y/x )         if x >= 0 and y < 0  (IV-quadrant)
+// Since atan function is odd, we can use the formula:
+//   atan(-u) = -atan(u)
+// to adjust the above conditions a bit further:
+//   atan2(y, x) = atan( |y|/|x| )         if x >= 0 and y >= 0 (I-quadrant)
+//               = pi - atan( |y|/|x| )    if x < 0 and y >= 0  (II-quadrant)
+//               = -pi + atan( |y|/|x| )   if x < 0 and y < 0   (III-quadrant)
+//               = -atan( |y|/|x| )        if x >= 0 and y < 0  (IV-quadrant)
+// Which can be simplified to:
+//   atan2(y, x) = sign(y) * atan( |y|/|x| )             if x >= 0
+//               = sign(y) * (pi - atan( |y|/|x| ))      if x < 0
+
+// * Range reduction 2: reciprocal
+// Now that the argument inside atan is positive, we can use the formula:
+//   atan(1/x) = pi/2 - atan(x)
+// to make the argument inside atan <= 1 as follow:
+//   atan2(y, x) = sign(y) * atan( |y|/|x|)            if 0 <= |y| <= x
+//               = sign(y) * (pi/2 - atan( |x|/|y| )   if 0 <= x < |y|
+//               = sign(y) * (pi - atan( |y|/|x| ))    if 0 <= |y| <= -x
+//               = sign(y) * (pi/2 + atan( |x|/|y| ))  if 0 <= -x < |y|
+
+// * Range reduction 3: look up table.
+// After the previous two range reduction steps, we reduce the problem to
+// compute atan(u) with 0 <= u <= 1, or to be precise:
+//   atan( n / d ) where n = min(|x|, |y|) and d = max(|x|, |y|).
+// An accurate polynomial approximation for the whole [0, 1] input range will
+// require a very large degree.  To make it more efficient, we reduce the input
+// range further by finding an integer idx such that:
+//   | n/d - idx/64 | <= 1/128.
+// In particular,
+//   idx := round(2^6 * n/d)
+// Then for the fast pass, we find a polynomial approximation for:
+//   atan( n/d ) ~ atan( idx/64 ) + (n/d - idx/64) * Q(n/d - idx/64)
+// For the accurate pass, we use the addition formula:
+//   atan( n/d ) - atan( idx/64 ) = atan( (n/d - idx/64)/(1 + (n*idx)/(64*d)) )
+//                                = atan( (n - d*(idx/64))/(d + n*(idx/64)) )
+// And for the fast pass, we use degree-9 Taylor polynomial to compute the RHS:
+//   atan(u) ~ P(u) = u - u^3/3 + u^5/5 - u^7/7 + u^9/9
+// with absolute errors bounded by:
+//   |atan(u) - P(u)| < |u|^11 / 11 < 2^-80
+// and relative errors bounded by:
+//   |(atan(u) - P(u)) / P(u)| < u^10 / 11 < 2^-73.
+
+LLVM_LIBC_FUNCTION(double, atan2, (double y, double x)) {
+  using FPBits = typename fputil::FPBits<double>;
+
+  constexpr double IS_NEG[2] = {1.0, -1.0};
+  constexpr DoubleDouble ZERO = {0.0, 0.0};
+  constexpr DoubleDouble MZERO = {-0.0, -0.0};
+  constexpr DoubleDouble PI = {0x1.1a62633145c07p-53, 0x1.921fb54442d18p+1};
+  constexpr DoubleDouble MPI = {-0x1.1a62633145c07p-53, -0x1.921fb54442d18p+1};
+  constexpr DoubleDouble PI_OVER_2 = {0x1.1a62633145c07p-54,
+                                      0x1.921fb54442d18p0};
+  constexpr DoubleDouble MPI_OVER_2 = {-0x1.1a62633145c07p-54,
+                                       -0x1.921fb54442d18p0};
+  constexpr DoubleDouble PI_OVER_4 = {0x1.1a62633145c07p-55,
+                                      0x1.921fb54442d18p-1};
+  constexpr DoubleDouble THREE_PI_OVER_4 = {0x1.a79394c9e8a0ap-54,
+                                            0x1.2d97c7f3321d2p+1};
+  // Adjustment for constant term:
+  //   CONST_ADJ[x_sign][y_sign][recip]
+  constexpr DoubleDouble CONST_ADJ[2][2][2] = {
+      {{ZERO, MPI_OVER_2}, {MZERO, MPI_OVER_2}},
+      {{MPI, PI_OVER_2}, {MPI, PI_OVER_2}}};
+
+  FPBits x_bits(x), y_bits(y);
+  bool x_sign = x_bits.sign().is_neg();
+  bool y_sign = y_bits.sign().is_neg();
+  x_bits = x_bits.abs();
+  y_bits = y_bits.abs();
+  uint64_t x_abs = x_bits.uintval();
+  uint64_t y_abs = y_bits.uintval();
+  bool recip = x_abs < y_abs;
+  uint64_t min_abs = recip ? x_abs : y_abs;
+  uint64_t max_abs = !recip ? x_abs : y_abs;
+  unsigned min_exp = static_cast<unsigned>(min_abs >> FPBits::FRACTION_LEN);
+  unsigned max_exp = static_cast<unsigned>(max_abs >> FPBits::FRACTION_LEN);
+
+  double num = FPBits(min_abs).get_val();
+  double den = FPBits(max_abs).get_val();
+
+  // Check for exceptional cases, whether inputs are 0, inf, nan, or close to
+  // overflow, or close to underflow.
+  if (LIBC_UNLIKELY(max_exp > 0x7ffU - 128U || min_exp < 128U)) {
+    if (x_bits.is_nan() || y_bits.is_nan())
+      return FPBits::quiet_nan().get_val();
+    unsigned x_except = x_abs == 0 ? 0 : (FPBits(x_abs).is_inf() ? 2 : 1);
+    unsigned y_except = y_abs == 0 ? 0 : (FPBits(y_abs).is_inf() ? 2 : 1);
+
+    // Exceptional cases:
+    //   EXCEPT[y_except][x_except][x_is_neg]
+    // with x_except & y_except:
+    //   0: zero
+    //   1: finite, non-zero
+    //   2: infinity
+    constexpr DoubleDouble EXCEPTS[3][3][2] = {
+        {{ZERO, PI}, {ZERO, PI}, {ZERO, PI}},
+        {{PI_OVER_2, PI_OVER_2}, {ZERO, ZERO}, {ZERO, PI}},
+        {{PI_OVER_2, PI_OVER_2},
+         {PI_OVER_2, PI_OVER_2},
+         {PI_OVER_4, THREE_PI_OVER_4}},
+    };
+
+    if ((x_except != 1) || (y_except != 1)) {
+      DoubleDouble r = EXCEPTS[y_except][x_except][x_sign];
+      return fputil::multiply_add(IS_NEG[y_sign], r.hi, IS_NEG[y_sign] * r.lo);
+    }
+    bool scale_up = min_exp < 128U;
+    bool scale_down = max_exp > 0x7ffU - 128U;
+    // At least one input is denormal, multiply both numerator and denominator
+    // by some large enough power of 2 to normalize denormal inputs.
+    if (scale_up) {
+      num *= 0x1.0p64;
+      if (!scale_down)
+        den *= 0x1.0p64;
+    } else if (scale_down) {
+      den *= 0x1.0p-64;
+      if (!scale_up)
+        num *= 0x1.0p-64;
+    }
+
+    min_abs = FPBits(num).uintval();
+    max_abs = FPBits(den).uintval();
+    min_exp = static_cast<unsigned>(min_abs >> FPBits::FRACTION_LEN);
+    max_exp = static_cast<unsigned>(max_abs >> FPBits::FRACTION_LEN);
+  }
+
+  double final_sign = IS_NEG[(x_sign != y_sign) != recip];
+  DoubleDouble const_term = CONST_ADJ[x_sign][y_sign][recip];
+  unsigned exp_diff = max_exp - min_exp;
+  // We have the following bound for normalized n and d:
+  //   2^(-exp_diff - 1) < n/d < 2^(-exp_diff + 1).
+  if (LIBC_UNLIKELY(exp_diff > 54)) {
+    return fputil::multiply_add(final_sign, const_term.hi,
+                                final_sign * (const_term.lo + num / den));
+  }
+
+  double k = fputil::nearest_integer(64.0 * num / den);
+  unsigned idx = static_cast<unsigned>(k);
+  // k = idx / 64
+  k *= 0x1.0p-6;
+
+  // Range reduction:
+  // atan(n/d) - atan(k/64) = atan((n/d - k/64) / (1 + (n/d) * (k/64)))
+  //                        = atan((n - d * k/64)) / (d + n * k/64))
+  DoubleDouble num_k = fputil::exact_mult(num, k);
+  DoubleDouble den_k = fputil::exact_mult(den, k);
+
+  // num_dd = n - d * k
+  DoubleDouble num_dd = fputil::exact_add(num - den_k.hi, -den_k.lo);
+  // den_dd = d + n * k
+  DoubleDouble den_dd = fputil::exact_add(den, num_k.hi);
+  den_dd.lo += num_k.lo;
+
+  // q = (n - d * k) / (d + n * k)
+  DoubleDouble q = fputil::div(num_dd, den_dd);
+  // p ~ atan(q)
+  DoubleDouble p = atan_eval(q);
+
+  DoubleDouble r = fputil::add(const_term, fputil::add(ATAN_I[idx], p));
+  r.hi *= final_sign;
+  r.lo *= final_sign;
+
+  return r.hi + r.lo;
+}
+
+} // namespace LIBC_NAMESPACE_DECL
diff --git a/libc/test/src/math/CMakeLists.txt b/libc/test/src/math/CMakeLists.txt
index 3ad5d98858165..380d283763afa 100644
--- a/libc/test/src/math/CMakeLists.txt
+++ b/libc/test/src/math/CMakeLists.txt
@@ -2044,6 +2044,18 @@ add_fp_unittest(
     libc.src.__support.FPUtil.fp_bits
 )
 
+add_fp_unittest(
+  atan2_test
+  NEED_MPFR
+  SUITE
+    libc-math-unittests
+  SRCS
+    atan2_test.cpp
+  DEPENDS
+    libc.src.math.atan2
+    libc.src.__support.FPUtil.fp_bits
+)
+
 add_fp_unittest(
   f16add_test
   NEED_MPFR
diff --git a/libc/test/src/math/atan2_test.cpp b/libc/test/src/math/at...
[truncated]

Comment on lines +223 to +224
unsigned min_exp = static_cast<unsigned>(min_abs >> FPBits::FRACTION_LEN);
unsigned max_exp = static_cast<unsigned>(max_abs >> FPBits::FRACTION_LEN);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does using .get_biased_exponent() instead generate extra instructions? Same comment for lines 271-272 below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, they generate extra masking instructions, whereas these are already non-negative.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it would be worth it to add a new function to FPBits instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about that, it + its name will be confusing to choose vs get_biased_exponent. This optimization is kind of small and not always applied to all cases, so I think for new we could leave it as is.

Comment on lines +21 to +22

namespace {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, from the LLVM code style:

Avoid putting declarations other than classes into anonymous namespaces

https://llvm.org/docs/CodingStandards.html#anonymous-namespaces.

Clang and LLVM Core largely conform to this, but we don't seem to follow this guideline as much in LLVM libc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for us, these file we only want 1 public function, the rest are static, so I think anonymous namespace provide a more clear-cut boundary, and less chance for us to forget to mark the local functions static.

@lntue lntue merged commit ca8b14d into llvm:main Jul 26, 2024
7 checks passed
@lntue lntue deleted the atan2 branch July 26, 2024 13:56
@lntue lntue mentioned this pull request Aug 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants