Skip to content

gh-100833: Remove 'volatile' qualifiers in fsum algorithm #100845

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jan 8, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Speed up :func:`math.fsum` by removing defensive ``volatile`` qualifiers.
44 changes: 22 additions & 22 deletions Modules/mathmodule.c
Original file line number Diff line number Diff line change
Expand Up @@ -1358,30 +1358,30 @@ FUNC1(tanh, tanh, 0,
Dickinson's post at <http://bugs.python.org/file10357/msum4.py>.
See those links for more details, proofs and other references.

Note 1: IEEE 754R floating point semantics are assumed,
but the current implementation does not re-establish special
value semantics across iterations (i.e. handling -Inf + Inf).
Note 1: IEEE 754 floating-point semantics with a rounding mode of
roundTiesToEven are assumed.

Note 2: No provision is made for intermediate overflow handling;
therefore, sum([1e+308, 1e-308, 1e+308]) returns 1e+308 while
sum([1e+308, 1e+308, 1e-308]) raises an OverflowError due to the
Note 2: No provision is made for intermediate overflow handling;
therefore, fsum([1e+308, -1e+308, 1e+308]) returns 1e+308 while
fsum([1e+308, 1e+308, -1e+308]) raises an OverflowError due to the
overflow of the first partial sum.

Note 3: The intermediate values lo, yr, and hi are declared volatile so
aggressive compilers won't algebraically reduce lo to always be exactly 0.0.
Also, the volatile declaration forces the values to be stored in memory as
regular doubles instead of extended long precision (80-bit) values. This
prevents double rounding because any addition or subtraction of two doubles
can be resolved exactly into double-sized hi and lo values. As long as the
hi value gets forced into a double before yr and lo are computed, the extra
bits in downstream extended precision operations (x87 for example) will be
exactly zero and therefore can be losslessly stored back into a double,
thereby preventing double rounding.

Note 4: A similar implementation is in Modules/cmathmodule.c.
Be sure to update both when making changes.

Note 5: The signature of math.fsum() differs from builtins.sum()
Note 3: The algorithm has two potential sources of fragility. First, C
permits arithmetic operations on `double`s to be performed in an
intermediate format whose range and precision may be greater than those of
`double` (see for example C99 §5.2.4.2.2, paragraph 8). This can happen for
example on machines using the now largely historical x87 FPUs. In this case,
`fsum` can produce incorrect results. If `FLT_EVAL_METHOD` is `0` or `1`, or
`FLT_EVAL_METHOD` is `2` and `long double` is identical to `double`, then we
should be safe from this source of errors. Second, an aggressively
optimizing compiler can re-associate operations so that (for example) the
statement `yr = hi - x;` is treated as `yr = (x + y) - x` and then
re-associated as `yr = y + (x - x)`, giving `y = yr` and `lo = 0.0`. That
re-association would be in violation of the C standard, and should not occur
except possibly in the presence of unsafe optimizations (e.g., -ffast-math,
-fassociative-math). Such optimizations should be avoided for this module.

Note 4: The signature of math.fsum() differs from builtins.sum()
because the start argument doesn't make sense in the context of
accurate summation. Since the partials table is collapsed before
returning a result, sum(seq2, start=sum(seq1)) may not equal the
Expand Down Expand Up @@ -1467,7 +1467,7 @@ math_fsum(PyObject *module, PyObject *seq)
Py_ssize_t i, j, n = 0, m = NUM_PARTIALS;
double x, y, t, ps[NUM_PARTIALS], *p = ps;
double xsave, special_sum = 0.0, inf_sum = 0.0;
volatile double hi, yr, lo;
double hi, yr, lo;

iter = PyObject_GetIter(seq);
if (iter == NULL)
Expand Down