Skip to content

Commit 745c0f3

Browse files
authored
Simplify vector_norm() by eliminating special cases in the main loop (GH-9006)
The *max* value is no longer treated as a special case in the main loop. Besides making the main loop simpler and branchless, this also lets us relax the input restriction of *vec* to contain only non-negative values.
1 parent aada63b commit 745c0f3

File tree

1 file changed

+18
-22
lines changed

1 file changed

+18
-22
lines changed

Modules/mathmodule.c

Lines changed: 18 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -2032,14 +2032,14 @@ math_fmod_impl(PyObject *module, double x, double y)
20322032
}
20332033

20342034
/*
2035-
Given an *n* length *vec* of non-negative values
2036-
where *max* is the largest value in the vector, compute:
2035+
Given an *n* length *vec* of values and a value *max*, compute:
20372036
20382037
max * sqrt(sum((x / max) ** 2 for x in vec))
20392038
2040-
The value of the *max* variable must be present in *vec*
2041-
or should equal to 0.0 when n==0. Likewise, *max* will
2042-
be INF if an infinity is present in the vec.
2039+
The value of the *max* variable must be non-negative and
2040+
at least equal to the absolute value of the largest magnitude
2041+
entry in the vector. If n==0, then *max* should be 0.0.
2042+
If an infinity is present in the vec, *max* should be INF.
20432043
20442044
The *found_nan* variable indicates whether some member of
20452045
the *vec* is a NaN.
@@ -2053,16 +2053,19 @@ The *csum* variable tracks the cumulative sum and *frac* tracks
20532053
the cumulative fractional errors at each step. Since this
20542054
variant assumes that |csum| >= |x| at each step, we establish
20552055
the precondition by starting the accumulation from 1.0 which
2056-
represents an entry equal to *max*. This also provides a nice
2057-
side benefit in that it lets us skip over a *max* entry (which
2058-
is swapped into *last*) saving us one iteration through the loop.
2056+
represents the largest possible value of (x/max)**2.
2057+
2058+
After the loop is finished, the initial 1.0 is subtracted out
2059+
for a net zero effect on the final sum. Since *csum* will be
2060+
greater than 1.0, the subtraction of 1.0 will not cause
2061+
fractional digits to be dropped from *csum*.
20592062
20602063
*/
20612064

20622065
static inline double
20632066
vector_norm(Py_ssize_t n, double *vec, double max, int found_nan)
20642067
{
2065-
double x, csum = 1.0, oldcsum, frac = 0.0, last;
2068+
double x, csum = 1.0, oldcsum, frac = 0.0;
20662069
Py_ssize_t i;
20672070

20682071
if (Py_IS_INFINITY(max)) {
@@ -2071,27 +2074,20 @@ vector_norm(Py_ssize_t n, double *vec, double max, int found_nan)
20712074
if (found_nan) {
20722075
return Py_NAN;
20732076
}
2074-
if (max == 0.0) {
2075-
return 0.0;
2077+
if (max == 0.0 || n == 1) {
2078+
return max;
20762079
}
2077-
assert(n > 0);
2078-
last = vec[n-1];
2079-
for (i=0 ; i < n-1 ; i++) {
2080+
for (i=0 ; i < n ; i++) {
20802081
x = vec[i];
2081-
assert(Py_IS_FINITE(x) && x >= 0.0 && x <= max);
2082-
if (x == max) {
2083-
x = last;
2084-
last = max;
2085-
}
2082+
assert(Py_IS_FINITE(x) && fabs(x) <= max);
20862083
x /= max;
20872084
x = x*x;
2088-
assert(csum >= x);
20892085
oldcsum = csum;
20902086
csum += x;
2087+
assert(csum >= x);
20912088
frac += (oldcsum - csum) + x;
20922089
}
2093-
assert(last == max);
2094-
return max * sqrt(csum + frac);
2090+
return max * sqrt(csum - 1.0 + frac);
20952091
}
20962092

20972093
#define NUM_STACK_ELEMS 16

0 commit comments

Comments
 (0)