Skip to content

bpo-29882: Fix portability bug introduced in GH-30774 #30794

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jan 23, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 14 additions & 4 deletions Include/internal/pycore_bitutils.h
Original file line number Diff line number Diff line change
Expand Up @@ -115,17 +115,27 @@ _Py_popcount32(uint32_t x)
const uint32_t M2 = 0x33333333;
// Binary: 0000 1111 0000 1111 ...
const uint32_t M4 = 0x0F0F0F0F;
// 256**4 + 256**3 + 256**2 + 256**1
const uint32_t SUM = 0x01010101;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why removing this constant? I added it to not have to use the macro to get an uint32_t literal number. UINT32_C() if I recall correctly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vstinner This is already explained in the long comment that you didn't review. :-)

The problem here isn't the constant; it's the type declaration.

For the multiplication in the last line of the function to be portable, we need at least one of the unsigned operands in that multiplication to not be promoted to int. An inline constant 0x01010101U satisfies that criterion: by C99 §6.4.4.1, together with C's guarantees about the minimum precision of long, it has type either unsigned int or unsigned long. A uint32_t constant with the same value does not satisfy that criterion, for reasons already explained.

And there isn't a type declaration that works here. I just explained why uint32_t won't work. If we declare SUM as unsigned int instead of uint32_t, we have portability issues on machines where int isn't large enough to represent the value 0x01010101. If we declare it as unsigned long, we're back to doing a 64-bit-by-64-bit multiply on almost all current Linux and macOS boxes.

If you really want to keep the constant, another option is to leave this line exactly as-is and change the multiplication in the last line to x * (SUM + 0U); that + 0U effectively forces the second multiplicand to have type with rank greater than or equal to that of int, making it immune to further integer promotion.

But as Tim observed in the #30774 discussion, none of this prevents us from potentially doing a 512-bit-by-512-bit multiply on a box that has 512-bit integers. But short of relying on compiler-specific intrinsics, that's inescapable anyway: standard C simply isn't capable of doing arithmetic on anything smaller than an int.


// Put count of each 2 bits into those 2 bits
x = x - ((x >> 1) & M1);
// Put count of each 4 bits into those 4 bits
x = (x & M2) + ((x >> 2) & M2);
// Put count of each 8 bits into those 8 bits
x = (x + (x >> 4)) & M4;
// Sum of the 4 byte counts
return (x * SUM) >> 24;
// Sum of the 4 byte counts.
// Take care when considering changes to the next line. Portability and
// correctness are delicate here, thanks to C's "integer promotions" (C99
// §6.3.1.1p2). On machines where the `int` type has width greater than 32
// bits, `x` will be promoted to an `int`, and following C's "usual
// arithmetic conversions" (C99 §6.3.1.8), the multiplication will be
// performed as a multiplication of two `unsigned int` operands. In this
// case it's critical that we cast back to `uint32_t` in order to keep only
// the least significant 32 bits. On machines where the `int` type has
// width no greater than 32, the multiplication is of two 32-bit unsigned
// integer types, and the (uint32_t) cast is a no-op. In both cases, we
// avoid the risk of undefined behaviour due to overflow of a
// multiplication of signed integer types.
return (uint32_t)(x * 0x01010101U) >> 24;
#endif
}

Expand Down
1 change: 1 addition & 0 deletions Modules/_testinternalcapi.c
Original file line number Diff line number Diff line change
Expand Up @@ -100,6 +100,7 @@ test_popcount(PyObject *self, PyObject *Py_UNUSED(args))
CHECK(0, 0);
CHECK(1, 1);
CHECK(0x08080808, 4);
CHECK(0x10000001, 2);
CHECK(0x10101010, 4);
CHECK(0x10204080, 4);
CHECK(0xDEADCAFE, 22);
Expand Down