Skip to content

[libc] Make 'rand()' thread-safe using atomics instead of TLS #96692

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 26, 2024

Conversation

jhuber6
Copy link
Contributor

@jhuber6 jhuber6 commented Jun 25, 2024

Summary:
Currently, we implement the rand function using thread-local storage.
This is somewhat problematic because not every target supports TLS, and
even more do not support non-zero initializers on TLS.

The C standard states that the rand() function need not be thread,
safe. However, many implementations provide thread-safety anyway.
There's some confusing language in the 'rationale' section of
https://pubs.opengroup.org/onlinepubs/9699919799/functions/rand.html,
but given that glibc uses a lock, I think we should make this thread
safe as well. it mentions that threaded behavior is desirable and can be
done in the two ways:

  1. A single per-process sequence of pseudo-random numbers that is shared by all threads that call rand()
  2. A different sequence of pseudo-random numbers for each thread that calls rand()

The current implementation is (2.) and this patch moves it to (1.). This
is beneficial for the GPU case and more generic support. The downside is
that it's slightly slower to do these atomic operations, the fast path
will be two atomic reads and an atomic write.

@llvmbot
Copy link
Member

llvmbot commented Jun 25, 2024

@llvm/pr-subscribers-libc

Author: Joseph Huber (jhuber6)

Changes

Summary:
Currently, we implement the rand function using thread-local storage.
This is somewhat problematic because not every target supports TLS, and
even more do not support non-zero initializers on TLS.

The C standard states that the rand() function need not be thread,
safe. However, many implementations provide thread-safety anyway.
There's some confusing language in the 'rationale' section of
https://pubs.opengroup.org/onlinepubs/9699919799/functions/rand.html,
but given that glibc uses a lock, I think we should make this thread
safe as well. it mentions that threaded behavior is desirable and can be
done in the two ways:

  1. A single per-process sequence of pseudo-random numbers that is shared by all threads that call rand()
  2. A different sequence of pseudo-random numbers for each thread that calls rand()

The current implementation is (2.) and this patch moves it to (1.). This
is beneficial for the GPU case and more generic support. The downside is
that it's slightly slower to do these atomic operations, the fast path
will be two atomic reads and an atomic write.


Full diff: https://github.com/llvm/llvm-project/pull/96692.diff

5 Files Affected:

  • (modified) libc/src/stdlib/rand.cpp (+10-6)
  • (modified) libc/src/stdlib/rand_util.cpp (+4-9)
  • (modified) libc/src/stdlib/rand_util.h (+5-23)
  • (modified) libc/src/stdlib/srand.cpp (+3-1)
  • (modified) libc/test/src/stdlib/rand_test.cpp (-3)
diff --git a/libc/src/stdlib/rand.cpp b/libc/src/stdlib/rand.cpp
index ad543b4048a94..eaa5c130bf7c1 100644
--- a/libc/src/stdlib/rand.cpp
+++ b/libc/src/stdlib/rand.cpp
@@ -15,12 +15,16 @@ namespace LIBC_NAMESPACE {
 // An implementation of the xorshift64star pseudo random number generator. This
 // is a good general purpose generator for most non-cryptographics applications.
 LLVM_LIBC_FUNCTION(int, rand, (void)) {
-  unsigned long x = rand_next;
-  x ^= x >> 12;
-  x ^= x << 25;
-  x ^= x >> 27;
-  rand_next = x;
-  return static_cast<int>((x * 0x2545F4914F6CDD1Dul) >> 32) & RAND_MAX;
+  unsigned long orig = rand_next.load(cpp::MemoryOrder::RELAXED);
+  for (;;) {
+    unsigned long x = orig;
+    x ^= x >> 12;
+    x ^= x << 25;
+    x ^= x >> 27;
+    if (rand_next.compare_exchange_weak(orig, x, cpp::MemoryOrder::ACQUIRE,
+                                        cpp::MemoryOrder::RELAXED))
+      return static_cast<int>((x * 0x2545F4914F6CDD1Dul) >> 32) & RAND_MAX;
+  }
 }
 
 } // namespace LIBC_NAMESPACE
diff --git a/libc/src/stdlib/rand_util.cpp b/libc/src/stdlib/rand_util.cpp
index 1f3dbce9c7860..81ee358935716 100644
--- a/libc/src/stdlib/rand_util.cpp
+++ b/libc/src/stdlib/rand_util.cpp
@@ -7,18 +7,13 @@
 //===----------------------------------------------------------------------===//
 
 #include "src/stdlib/rand_util.h"
+#include "src/__support/CPP/atomic.h"
 #include "src/__support/macros/attributes.h"
 
 namespace LIBC_NAMESPACE {
 
-#ifdef LIBC_TARGET_ARCH_IS_GPU
-// FIXME: Local GPU memory cannot be initialized so we cannot currently provide
-// a standard compliant default value.
-ThreadLocal<unsigned long> rand_next;
-#else
-// C standard 7.10p2: If 'rand' is called before 'srand' it is to proceed as if
-// the 'srand' function was called with a value of '1'.
-LIBC_THREAD_LOCAL unsigned long rand_next = 1;
-#endif
+//  C standard 7.10p2: If 'rand' is called before 'srand' it is to
+// proceed as if the 'srand' function was called with a value of '1'.
+cpp::Atomic<unsigned long> rand_next = 1;
 
 } // namespace LIBC_NAMESPACE
diff --git a/libc/src/stdlib/rand_util.h b/libc/src/stdlib/rand_util.h
index cadd6b5cdcbb8..5d7febf8248d8 100644
--- a/libc/src/stdlib/rand_util.h
+++ b/libc/src/stdlib/rand_util.h
@@ -9,33 +9,15 @@
 #ifndef LLVM_LIBC_SRC_STDLIB_RAND_UTIL_H
 #define LLVM_LIBC_SRC_STDLIB_RAND_UTIL_H
 
-#include "src/__support/GPU/utils.h"
+#include "src/__support/CPP/atomic.h"
 #include "src/__support/macros/attributes.h"
 
 namespace LIBC_NAMESPACE {
 
-#ifdef LIBC_TARGET_ARCH_IS_GPU
-// Implement thread local storage on the GPU using local memory. Each thread
-// gets its slot in the local memory array and is private to the group.
-// TODO: We need to implement the 'thread_local' keyword on the GPU. This is an
-// inefficient and incomplete stand-in until that is done.
-template <typename T> class ThreadLocal {
-private:
-  static constexpr long MAX_THREADS = 1024;
-  [[clang::loader_uninitialized]] static inline gpu::Local<T>
-      storage[MAX_THREADS];
-
-public:
-  LIBC_INLINE operator T() const { return storage[gpu::get_thread_id()]; }
-  LIBC_INLINE void operator=(const T &value) {
-    storage[gpu::get_thread_id()] = value;
-  }
-};
-
-extern ThreadLocal<unsigned long> rand_next;
-#else
-extern LIBC_THREAD_LOCAL unsigned long rand_next;
-#endif
+// The ISO C standard does not explicitly require thread-safe behavior for the
+// generic `rand()` function. Some implementations expect it however, so we
+// provide it here.
+extern cpp::Atomic<unsigned long> rand_next;
 
 } // namespace LIBC_NAMESPACE
 
diff --git a/libc/src/stdlib/srand.cpp b/libc/src/stdlib/srand.cpp
index 008c7a9e565e4..21166c7a6754e 100644
--- a/libc/src/stdlib/srand.cpp
+++ b/libc/src/stdlib/srand.cpp
@@ -12,6 +12,8 @@
 
 namespace LIBC_NAMESPACE {
 
-LLVM_LIBC_FUNCTION(void, srand, (unsigned int seed)) { rand_next = seed; }
+LLVM_LIBC_FUNCTION(void, srand, (unsigned int seed)) {
+  rand_next.store(seed, cpp::MemoryOrder::RELAXED);
+}
 
 } // namespace LIBC_NAMESPACE
diff --git a/libc/test/src/stdlib/rand_test.cpp b/libc/test/src/stdlib/rand_test.cpp
index 7934dc16aa461..6f25708e53905 100644
--- a/libc/test/src/stdlib/rand_test.cpp
+++ b/libc/test/src/stdlib/rand_test.cpp
@@ -23,15 +23,12 @@ TEST(LlvmLibcRandTest, UnsetSeed) {
     vals[i] = val;
   }
 
-  // FIXME: The GPU implementation cannot initialize the seed correctly.
-#ifndef LIBC_TARGET_ARCH_IS_GPU
   // The C standard specifies that if 'srand' is never called it should behave
   // as if 'srand' was called with a value of 1. If we seed the value with 1 we
   // should get the same sequence as the unseeded version.
   LIBC_NAMESPACE::srand(1);
   for (size_t i = 0; i < 1000; ++i)
     ASSERT_EQ(LIBC_NAMESPACE::rand(), vals[i]);
-#endif
 }
 
 TEST(LlvmLibcRandTest, SetSeed) {

@SchrodingerZhu
Copy link
Contributor

Realtime people may not be happy that rand function is not wait-free.

Copy link
Contributor

@frobtech frobtech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this is a better approach for this code than thread_local or MT-unsafe or locks. The only potential issue is if there are any embedded cases that are purely single-threaded and atomics are not available (or too costly). (For example, on AArch64 you can't use atomics when the MMU is off.)

Summary:
Currently, we implement the `rand` function using thread-local storage.
This is somewhat problematic because not every target supports TLS, and
even more do not support non-zero initializers on TLS.

The C standard states that the `rand()` function need not be thread,
safe. However, many implementations provide thread-safety anyway.
There's some confusing language in the 'rationale' section of
https://pubs.opengroup.org/onlinepubs/9699919799/functions/rand.html,
but given that `glibc` uses a lock, I think we should make this thread
safe as well. it mentions that threaded behavior is desirable and can be
done in the two ways:

1. A single per-process sequence of pseudo-random numbers that is shared by all threads that call rand()
2. A different sequence of pseudo-random numbers for each thread that calls rand()

The current implementation is (2.) and this patch moves it to (1.). This
is beneficial for the GPU case and more generic support. The downside is
that it's slightly slower to do these atomic operations, the fast path
will be two atomic reads and an atomic write.
@jhuber6 jhuber6 merged commit 86860be into llvm:main Jun 26, 2024
6 checks passed
@llvm-ci
Copy link
Collaborator

llvm-ci commented Jun 26, 2024

LLVM Buildbot has detected a new failure on builder fuchsia-x86_64-linux running on fuchsia-debian-64-us-central1-a-1 while building libc at step 4 "annotate".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/11/builds/642

Here is the relevant piece of the build log for the reference:

Step 4 (annotate) failure: 'python ../llvm-zorg/zorg/buildbot/builders/annotated/fuchsia-linux.py ...' (failure)
...
[353/1568] Building CXX object libc/src/stdfix/CMakeFiles/libc.src.stdfix.roundulk.dir/roundulk.cpp.obj
[354/1568] Building CXX object libc/src/math/generic/CMakeFiles/libc.src.math.generic.asinf.dir/asinf.cpp.obj
[355/1568] Building CXX object libc/src/stdfix/CMakeFiles/libc.src.stdfix.uksqrtui.dir/uksqrtui.cpp.obj
[356/1568] Building CXX object libc/src/math/generic/CMakeFiles/libc.src.math.generic.acosf.dir/acosf.cpp.obj
[357/1568] Building CXX object libc/src/stdlib/CMakeFiles/libc.src.stdlib.bsearch.dir/bsearch.cpp.obj
[358/1568] Building CXX object libc/src/math/generic/CMakeFiles/libc.src.math.generic.hypot.dir/hypot.cpp.obj
[359/1568] Building CXX object libc/src/stdlib/CMakeFiles/libc.src.stdlib.div.dir/div.cpp.obj
[360/1568] Building CXX object libc/src/stdlib/CMakeFiles/libc.src.stdlib.ldiv.dir/ldiv.cpp.obj
[361/1568] Building CXX object libc/src/stdlib/CMakeFiles/libc.src.stdlib.lldiv.dir/lldiv.cpp.obj
[362/1568] Building CXX object libc/src/stdlib/CMakeFiles/libc.src.stdlib.srand.dir/srand.cpp.obj
FAILED: libc/src/stdlib/CMakeFiles/libc.src.stdlib.srand.dir/srand.cpp.obj 
/var/lib/buildbot/fuchsia-x86_64-linux/build/llvm-build-8e4temoe/bin/clang++ --target=armv6m-unknown-eabi -DLIBC_NAMESPACE=__llvm_libc_19_0_0_git -I/var/lib/buildbot/fuchsia-x86_64-linux/llvm-project/libc -isystem /var/lib/buildbot/fuchsia-x86_64-linux/build/llvm-build-8e4temoe/include/armv6m-unknown-eabi --target=armv6m-unknown-eabi -mthumb -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -ffunction-sections -fdata-sections -ffile-prefix-map=/var/lib/buildbot/fuchsia-x86_64-linux/build/llvm-build-8e4temoe/runtimes/runtimes-armv6m-unknown-eabi-bins=../../../../llvm-project -ffile-prefix-map=/var/lib/buildbot/fuchsia-x86_64-linux/llvm-project/= -no-canonical-prefixes -O2 -g -DNDEBUG -std=gnu++17 --target=armv6m-unknown-eabi -fpie -DLIBC_FULL_BUILD -ffreestanding -ffixed-point -fno-builtin -fno-exceptions -fno-lax-vector-conversions -fno-unwind-tables -fno-asynchronous-unwind-tables -fno-rtti -ftrivial-auto-var-init=pattern -fno-omit-frame-pointer -Wall -Wextra -Werror -Wconversion -Wno-sign-conversion -Wimplicit-fallthrough -Wwrite-strings -Wextra-semi -Wnewline-eof -Wnonportable-system-include-path -Wstrict-prototypes -Wthread-safety -Wglobal-constructors -DLIBC_COPT_PUBLIC_PACKAGING -MD -MT libc/src/stdlib/CMakeFiles/libc.src.stdlib.srand.dir/srand.cpp.obj -MF libc/src/stdlib/CMakeFiles/libc.src.stdlib.srand.dir/srand.cpp.obj.d -o libc/src/stdlib/CMakeFiles/libc.src.stdlib.srand.dir/srand.cpp.obj -c /var/lib/buildbot/fuchsia-x86_64-linux/llvm-project/libc/src/stdlib/srand.cpp
In file included from /var/lib/buildbot/fuchsia-x86_64-linux/llvm-project/libc/src/stdlib/srand.cpp:11:
In file included from /var/lib/buildbot/fuchsia-x86_64-linux/llvm-project/libc/src/stdlib/rand_util.h:12:
/var/lib/buildbot/fuchsia-x86_64-linux/llvm-project/libc/src/__support/CPP/atomic.h:90:5: error: large atomic operation may incur significant performance penalty; the access size (4 bytes) exceeds the max lock-free size (0  bytes) [-Werror,-Watomic-alignment]
   90 |     __scoped_atomic_store_n(&val, rhs, int(mem_ord), (int)(mem_scope));
      |     ^
1 error generated.
[363/1568] Building CXX object libc/src/string/CMakeFiles/libc.src.string.strerror.dir/strerror.cpp.obj
[364/1568] Building CXX object libc/src/string/CMakeFiles/libc.src.string.strerror_r.dir/strerror_r.cpp.obj
[365/1568] Building CXX object libc/src/math/generic/CMakeFiles/libc.src.math.generic.scalbn.dir/scalbn.cpp.obj
[366/1568] Building CXX object libc/src/stdlib/CMakeFiles/libc.src.stdlib.rand.dir/rand.cpp.obj
FAILED: libc/src/stdlib/CMakeFiles/libc.src.stdlib.rand.dir/rand.cpp.obj 
/var/lib/buildbot/fuchsia-x86_64-linux/build/llvm-build-8e4temoe/bin/clang++ --target=armv6m-unknown-eabi -DLIBC_NAMESPACE=__llvm_libc_19_0_0_git -I/var/lib/buildbot/fuchsia-x86_64-linux/llvm-project/libc -isystem /var/lib/buildbot/fuchsia-x86_64-linux/build/llvm-build-8e4temoe/include/armv6m-unknown-eabi --target=armv6m-unknown-eabi -mthumb -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -ffunction-sections -fdata-sections -ffile-prefix-map=/var/lib/buildbot/fuchsia-x86_64-linux/build/llvm-build-8e4temoe/runtimes/runtimes-armv6m-unknown-eabi-bins=../../../../llvm-project -ffile-prefix-map=/var/lib/buildbot/fuchsia-x86_64-linux/llvm-project/= -no-canonical-prefixes -O2 -g -DNDEBUG -std=gnu++17 --target=armv6m-unknown-eabi -fpie -DLIBC_FULL_BUILD -ffreestanding -ffixed-point -fno-builtin -fno-exceptions -fno-lax-vector-conversions -fno-unwind-tables -fno-asynchronous-unwind-tables -fno-rtti -ftrivial-auto-var-init=pattern -fno-omit-frame-pointer -Wall -Wextra -Werror -Wconversion -Wno-sign-conversion -Wimplicit-fallthrough -Wwrite-strings -Wextra-semi -Wnewline-eof -Wnonportable-system-include-path -Wstrict-prototypes -Wthread-safety -Wglobal-constructors -DLIBC_COPT_PUBLIC_PACKAGING -MD -MT libc/src/stdlib/CMakeFiles/libc.src.stdlib.rand.dir/rand.cpp.obj -MF libc/src/stdlib/CMakeFiles/libc.src.stdlib.rand.dir/rand.cpp.obj.d -o libc/src/stdlib/CMakeFiles/libc.src.stdlib.rand.dir/rand.cpp.obj -c /var/lib/buildbot/fuchsia-x86_64-linux/llvm-project/libc/src/stdlib/rand.cpp
In file included from /var/lib/buildbot/fuchsia-x86_64-linux/llvm-project/libc/src/stdlib/rand.cpp:12:
In file included from /var/lib/buildbot/fuchsia-x86_64-linux/llvm-project/libc/src/stdlib/rand_util.h:12:
/var/lib/buildbot/fuchsia-x86_64-linux/llvm-project/libc/src/__support/CPP/atomic.h:75:12: error: large atomic operation may incur significant performance penalty; the access size (4 bytes) exceeds the max lock-free size (0  bytes) [-Werror,-Watomic-alignment]
   75 |     return __scoped_atomic_load_n(&val, int(mem_ord), (int)(mem_scope));
      |            ^
/var/lib/buildbot/fuchsia-x86_64-linux/llvm-project/libc/src/__support/CPP/atomic.h:109:12: error: large atomic operation may incur significant performance penalty; the access size (4 bytes) exceeds the max lock-free size (0  bytes) [-Werror,-Watomic-alignment]
  109 |     return __atomic_compare_exchange_n(&val, &expected, desired, false,
      |            ^
2 errors generated.
[367/1568] Building CXX object libc/src/math/generic/CMakeFiles/libc.src.math.generic.exp2.dir/exp2.cpp.obj
[368/1568] Building CXX object libc/src/math/generic/CMakeFiles/libc.src.math.generic.scalbnl.dir/scalbnl.cpp.obj
[369/1568] Building CXX object libc/src/string/CMakeFiles/libc.src.string.mempcpy.dir/mempcpy.cpp.obj
[370/1568] Building CXX object libc/src/string/CMakeFiles/libc.src.string.index.dir/index.cpp.obj
[371/1568] Building CXX object libc/src/string/CMakeFiles/libc.src.string.rindex.dir/rindex.cpp.obj
[372/1568] Building CXX object libc/src/string/CMakeFiles/libc.src.string.strchr.dir/strchr.cpp.obj
[373/1568] Building CXX object libc/src/string/CMakeFiles/libc.src.string.memchr.dir/memchr.cpp.obj
[374/1568] Building CXX object libc/src/string/CMakeFiles/libc.src.string.strcpy.dir/strcpy.cpp.obj
[375/1568] Building CXX object libc/src/string/CMakeFiles/libc.src.string.stpcpy.dir/stpcpy.cpp.obj
[376/1568] Building CXX object libc/src/string/CMakeFiles/libc.src.string.stpncpy.dir/stpncpy.cpp.obj
[377/1568] Building CXX object libc/src/stdlib/CMakeFiles/libc.src.stdlib.qsort.dir/qsort.cpp.obj
[378/1568] Building CXX object libc/src/string/CMakeFiles/libc.src.string.strlen.dir/strlen.cpp.obj
[379/1568] Building CXX object libc/src/string/CMakeFiles/libc.src.string.strrchr.dir/strrchr.cpp.obj
[380/1568] Building CXX object libc/src/stdlib/CMakeFiles/libc.src.stdlib.atoi.dir/atoi.cpp.obj
[381/1568] Building CXX object libc/src/stdlib/CMakeFiles/libc.src.stdlib.atol.dir/atol.cpp.obj
[382/1568] Building CXX object libc/src/string/CMakeFiles/libc.src.string.strnlen.dir/strnlen.cpp.obj
Step 6 (build) failure: build (failure)
...
[353/1568] Building CXX object libc/src/stdfix/CMakeFiles/libc.src.stdfix.roundulk.dir/roundulk.cpp.obj
[354/1568] Building CXX object libc/src/math/generic/CMakeFiles/libc.src.math.generic.asinf.dir/asinf.cpp.obj
[355/1568] Building CXX object libc/src/stdfix/CMakeFiles/libc.src.stdfix.uksqrtui.dir/uksqrtui.cpp.obj
[356/1568] Building CXX object libc/src/math/generic/CMakeFiles/libc.src.math.generic.acosf.dir/acosf.cpp.obj
[357/1568] Building CXX object libc/src/stdlib/CMakeFiles/libc.src.stdlib.bsearch.dir/bsearch.cpp.obj
[358/1568] Building CXX object libc/src/math/generic/CMakeFiles/libc.src.math.generic.hypot.dir/hypot.cpp.obj
[359/1568] Building CXX object libc/src/stdlib/CMakeFiles/libc.src.stdlib.div.dir/div.cpp.obj
[360/1568] Building CXX object libc/src/stdlib/CMakeFiles/libc.src.stdlib.ldiv.dir/ldiv.cpp.obj
[361/1568] Building CXX object libc/src/stdlib/CMakeFiles/libc.src.stdlib.lldiv.dir/lldiv.cpp.obj
[362/1568] Building CXX object libc/src/stdlib/CMakeFiles/libc.src.stdlib.srand.dir/srand.cpp.obj
FAILED: libc/src/stdlib/CMakeFiles/libc.src.stdlib.srand.dir/srand.cpp.obj 
/var/lib/buildbot/fuchsia-x86_64-linux/build/llvm-build-8e4temoe/bin/clang++ --target=armv6m-unknown-eabi -DLIBC_NAMESPACE=__llvm_libc_19_0_0_git -I/var/lib/buildbot/fuchsia-x86_64-linux/llvm-project/libc -isystem /var/lib/buildbot/fuchsia-x86_64-linux/build/llvm-build-8e4temoe/include/armv6m-unknown-eabi --target=armv6m-unknown-eabi -mthumb -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -ffunction-sections -fdata-sections -ffile-prefix-map=/var/lib/buildbot/fuchsia-x86_64-linux/build/llvm-build-8e4temoe/runtimes/runtimes-armv6m-unknown-eabi-bins=../../../../llvm-project -ffile-prefix-map=/var/lib/buildbot/fuchsia-x86_64-linux/llvm-project/= -no-canonical-prefixes -O2 -g -DNDEBUG -std=gnu++17 --target=armv6m-unknown-eabi -fpie -DLIBC_FULL_BUILD -ffreestanding -ffixed-point -fno-builtin -fno-exceptions -fno-lax-vector-conversions -fno-unwind-tables -fno-asynchronous-unwind-tables -fno-rtti -ftrivial-auto-var-init=pattern -fno-omit-frame-pointer -Wall -Wextra -Werror -Wconversion -Wno-sign-conversion -Wimplicit-fallthrough -Wwrite-strings -Wextra-semi -Wnewline-eof -Wnonportable-system-include-path -Wstrict-prototypes -Wthread-safety -Wglobal-constructors -DLIBC_COPT_PUBLIC_PACKAGING -MD -MT libc/src/stdlib/CMakeFiles/libc.src.stdlib.srand.dir/srand.cpp.obj -MF libc/src/stdlib/CMakeFiles/libc.src.stdlib.srand.dir/srand.cpp.obj.d -o libc/src/stdlib/CMakeFiles/libc.src.stdlib.srand.dir/srand.cpp.obj -c /var/lib/buildbot/fuchsia-x86_64-linux/llvm-project/libc/src/stdlib/srand.cpp
In file included from /var/lib/buildbot/fuchsia-x86_64-linux/llvm-project/libc/src/stdlib/srand.cpp:11:
In file included from /var/lib/buildbot/fuchsia-x86_64-linux/llvm-project/libc/src/stdlib/rand_util.h:12:
/var/lib/buildbot/fuchsia-x86_64-linux/llvm-project/libc/src/__support/CPP/atomic.h:90:5: error: large atomic operation may incur significant performance penalty; the access size (4 bytes) exceeds the max lock-free size (0  bytes) [-Werror,-Watomic-alignment]
   90 |     __scoped_atomic_store_n(&val, rhs, int(mem_ord), (int)(mem_scope));
      |     ^
1 error generated.
[363/1568] Building CXX object libc/src/string/CMakeFiles/libc.src.string.strerror.dir/strerror.cpp.obj
[364/1568] Building CXX object libc/src/string/CMakeFiles/libc.src.string.strerror_r.dir/strerror_r.cpp.obj
[365/1568] Building CXX object libc/src/math/generic/CMakeFiles/libc.src.math.generic.scalbn.dir/scalbn.cpp.obj
[366/1568] Building CXX object libc/src/stdlib/CMakeFiles/libc.src.stdlib.rand.dir/rand.cpp.obj
FAILED: libc/src/stdlib/CMakeFiles/libc.src.stdlib.rand.dir/rand.cpp.obj 
/var/lib/buildbot/fuchsia-x86_64-linux/build/llvm-build-8e4temoe/bin/clang++ --target=armv6m-unknown-eabi -DLIBC_NAMESPACE=__llvm_libc_19_0_0_git -I/var/lib/buildbot/fuchsia-x86_64-linux/llvm-project/libc -isystem /var/lib/buildbot/fuchsia-x86_64-linux/build/llvm-build-8e4temoe/include/armv6m-unknown-eabi --target=armv6m-unknown-eabi -mthumb -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -ffunction-sections -fdata-sections -ffile-prefix-map=/var/lib/buildbot/fuchsia-x86_64-linux/build/llvm-build-8e4temoe/runtimes/runtimes-armv6m-unknown-eabi-bins=../../../../llvm-project -ffile-prefix-map=/var/lib/buildbot/fuchsia-x86_64-linux/llvm-project/= -no-canonical-prefixes -O2 -g -DNDEBUG -std=gnu++17 --target=armv6m-unknown-eabi -fpie -DLIBC_FULL_BUILD -ffreestanding -ffixed-point -fno-builtin -fno-exceptions -fno-lax-vector-conversions -fno-unwind-tables -fno-asynchronous-unwind-tables -fno-rtti -ftrivial-auto-var-init=pattern -fno-omit-frame-pointer -Wall -Wextra -Werror -Wconversion -Wno-sign-conversion -Wimplicit-fallthrough -Wwrite-strings -Wextra-semi -Wnewline-eof -Wnonportable-system-include-path -Wstrict-prototypes -Wthread-safety -Wglobal-constructors -DLIBC_COPT_PUBLIC_PACKAGING -MD -MT libc/src/stdlib/CMakeFiles/libc.src.stdlib.rand.dir/rand.cpp.obj -MF libc/src/stdlib/CMakeFiles/libc.src.stdlib.rand.dir/rand.cpp.obj.d -o libc/src/stdlib/CMakeFiles/libc.src.stdlib.rand.dir/rand.cpp.obj -c /var/lib/buildbot/fuchsia-x86_64-linux/llvm-project/libc/src/stdlib/rand.cpp
In file included from /var/lib/buildbot/fuchsia-x86_64-linux/llvm-project/libc/src/stdlib/rand.cpp:12:
In file included from /var/lib/buildbot/fuchsia-x86_64-linux/llvm-project/libc/src/stdlib/rand_util.h:12:
/var/lib/buildbot/fuchsia-x86_64-linux/llvm-project/libc/src/__support/CPP/atomic.h:75:12: error: large atomic operation may incur significant performance penalty; the access size (4 bytes) exceeds the max lock-free size (0  bytes) [-Werror,-Watomic-alignment]
   75 |     return __scoped_atomic_load_n(&val, int(mem_ord), (int)(mem_scope));
      |            ^
/var/lib/buildbot/fuchsia-x86_64-linux/llvm-project/libc/src/__support/CPP/atomic.h:109:12: error: large atomic operation may incur significant performance penalty; the access size (4 bytes) exceeds the max lock-free size (0  bytes) [-Werror,-Watomic-alignment]
  109 |     return __atomic_compare_exchange_n(&val, &expected, desired, false,
      |            ^
2 errors generated.
[367/1568] Building CXX object libc/src/math/generic/CMakeFiles/libc.src.math.generic.exp2.dir/exp2.cpp.obj
[368/1568] Building CXX object libc/src/math/generic/CMakeFiles/libc.src.math.generic.scalbnl.dir/scalbnl.cpp.obj
[369/1568] Building CXX object libc/src/string/CMakeFiles/libc.src.string.mempcpy.dir/mempcpy.cpp.obj
[370/1568] Building CXX object libc/src/string/CMakeFiles/libc.src.string.index.dir/index.cpp.obj
[371/1568] Building CXX object libc/src/string/CMakeFiles/libc.src.string.rindex.dir/rindex.cpp.obj
[372/1568] Building CXX object libc/src/string/CMakeFiles/libc.src.string.strchr.dir/strchr.cpp.obj
[373/1568] Building CXX object libc/src/string/CMakeFiles/libc.src.string.memchr.dir/memchr.cpp.obj
[374/1568] Building CXX object libc/src/string/CMakeFiles/libc.src.string.strcpy.dir/strcpy.cpp.obj
[375/1568] Building CXX object libc/src/string/CMakeFiles/libc.src.string.stpcpy.dir/stpcpy.cpp.obj
[376/1568] Building CXX object libc/src/string/CMakeFiles/libc.src.string.stpncpy.dir/stpncpy.cpp.obj
[377/1568] Building CXX object libc/src/stdlib/CMakeFiles/libc.src.stdlib.qsort.dir/qsort.cpp.obj
[378/1568] Building CXX object libc/src/string/CMakeFiles/libc.src.string.strlen.dir/strlen.cpp.obj
[379/1568] Building CXX object libc/src/string/CMakeFiles/libc.src.string.strrchr.dir/strrchr.cpp.obj
[380/1568] Building CXX object libc/src/stdlib/CMakeFiles/libc.src.stdlib.atoi.dir/atoi.cpp.obj
[381/1568] Building CXX object libc/src/stdlib/CMakeFiles/libc.src.stdlib.atol.dir/atol.cpp.obj
[382/1568] Building CXX object libc/src/string/CMakeFiles/libc.src.string.strnlen.dir/strnlen.cpp.obj

@jhuber6
Copy link
Contributor Author

jhuber6 commented Jun 26, 2024

So, the Fuscia builder thinks that its atomic inline width is zero, which leads to a warning and because we compile with -Werror it turns into an error. Any suggestions for this? I don't know the platform, but a maximum atomic inline size of zero seems weird. This is usually set with MaxAtomicInlineWidth = TargetPointerWidth; in clang/lib/Basic/Targets/. Does anyone who knows Fuscia know if it's true that every single atomic there goes through compiler-rt? Otherwise I think the solution is to ignore the warning.

@nickdesaulniers
Copy link
Member

nickdesaulniers commented Jun 26, 2024

https://en.cppreference.com/w/cpp/atomic/atomic/is_lock_free I think can be used to detect if usage of an atomic type would result in a libcall. Perhaps we can have two variants of the function?

@SchrodingerZhu
Copy link
Contributor

I think at least for clang, is_lock_free can also be detected using macros directly

@jhuber6
Copy link
Contributor Author

jhuber6 commented Jun 26, 2024

I think at least for clang, is_lock_free can also be detected using macros directly

Can you give me an example of that? I tried as in https://godbolt.org/z/v9chvGxe9 but it doesn't seem to work within a macro.

And even then, should we really make this MT-unsafe if the atomics aren't fast? It should still work as long as the platform has some support. The alternatives are to 1. leave it slow and suppress the diagnostics, or 2. make baremetal submit a new config COPT_MT_UNSAFE or similar.

@nickdesaulniers
Copy link
Member

And even then, should we really make this MT-unsafe if the atomics aren't fast?

I can forsee primitive microcontrollers which are single threaded wanting to make use of rand, and not caring about MT safety.

@jhuber6
Copy link
Contributor Author

jhuber6 commented Jun 26, 2024

And even then, should we really make this MT-unsafe if the atomics aren't fast?

I can forsee primitive microcontrollers which are single threaded wanting to make use of rand, and not caring about MT safety.

Yeah, what's the preferred way to handle this? I'd say just an option for baremetal, but even that might have some MT targets.

@nickdesaulniers
Copy link
Member

We do have libc/src/stdlib/baremetal/abort.cpp, perhaps it's time for libc/src/stdlib/baremetal/rand.cpp?

but even that might have some MT targets.
make baremetal submit a new config COPT_MT_UNSAFE or similar.

If the C standard makes no guarantee, then let baremetal callers of rand use their own locking if they need MT safety. Until we have customers asking for baremetal MT safety, I'd prefer to keep the number of configurations (and thus combinations of libc to test) at a minimum.

lravenclaw pushed a commit to lravenclaw/llvm-project that referenced this pull request Jul 3, 2024
…6692)

Summary:
Currently, we implement the `rand` function using thread-local storage.
This is somewhat problematic because not every target supports TLS, and
even more do not support non-zero initializers on TLS.

The C standard states that the `rand()` function need not be thread,
safe. However, many implementations provide thread-safety anyway.
There's some confusing language in the 'rationale' section of
https://pubs.opengroup.org/onlinepubs/9699919799/functions/rand.html,
but given that `glibc` uses a lock, I think we should make this thread
safe as well. it mentions that threaded behavior is desirable and can be
done in the two ways:

1. A single per-process sequence of pseudo-random numbers that is shared
by all threads that call rand()
2. A different sequence of pseudo-random numbers for each thread that
calls rand()

The current implementation is (2.) and this patch moves it to (1.). This
is beneficial for the GPU case and more generic support. The downside is
that it's slightly slower to do these atomic operations, the fast path
will be two atomic reads and an atomic write.
AlexisPerry pushed a commit to llvm-project-tlp/llvm-project that referenced this pull request Jul 9, 2024
…6692)

Summary:
Currently, we implement the `rand` function using thread-local storage.
This is somewhat problematic because not every target supports TLS, and
even more do not support non-zero initializers on TLS.

The C standard states that the `rand()` function need not be thread,
safe. However, many implementations provide thread-safety anyway.
There's some confusing language in the 'rationale' section of
https://pubs.opengroup.org/onlinepubs/9699919799/functions/rand.html,
but given that `glibc` uses a lock, I think we should make this thread
safe as well. it mentions that threaded behavior is desirable and can be
done in the two ways:

1. A single per-process sequence of pseudo-random numbers that is shared
by all threads that call rand()
2. A different sequence of pseudo-random numbers for each thread that
calls rand()

The current implementation is (2.) and this patch moves it to (1.). This
is beneficial for the GPU case and more generic support. The downside is
that it's slightly slower to do these atomic operations, the fast path
will be two atomic reads and an atomic write.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants