-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[libc] Add loader option to force serial execution of GPU region #101601
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-libc Author: Joseph Huber (jhuber6) ChangesSummary: This is done right now with a simple file lock on the executing file. I Restricting this to one thread isn't overly ideal, given the fact that Full diff: https://github.com/llvm/llvm-project/pull/101601.diff 1 Files Affected:
diff --git a/libc/utils/gpu/loader/Main.cpp b/libc/utils/gpu/loader/Main.cpp
index 44ed8bf58ab87..7037d772ad2bc 100644
--- a/libc/utils/gpu/loader/Main.cpp
+++ b/libc/utils/gpu/loader/Main.cpp
@@ -11,6 +11,8 @@
//
//===----------------------------------------------------------------------===//
+#include <sys/file.h>
+
#include "Loader.h"
#include "llvm/BinaryFormat/Magic.h"
@@ -62,6 +64,12 @@ static cl::opt<bool>
cl::desc("Output resource usage of launched kernels"),
cl::init(false), cl::cat(loader_category));
+static cl::opt<bool>
+ no_parallelism("no-parallelism",
+ cl::desc("Allows only a single process to use the GPU at a "
+ "time. Useful to suppress out-of-resource errors"),
+ cl::init(false), cl::cat(loader_category));
+
static cl::opt<std::string> file(cl::Positional, cl::Required,
cl::desc("<gpu executable>"),
cl::cat(loader_category));
@@ -98,6 +106,15 @@ int main(int argc, const char **argv, const char **envp) {
llvm::transform(args, std::back_inserter(new_argv),
[](const std::string &arg) { return arg.c_str(); });
+ // Claim a file lock on the executable so only a single process can enter this
+ // region if requested. This prevents the loader from spurious failures.
+ int fd = -1;
+ if (no_parallelism) {
+ fd = open(argv[0], O_RDONLY);
+ if (flock(fd, LOCK_EX) == 1)
+ report_error(createStringError("Failed to lock '%s'", argv[0]));
+ }
+
// Drop the loader from the program arguments.
LaunchParameters params{threads_x, threads_y, threads_z,
blocks_x, blocks_y, blocks_z};
@@ -105,5 +122,10 @@ int main(int argc, const char **argv, const char **envp) {
const_cast<char *>(image.getBufferStart()),
image.getBufferSize(), params, print_resource_usage);
+ if (no_parallelism) {
+ if (flock(fd, LOCK_UN) == 1)
+ report_error(createStringError("Failed to unlock '%s'", argv[0]));
+ }
+
return ret;
}
|
a17f2ac
to
af5cc55
Compare
Summary: The loader is used as a test utility to run traditionally CPU based unit tests on the GPU. This has issues when used with something like `llvm-lit` because the GPU runtimes have a nasty habit of either running out of resources or hanging when they are overloaded. To combat this, I added this option to force each process to perform the GPU part serially. This is done right now with a simple file lock on the executing file. I was originally thinking about using more complex IPC to allow N processes to share execution, but that seemed overly complicated given the incredibly large number of failure modes it introduces. File locks are nice here because if the process crashes or is killed it will release the lock automatically (at least on Linux). This is in contrast to something like POSIX shared memory which will stick around until it's unlinked, meaning that if someone did `sigkill` on the program it would never get cleaned up and other threads might wait on a mutex that never occurs. Restricting this to one thread isn't overly ideal, given the fact that the runtime can likely handle at least a *few* separate processes, but this was easy and it works, so might as well start here. This will hopefully unblock me on running `libcxx` tests, as those ran with so much parallelism spurious failures were very common.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Summary:
The loader is used as a test utility to run traditionally CPU based unit
tests on the GPU. This has issues when used with something like
llvm-lit
because the GPU runtimes have a nasty habit of either runningout of resources or hanging when they are overloaded. To combat this, I
added this option to force each process to perform the GPU part
serially.
This is done right now with a simple file lock on the executing file. I
was originally thinking about using more complex IPC to allow N
processes to share execution, but that seemed overly complicated given
the incredibly large number of failure modes it introduces. File locks
are nice here because if the process crashes or is killed it will
release the lock automatically (at least on Linux). This is in contrast
to something like POSIX shared memory which will stick around until it's
unlinked, meaning that if someone did
sigkill
on the program it wouldnever get cleaned up and other threads might wait on a mutex that never
occurs.
Restricting this to one thread isn't overly ideal, given the fact that
the runtime can likely handle at least a few separate processes, but
this was easy and it works, so might as well start here. This will
hopefully unblock me on running
libcxx
tests, as those ran with somuch parallelism spurious failures were very common.