-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[amdgpu-arch] Replace use of HSA with reading sysfs directly #116651
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Summary: For Linux systems, we currently use the HSA library to determine the installed GPUs. However, this isn't really necessary and adds a dependency on the HSA runtime as well as a lot of overhead. Instead, this patch uses the `sysfs` interface exposed by `amdkfd` to do this directly.
@llvm/pr-subscribers-backend-amdgpu @llvm/pr-subscribers-clang Author: Joseph Huber (jhuber6) ChangesSummary: Full diff: https://github.com/llvm/llvm-project/pull/116651.diff 4 Files Affected:
diff --git a/clang/tools/amdgpu-arch/AMDGPUArch.cpp b/clang/tools/amdgpu-arch/AMDGPUArch.cpp
index 7ae57b7877e1fe..6c10cbc5c46a83 100644
--- a/clang/tools/amdgpu-arch/AMDGPUArch.cpp
+++ b/clang/tools/amdgpu-arch/AMDGPUArch.cpp
@@ -25,7 +25,7 @@ static void PrintVersion(raw_ostream &OS) {
OS << clang::getClangToolFullVersion("amdgpu-arch") << '\n';
}
-int printGPUsByHSA();
+int printGPUsByKFD();
int printGPUsByHIP();
int main(int argc, char *argv[]) {
@@ -45,7 +45,7 @@ int main(int argc, char *argv[]) {
}
#ifndef _WIN32
- if (!printGPUsByHSA())
+ if (!printGPUsByKFD())
return 0;
#endif
diff --git a/clang/tools/amdgpu-arch/AMDGPUArchByHSA.cpp b/clang/tools/amdgpu-arch/AMDGPUArchByHSA.cpp
deleted file mode 100644
index 432f2c414ed244..00000000000000
--- a/clang/tools/amdgpu-arch/AMDGPUArchByHSA.cpp
+++ /dev/null
@@ -1,122 +0,0 @@
-//===- AMDGPUArchByHSA.cpp - list AMDGPU installed ------*- C++ -*---------===//
-//
-// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
-// See https://llvm.org/LICENSE.txt for license information.
-// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
-//
-//===----------------------------------------------------------------------===//
-//
-// This file implements a tool for detecting name of AMDGPU installed in system
-// using HSA on Linux. This tool is used by AMDGPU OpenMP and HIP driver.
-//
-//===----------------------------------------------------------------------===//
-
-#include "clang/Basic/Version.h"
-#include "llvm/Support/CommandLine.h"
-#include "llvm/Support/DynamicLibrary.h"
-#include "llvm/Support/Error.h"
-#include "llvm/Support/raw_ostream.h"
-#include <memory>
-#include <string>
-#include <vector>
-
-using namespace llvm;
-
-typedef enum {
- HSA_STATUS_SUCCESS = 0x0,
-} hsa_status_t;
-
-typedef enum {
- HSA_DEVICE_TYPE_CPU = 0,
- HSA_DEVICE_TYPE_GPU = 1,
-} hsa_device_type_t;
-
-typedef enum {
- HSA_AGENT_INFO_NAME = 0,
- HSA_AGENT_INFO_DEVICE = 17,
-} hsa_agent_info_t;
-
-typedef struct hsa_agent_s {
- uint64_t handle;
-} hsa_agent_t;
-
-hsa_status_t (*hsa_init)();
-hsa_status_t (*hsa_shut_down)();
-hsa_status_t (*hsa_agent_get_info)(hsa_agent_t, hsa_agent_info_t, void *);
-hsa_status_t (*hsa_iterate_agents)(hsa_status_t (*)(hsa_agent_t, void *),
- void *);
-
-constexpr const char *DynamicHSAPath = "libhsa-runtime64.so";
-
-llvm::Error loadHSA() {
- std::string ErrMsg;
- auto DynlibHandle = std::make_unique<llvm::sys::DynamicLibrary>(
- llvm::sys::DynamicLibrary::getPermanentLibrary(DynamicHSAPath, &ErrMsg));
- if (!DynlibHandle->isValid()) {
- return llvm::createStringError(llvm::inconvertibleErrorCode(),
- "Failed to 'dlopen' %s", DynamicHSAPath);
- }
-#define DYNAMIC_INIT(SYMBOL) \
- { \
- void *SymbolPtr = DynlibHandle->getAddressOfSymbol(#SYMBOL); \
- if (!SymbolPtr) \
- return llvm::createStringError(llvm::inconvertibleErrorCode(), \
- "Failed to 'dlsym' " #SYMBOL); \
- SYMBOL = reinterpret_cast<decltype(SYMBOL)>(SymbolPtr); \
- }
- DYNAMIC_INIT(hsa_init);
- DYNAMIC_INIT(hsa_shut_down);
- DYNAMIC_INIT(hsa_agent_get_info);
- DYNAMIC_INIT(hsa_iterate_agents);
-#undef DYNAMIC_INIT
- return llvm::Error::success();
-}
-
-static hsa_status_t iterateAgentsCallback(hsa_agent_t Agent, void *Data) {
- hsa_device_type_t DeviceType;
- hsa_status_t Status =
- hsa_agent_get_info(Agent, HSA_AGENT_INFO_DEVICE, &DeviceType);
-
- // continue only if device type if GPU
- if (Status != HSA_STATUS_SUCCESS || DeviceType != HSA_DEVICE_TYPE_GPU) {
- return Status;
- }
-
- std::vector<std::string> *GPUs =
- static_cast<std::vector<std::string> *>(Data);
- char GPUName[64];
- Status = hsa_agent_get_info(Agent, HSA_AGENT_INFO_NAME, GPUName);
- if (Status != HSA_STATUS_SUCCESS) {
- return Status;
- }
- GPUs->push_back(GPUName);
- return HSA_STATUS_SUCCESS;
-}
-
-int printGPUsByHSA() {
- // Attempt to load the HSA runtime.
- if (llvm::Error Err = loadHSA()) {
- logAllUnhandledErrors(std::move(Err), llvm::errs());
- return 1;
- }
-
- hsa_status_t Status = hsa_init();
- if (Status != HSA_STATUS_SUCCESS) {
- return 1;
- }
-
- std::vector<std::string> GPUs;
- Status = hsa_iterate_agents(iterateAgentsCallback, &GPUs);
- if (Status != HSA_STATUS_SUCCESS) {
- return 1;
- }
-
- for (const auto &GPU : GPUs)
- llvm::outs() << GPU << '\n';
-
- if (GPUs.size() < 1)
- return 1;
-
- hsa_shut_down();
- return 0;
-}
diff --git a/clang/tools/amdgpu-arch/AMDGPUArchByKFD.cpp b/clang/tools/amdgpu-arch/AMDGPUArchByKFD.cpp
new file mode 100644
index 00000000000000..c7590572de63d4
--- /dev/null
+++ b/clang/tools/amdgpu-arch/AMDGPUArchByKFD.cpp
@@ -0,0 +1,74 @@
+//===- AMDGPUArchByKFD.cpp - list AMDGPU installed ------*- C++ -*---------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This file implements a tool for detecting name of AMD GPUs installed in
+// system using the Linux sysfs interface for the AMD KFD driver.
+//
+//===----------------------------------------------------------------------===//
+
+#include "llvm/Support/FileSystem.h"
+#include "llvm/Support/LineIterator.h"
+#include "llvm/Support/MemoryBuffer.h"
+#include "llvm/Support/Path.h"
+#include <memory>
+#include <string>
+
+using namespace llvm;
+
+constexpr const char *KFD_SYSFS_NODE_PATH =
+ "/sys/devices/virtual/kfd/kfd/topology/nodes";
+
+constexpr static long getMajor(long Ver) { return (Ver / 10000) % 100; }
+constexpr static long getMinor(long Ver) { return (Ver / 100) % 100; }
+constexpr static long getStep(long Ver) { return Ver % 100; }
+
+int printGPUsByKFD() {
+ SmallVector<std::pair<long, long>> Devices;
+ std::error_code EC;
+ for (sys::fs::directory_iterator Begin(KFD_SYSFS_NODE_PATH, EC), End;
+ Begin != End; Begin.increment(EC)) {
+ if (EC)
+ return 1;
+
+ long Node = 0;
+ if (sys::path::stem(Begin->path()).consumeInteger(10, Node))
+ return 1;
+
+ SmallVector<char> Path(Begin->path().begin(), Begin->path().end());
+ sys::path::append(Path, "properties");
+
+ ErrorOr<std::unique_ptr<MemoryBuffer>> BufferOrErr =
+ MemoryBuffer::getFileOrSTDIN(Path);
+ if (std::error_code EC = BufferOrErr.getError())
+ return 1;
+
+ long GFXVersion = 0;
+ for (line_iterator Lines(**BufferOrErr, false); !Lines.is_at_end();
+ ++Lines) {
+ if (Lines->starts_with("gfx_target_version")) {
+ if (Lines->drop_front(sizeof("gfx_target_version"))
+ .consumeInteger(10, GFXVersion))
+ return 1;
+ break;
+ }
+ }
+
+ // If this is zero the node is a CPU.
+ if (GFXVersion == 0)
+ continue;
+ Devices.emplace_back(Node, GFXVersion);
+ }
+
+ // Sort the devices by their node to make sure it prints in order.
+ llvm::sort(Devices, [](auto &L, auto &R) { return L.first < R.first; });
+ for (const auto &[Node, GFXVersion] : Devices)
+ std::fprintf(stdout, "gfx%ld%ld%lx\n", getMajor(GFXVersion),
+ getMinor(GFXVersion), getStep(GFXVersion));
+
+ return 0;
+}
diff --git a/clang/tools/amdgpu-arch/CMakeLists.txt b/clang/tools/amdgpu-arch/CMakeLists.txt
index 1657c701251308..c4c8de614565a7 100644
--- a/clang/tools/amdgpu-arch/CMakeLists.txt
+++ b/clang/tools/amdgpu-arch/CMakeLists.txt
@@ -8,6 +8,6 @@
set(LLVM_LINK_COMPONENTS Support)
-add_clang_tool(amdgpu-arch AMDGPUArch.cpp AMDGPUArchByHSA.cpp AMDGPUArchByHIP.cpp)
+add_clang_tool(amdgpu-arch AMDGPUArch.cpp AMDGPUArchByKFD.cpp AMDGPUArchByHIP.cpp)
target_link_libraries(amdgpu-arch PRIVATE clangBasic)
|
for (sys::fs::directory_iterator Begin(KFD_SYSFS_NODE_PATH, EC), End; | ||
Begin != End; Begin.increment(EC)) { | ||
if (EC) | ||
return 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should print some errors when this fails
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was unsure if I should, since we can still fall back to HIP in the current implementation.
@jhuber6 can you comment on "lot of overhead" and if that matters? Also, not sure why the HSA library dependence is a problem. This seems to be exposing amdgpu-arch to more maintenance overhead. |
Sometimes the driver will hang and since this is used inside of |
I don't really understand why cluster users are compiling on a system where the GPUs are being stressed, and I still don't see why it's a good idea to break layering for this case. Also, I wasn't aware that the "native" offload arch is supported by ROCm. |
I'd like to eliminate a class of failures I've seen with
HIP does at least upstream, don't know if that was modified in ROCm. OpenMP supports it in the fork. |
They must never ever do this. It is their fault and problem if this breaks |
The issue also happens to machines with one GPU if amdgpu-arch is executed multiple times in a short time due to some limitation of the driver. --offload-arch=native calls amdpu-arch to get the actual GPU archs. |
I think we could use this approach for Linux. amdgpu-arch can still use HIP runtime to detect GPU for both Linux and Windows, so we still have a fallback even if we remove the HSA approach. |
I feel like this is a workaround. Can we not fix the "limitation of the driver" and if there is an issue with HSA overhead, shouldn't we file a ticket? |
I don't know for sure, but I'd guess that the limitations seen may be related to the limit of 128 or so doorbells I think we have somewhere. I don't see this as a workaround, it deliberately gets the same information without requiring |
We shouldn't need to load a driver to query the devices on the system. This exists for this purpose, if there isn't going to be some ioctl to read it for you |
Here's a question, should it respect |
I think it depends on whether you consider this as a ROCm toolchain. If not, I'd prefer not to be bound to any ROCm related stuff. |
Agreed, maybe in the fork but makes sense to leave community LLVM alone. |
Maybe add this information into the beginning of the file. |
No. That is a transient runtime debug flag. This is not executing code on the device |
It depends on the device and the system. In this case, since linux provides an alternative I agree it makes sense to use it, and now see why this is not a workaround. Thanks. |
I think there was already a ticket but it cannot be fixed easily. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely yes, awesome. I had a todo to have the kernel export this under sysfs literally years ago and didn't get around to working out their commit structure, delighted to see it is exposed. The unreliable hsa calls has been a consistent nuisance for ages, great to see we don't need that hack any more.
Possible after-commit fix is to replace all the return 1 with different numbers (maybe an enum?) so that we have exit 0 on success and an exit code which indicates what went wrong on failure.
Oh. I now see there was a bunch of discussion about this, will add some context. The driver has a hard limit on how many processes can open it at a time. clang calls this utility to ask what gpu to compile for by default. If you put those together, a parallel build on a vaguely modern desktop immediately blows through that process limit and proceeds to fail, so the user has to deliberately build code slowly or specify the gpu by hand, to work around our tooling falling over. The limit on number of processes is generally reasonable - launching hundreds of processes that all open the GPU and allocate queues and launch kernels is generally a disaster and having the kernel return an equivalent to "too many open" is great. However in the specific case where we are only asking for trivial information, which doesn't need to allocate a queue or do anything whatsoever with the GPU, this is a spurious and annoying limitation. I suppose it could be "fixed" in the driver - some sort of reference count which is incremented when you do something non-trivial instead of on open - but I'd expect the kernel people to tell us to stop opening hundreds of processes when none of them do any work. One clumsy outstanding thing is that this should now be some code in a header that clang includes so that instead of the subprocess shell to handle arch=native clang just looks up the information directly. @b-sumner does the additional context make the design choice clear? |
Would only work for Linux unfortunately, unless some Windows driver developer out there knows if there's some similar win32 magic. |
Windows getting subprocess calls until their driver catches up (or someone points out how to do this) seems fine to me. Linux people get a slightly faster clang, and maybe even one that prints useful error messages while it dies derived from the return code. |
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/168/builds/5681 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/72/builds/5473 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/51/builds/6547 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/145/builds/3271 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/186/builds/4148 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/64/builds/1492 Here is the relevant piece of the build log for the reference
|
Summary:
For Linux systems, we currently use the HSA library to determine the
installed GPUs. However, this isn't really necessary and adds a
dependency on the HSA runtime as well as a lot of overhead. Instead,
this patch uses the
sysfs
interface exposed byamdkfd
to do thisdirectly.