Skip to content

[Offload][AMDGPU] accept generic target #118919

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Dec 10, 2024
Merged

Conversation

hidekisaito
Copy link
Contributor

@hidekisaito hidekisaito commented Dec 6, 2024

Enables generic ISA, e.g., "--offload-arch=gfx11-generic" device code to run on gfx11-generic ISA capable device.

Executable may contain one ELF that has specific target ISA and another ELF that has compatible generic ISA.
Under that circumstance, this code should say both ELFs are compatible, leaving the rest to PluginManager to handle.
Suggestions on how best to address that is welcome.

@llvmbot
Copy link
Member

llvmbot commented Dec 6, 2024

@llvm/pr-subscribers-offload

@llvm/pr-subscribers-backend-amdgpu

Author: None (hidekisaito)

Changes

Full diff: https://github.com/llvm/llvm-project/pull/118919.diff

3 Files Affected:

  • (modified) offload/DeviceRTL/CMakeLists.txt (+10-5)
  • (modified) offload/plugins-nextgen/amdgpu/src/rtl.cpp (+27-21)
  • (modified) offload/plugins-nextgen/common/src/Utils/ELF.cpp (+2-1)
diff --git a/offload/DeviceRTL/CMakeLists.txt b/offload/DeviceRTL/CMakeLists.txt
index 32a7510be980d8..b9659ca3d7a9c2 100644
--- a/offload/DeviceRTL/CMakeLists.txt
+++ b/offload/DeviceRTL/CMakeLists.txt
@@ -42,11 +42,16 @@ set(devicertl_base_directory ${CMAKE_CURRENT_SOURCE_DIR})
 set(include_directory ${devicertl_base_directory}/include)
 set(source_directory ${devicertl_base_directory}/src)
 
-set(all_amdgpu_architectures "gfx700;gfx701;gfx801;gfx803;gfx900;gfx902;gfx906"
-                             "gfx908;gfx90a;gfx90c;gfx940;gfx941;gfx942;gfx950;gfx1010"
-                             "gfx1012;gfx1030;gfx1031;gfx1032;gfx1033;gfx1034;gfx1035"
-                             "gfx1036;gfx1100;gfx1101;gfx1102;gfx1103;gfx1150"
-                             "gfx1151;gfx1152;gfx1153")
+set(all_amdgpu_architectures "gfx700;gfx701;gfx801;gfx803"
+                             "gfx9-generic;gfx900;gfx902;gfx906;gfx908"
+                             "gfx90a;gfx90c"
+                             "gfx9-4-generic;gfx940;gfx941;gfx942;gfx950"
+                             "gfx10-1-generic;gfx1010;gfx1012"
+                             "gfx10-3-generic;gfx1030;gfx1031;gfx1032;gfx1033"
+                             "gfx1034;gfx1035;gfx1036"
+                             "gfx11-generic;gfx1100;gfx1101;gfx1102;gfx1103"
+                             "gfx1150;gfx1151;gfx1152;gfx1153"
+                             "gfx12-generic")
 set(all_nvptx_architectures "sm_35;sm_37;sm_50;sm_52;sm_53;sm_60;sm_61;sm_62"
                             "sm_70;sm_72;sm_75;sm_80;sm_86;sm_87;sm_89;sm_90")
 set(all_gpu_architectures
diff --git a/offload/plugins-nextgen/amdgpu/src/rtl.cpp b/offload/plugins-nextgen/amdgpu/src/rtl.cpp
index d74e65d4165679..cdc7f5ae0427ad 100644
--- a/offload/plugins-nextgen/amdgpu/src/rtl.cpp
+++ b/offload/plugins-nextgen/amdgpu/src/rtl.cpp
@@ -190,8 +190,8 @@ Error asyncMemCopy(bool UseMultipleSdmaEngines, void *Dst, hsa_agent_t DstAgent,
 #endif
 }
 
-Expected<std::string> getTargetTripleAndFeatures(hsa_agent_t Agent) {
-  std::string Target;
+Error getTargetTripleAndFeatures(hsa_agent_t Agent,
+                                 SmallVector<std::string> &Targets) {
   auto Err = hsa_utils::iterateAgentISAs(Agent, [&](hsa_isa_t ISA) {
     uint32_t Length;
     hsa_status_t Status;
@@ -205,13 +205,16 @@ Expected<std::string> getTargetTripleAndFeatures(hsa_agent_t Agent) {
       return Status;
 
     llvm::StringRef TripleTarget(ISAName.begin(), Length);
-    if (TripleTarget.consume_front("amdgcn-amd-amdhsa"))
-      Target = TripleTarget.ltrim('-').rtrim('\0').str();
-    return HSA_STATUS_INFO_BREAK;
+    if (TripleTarget.consume_front("amdgcn-amd-amdhsa")) {
+      auto Target = TripleTarget.ltrim('-').rtrim('\0').str();
+      if (Target.find("generic") != std::string::npos)
+        Targets.push_back(Target);
+      else
+        Targets[0] = Target;
+    }
+    return HSA_STATUS_SUCCESS;
   });
-  if (Err)
-    return Err;
-  return Target;
+  return Err;
 }
 } // namespace hsa_utils
 
@@ -1988,12 +1991,11 @@ struct AMDGPUDeviceTy : public GenericDeviceTy, AMDGenericDeviceTy {
       return Err;
 
     // Detect if XNACK is enabled
-    auto TargeTripleAndFeaturesOrError =
-        hsa_utils::getTargetTripleAndFeatures(Agent);
-    if (!TargeTripleAndFeaturesOrError)
-      return TargeTripleAndFeaturesOrError.takeError();
-    if (static_cast<StringRef>(*TargeTripleAndFeaturesOrError)
-            .contains("xnack+"))
+    SmallVector<std::string> Targets;
+    Targets.push_back("");
+    if (auto Err = hsa_utils::getTargetTripleAndFeatures(Agent, Targets))
+      return Err;
+    if (Targets[0].find("xnack+") != std::string::npos)
       IsXnackEnabled = true;
 
     // detect if device is an APU.
@@ -3207,13 +3209,17 @@ struct AMDGPUPluginTy final : public GenericPluginTy {
     if (!Processor)
       return false;
 
-    auto TargeTripleAndFeaturesOrError =
-        hsa_utils::getTargetTripleAndFeatures(getKernelAgent(DeviceId));
-    if (!TargeTripleAndFeaturesOrError)
-      return TargeTripleAndFeaturesOrError.takeError();
-    return offloading::amdgpu::isImageCompatibleWithEnv(
-        Processor ? *Processor : "", ElfOrErr->getPlatformFlags(),
-        *TargeTripleAndFeaturesOrError);
+    SmallVector<std::string> Targets;
+    Targets.push_back("");
+    if (auto Err = hsa_utils::getTargetTripleAndFeatures(
+            getKernelAgent(DeviceId), Targets))
+      return Err;
+    for (auto &Target : Targets)
+      if (offloading::amdgpu::isImageCompatibleWithEnv(
+              Processor ? *Processor : "", ElfOrErr->getPlatformFlags(),
+              Target))
+        return true;
+    return false;
   }
 
   bool isDataExchangable(int32_t SrcDeviceId, int32_t DstDeviceId) override {
diff --git a/offload/plugins-nextgen/common/src/Utils/ELF.cpp b/offload/plugins-nextgen/common/src/Utils/ELF.cpp
index f5037611e72e0e..10b32440dc8778 100644
--- a/offload/plugins-nextgen/common/src/Utils/ELF.cpp
+++ b/offload/plugins-nextgen/common/src/Utils/ELF.cpp
@@ -68,7 +68,8 @@ checkMachineImpl(const object::ELFObjectFile<ELFT> &ELFObj, uint16_t EMachine) {
         Header.e_ident[EI_ABIVERSION] != ELFABIVERSION_AMDGPU_HSA_V6)
       return createError("Invalid AMD ABI version, must be version 4 or above");
     if ((Header.e_flags & EF_AMDGPU_MACH) < EF_AMDGPU_MACH_AMDGCN_GFX700 ||
-        (Header.e_flags & EF_AMDGPU_MACH) > EF_AMDGPU_MACH_AMDGCN_GFX1201)
+        (Header.e_flags & EF_AMDGPU_MACH) >
+            EF_AMDGPU_MACH_AMDGCN_GFX9_4_GENERIC)
       return createError("Unsupported AMDGPU architecture");
   } else if (Header.e_machine == EM_CUDA) {
     if (~Header.e_flags & EF_CUDA_64BIT_ADDRESS)

Copy link
Contributor

@shiltian shiltian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there needs to be any other handling of generic targets other than device runtime support and collecting all ISAs instead of just one.

@jplehr jplehr self-requested a review December 6, 2024 21:33
Expected<std::string> getTargetTripleAndFeatures(hsa_agent_t Agent) {
std::string Target;
Error getTargetTripleAndFeatures(hsa_agent_t Agent,
SmallVector<SmallString<32>> &Targets) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see too much point of using SmallString here because every use of it is to convert it to std::string.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SmallString is std::string with a template argument for the size of the SSO buffer. I would recommend we choose a number we expect to always fit the string so we don't need to malloc it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then 32 is sufficient here

Copy link
Contributor

@jhuber6 jhuber6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@shiltian shiltian merged commit f2bceb2 into llvm:main Dec 10, 2024
6 checks passed
searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request Mar 27, 2025
Enables generic ISA, e.g., "--offload-arch=gfx11-generic" device code to
run on gfx11-generic ISA capable device.

Executable may contain one ELF that has specific target ISA and another
ELF that has compatible generic ISA.
Under that circumstance, this code should say both ELFs are compatible,
leaving the rest to PluginManager to handle.
Suggestions on how best to address that is welcome.
searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request Mar 27, 2025
rocm-ci pushed a commit to ROCm/llvm-project that referenced this pull request May 20, 2025
Enables generic ISA, e.g., "--offload-arch=gfx11-generic" device code to
run on gfx11-generic ISA capable device.

Executable may contain one ELF that has specific target ISA and another
ELF that has compatible generic ISA.
Under that circumstance, this code should say both ELFs are compatible,
leaving the rest to PluginManager to handle.
Suggestions on how best to address that is welcome.
rocm-ci pushed a commit to ROCm/llvm-project that referenced this pull request May 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants