Skip to content

[ELF] Add CPU name detection for CUDA architectures #75964

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Dec 20, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions llvm/include/llvm/Object/ELFObjectFile.h
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,7 @@ class ELFObjectFileBase : public ObjectFile {
SubtargetFeatures getLoongArchFeatures() const;

StringRef getAMDGPUCPUName() const;
StringRef getNVPTXCPUName() const;

protected:
ELFObjectFileBase(unsigned int Type, MemoryBufferRef Source);
Expand Down
69 changes: 69 additions & 0 deletions llvm/lib/Object/ELFObjectFile.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -358,6 +358,8 @@ std::optional<StringRef> ELFObjectFileBase::tryGetCPUName() const {
switch (getEMachine()) {
case ELF::EM_AMDGPU:
return getAMDGPUCPUName();
case ELF::EM_CUDA:
return getNVPTXCPUName();
case ELF::EM_PPC:
case ELF::EM_PPC64:
return StringRef("future");
Expand Down Expand Up @@ -517,6 +519,73 @@ StringRef ELFObjectFileBase::getAMDGPUCPUName() const {
}
}

StringRef ELFObjectFileBase::getNVPTXCPUName() const {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getNVPTXGPUName() or getNVPTXArchName()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't sure which to call it. AMDGPU calls everything processor so I copied the format for the name here.

assert(getEMachine() == ELF::EM_CUDA);
unsigned SM = getPlatformFlags() & ELF::EF_CUDA_SM;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If SM is only used once, just inline it into the use site

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mostly copied this from the existing AMDGPU version and kept the style.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤷 my feeling is two wrongs don't make a right. Especially since AMD does have CPUs. But nbd.


switch (SM) {
// Fermi architecture.
case ELF::EF_CUDA_SM20:
return "sm_20";
case ELF::EF_CUDA_SM21:
return "sm_21";

// Kepler architecture.
case ELF::EF_CUDA_SM30:
return "sm_30";
case ELF::EF_CUDA_SM32:
return "sm_32";
case ELF::EF_CUDA_SM35:
return "sm_35";
case ELF::EF_CUDA_SM37:
return "sm_37";

// Maxwell architecture.
case ELF::EF_CUDA_SM50:
return "sm_50";
case ELF::EF_CUDA_SM52:
return "sm_52";
case ELF::EF_CUDA_SM53:
return "sm_53";

// Pascal architecture.
case ELF::EF_CUDA_SM60:
return "sm_60";
case ELF::EF_CUDA_SM61:
return "sm_61";
case ELF::EF_CUDA_SM62:
return "sm_62";

// Volta architecture.
case ELF::EF_CUDA_SM70:
return "sm_70";
case ELF::EF_CUDA_SM72:
return "sm_72";

// Turing architecture.
case ELF::EF_CUDA_SM75:
return "sm_75";

// Ampere architecture.
case ELF::EF_CUDA_SM80:
return "sm_80";
case ELF::EF_CUDA_SM86:
return "sm_86";
case ELF::EF_CUDA_SM87:
return "sm_87";

// Ada architecture.
case ELF::EF_CUDA_SM89:
return "sm_89";

// Hopper architecture.
case ELF::EF_CUDA_SM90:
return getPlatformFlags() & ELF::EF_CUDA_ACCELERATORS ? "sm_90a" : "sm_90";
default:
llvm_unreachable("Unknown EF_CUDA_SM value");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

report_fatal_error? In some builds llvm_unreachable is UB if executed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably good enough to consider it unreachable, since this covers the full range of expected values inside the mask. However a fatal error may be more obvious if we forget to update this when a new sm is added. Would be nice to tie this in with clang somehow to guarantee coherency, but a bit out of scope.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, the issue is that if I happen to run an old llvm-readelf or whatever with a new ELF binary, it shouldn't have UB, it should crash gracefully (at least).

}
}

// FIXME Encode from a tablegen description or target parser.
void ELFObjectFileBase::setARMSubArch(Triple &TheTriple) const {
if (TheTriple.getSubArch() != Triple::NoSubArch)
Expand Down