Skip to content

[NVPTX] Add support for PTX ISA v8.8 #136639

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 3, 2025
Merged

Conversation

Prince781
Copy link
Contributor

@Prince781 Prince781 commented Apr 22, 2025

Support PTX version 8.8 (-mattr=+ptx88) from CUDA 12.9. The following new targets are also added:

  • SM103 and SM121: sm_103, sm_103a, sm_121, sm_121a.

Also, some things were reformatted.

https://docs.nvidia.com/cuda/parallel-thread-execution/#changes-in-ptx-isa-version-8-8

@llvmbot
Copy link
Member

llvmbot commented Apr 22, 2025

@llvm/pr-subscribers-backend-nvptx

Author: Princeton Ferro (Prince781)

Changes

Support PTX version 8.8 (-mattr=+ptx88) from CUDA 12.9. The following new targets are also added:

  • Family-specific targets, which are compatible with all the subsequent architectures belonging to the same GPU family: sm_100f, sm_101f, sm_103f, sm_120f, sm_121f.
  • New archs: SM103 and SM121: sm_103, sm_103a, sm_121, sm_121a.

Also, some things were reformatted.


Full diff: https://github.com/llvm/llvm-project/pull/136639.diff

2 Files Affected:

  • (modified) llvm/lib/Target/NVPTX/NVPTX.td (+48-27)
  • (modified) llvm/test/CodeGen/NVPTX/sm-version.ll (+36)
diff --git a/llvm/lib/Target/NVPTX/NVPTX.td b/llvm/lib/Target/NVPTX/NVPTX.td
index 5467ae011a208..895f331ed3124 100644
--- a/llvm/lib/Target/NVPTX/NVPTX.td
+++ b/llvm/lib/Target/NVPTX/NVPTX.td
@@ -36,17 +36,29 @@ class FeaturePTX<int version>:
 
 foreach sm = [20, 21, 30, 32, 35, 37, 50, 52, 53,
               60, 61, 62, 70, 72, 75, 80, 86, 87,
-              89, 90, 100, 101, 120] in
+              89, 90, 100, 101, 103, 120, 121] in
   def SM#sm: FeatureSM<""#sm, !mul(sm, 10)>;
 
-def SM90a: FeatureSM<"90a", 901>;
+// Arch-specific targets. PTX for these is not compatible with any other
+// architectures.
+def SM90a : FeatureSM<"90a", 901>;
 def SM100a: FeatureSM<"100a", 1001>;
 def SM101a: FeatureSM<"101a", 1011>;
+def SM103a: FeatureSM<"103a", 1031>;
 def SM120a: FeatureSM<"120a", 1201>;
+def SM121a: FeatureSM<"121a", 1211>;
+
+// Family-specific targets. PTX for these is compatible with all subsequent
+// targets in the same family.
+def SM100f: FeatureSM<"100f", 1002>;
+def SM101f: FeatureSM<"101f", 1012>;
+def SM103f: FeatureSM<"103f", 1032>;
+def SM120f: FeatureSM<"120f", 1202>;
+def SM121f: FeatureSM<"121f", 1212>;
 
 foreach version = [32, 40, 41, 42, 43, 50, 60, 61, 62, 63, 64, 65,
                    70, 71, 72, 73, 74, 75, 76, 77, 78,
-                   80, 81, 82, 83, 84, 85, 86, 87] in
+                   80, 81, 82, 83, 84, 85, 86, 87, 88] in
   def PTX#version: FeaturePTX<version>;
 
 //===----------------------------------------------------------------------===//
@@ -56,33 +68,42 @@ foreach version = [32, 40, 41, 42, 43, 50, 60, 61, 62, 63, 64, 65,
 class Proc<string Name, list<SubtargetFeature> Features>
  : Processor<Name, NoItineraries, Features>;
 
-def : Proc<"sm_20", [SM20, PTX32]>;
-def : Proc<"sm_21", [SM21, PTX32]>;
-def : Proc<"sm_30", [SM30]>;
-def : Proc<"sm_32", [SM32, PTX40]>;
-def : Proc<"sm_35", [SM35, PTX32]>;
-def : Proc<"sm_37", [SM37, PTX41]>;
-def : Proc<"sm_50", [SM50, PTX40]>;
-def : Proc<"sm_52", [SM52, PTX41]>;
-def : Proc<"sm_53", [SM53, PTX42]>;
-def : Proc<"sm_60", [SM60, PTX50]>;
-def : Proc<"sm_61", [SM61, PTX50]>;
-def : Proc<"sm_62", [SM62, PTX50]>;
-def : Proc<"sm_70", [SM70, PTX60]>;
-def : Proc<"sm_72", [SM72, PTX61]>;
-def : Proc<"sm_75", [SM75, PTX63]>;
-def : Proc<"sm_80", [SM80, PTX70]>;
-def : Proc<"sm_86", [SM86, PTX71]>;
-def : Proc<"sm_87", [SM87, PTX74]>;
-def : Proc<"sm_89", [SM89, PTX78]>;
-def : Proc<"sm_90", [SM90, PTX78]>;
-def : Proc<"sm_90a", [SM90a, PTX80]>;
-def : Proc<"sm_100", [SM100, PTX86]>;
+def : Proc<"sm_20",   [SM20, PTX32]>;
+def : Proc<"sm_21",   [SM21, PTX32]>;
+def : Proc<"sm_30",   [SM30]>;
+def : Proc<"sm_32",   [SM32, PTX40]>;
+def : Proc<"sm_35",   [SM35, PTX32]>;
+def : Proc<"sm_37",   [SM37, PTX41]>;
+def : Proc<"sm_50",   [SM50, PTX40]>;
+def : Proc<"sm_52",   [SM52, PTX41]>;
+def : Proc<"sm_53",   [SM53, PTX42]>;
+def : Proc<"sm_60",   [SM60, PTX50]>;
+def : Proc<"sm_61",   [SM61, PTX50]>;
+def : Proc<"sm_62",   [SM62, PTX50]>;
+def : Proc<"sm_70",   [SM70, PTX60]>;
+def : Proc<"sm_72",   [SM72, PTX61]>;
+def : Proc<"sm_75",   [SM75, PTX63]>;
+def : Proc<"sm_80",   [SM80, PTX70]>;
+def : Proc<"sm_86",   [SM86, PTX71]>;
+def : Proc<"sm_87",   [SM87, PTX74]>;
+def : Proc<"sm_89",   [SM89, PTX78]>;
+def : Proc<"sm_90",   [SM90, PTX78]>;
+def : Proc<"sm_90a",  [SM90a, PTX80]>;
+def : Proc<"sm_100",  [SM100, PTX86]>;
 def : Proc<"sm_100a", [SM100a, PTX86]>;
-def : Proc<"sm_101", [SM101, PTX86]>;
+def : Proc<"sm_100f", [SM100f, PTX88]>;
+def : Proc<"sm_101",  [SM101, PTX86]>;
 def : Proc<"sm_101a", [SM101a, PTX86]>;
-def : Proc<"sm_120", [SM120, PTX87]>;
+def : Proc<"sm_101f", [SM101f, PTX88]>;
+def : Proc<"sm_103",  [SM103, PTX88]>;
+def : Proc<"sm_103a", [SM103a, PTX88]>;
+def : Proc<"sm_103f", [SM103f, PTX88]>;
+def : Proc<"sm_120",  [SM120, PTX87]>;
 def : Proc<"sm_120a", [SM120a, PTX87]>;
+def : Proc<"sm_120f", [SM120f, PTX88]>;
+def : Proc<"sm_121",  [SM121, PTX88]>;
+def : Proc<"sm_121a", [SM121a, PTX88]>;
+def : Proc<"sm_121f", [SM121f, PTX88]>;
 
 def NVPTXInstrInfo : InstrInfo {
 }
diff --git a/llvm/test/CodeGen/NVPTX/sm-version.ll b/llvm/test/CodeGen/NVPTX/sm-version.ll
index ce9a1b1b161dc..3a154a1b9ac9c 100644
--- a/llvm/test/CodeGen/NVPTX/sm-version.ll
+++ b/llvm/test/CodeGen/NVPTX/sm-version.ll
@@ -18,10 +18,19 @@
 ; RUN: llc < %s -mtriple=nvptx -mcpu=sm_90a | FileCheck %s --check-prefix=SM90a
 ; RUN: llc < %s -mtriple=nvptx -mcpu=sm_100 | FileCheck %s --check-prefix=SM100
 ; RUN: llc < %s -mtriple=nvptx -mcpu=sm_100a | FileCheck %s --check-prefix=SM100a
+; RUN: llc < %s -mtriple=nvptx -mcpu=sm_100f | FileCheck %s --check-prefix=SM100f
 ; RUN: llc < %s -mtriple=nvptx -mcpu=sm_101 | FileCheck %s --check-prefix=SM101
 ; RUN: llc < %s -mtriple=nvptx -mcpu=sm_101a | FileCheck %s --check-prefix=SM101a
+; RUN: llc < %s -mtriple=nvptx -mcpu=sm_101f | FileCheck %s --check-prefix=SM101f
+; RUN: llc < %s -mtriple=nvptx -mcpu=sm_103 | FileCheck %s --check-prefix=SM103
+; RUN: llc < %s -mtriple=nvptx -mcpu=sm_103a | FileCheck %s --check-prefix=SM103a
+; RUN: llc < %s -mtriple=nvptx -mcpu=sm_103f | FileCheck %s --check-prefix=SM103f
 ; RUN: llc < %s -mtriple=nvptx -mcpu=sm_120 | FileCheck %s --check-prefix=SM120
 ; RUN: llc < %s -mtriple=nvptx -mcpu=sm_120a | FileCheck %s --check-prefix=SM120a
+; RUN: llc < %s -mtriple=nvptx -mcpu=sm_120f | FileCheck %s --check-prefix=SM120f
+; RUN: llc < %s -mtriple=nvptx -mcpu=sm_121 | FileCheck %s --check-prefix=SM121
+; RUN: llc < %s -mtriple=nvptx -mcpu=sm_121a | FileCheck %s --check-prefix=SM121a
+; RUN: llc < %s -mtriple=nvptx -mcpu=sm_121f | FileCheck %s --check-prefix=SM121f
 
 ; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_20 | FileCheck %s --check-prefix=SM20
 ; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_21 | FileCheck %s --check-prefix=SM21
@@ -43,10 +52,19 @@
 ; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_90a | FileCheck %s --check-prefix=SM90a
 ; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_100 | FileCheck %s --check-prefix=SM100
 ; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_100a | FileCheck %s --check-prefix=SM100a
+; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_100f | FileCheck %s --check-prefix=SM100f
 ; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_101 | FileCheck %s --check-prefix=SM101
 ; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_101a | FileCheck %s --check-prefix=SM101a
+; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_101f | FileCheck %s --check-prefix=SM101f
+; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_103 | FileCheck %s --check-prefix=SM103
+; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_103a | FileCheck %s --check-prefix=SM103a
+; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_103f | FileCheck %s --check-prefix=SM103f
 ; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_120 | FileCheck %s --check-prefix=SM120
 ; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_120a | FileCheck %s --check-prefix=SM120a
+; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_120f | FileCheck %s --check-prefix=SM120f
+; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_121 | FileCheck %s --check-prefix=SM121
+; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_121a | FileCheck %s --check-prefix=SM121a
+; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_121f | FileCheck %s --check-prefix=SM121f
 
 ; SM20: .version 3.2
 ; SM21: .version 3.2
@@ -68,10 +86,19 @@
 ; SM90a: .version 8.0
 ; SM100: .version 8.6
 ; SM100a: .version 8.6
+; SM100f: .version 8.8
 ; SM101: .version 8.6
 ; SM101a: .version 8.6
+; SM101f: .version 8.8
+; SM103: .version 8.8
+; SM103a: .version 8.8
+; SM103f: .version 8.8
 ; SM120: .version 8.7
 ; SM120a: .version 8.7
+; SM120f: .version 8.8
+; SM121: .version 8.8
+; SM121a: .version 8.8
+; SM121f: .version 8.8
 
 ; SM20: .target sm_20
 ; SM21: .target sm_21
@@ -93,7 +120,16 @@
 ; SM90a: .target sm_90a
 ; SM100: .target sm_100
 ; SM100a: .target sm_100a
+; SM100f: .target sm_100f
 ; SM101: .target sm_101
 ; SM101a: .target sm_101a
+; SM101f: .target sm_101f
+; SM103: .target sm_103
+; SM103a: .target sm_103a
+; SM103f: .target sm_103f
 ; SM120: .target sm_120
 ; SM120a: .target sm_120a
+; SM120f: .target sm_120f
+; SM121: .target sm_121
+; SM121a: .target sm_121a
+; SM121f: .target sm_121f

@durga4github
Copy link
Contributor

Changes LGTM

@Prince781 Prince781 force-pushed the dev/pferro/ptx88 branch from 5f736d5 to aa515be Compare May 2, 2025 19:56
Support PTX version 8.8 (`-mattr=+ptx88`) from CUDA 12.9. The following
new targets are also added:

SM103 and SM121: sm_103, sm_103a, sm_121, sm_121a.

Also, some things were reformatted.

https://docs.nvidia.com/cuda/parallel-thread-execution/#changes-in-ptx-isa-version-8-8
@Prince781 Prince781 force-pushed the dev/pferro/ptx88 branch from aa515be to a9c1073 Compare May 2, 2025 19:57
@Prince781 Prince781 merged commit 659f5ac into llvm:main May 3, 2025
11 checks passed
IanWood1 pushed a commit to IanWood1/llvm-project that referenced this pull request May 6, 2025
Support PTX version 8.8 (`-mattr=+ptx88`) from CUDA 12.9. The following
new targets are also added:

- SM103 and SM121: sm_103, sm_103a, sm_121, sm_121a.

Also, some things were reformatted.

https://docs.nvidia.com/cuda/parallel-thread-execution/#changes-in-ptx-isa-version-8-8
IanWood1 pushed a commit to IanWood1/llvm-project that referenced this pull request May 6, 2025
Support PTX version 8.8 (`-mattr=+ptx88`) from CUDA 12.9. The following
new targets are also added:

- SM103 and SM121: sm_103, sm_103a, sm_121, sm_121a.

Also, some things were reformatted.

https://docs.nvidia.com/cuda/parallel-thread-execution/#changes-in-ptx-isa-version-8-8
IanWood1 pushed a commit to IanWood1/llvm-project that referenced this pull request May 6, 2025
Support PTX version 8.8 (`-mattr=+ptx88`) from CUDA 12.9. The following
new targets are also added:

- SM103 and SM121: sm_103, sm_103a, sm_121, sm_121a.

Also, some things were reformatted.

https://docs.nvidia.com/cuda/parallel-thread-execution/#changes-in-ptx-isa-version-8-8
GeorgeARM pushed a commit to GeorgeARM/llvm-project that referenced this pull request May 7, 2025
Support PTX version 8.8 (`-mattr=+ptx88`) from CUDA 12.9. The following
new targets are also added:

- SM103 and SM121: sm_103, sm_103a, sm_121, sm_121a.

Also, some things were reformatted.

https://docs.nvidia.com/cuda/parallel-thread-execution/#changes-in-ptx-isa-version-8-8
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants