Skip to content

Commit e542576

Browse files
committed
Added NVPTX architecture description
1 parent e2837d1 commit e542576

File tree

2 files changed

+38
-7
lines changed

2 files changed

+38
-7
lines changed

llvm/lib/Target/NVPTX/NVPTX.td

Lines changed: 37 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -33,18 +33,49 @@ class FeaturePTX<int version>:
3333
SubtargetFeature<"ptx"# version, "PTXVersion",
3434
"" # version,
3535
"Use PTX version " # version>;
36-
36+
//
37+
// NVPTX Architecture Hierarchy and Ordering:
38+
//
39+
// Family: 2/3/5/6/7/8/9/10/12 (Follows Onion model, older family is compatible with newer family)
40+
// Arch: 2*/3*/5*/6*/7*/8*/9*/10*/12*
41+
//
42+
// Family-specific: F*f : F*f > F* =>
43+
// + The plain base architecture is compatible with the family-specific architecture
44+
// (e.g. sm_100 compatible with >= sm_100*f*)
45+
// + The family-specific architecture is compatible with future family-specific
46+
// architectures within the same family (e.g. sm_100f compatible with >= sm_10X*f*
47+
// but not with sm_12X*f*)
48+
//
49+
// Family and SM Target Definition:
50+
// +----------------+--------------------------------------------------------+
51+
// | Family | Target SM architectures included |
52+
// +----------------+--------------------------------------------------------+
53+
// | sm_10x family | sm_100f, sm_103f, future targets in sm_10x family |
54+
// | sm_101 family | sm_101f (exception) |
55+
// | sm_12x family | sm_120f, sm_121f, future targets in sm_12x family |
56+
// +----------------+--------------------------------------------------------+
57+
//
58+
// Architecture-specific: F*a : F*a > F*f > F* =>
59+
// + The plain base architecture is compatible with the architecture-specific architecture
60+
// (e.g. sm_100 compatible with >= sm_100*a*)
61+
// + The family-specific architecture is compatible with the architecture-specific architecture
62+
// (e.g. sm_100f compatible with >= sm_100*a*)
63+
// + The architecture-specific architecture is incompatible with any other architecture
64+
// (e.g. sm_100a is only compatible with sm_100*a*)
65+
//
66+
// Encoding: Arch * 1000 + 'f' * 10 + 'a' * 1 (where 'a' ⇒ 'f')
67+
//
68+
// This encoding allows simple implementation of the partial ordering of the architectures.
69+
// + Compare Family and Arch by dividing FullSMVersion by 1000 and 100 respectively before the comparison.
70+
// + Compare within the family by comparing FullSMVersion, given both belongs to the same family.
71+
// + Detect 'a' variants by checking FullSMVersion % 10.
72+
//
3773
foreach sm = [20, 21, 30, 32, 35, 37, 50, 52, 53,
3874
60, 61, 62, 70, 72, 75, 80, 86, 87,
3975
89, 90, 100, 101, 103, 120, 121] in {
4076
// Base SM version (e.g. FullSMVersion for sm_100 is 10000)
4177
def SM#sm : FeatureSM<""#sm, !mul(sm, 100)>;
4278

43-
// Note: Subset of the architecture-specific features, normally
44-
// available in "a" variants that will be compatible with subsequent targets
45-
// in the same family. I.e they are only ordered within the major architecture,
46-
// but are not comparable with other major architectures
47-
4879
// Family-specific targets which are compatible within same family
4980
// (e.g. FullSMVersion for sm_100f is 10010)
5081
if !ge(sm, 100) then {

llvm/lib/Target/NVPTX/NVPTXSubtarget.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -137,7 +137,7 @@ class NVPTXSubtarget : public NVPTXGenSubtargetInfo {
137137
bool hasArchAccelFeatures() const { return getFullSmVersion() % 10; }
138138
// GPUs with 'f' suffix have architecture-accelerated features which are
139139
// portable across all future architectures under same SM major. For example,
140-
// sm_100f features will work for sm_10X future architectures.
140+
// sm_100f features will work for sm_10X*f*/sm_10X*a* future architectures.
141141
// - false represents non-family-specific architecture.
142142
// - true represents family-specific variant.
143143
bool hasFamilySpecificFeatures() const {

0 commit comments

Comments
 (0)