Skip to content

[X86] AMD Zen 5 Initial enablement #107964

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Sep 13, 2024
Merged

[X86] AMD Zen 5 Initial enablement #107964

merged 4 commits into from
Sep 13, 2024

Conversation

ganeshgit
Copy link
Contributor

This patch enables the basic skeleton enablement of AMD next gen zen5 CPUs.

@llvmbot llvmbot added clang Clang issues not falling into any other category compiler-rt backend:X86 clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' clang:frontend Language frontend issues, e.g. anything involving "Sema" compiler-rt:builtins mc Machine (object) code llvm:transforms labels Sep 10, 2024
@llvmbot
Copy link
Member

llvmbot commented Sep 10, 2024

@llvm/pr-subscribers-backend-x86
@llvm/pr-subscribers-llvm-transforms
@llvm/pr-subscribers-mc

@llvm/pr-subscribers-clang-driver

Author: Ganesh (ganeshgit)

Changes

This patch enables the basic skeleton enablement of AMD next gen zen5 CPUs.


Patch is 31.47 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/107964.diff

30 Files Affected:

  • (modified) clang/lib/Basic/Targets/X86.cpp (+4)
  • (modified) clang/test/CodeGen/target-builtin-noerror.c (+1)
  • (modified) clang/test/Driver/x86-march.c (+4)
  • (modified) clang/test/Frontend/x86-target-cpu.c (+1)
  • (modified) clang/test/Misc/target-invalid-cpu-note/x86.c (+4)
  • (modified) clang/test/Preprocessor/predefined-arch-macros.c (+142)
  • (modified) compiler-rt/lib/builtins/cpu_model/x86.c (+20)
  • (modified) llvm/include/llvm/TargetParser/X86TargetParser.def (+3)
  • (modified) llvm/include/llvm/TargetParser/X86TargetParser.h (+1)
  • (modified) llvm/lib/Target/X86/X86.td (+15)
  • (modified) llvm/lib/Target/X86/X86PfmCounters.td (+1)
  • (modified) llvm/lib/TargetParser/Host.cpp (+19)
  • (modified) llvm/lib/TargetParser/X86TargetParser.cpp (+5)
  • (modified) llvm/test/CodeGen/X86/bypass-slow-division-64.ll (+1)
  • (modified) llvm/test/CodeGen/X86/cmp16.ll (+1)
  • (modified) llvm/test/CodeGen/X86/cpus-amd.ll (+1)
  • (modified) llvm/test/CodeGen/X86/rdpru.ll (+1)
  • (modified) llvm/test/CodeGen/X86/shuffle-as-shifts.ll (+1)
  • (modified) llvm/test/CodeGen/X86/slow-unaligned-mem.ll (+1)
  • (modified) llvm/test/CodeGen/X86/sqrt-fastmath-tune.ll (+1)
  • (modified) llvm/test/CodeGen/X86/tuning-shuffle-permilpd-avx512.ll (+1)
  • (modified) llvm/test/CodeGen/X86/tuning-shuffle-permilps-avx512.ll (+1)
  • (modified) llvm/test/CodeGen/X86/tuning-shuffle-unpckpd-avx512.ll (+1)
  • (modified) llvm/test/CodeGen/X86/tuning-shuffle-unpckps-avx512.ll (+1)
  • (modified) llvm/test/CodeGen/X86/vector-shuffle-fast-per-lane.ll (+1)
  • (modified) llvm/test/CodeGen/X86/vpdpwssd.ll (+1)
  • (modified) llvm/test/CodeGen/X86/x86-64-double-shifts-var.ll (+1)
  • (modified) llvm/test/MC/X86/x86_long_nop.s (+2)
  • (modified) llvm/test/Transforms/LoopUnroll/X86/call-remark.ll (+1)
  • (modified) llvm/test/Transforms/SLPVectorizer/X86/pr63668.ll (+1)
diff --git a/clang/lib/Basic/Targets/X86.cpp b/clang/lib/Basic/Targets/X86.cpp
index 62c382b67ad14a..5448bd841959f4 100644
--- a/clang/lib/Basic/Targets/X86.cpp
+++ b/clang/lib/Basic/Targets/X86.cpp
@@ -728,6 +728,9 @@ void X86TargetInfo::getTargetDefines(const LangOptions &Opts,
   case CK_ZNVER4:
     defineCPUMacros(Builder, "znver4");
     break;
+  case CK_ZNVER5:
+    defineCPUMacros(Builder, "znver5");
+    break;
   case CK_Geode:
     defineCPUMacros(Builder, "geode");
     break;
@@ -1626,6 +1629,7 @@ std::optional<unsigned> X86TargetInfo::getCPUCacheLineSize() const {
     case CK_ZNVER2:
     case CK_ZNVER3:
     case CK_ZNVER4:
+    case CK_ZNVER5:
     // Deprecated
     case CK_x86_64:
     case CK_x86_64_v2:
diff --git a/clang/test/CodeGen/target-builtin-noerror.c b/clang/test/CodeGen/target-builtin-noerror.c
index 14024e3953182c..2a05074d7c2b68 100644
--- a/clang/test/CodeGen/target-builtin-noerror.c
+++ b/clang/test/CodeGen/target-builtin-noerror.c
@@ -207,4 +207,5 @@ void verifycpustrings(void) {
   (void)__builtin_cpu_is("znver2");
   (void)__builtin_cpu_is("znver3");
   (void)__builtin_cpu_is("znver4");
+  (void)__builtin_cpu_is("znver5");
 }
diff --git a/clang/test/Driver/x86-march.c b/clang/test/Driver/x86-march.c
index cc993b53937c17..3bc2a82ae778d6 100644
--- a/clang/test/Driver/x86-march.c
+++ b/clang/test/Driver/x86-march.c
@@ -242,6 +242,10 @@
 // RUN: %clang -target x86_64-unknown-unknown -c -### %s -march=znver4 2>&1 \
 // RUN:   | FileCheck %s -check-prefix=znver4
 // znver4: "-target-cpu" "znver4"
+//
+// RUN: %clang -target x86_64-unknown-unknown -c -### %s -march=znver5 2>&1 \
+// RUN:   | FileCheck %s -check-prefix=znver5
+// znver5: "-target-cpu" "znver5"
 
 // RUN: %clang -target x86_64 -c -### %s -march=x86-64 2>&1 | FileCheck %s --check-prefix=x86-64
 // x86-64: "-target-cpu" "x86-64"
diff --git a/clang/test/Frontend/x86-target-cpu.c b/clang/test/Frontend/x86-target-cpu.c
index 6c8502ac2c21ee..f2885a040c3701 100644
--- a/clang/test/Frontend/x86-target-cpu.c
+++ b/clang/test/Frontend/x86-target-cpu.c
@@ -38,5 +38,6 @@
 // RUN: %clang_cc1 -triple x86_64-unknown-unknown -target-cpu znver2 -verify %s
 // RUN: %clang_cc1 -triple x86_64-unknown-unknown -target-cpu znver3 -verify %s
 // RUN: %clang_cc1 -triple x86_64-unknown-unknown -target-cpu znver4 -verify %s
+// RUN: %clang_cc1 -triple x86_64-unknown-unknown -target-cpu znver5 -verify %s
 //
 // expected-no-diagnostics
diff --git a/clang/test/Misc/target-invalid-cpu-note/x86.c b/clang/test/Misc/target-invalid-cpu-note/x86.c
index 607192a5409ba8..7879676040af46 100644
--- a/clang/test/Misc/target-invalid-cpu-note/x86.c
+++ b/clang/test/Misc/target-invalid-cpu-note/x86.c
@@ -99,6 +99,7 @@
 // X86-SAME: {{^}}, znver2
 // X86-SAME: {{^}}, znver3
 // X86-SAME: {{^}}, znver4
+// X86-SAME: {{^}}, znver5
 // X86-SAME: {{^}}, x86-64
 // X86-SAME: {{^}}, x86-64-v2
 // X86-SAME: {{^}}, x86-64-v3
@@ -175,6 +176,7 @@
 // X86_64-SAME: {{^}}, znver2
 // X86_64-SAME: {{^}}, znver3
 // X86_64-SAME: {{^}}, znver4
+// X86_64-SAME: {{^}}, znver5
 // X86_64-SAME: {{^}}, x86-64
 // X86_64-SAME: {{^}}, x86-64-v2
 // X86_64-SAME: {{^}}, x86-64-v3
@@ -278,6 +280,7 @@
 // TUNE_X86-SAME: {{^}}, znver2
 // TUNE_X86-SAME: {{^}}, znver3
 // TUNE_X86-SAME: {{^}}, znver4
+// TUNE_X86-SAME: {{^}}, znver5
 // TUNE_X86-SAME: {{^}}, x86-64
 // TUNE_X86-SAME: {{^}}, geode
 // TUNE_X86-SAME: {{$}}
@@ -379,6 +382,7 @@
 // TUNE_X86_64-SAME: {{^}}, znver2
 // TUNE_X86_64-SAME: {{^}}, znver3
 // TUNE_X86_64-SAME: {{^}}, znver4
+// TUNE_X86_64-SAME: {{^}}, znver5
 // TUNE_X86_64-SAME: {{^}}, x86-64
 // TUNE_X86_64-SAME: {{^}}, geode
 // TUNE_X86_64-SAME: {{$}}
diff --git a/clang/test/Preprocessor/predefined-arch-macros.c b/clang/test/Preprocessor/predefined-arch-macros.c
index 49646d94d920c8..a149c69ee0cdb2 100644
--- a/clang/test/Preprocessor/predefined-arch-macros.c
+++ b/clang/test/Preprocessor/predefined-arch-macros.c
@@ -3923,6 +3923,148 @@
 // CHECK_ZNVER4_M64: #define __znver4 1
 // CHECK_ZNVER4_M64: #define __znver4__ 1
 
+// RUN: %clang -march=znver5 -m32 -E -dM %s -o - 2>&1 \
+// RUN:     -target i386-unknown-linux \
+// RUN:   | FileCheck -match-full-lines %s -check-prefix=CHECK_ZNVER5_M32
+// CHECK_ZNVER5_M32-NOT: #define __3dNOW_A__ 1
+// CHECK_ZNVER5_M32-NOT: #define __3dNOW__ 1
+// CHECK_ZNVER5_M32: #define __ADX__ 1
+// CHECK_ZNVER5_M32: #define __AES__ 1
+// CHECK_ZNVER5_M32: #define __AVX2__ 1
+// CHECK_ZNVER5_M32: #define __AVX512BF16__ 1
+// CHECK_ZNVER5_M32: #define __AVX512BITALG__ 1
+// CHECK_ZNVER5_M32: #define __AVX512BW__ 1
+// CHECK_ZNVER5_M32: #define __AVX512CD__ 1
+// CHECK_ZNVER5_M32: #define __AVX512DQ__ 1
+// CHECK_ZNVER5_M32: #define __AVX512F__ 1
+// CHECK_ZNVER5_M32: #define __AVX512IFMA__ 1
+// CHECK_ZNVER5_M32: #define __AVX512VBMI2__ 1
+// CHECK_ZNVER5_M32: #define __AVX512VBMI__ 1
+// CHECK_ZNVER5_M32: #define __AVX512VL__ 1
+// CHECK_ZNVER5_M32: #define __AVX512VNNI__ 1
+// CHECK_ZNVER5_M32: #define __AVX512VP2INTERSECT__ 1
+// CHECK_ZNVER5_M32: #define __AVX512VPOPCNTDQ__ 1
+// CHECK_ZNVER5_M32: #define __AVXVNNI__ 1
+// CHECK_ZNVER5_M32: #define __AVX__ 1
+// CHECK_ZNVER5_M32: #define __BMI2__ 1
+// CHECK_ZNVER5_M32: #define __BMI__ 1
+// CHECK_ZNVER5_M32: #define __CLFLUSHOPT__ 1
+// CHECK_ZNVER5_M32: #define __CLWB__ 1
+// CHECK_ZNVER5_M32: #define __CLZERO__ 1
+// CHECK_ZNVER5_M32: #define __F16C__ 1
+// CHECK_ZNVER5_M32-NOT: #define __FMA4__ 1
+// CHECK_ZNVER5_M32: #define __FMA__ 1
+// CHECK_ZNVER5_M32: #define __FSGSBASE__ 1
+// CHECK_ZNVER5_M32: #define __GFNI__ 1
+// CHECK_ZNVER5_M32: #define __LZCNT__ 1
+// CHECK_ZNVER5_M32: #define __MMX__ 1
+// CHECK_ZNVER5_M32: #define __MOVDIR64B__ 1
+// CHECK_ZNVER5_M32: #define __MOVDIRI__ 1
+// CHECK_ZNVER5_M32: #define __PCLMUL__ 1
+// CHECK_ZNVER5_M32: #define __PKU__ 1
+// CHECK_ZNVER5_M32: #define __POPCNT__ 1
+// CHECK_ZNVER5_M32: #define __PREFETCHI__ 1
+// CHECK_ZNVER5_M32: #define __PRFCHW__ 1
+// CHECK_ZNVER5_M32: #define __RDPID__ 1
+// CHECK_ZNVER5_M32: #define __RDPRU__ 1
+// CHECK_ZNVER5_M32: #define __RDRND__ 1
+// CHECK_ZNVER5_M32: #define __RDSEED__ 1
+// CHECK_ZNVER5_M32: #define __SHA__ 1
+// CHECK_ZNVER5_M32: #define __SSE2_MATH__ 1
+// CHECK_ZNVER5_M32: #define __SSE2__ 1
+// CHECK_ZNVER5_M32: #define __SSE3__ 1
+// CHECK_ZNVER5_M32: #define __SSE4A__ 1
+// CHECK_ZNVER5_M32: #define __SSE4_1__ 1
+// CHECK_ZNVER5_M32: #define __SSE4_2__ 1
+// CHECK_ZNVER5_M32: #define __SSE_MATH__ 1
+// CHECK_ZNVER5_M32: #define __SSE__ 1
+// CHECK_ZNVER5_M32: #define __SSSE3__ 1
+// CHECK_ZNVER5_M32-NOT: #define __TBM__ 1
+// CHECK_ZNVER5_M32: #define __WBNOINVD__ 1
+// CHECK_ZNVER5_M32-NOT: #define __XOP__ 1
+// CHECK_ZNVER5_M32: #define __XSAVEC__ 1
+// CHECK_ZNVER5_M32: #define __XSAVEOPT__ 1
+// CHECK_ZNVER5_M32: #define __XSAVES__ 1
+// CHECK_ZNVER5_M32: #define __XSAVE__ 1
+// CHECK_ZNVER5_M32: #define __i386 1
+// CHECK_ZNVER5_M32: #define __i386__ 1
+// CHECK_ZNVER5_M32: #define __tune_znver5__ 1
+// CHECK_ZNVER5_M32: #define __znver5 1
+// CHECK_ZNVER5_M32: #define __znver5__ 1
+
+// RUN: %clang -march=znver5 -m64 -E -dM %s -o - 2>&1 \
+// RUN:     -target i386-unknown-linux \
+// RUN:   | FileCheck -match-full-lines %s -check-prefix=CHECK_ZNVER5_M64
+// CHECK_ZNVER5_M64-NOT: #define __3dNOW_A__ 1
+// CHECK_ZNVER5_M64-NOT: #define __3dNOW__ 1
+// CHECK_ZNVER5_M64: #define __ADX__ 1
+// CHECK_ZNVER5_M64: #define __AES__ 1
+// CHECK_ZNVER5_M64: #define __AVX2__ 1
+// CHECK_ZNVER5_M64: #define __AVX512BF16__ 1
+// CHECK_ZNVER5_M64: #define __AVX512BITALG__ 1
+// CHECK_ZNVER5_M64: #define __AVX512BW__ 1
+// CHECK_ZNVER5_M64: #define __AVX512CD__ 1
+// CHECK_ZNVER5_M64: #define __AVX512DQ__ 1
+// CHECK_ZNVER5_M64: #define __AVX512F__ 1
+// CHECK_ZNVER5_M64: #define __AVX512IFMA__ 1
+// CHECK_ZNVER5_M64: #define __AVX512VBMI2__ 1
+// CHECK_ZNVER5_M64: #define __AVX512VBMI__ 1
+// CHECK_ZNVER5_M64: #define __AVX512VL__ 1
+// CHECK_ZNVER5_M64: #define __AVX512VNNI__ 1
+// CHECK_ZNVER5_M64: #define __AVX512VP2INTERSECT__ 1
+// CHECK_ZNVER5_M64: #define __AVX512VPOPCNTDQ__ 1
+// CHECK_ZNVER5_M64: #define __AVXVNNI__ 1
+// CHECK_ZNVER5_M64: #define __AVX__ 1
+// CHECK_ZNVER5_M64: #define __BMI2__ 1
+// CHECK_ZNVER5_M64: #define __BMI__ 1
+// CHECK_ZNVER5_M64: #define __CLFLUSHOPT__ 1
+// CHECK_ZNVER5_M64: #define __CLWB__ 1
+// CHECK_ZNVER5_M64: #define __CLZERO__ 1
+// CHECK_ZNVER5_M64: #define __F16C__ 1
+// CHECK_ZNVER5_M64-NOT: #define __FMA4__ 1
+// CHECK_ZNVER5_M64: #define __FMA__ 1
+// CHECK_ZNVER5_M64: #define __FSGSBASE__ 1
+// CHECK_ZNVER5_M64: #define __GFNI__ 1
+// CHECK_ZNVER5_M64: #define __LZCNT__ 1
+// CHECK_ZNVER5_M64: #define __MMX__ 1
+// CHECK_ZNVER5_M64: #define __MOVDIR64B__ 1
+// CHECK_ZNVER5_M64: #define __MOVDIRI__ 1
+// CHECK_ZNVER5_M64: #define __PCLMUL__ 1
+// CHECK_ZNVER5_M64: #define __PKU__ 1
+// CHECK_ZNVER5_M64: #define __POPCNT__ 1
+// CHECK_ZNVER5_M64: #define __PREFETCHI__ 1
+// CHECK_ZNVER5_M64: #define __PRFCHW__ 1
+// CHECK_ZNVER5_M64: #define __RDPID__ 1
+// CHECK_ZNVER5_M64: #define __RDPRU__ 1
+// CHECK_ZNVER5_M64: #define __RDRND__ 1
+// CHECK_ZNVER5_M64: #define __RDSEED__ 1
+// CHECK_ZNVER5_M64: #define __SHA__ 1
+// CHECK_ZNVER5_M64: #define __SSE2_MATH__ 1
+// CHECK_ZNVER5_M64: #define __SSE2__ 1
+// CHECK_ZNVER5_M64: #define __SSE3__ 1
+// CHECK_ZNVER5_M64: #define __SSE4A__ 1
+// CHECK_ZNVER5_M64: #define __SSE4_1__ 1
+// CHECK_ZNVER5_M64: #define __SSE4_2__ 1
+// CHECK_ZNVER5_M64: #define __SSE_MATH__ 1
+// CHECK_ZNVER5_M64: #define __SSE__ 1
+// CHECK_ZNVER5_M64: #define __SSSE3__ 1
+// CHECK_ZNVER5_M64-NOT: #define __TBM__ 1
+// CHECK_ZNVER5_M64: #define __VAES__ 1
+// CHECK_ZNVER5_M64: #define __VPCLMULQDQ__ 1
+// CHECK_ZNVER5_M64: #define __WBNOINVD__ 1
+// CHECK_ZNVER5_M64-NOT: #define __XOP__ 1
+// CHECK_ZNVER5_M64: #define __XSAVEC__ 1
+// CHECK_ZNVER5_M64: #define __XSAVEOPT__ 1
+// CHECK_ZNVER5_M64: #define __XSAVES__ 1
+// CHECK_ZNVER5_M64: #define __XSAVE__ 1
+// CHECK_ZNVER5_M64: #define __amd64 1
+// CHECK_ZNVER5_M64: #define __amd64__ 1
+// CHECK_ZNVER5_M64: #define __tune_znver5__ 1
+// CHECK_ZNVER5_M64: #define __x86_64 1
+// CHECK_ZNVER5_M64: #define __x86_64__ 1
+// CHECK_ZNVER5_M64: #define __znver5 1
+// CHECK_ZNVER5_M64: #define __znver5__ 1
+
 // End X86/GCC/Linux tests ------------------
 
 // Begin PPC/GCC/Linux tests ----------------
diff --git a/compiler-rt/lib/builtins/cpu_model/x86.c b/compiler-rt/lib/builtins/cpu_model/x86.c
index 069defc970190e..e5d74daf26d3de 100644
--- a/compiler-rt/lib/builtins/cpu_model/x86.c
+++ b/compiler-rt/lib/builtins/cpu_model/x86.c
@@ -63,6 +63,7 @@ enum ProcessorTypes {
   INTEL_SIERRAFOREST,
   INTEL_GRANDRIDGE,
   INTEL_CLEARWATERFOREST,
+  AMDFAM1AH,
   CPU_TYPE_MAX
 };
 
@@ -101,6 +102,7 @@ enum ProcessorSubtypes {
   INTEL_COREI7_ARROWLAKE,
   INTEL_COREI7_ARROWLAKE_S,
   INTEL_COREI7_PANTHERLAKE,
+  AMDFAM1AH_ZNVER5,
   CPU_SUBTYPE_MAX
 };
 
@@ -748,6 +750,24 @@ static const char *getAMDProcessorTypeAndSubtype(unsigned Family,
       break; //  "znver4"
     }
     break; // family 19h
+  case 26:
+    CPU = "znver5";
+    *Type = AMDFAM1AH;
+    if (Model <= 0x77) {
+     // Models 00h-0Fh (Breithorn).
+     // Models 10h-1Fh (Breithorn-Dense).
+     // Models 20h-2Fh (Strix 1).
+     // Models 30h-37h (Strix 2).
+     // Models 38h-3Fh (Strix 3).
+     // Models 40h-4Fh (Granite Ridge).
+     // Models 50h-5Fh (Weisshorn).
+     // Models 60h-6Fh (Krackan1).
+     // Models 70h-77h (Sarlak).
+     CPU = "znver5";
+     *Subtype = AMDFAM1AH_ZNVER5;
+     break; //  "znver5"
+    }
+    break;
   default:
     break; // Unknown AMD CPU.
   }
diff --git a/llvm/include/llvm/TargetParser/X86TargetParser.def b/llvm/include/llvm/TargetParser/X86TargetParser.def
index cd160f54e66705..e5bf196559ba63 100644
--- a/llvm/include/llvm/TargetParser/X86TargetParser.def
+++ b/llvm/include/llvm/TargetParser/X86TargetParser.def
@@ -49,11 +49,13 @@ X86_CPU_TYPE(ZHAOXIN_FAM7H,       "zhaoxin_fam7h")
 X86_CPU_TYPE(INTEL_SIERRAFOREST,  "sierraforest")
 X86_CPU_TYPE(INTEL_GRANDRIDGE,    "grandridge")
 X86_CPU_TYPE(INTEL_CLEARWATERFOREST, "clearwaterforest")
+X86_CPU_TYPE(AMDFAM1AH,           "amdfam1ah")
 
 // Alternate names supported by __builtin_cpu_is and target multiversioning.
 X86_CPU_TYPE_ALIAS(INTEL_BONNELL,    "atom")
 X86_CPU_TYPE_ALIAS(AMDFAM10H,        "amdfam10")
 X86_CPU_TYPE_ALIAS(AMDFAM15H,        "amdfam15")
+X86_CPU_TYPE_ALIAS(AMDFAM1AH,        "amdfam1a")
 X86_CPU_TYPE_ALIAS(INTEL_SILVERMONT, "slm")
 
 #undef X86_CPU_TYPE_ALIAS
@@ -104,6 +106,7 @@ X86_CPU_SUBTYPE(INTEL_COREI7_GRANITERAPIDS_D,"graniterapids-d")
 X86_CPU_SUBTYPE(INTEL_COREI7_ARROWLAKE,      "arrowlake")
 X86_CPU_SUBTYPE(INTEL_COREI7_ARROWLAKE_S,    "arrowlake-s")
 X86_CPU_SUBTYPE(INTEL_COREI7_PANTHERLAKE,    "pantherlake")
+X86_CPU_SUBTYPE(AMDFAM1AH_ZNVER5,            "znver5")
 
 // Alternate names supported by __builtin_cpu_is and target multiversioning.
 X86_CPU_SUBTYPE_ALIAS(INTEL_COREI7_ALDERLAKE, "raptorlake")
diff --git a/llvm/include/llvm/TargetParser/X86TargetParser.h b/llvm/include/llvm/TargetParser/X86TargetParser.h
index 2083e585af4ac8..0e17c4674719cf 100644
--- a/llvm/include/llvm/TargetParser/X86TargetParser.h
+++ b/llvm/include/llvm/TargetParser/X86TargetParser.h
@@ -142,6 +142,7 @@ enum CPUKind {
   CK_ZNVER2,
   CK_ZNVER3,
   CK_ZNVER4,
+  CK_ZNVER5,
   CK_x86_64,
   CK_x86_64_v2,
   CK_x86_64_v3,
diff --git a/llvm/lib/Target/X86/X86.td b/llvm/lib/Target/X86/X86.td
index 988966fa6a6c46..6cf37836f921d4 100644
--- a/llvm/lib/Target/X86/X86.td
+++ b/llvm/lib/Target/X86/X86.td
@@ -1549,6 +1549,19 @@ def ProcessorFeatures {
                                                   FeatureVPOPCNTDQ];
   list<SubtargetFeature> ZN4Features =
     !listconcat(ZN3Features, ZN4AdditionalFeatures);
+
+
+  list<SubtargetFeature> ZN5Tuning = ZN4Tuning;
+  list<SubtargetFeature> ZN5AdditionalFeatures = [FeatureVNNI,
+                                                  FeatureMOVDIRI,
+                                                  FeatureMOVDIR64B,
+                                                  FeatureVP2INTERSECT,
+                                                  FeaturePREFETCHI,
+                                                  FeatureAVXVNNI
+                                                  ];
+  list<SubtargetFeature> ZN5Features =
+    !listconcat(ZN4Features, ZN5AdditionalFeatures);
+
 }
 
 //===----------------------------------------------------------------------===//
@@ -1898,6 +1911,8 @@ def : ProcModel<"znver3", Znver3Model, ProcessorFeatures.ZN3Features,
                 ProcessorFeatures.ZN3Tuning>;
 def : ProcModel<"znver4", Znver4Model, ProcessorFeatures.ZN4Features,
            ProcessorFeatures.ZN4Tuning>;
+def : ProcModel<"znver5", Znver4Model, ProcessorFeatures.ZN5Features,
+                ProcessorFeatures.ZN5Tuning>;
 
 def : Proc<"geode",           [FeatureX87, FeatureCX8, FeatureMMX, FeaturePRFCHW],
                               [TuningSlowUAMem16, TuningInsertVZEROUPPER]>;
diff --git a/llvm/lib/Target/X86/X86PfmCounters.td b/llvm/lib/Target/X86/X86PfmCounters.td
index 2b1dac411c9927..c30e989cdc2af1 100644
--- a/llvm/lib/Target/X86/X86PfmCounters.td
+++ b/llvm/lib/Target/X86/X86PfmCounters.td
@@ -350,3 +350,4 @@ def ZnVer4PfmCounters : ProcPfmCounters {
   let ValidationCounters = DefaultAMDPfmValidationCounters;
 }
 def : PfmCountersBinding<"znver4", ZnVer4PfmCounters>;
+def : PfmCountersBinding<"znver5", ZnVer4PfmCounters>;
diff --git a/llvm/lib/TargetParser/Host.cpp b/llvm/lib/TargetParser/Host.cpp
index 986b9a211ce6c1..a85d28d8308308 100644
--- a/llvm/lib/TargetParser/Host.cpp
+++ b/llvm/lib/TargetParser/Host.cpp
@@ -1151,6 +1151,25 @@ static const char *getAMDProcessorTypeAndSubtype(unsigned Family,
       break; //  "znver4"
     }
     break; // family 19h
+  case 26:
+    CPU = "znver5";
+    *Type = X86::AMDFAM1AH;
+    if (Model <= 0x77) {
+     // Models 00h-0Fh (Breithorn).
+     // Models 10h-1Fh (Breithorn-Dense).
+     // Models 20h-2Fh (Strix 1).
+     // Models 30h-37h (Strix 2).
+     // Models 38h-3Fh (Strix 3).
+     // Models 40h-4Fh (Granite Ridge).
+     // Models 50h-5Fh (Weisshorn).
+     // Models 60h-6Fh (Krackan1).
+     // Models 70h-77h (Sarlak).
+     CPU = "znver5";
+     *Subtype = X86::AMDFAM1AH_ZNVER5;
+     break; //  "znver5"
+    } 
+    break;
+
   default:
     break; // Unknown AMD CPU.
   }
diff --git a/llvm/lib/TargetParser/X86TargetParser.cpp b/llvm/lib/TargetParser/X86TargetParser.cpp
index 57bda0651ea829..27feb9c78912a5 100644
--- a/llvm/lib/TargetParser/X86TargetParser.cpp
+++ b/llvm/lib/TargetParser/X86TargetParser.cpp
@@ -238,6 +238,10 @@ static constexpr FeatureBitset FeaturesZNVER4 =
     FeatureAVX512BITALG | FeatureAVX512VPOPCNTDQ | FeatureAVX512BF16 |
     FeatureGFNI | FeatureSHSTK;
 
+static constexpr FeatureBitset FeaturesZNVER5 =
+    FeaturesZNVER4 | FeatureAVXVNNI | FeatureMOVDIRI | FeatureMOVDIR64B |
+    FeatureAVX512VP2INTERSECT | FeaturePREFETCHI | FeatureAVXVNNI ;
+
 // D151696 tranplanted Mangling and OnlyForCPUDispatchSpecific from
 // X86TargetParser.def to here. They are assigned by following ways:
 // 1. Copy the mangling from the original CPU_SPEICIFC MACROs. If no, assign
@@ -417,6 +421,7 @@ constexpr ProcInfo Processors[] = {
   { {"znver2"}, CK_ZNVER2, FEATURE_AVX2, FeaturesZNVER2, '\0', false },
   { {"znver3"}, CK_ZNVER3, FEATURE_AVX2, FeaturesZNVER3, '\0', false },
   { {"znver4"}, CK_ZNVER4, FEATURE_AVX512VBMI2, FeaturesZNVER4, '\0', false },
+  { {"znver5"}, CK_ZNVER5, FEATURE_AVX512VP2INTERSECT, FeaturesZNVER5, '\0', false },
   // Generic 64-bit processor.
   { {"x86-64"}, CK_x86_64, FEATURE_SSE2 , FeaturesX86_64, '\0', false },
   { {"x86-64-v2"}, CK_x86_64_v2, FEATURE_SSE4_2 , FeaturesX86_64_V2, '\0', false },
diff --git a/llvm/test/CodeGen/X86/bypass-slow-division-64.ll b/llvm/test/CodeGen/X86/bypass-slow-division-64.ll
index 6e0cfdd26a7866..b0ca0069a526b7 100644
--- a/llvm/test/CodeGen/X86/bypass-slow-division-64.ll
+++ b/llvm/test/CodeGen/X86/bypass-slow-division-64.ll
@@ -23,6 +23,7 @@
 ; RUN: llc < %s -mtriple=x86_64-- -mcpu=znver2          | FileCheck %s --check-prefixes=CHECK,SLOW-DIVQ
 ; RUN: llc < %s -mtriple=x86_64-- -mcpu=znver3          | FileCheck %s --check-prefixes=CHECK,SLOW-DIVQ
 ; RUN: llc < %s -mtriple=x86_64-- -mcpu=znver4          | FileCheck %s --check-prefixes=CHECK,SLOW-DIVQ
+; RUN: llc < %s -mtriple=x86_64-- -mcpu=znver5          | FileCheck %s --check-prefixes=CHECK,SLOW-DIVQ
 
 ; Additional tests for 64-bit divide bypass
 
diff --git a/llvm/test/CodeGen/X86/cmp16.ll b/llvm/test/CodeGen/X86/cmp16.ll
index fa9e75ff16a5ca..8c14a78d9e1138 100644
--- a/llvm/test/CodeGen/X86/cmp16.ll
+++ b/llvm/test/CodeGen/X86/cmp16.ll
@@ -13,6 +13,7 @@
 ; RUN: llc < %s -mtriple=x86_64-- -mcpu=znver2 | FileCheck %s --check-prefixes=X64,X64-FAST
 ; RUN: llc < %s -mtriple=x86_64-- -mcpu=znver3 | FileCheck %s --check-prefixes=X64,X64-FAST
 ; RUN: llc < %s -mtriple=x86_64-- -mcpu=znver4 | FileCheck %s --check-prefixes=X64,X64-FAST
+; RUN: llc < %s -mtriple=x86_64-- -mcpu=znver5 | FileCheck %s --check-prefixes=X64,X64-FAST
 
 define i1 @cmp16_reg_eq_reg(i16 %a0, i16 %a1) {
 ; X86-GENERIC-LABEL: cmp16_reg_eq_reg:
diff --git a/llvm/test/CodeGen/X86/cpus-amd.ll b/llvm/test/CodeGen/X86/cpus-amd.ll
index 228a00428c4571..33b2cf37314788 100644
--- a/llvm/test/CodeGen/X86/cpus-amd.ll
+++ b/llvm/test/CodeGen/X86/cpus-amd.ll
@@ -29,6 +29,7 @@
 ; RUN: llc < %s -o /dev/null -mtriple=x86_64-unknown-unknown -mcpu=znver2 2>&1 | FileCheck %s --check-prefix=CHECK-NO-ERROR --allow-empty
 ; RUN: llc < %s -o /dev/null -mtriple=x86_64-unknown-unknown -mcpu=znver3 2>&1 | FileCheck %s --check-prefix=CHECK-NO-ERROR --allow-empty
 ; RUN: llc < %s -o /dev/null -mtriple=x86_64-unknown-unknown -mcpu=znver4 2>&1 | FileCheck %s --check-prefix=CHECK-NO-ERROR --allow-empty
+; RUN: llc < %s -o /dev/null -mtriple=x86_64-unknown-unknown -mcpu=znver5 2>&1 | FileCheck %s --check-prefix=CHECK-NO-ERROR --allow-empty
 
 define void @foo() {
   ret void
diff --git a/llvm/test/CodeGen/X86/rdpru.ll b/llvm/test/CodeGen/X86/rdpru.ll
index 7771f52653cb50..be79a4499a3389 100644
--- a/llvm/test/CodeGen/X86/rdpru.ll
+++ b/llvm/test/CodeGen/X86/rdpru.ll
@@ -6,6 +6,7 @@
 ; RUN: llc < %s -mtriple=x86_64-- -mcpu=znver2 | FileCheck %s --check-prefix=X64
 ; RUN: llc < %s -mtriple=x86_64-- -mcpu=znver3 -fast-isel | FileCheck %s --check-prefix=X64
 ; RUN: llc < %s -mtriple=x86_64-- -mcpu=znver4 -fast-isel | FileCheck ...
[truncated]

Copy link

github-actions bot commented Sep 10, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@@ -1151,6 +1151,25 @@ static const char *getAMDProcessorTypeAndSubtype(unsigned Family,
break; // "znver4"
}
break; // family 19h
case 26:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you bump the equivalent code in compiler-rt too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I posted some patches a while ago to start unifying things so that there's a single canonical version that gets copied and pasted, but those have been stalled for a while on reviewers, so for now we're mostly stuck with the ad-hoc way of doing things.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I posted some patches a while ago to start unifying things so that there's a single canonical version that gets copied and pasted, but those have been stalled for a while on reviewers, so for now we're mostly stuck with the ad-hoc way of doing things.

Can you point me to those pull requests.
BTW, I see that the compiler-rt change is in place. Do you find any issues there? I think the ordering with respect to libgcc is also okay.
b68bcc1#diff-f0c447f4acaa87cede210a02796f8a3b2f08f6d8acaf951ed4c65465bf37c967

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, looks like I missed it. Sorry about that!

There's #97856, #97861, and #97872 (still a draft).

I want to try and land some test infrastructure first (#101927) so I can ensure the refactorings don't break anything though. There are some more patches that I need to finish doing, but haven't gotten to the rest yet given the current ones are stalled.

@ganeshgit
Copy link
Contributor Author

ganeshgit commented Sep 10, 2024

This patch enables the basic skeleton enablement of AMD next gen zen5 CPUs.

@RKSimon Please post your comments. I have few subsequent patches for scheduler enablement, and some tuning patches lined up as well.

@tschuett tschuett requested a review from RKSimon September 10, 2024 07:00
Copy link
Collaborator

@RKSimon RKSimon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM as a base patch (znver4 + extra isas) - we should hold off from cherry picking into 19.x until we see the scope of the follow up patches.

@RKSimon
Copy link
Collaborator

RKSimon commented Sep 10, 2024

@ganeshgit Can you address the clang-format warnings please?

@ganeshgit
Copy link
Contributor Author

@ganeshgit Can you address the clang-format warnings please?

Yes sure. I will correct them!

@RKSimon
Copy link
Collaborator

RKSimon commented Sep 13, 2024

@tru This patch at the very least needs to make it for 19.x but I was hoping we'd get some of the tuning improvements in as well - should we wait for those PRs or just get this committed and cherry picked straight away?

@tru
Copy link
Collaborator

tru commented Sep 13, 2024

Hi,

This looks pretty safe, but feel free to think out loud about the risks of merging this. Technically it doesn't really fit the criteria of regression or important bugfix, I understand the needs of getting this into the release and I don't want to be a blocker for that.

Tuning improvements could be riskier if I understand it correctly?

@tru tru added this to the LLVM 19.X Release milestone Sep 13, 2024
@RKSimon
Copy link
Collaborator

RKSimon commented Sep 13, 2024

@ganeshgit Ignore what I said earlier about waiting for the tuning patches :) Please can we get this committed to trunk, we'll let it brew for a few days and then cherry pick for 19.x - if you can create PRs for the tuning changes as soon as possible we can review them for 19.x on a case by case basis.

@nikic
Copy link
Contributor

nikic commented Sep 13, 2024

Zen 5 support in GCC was upstreamed more than half a year ago -- why is the LLVM support being upstreamed only now, after missing the 19.x window? What steps are being taken to ensure this does not happen again?

The change to X86TargetParser.h looks ABI breaking to me.

@ganeshgit
Copy link
Contributor Author

@ganeshgit Ignore what I said earlier about waiting for the tuning patches :) Please can we get this committed to trunk, we'll let it brew for a few days and then cherry pick for 19.x - if you can create PRs for the tuning changes as soon as possible we can review them for 19.x on a case by case basis.

I will upload the rebased code addressing the format errors shortly. Yes PRs will work for tuning changes. I think it will take some time.

@ganeshgit
Copy link
Contributor Author

Zen 5 support in GCC was upstreamed more than half a year ago -- why is the LLVM support being upstreamed only now, after missing the 19.x window? What steps are being taken to ensure this does not happen again?

The change to X86TargetParser.h looks ABI breaking to me.

We have a dependency in libpfm for llvm which requires legal clearances. In future, we will make sure this gets addressed in advance and will try to upload our patches in sync with GCC patches. Apologies for the inconvenience.

@tru
Copy link
Collaborator

tru commented Sep 13, 2024

The change to X86TargetParser.h looks ABI breaking to me.

This seems unfortunate to me. But I don't think it would be good to insert the enum at the end of the list and changing the sorting order.

How big of a problem would it be with a ABI break now? I know you have requested that we try to avoid those, even if it's strictly within the policy to still do that before the final release.

@RKSimon
Copy link
Collaborator

RKSimon commented Sep 13, 2024

It would be messy, but could we not place the CK_ZNVER5 enum entry at the end of the enum list just for 19.x and then fix the sorting in trunk?

@AaronBallman
Copy link
Collaborator

It would be messy, but could we not place the CK_ZNVER5 enum entry at the end of the enum list just for 19.x and then fix the sorting in trunk?

Seems better than an ABI break this late in the cycle, but I don't have super strong feelings.

@tru
Copy link
Collaborator

tru commented Sep 13, 2024

It would be messy, but could we not place the CK_ZNVER5 enum entry at the end of the enum list just for 19.x and then fix the sorting in trunk?

I would be fine with that. WDYT @nikic ?

@ganeshgit
Copy link
Contributor Author

It would be messy, but could we not place the CK_ZNVER5 enum entry at the end of the enum list just for 19.x and then fix the sorting in trunk?

It would be messy, but could we not place the CK_ZNVER5 enum entry at the end of the enum list just for 19.x and then fix the sorting in trunk?

I would be fine with that. WDYT @nikic ?

I will do the change placing the enum at the end of the list after CK_Geode and submit for this branch instead of rebasing then.

@tru
Copy link
Collaborator

tru commented Sep 13, 2024

I think it's better it land as it should be in main and then you can create a new PR against the release/19.x branch with the different enum layout.

@RKSimon
Copy link
Collaborator

RKSimon commented Sep 13, 2024

@ganeshgit OK to commit?

@ganeshgit
Copy link
Contributor Author

@ganeshgit OK to commit?

Yes I am okay. Can you please commit and close this PR. I will submit another PR for 19.x release.

@RKSimon RKSimon merged commit 02e4186 into llvm:main Sep 13, 2024
5 of 7 checks passed
@ganeshgit ganeshgit deleted the znver5 branch September 13, 2024 16:47
@RKSimon RKSimon removed this from the LLVM 19.X Release milestone Sep 15, 2024
@tru
Copy link
Collaborator

tru commented Sep 16, 2024

@ganeshgit or @RKSimon can one of you put up a PR against release/19.x with the abi compatible changes so that we have chance to look at it and approve it before the release tomorrow.

@ganeshgit
Copy link
Contributor Author

@ganeshgit or @RKSimon can one of you put up a PR against release/19.x with the abi compatible changes so that we have chance to look at it and approve it before the release tomorrow.

I will submit the changes shortly. Probably I will create a branch with a base commit which is prior to this commit to main for easy integration.

@tru
Copy link
Collaborator

tru commented Sep 16, 2024

Create a branch from release/19.x in your own fork, then cherry-pick over the changes from main, edit them to match the things we talked about above (and fix any merge problems). Then submit a PR that wants to merge yourbranch into release/19.x. And check the checkbox maintainers can edit this branch and it will help my integration!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:X86 clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category compiler-rt:builtins compiler-rt llvm:transforms mc Machine (object) code
Projects
Status: Needs Backport PR
Development

Successfully merging this pull request may close these issues.

7 participants