Skip to content

[Clang][BPF] Add __BPF_CPU_VERSION__ macro #71856

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 10, 2023
Merged

Conversation

yonghong-song
Copy link
Contributor

@yonghong-song yonghong-song commented Nov 9, 2023

Sometimes bpf developer might want to develop different codes
based on particular cpu versioins. For example, cpu v1/v2/v3
branch target is 16bit while cpu v4 branch target is 32bit,
thus cpu v4 allows more aggressive loop unrolling than cpu v1/v2/v3
(see [1] for a kernel selftest failure due to this).
We would like to maintain aggressive loop unrolling for cpu v4
while limit loop unrolling for earlier cpu versions.
Another example, signed divide also only available with cpu v4.

Actually, adding cpu specific macros are fairly common
in llvm. For example, x86 has maco like 'i486', 'pentium_mmx', etc.
AArch64 has '__ARM_NEON', '__ARM_FEATURE_SVE', etc.

This patch added BPF_CPU_VERSION macro. Current possible values
are 0/1/2/3/4. The following are the -mcpu=... to BPF_CPU_VERSION
mapping:

       cpu                  __BPF_CPU_VERSION__
       no -mcpu=<...>       1
       -mcpu=v1             1
       -mcpu=v2             2
       -mcpu=v3             3
       -mcpu=v4             4
       -mcpu=generic        1
       -mcpu=probe          0

This patch also added some macros for developers to identify some cpu
insn features:

      feature macro               enabled in which cpu
      __BPF_FEATURE_JMP_EXT       >= v2
      __BPF_FEATURE_JMP32         >= v3
      __BPF_FEATURE_ALU32         >= v3
      __BPF_FEATURE_LDSX          >= v4
      __BPF_FEATURE_MOVSX         >= v4
      __BPF_FEATURE_BSWAP         >= v4
      __BPF_FEATURE_SDIV_SMOD     >= v4
      __BPF_FEATURE_GOTOL         >= v4
      __BPF_FEATURE_ST            >= v4
     [1] https://lore.kernel.org/bpf/[email protected]/

@yonghong-song yonghong-song requested a review from eddyz87 November 9, 2023 19:42
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" labels Nov 9, 2023
@llvmbot
Copy link
Member

llvmbot commented Nov 9, 2023

@llvm/pr-subscribers-clang

Author: None (yonghong-song)

Changes

Sometimes bpf developer might want to develop different codes based on particular cpu versioins. For example, cpu v1/v2/v3 branch target is 16bit while cpu v4 branch target is 32bit, thus cpu v4 allows more aggressive loop unrolling than cpu v1/v2/v3 (see [1] for a kernel selftest failure due to this). We would like to maintain aggressive loop unrolling for cpu v4 while limit loop unrolling for earlier cpu versions. Another example, signed divide also only available with cpu v4.

Actually, adding cpu specific macros are fairly common in llvm. For example, x86 has maco like 'i486', 'pentium_mmx', etc. AArch64 has '__ARM_NEON', '__ARM_FEATURE_SVE', etc.

This patch added bpf_cpu_version macro. Current possible values are 0/1/2/3/4. The following are the -mcpu=... to bpf_cpu_version mapping:
cpu bpf_cpu_version
no -mcpu=<...> 1
-mcpu=v1 1
-mcpu=v2 2
-mcpu=v3 3
-mcpu=v4 4
-mcpu=generic 1
-mcpu=probe 0

[1] https://lore.kernel.org/bpf/3e3a8a30-dde0-43a1-981e-2274962780ef@linux.dev/


Full diff: https://github.com/llvm/llvm-project/pull/71856.diff

2 Files Affected:

  • (modified) clang/lib/Basic/Targets/BPF.cpp (+8)
  • (modified) clang/test/Preprocessor/bpf-predefined-macros.c (+35-2)
diff --git a/clang/lib/Basic/Targets/BPF.cpp b/clang/lib/Basic/Targets/BPF.cpp
index d6288d2e0d0e176..a61e279f395ea31 100644
--- a/clang/lib/Basic/Targets/BPF.cpp
+++ b/clang/lib/Basic/Targets/BPF.cpp
@@ -29,6 +29,14 @@ void BPFTargetInfo::getTargetDefines(const LangOptions &Opts,
                                      MacroBuilder &Builder) const {
   Builder.defineMacro("__bpf__");
   Builder.defineMacro("__BPF__");
+
+  std::string CPU = getTargetOpts().CPU;
+  if (CPU == "probe")
+    Builder.defineMacro("__bpf_cpu_version__", "0");
+  else if (CPU.empty() || CPU == "generic")
+    Builder.defineMacro("__bpf_cpu_version__", "1");
+  else
+    Builder.defineMacro("__bpf_cpu_version__", CPU.substr(1));
 }
 
 static constexpr llvm::StringLiteral ValidCPUNames[] = {"generic", "v1", "v2",
diff --git a/clang/test/Preprocessor/bpf-predefined-macros.c b/clang/test/Preprocessor/bpf-predefined-macros.c
index bcb985f95426622..364f4cc3f4d0c3c 100644
--- a/clang/test/Preprocessor/bpf-predefined-macros.c
+++ b/clang/test/Preprocessor/bpf-predefined-macros.c
@@ -1,5 +1,11 @@
-// RUN: %clang -E -target bpfel -x c -o - %s | FileCheck %s
-// RUN: %clang -E -target bpfeb -x c -o - %s | FileCheck %s
+// RUN: %clang -E -target bpfel -x c -o - %s | FileCheck -check-prefix=CHECK -check-prefix=CPU_NO %s
+// RUN: %clang -E -target bpfeb -x c -o - %s | FileCheck -check-prefix=CHECK -check-prefix=CPU_NO %s
+// RUN: %clang -E -target bpfel -mcpu=v1 -x c -o - %s | FileCheck -check-prefix=CHECK -check-prefix=CPU_V1 %s
+// RUN: %clang -E -target bpfel -mcpu=v2 -x c -o - %s | FileCheck -check-prefix=CHECK -check-prefix=CPU_V2 %s
+// RUN: %clang -E -target bpfel -mcpu=v3 -x c -o - %s | FileCheck -check-prefix=CHECK -check-prefix=CPU_V3 %s
+// RUN: %clang -E -target bpfel -mcpu=v4 -x c -o - %s | FileCheck -check-prefix=CHECK -check-prefix=CPU_V4 %s
+// RUN: %clang -E -target bpfel -mcpu=generic -x c -o - %s | FileCheck -check-prefix=CHECK -check-prefix=CPU_GENERIC %s
+// RUN: %clang -E -target bpfel -mcpu=probe -x c -o - %s | FileCheck -check-prefix=CHECK -check-prefix=CPU_PROBE %s
 
 #ifdef __bpf__
 int b;
@@ -10,7 +16,34 @@ int c;
 #ifdef bpf
 int d;
 #endif
+#ifdef __bpf_cpu_version__
+int e;
+#endif
+#if __bpf_cpu_version__ == 0
+int f;
+#endif
+#if __bpf_cpu_version__ == 1
+int g;
+#endif
+#if __bpf_cpu_version__ == 2
+int h;
+#endif
+#if __bpf_cpu_version__ == 3
+int i;
+#endif
+#if __bpf_cpu_version__ == 4
+int j;
+#endif
 
 // CHECK: int b;
 // CHECK: int c;
 // CHECK-NOT: int d;
+// CHECK: int e;
+
+// CPU_NO: int g;
+// CPU_V1: int g;
+// CPU_V2: int h;
+// CPU_V3: int i;
+// CPU_V4: int j;
+// CPU_GENERIC: int g;
+// CPU_PROBE: int f;

@yonghong-song
Copy link
Contributor Author

cc @4ast @anakryiko

Copy link
Contributor

@eddyz87 eddyz87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Yonghong,

Thank you for working on this.

else if (CPU.empty() || CPU == "generic")
Builder.defineMacro("__bpf_cpu_version__", "1");
else
Builder.defineMacro("__bpf_cpu_version__", CPU.substr(1));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That works and somewhat similar to other archs. Like amdgcn_processor.
I have slight preference to use capitol case BPF_CPU_VERSION.
Let's add all groups too:
HasJmpExt = true;
HasJmp32 = true;
HasAlu32 = true;
HasJmpExt = true;
HasJmp32 = true;
HasAlu32 = true;
HasLdsx = !Disable_ldsx;
HasMovsx = !Disable_movsx;
HasBswap = !Disable_bswap;
HasSdivSmod = !Disable_sdiv_smod;
HasGotol = !Disable_gotol;

arm does __ARM_FEATURE_xx.
We can do __BPF_FEATURE_ALU32, __BPF_FEATURE_GOTOL.

Copy link
Contributor Author

@yonghong-song yonghong-song Nov 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some architectures prefer small case (x86, pcc, loongarch, riscv) while some other architectures (arm, etc.) prefer upper case. Yes, we can use capitol cases for all the proposed macros.
For BPF_CPU_VERSION, I would like __BPF_CPU_VERSION. Basically adding '__' as prefix. What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I don't particularly like __BPF_CPU_VERSION either and actually prefer double underscore before and after BPF_CPU_VERSION. Now, I realize that you actually mean with double underscore before and after BPF_CPU_VERSION. github just displays as highlight.

Sometimes bpf developer might want to develop different codes
based on particular cpu versioins. For example, cpu v1/v2/v3
branch target is 16bit while cpu v4 branch target is 32bit,
thus cpu v4 allows more aggressive loop unrolling than cpu v1/v2/v3
(see [1] for a kernel selftest failure due to this).
We would like to maintain aggressive loop unrolling for cpu v4
while limit loop unrolling for earlier cpu versions.
Another example, signed divide also only available with cpu v4.

Actually, adding cpu specific macros are fairly common
in llvm. For example, x86 has maco like 'i486', '__pentium_mmx__', etc.
AArch64 has '__ARM_NEON', '__ARM_FEATURE_SVE', etc.

This patch added __BPF_CPU_VERSION__ macro. Current possible values
are 0/1/2/3/4. The following are the -mcpu=... to __BPF_CPU_VERSION__
mapping:
   cpu                  __BPF_CPU_VERSION__
   no -mcpu=<...>       1
   -mcpu=v1             1
   -mcpu=v2             2
   -mcpu=v3             3
   -mcpu=v4             4
   -mcpu=generic        1
   -mcpu=probe          0

This patch also added some macros for developers to identify some cpu
insn features:
  feature macro               enabled in which cpu
  __BPF_FEATURE_JMP_EXT       >= v2
  __BPF_FEATURE_JMP32         >= v3
  __BPF_FEATURE_ALU32         >= v3
  __BPF_FEATURE_LDSX          >= v4
  __BPF_FEATURE_MOVSX         >= v4
  __BPF_FEATURE_BSWAP         >= v4
  __BPF_FEATURE_SDIV_SMOD     >= v4
  __BPF_FEATURE_GOTOL         >= v4
  __BPF_FEATURE_ST            >= v4

     [1] https://lore.kernel.org/bpf/[email protected]/
@yonghong-song yonghong-song changed the title [Clang][BPF] Add __bpf_cpu_version__ macro [Clang][BPF] Add __BPF_CPU_VERSION__ macro Nov 10, 2023
@yonghong-song yonghong-song merged commit 4e67234 into llvm:main Nov 10, 2023
zahiraam pushed a commit to zahiraam/llvm-project that referenced this pull request Nov 20, 2023
Sometimes bpf developer might want to develop different codes
based on particular cpu versioins. For example, cpu v1/v2/v3
branch target is 16bit while cpu v4 branch target is 32bit,
thus cpu v4 allows more aggressive loop unrolling than cpu v1/v2/v3
(see [1] for a kernel selftest failure due to this).
We would like to maintain aggressive loop unrolling for cpu v4
while limit loop unrolling for earlier cpu versions.
Another example, signed divide also only available with cpu v4.

Actually, adding cpu specific macros are fairly common
in llvm. For example, x86 has maco like 'i486', '__pentium_mmx__', etc.
AArch64 has '__ARM_NEON', '__ARM_FEATURE_SVE', etc.

This patch added __BPF_CPU_VERSION__ macro. Current possible values
are 0/1/2/3/4. The following are the -mcpu=... to __BPF_CPU_VERSION__
mapping:
```
       cpu                  __BPF_CPU_VERSION__
       no -mcpu=<...>       1
       -mcpu=v1             1
       -mcpu=v2             2
       -mcpu=v3             3
       -mcpu=v4             4
       -mcpu=generic        1
       -mcpu=probe          0
```
    
This patch also added some macros for developers to identify some cpu
insn features:
```
      feature macro               enabled in which cpu
      __BPF_FEATURE_JMP_EXT       >= v2
      __BPF_FEATURE_JMP32         >= v3
      __BPF_FEATURE_ALU32         >= v3
      __BPF_FEATURE_LDSX          >= v4
      __BPF_FEATURE_MOVSX         >= v4
      __BPF_FEATURE_BSWAP         >= v4
      __BPF_FEATURE_SDIV_SMOD     >= v4
      __BPF_FEATURE_GOTOL         >= v4
      __BPF_FEATURE_ST            >= v4
```    
[1]
https://lore.kernel.org/bpf/[email protected]/
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:BPF clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants