Skip to content

[OpenMP][USM] Adds test for -fopenmp-force-usm flag #75467

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

jplehr
Copy link
Contributor

@jplehr jplehr commented Dec 14, 2023

This adds a basic test to check the correct generation of double indirect access to declare target globals in USM mode vs non-USM mode.
I am a bit unhappy with the way this test is set up, but could not find a better way to do it. Happy to improve that and add more tests then.

Marked as XFAIL to first land test and then enable in subsequent patch.

@jplehr jplehr added the openmp:libomptarget OpenMP offload runtime label Dec 14, 2023
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:openmp OpenMP related changes to Clang labels Dec 14, 2023
@llvmbot
Copy link
Member

llvmbot commented Dec 14, 2023

@llvm/pr-subscribers-clang

Author: Jan Patrick Lehr (jplehr)

Changes

This adds a basic test to check the correct generation of double indirect access to declare target globals in USM mode vs non-USM mode.
I am a bit unhappy with the way this test is set up, but could not find a better way to do it. Happy to improve that and add more tests then.

Marked as XFAIL to first land test and then enable in subsequent patch.


Full diff: https://github.com/llvm/llvm-project/pull/75467.diff

1 Files Affected:

  • (added) clang/test/OpenMP/force-usm.c (+73)
diff --git a/clang/test/OpenMP/force-usm.c b/clang/test/OpenMP/force-usm.c
new file mode 100644
index 00000000000000..222705322b8976
--- /dev/null
+++ b/clang/test/OpenMP/force-usm.c
@@ -0,0 +1,73 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --include-generated-funcs --replace-value-regex "__omp_offloading_[0-9a-z]+_[0-9a-z]+" "pl_cond[.].+[.|,]" --prefix-filecheck-ir-name _ --version 3
+// XFAIL: amdgpu-registered-target
+
+// RUN: %clang_cc1 -fopenmp -x c++ -std=c++11 -triple x86_64-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -include %S/../../lib/Headers/openmp_wrappers/usm/force_usm.h -emit-llvm-bc %s -o %t-ppc-host.bc
+// RUN: %clang_cc1 -fopenmp -x c++ -std=c++11 -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -include %S/../../lib/Headers/openmp_wrappers/usm/force_usm.h -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - | FileCheck -check-prefix=CHECK-USM %s
+
+// RUN: %clang_cc1 -fopenmp -x c++ -std=c++11 -triple x86_64-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-ppc-host.bc
+// RUN: %clang_cc1 -fopenmp -x c++ -std=c++11 -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - | FileCheck -check-prefix=CHECK-DEFAULT %s
+// expected-no-diagnostics
+
+extern "C" void *malloc(unsigned int b);
+
+int GI;
+#pragma omp declare target
+int *pGI;
+#pragma omp end declare target
+
+int main(void) {
+
+  GI = 0;
+
+  pGI = (int *) malloc(sizeof(int));
+  *pGI = 42;
+
+#pragma omp target map(pGI[:1], GI)
+  {
+    GI = 1;
+    *pGI = 2;
+  }
+
+  return 0;
+}
+
+// CHECK-USM-LABEL: define weak_odr protected amdgpu_kernel void @{{__omp_offloading_[0-9a-z]+_[0-9a-z]+}}_main_l25
+// CHECK-USM-SAME: (ptr noundef nonnull align 4 dereferenceable(4) [[GI:%.*]]) #[[ATTR0:[0-9]+]] {
+// CHECK-USM-NEXT:  entry:
+// CHECK-USM-NEXT:    [[GI_ADDR:%.*]] = alloca ptr, align 8, addrspace(5)
+// CHECK-USM-NEXT:    [[GI_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[GI_ADDR]] to ptr
+// CHECK-USM-NEXT:    store ptr [[GI]], ptr [[GI_ADDR_ASCAST]], align 8
+// CHECK-USM-NEXT:    [[TMP0:%.*]] = load ptr, ptr [[GI_ADDR_ASCAST]], align 8
+// CHECK-USM-NEXT:    [[TMP1:%.*]] = call i32 @__kmpc_target_init(ptr addrspacecast (ptr addrspace(1) @[[GLOB1:[0-9]+]] to ptr), i8 1, i1 true)
+// CHECK-USM-NEXT:    [[EXEC_USER_CODE:%.*]] = icmp eq i32 [[TMP1]], -1
+// CHECK-USM-NEXT:    br i1 [[EXEC_USER_CODE]], label [[USER_CODE_ENTRY:%.*]], label [[WORKER_EXIT:%.*]]
+// CHECK-USM:       user_code.entry:
+// CHECK-USM-NEXT:    store i32 1, ptr [[TMP0]], align 4
+// CHECK-USM-NEXT:    [[TMP2:%.*]] = load ptr, ptr @pGI_decl_tgt_ref_ptr, align 8
+// CHECK-USM-NEXT:    [[TMP3:%.*]] = load ptr, ptr [[TMP2]], align 8
+// CHECK-USM-NEXT:    store i32 2, ptr [[TMP3]], align 4
+// CHECK-USM-NEXT:    call void @__kmpc_target_deinit(ptr addrspacecast (ptr addrspace(1) @[[GLOB1]] to ptr), i8 1)
+// CHECK-USM-NEXT:    ret void
+// CHECK-USM:       worker.exit:
+// CHECK-USM-NEXT:    ret void
+//
+//
+// CHECK-DEFAULT-LABEL: define weak_odr protected amdgpu_kernel void @{{__omp_offloading_[0-9a-z]+_[0-9a-z]+}}_main_l25
+// CHECK-DEFAULT-SAME: (ptr noundef nonnull align 4 dereferenceable(4) [[GI:%.*]]) #[[ATTR0:[0-9]+]] {
+// CHECK-DEFAULT-NEXT:  entry:
+// CHECK-DEFAULT-NEXT:    [[GI_ADDR:%.*]] = alloca ptr, align 8, addrspace(5)
+// CHECK-DEFAULT-NEXT:    [[GI_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[GI_ADDR]] to ptr
+// CHECK-DEFAULT-NEXT:    store ptr [[GI]], ptr [[GI_ADDR_ASCAST]], align 8
+// CHECK-DEFAULT-NEXT:    [[TMP0:%.*]] = load ptr, ptr [[GI_ADDR_ASCAST]], align 8
+// CHECK-DEFAULT-NEXT:    [[TMP1:%.*]] = call i32 @__kmpc_target_init(ptr addrspacecast (ptr addrspace(1) @[[GLOB1:[0-9]+]] to ptr), i8 1, i1 true)
+// CHECK-DEFAULT-NEXT:    [[EXEC_USER_CODE:%.*]] = icmp eq i32 [[TMP1]], -1
+// CHECK-DEFAULT-NEXT:    br i1 [[EXEC_USER_CODE]], label [[USER_CODE_ENTRY:%.*]], label [[WORKER_EXIT:%.*]]
+// CHECK-DEFAULT:       user_code.entry:
+// CHECK-DEFAULT-NEXT:    store i32 1, ptr [[TMP0]], align 4
+// CHECK-DEFAULT-NEXT:    [[TMP2:%.*]] = load ptr, ptr addrspacecast (ptr addrspace(1) @pGI to ptr), align 8
+// CHECK-DEFAULT-NEXT:    store i32 2, ptr [[TMP2]], align 4
+// CHECK-DEFAULT-NEXT:    call void @__kmpc_target_deinit(ptr addrspacecast (ptr addrspace(1) @[[GLOB1]] to ptr), i8 1)
+// CHECK-DEFAULT-NEXT:    ret void
+// CHECK-DEFAULT:       worker.exit:
+// CHECK-DEFAULT-NEXT:    ret void
+//

This adds a basic test to check the correct generation of double
indirect access to declare target globals in USM mode vs non-USM mode.

Marked as XFAIL to first land test and then enable in subsequent patch.
@jplehr jplehr force-pushed the feat/force-usm-test branch from ea2a919 to d3d073d Compare December 14, 2023 14:07
@jplehr
Copy link
Contributor Author

jplehr commented Dec 29, 2023

I updated the feature PR (#75468) with a different solution. Will update the test after feedback if the route I took in the other PR is seen as OK.

@jdoerfert
Copy link
Member

How do the IR checks verify the flag works?

@jhuber6
Copy link
Contributor

jhuber6 commented Jan 4, 2024

Test should probably show that IR is equivalent to #pragma omp requires unified_shared_memory or however that's spelled. Basic documentation should be provided by the help text in the new flag, but we probably have somewhere in the OpenMP docs you could add it to if desired.

@jdoerfert
Copy link
Member

I mean, the test shows device IR for two functions, doesn't it? I am not aware that part is impacted by USM. Host globals/register functions are.

@jplehr
Copy link
Contributor Author

jplehr commented Jan 4, 2024

The IR is impacted for the global that is in that test case. Lines ~46-50 (first IR section) vs line ~68 (second IR section). The remaining code is indeed the same.
The way that this test is executed is out of date however, given that I have reimplemented the flag.
My plan is to keep this test and add a few runtime tests as well, since we should be able to validate that we do not see data transfers.

@jdoerfert
Copy link
Member

I see. FWIW "pGI_decl_tgt_ref_ptr" is really not a good name (not your fault).

@jplehr
Copy link
Contributor Author

jplehr commented Jan 18, 2024

Closing this. Test is now part of feature-PR.

@jplehr jplehr closed this Jan 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:openmp OpenMP related changes to Clang clang Clang issues not falling into any other category openmp:libomptarget OpenMP offload runtime
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants