Skip to content

Codegen changes for strict modifier with grainsize/num_tasks of taskloop construct #117196

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Nov 28, 2024

Conversation

chandraghale
Copy link
Contributor

@chandraghale chandraghale commented Nov 21, 2024

Initial parsing/sema for 'strict' modifier with 'num_tasks' and ‘grainsize’ clause is present in these commits grainsize_parsing and num_tasks_parsing . However, this implementation appears incomplete as it lacks code generation support. A runtime patch was introduced in this runtime commit runtime_patch , which adds a new API, _kmpc_taskloop_5, to accommodate the strict modifier. 
In this patch I have added codegen support. When the strict modifier is present alongside the grainsize or num_tasks clauses of taskloop construct, the code now emits a call to _kmpc_taskloop_5, which includes an additional parameter of type i32 with the value 1 to indicate the strict modifier. If the strict modifier is not present, it falls back to the existing _kmpc_taskloop API call.

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:codegen IR generation bugs: mangling, exceptions, etc. flang:openmp clang:openmp OpenMP related changes to Clang labels Nov 21, 2024
@llvmbot
Copy link
Member

llvmbot commented Nov 21, 2024

@llvm/pr-subscribers-flang-openmp

@llvm/pr-subscribers-clang

Author: CHANDRA GHALE (chandraghale)

Changes

Initial parsing/sema for 'strict' modifier with 'num_tasks' and ‘grainsize’ clause is present in these commits grainsize_parsing and num_tasks_parsing . However, this implementation appears incomplete as it lacks code generation support. A runtime patch was introduced in this runtime commit runtime_patch , which adds a new API, _kmpc_taskloop_5, to accommodate the strict modifier. 
In this patch I have added codegen support. When the strict modifier is present alongside the grainsize or num_tasks clauses, the code emits a call to _kmpc_taskloop_5, which includes an additional parameter of type i32 with the value 1 to indicate the strict modifier. If the strict modifier is not present, it falls back to the existing _kmpc_taskloop API call.


Full diff: https://github.com/llvm/llvm-project/pull/117196.diff

5 Files Affected:

  • (modified) clang/lib/CodeGen/CGOpenMPRuntime.cpp (+28)
  • (modified) clang/lib/CodeGen/CGOpenMPRuntime.h (+1)
  • (modified) clang/lib/CodeGen/CGStmtOpenMP.cpp (+2)
  • (added) clang/test/OpenMP/taskloop_strictmodifier_codegen.cpp (+256)
  • (modified) llvm/include/llvm/Frontend/OpenMP/OMPKinds.def (+3)
diff --git a/clang/lib/CodeGen/CGOpenMPRuntime.cpp b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
index cc389974e04081..361550d2f102b4 100644
--- a/clang/lib/CodeGen/CGOpenMPRuntime.cpp
+++ b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
@@ -4666,6 +4666,33 @@ void CGOpenMPRuntime::emitTaskLoopCall(CodeGenFunction &CGF, SourceLocation Loc,
                                CGF.getContext().VoidPtrTy);
   }
   enum { NoSchedule = 0, Grainsize = 1, NumTasks = 2 };
+  if( Data.HasModifier ){
+    llvm::Value *TaskArgs[] = {
+      UpLoc,
+      ThreadID,
+      Result.NewTask,
+      IfVal,
+      LBLVal.getPointer(CGF),
+      UBLVal.getPointer(CGF),
+      CGF.EmitLoadOfScalar(StLVal, Loc),
+      llvm::ConstantInt::getSigned(
+          CGF.IntTy, 1), // Always 1 because taskgroup emitted by the compiler
+      llvm::ConstantInt::getSigned(
+          CGF.IntTy, Data.Schedule.getPointer()
+                         ? Data.Schedule.getInt() ? NumTasks : Grainsize
+                         : NoSchedule),
+      Data.Schedule.getPointer()
+          ? CGF.Builder.CreateIntCast(Data.Schedule.getPointer(), CGF.Int64Ty,
+                                      /*isSigned=*/false)
+          : llvm::ConstantInt::get(CGF.Int64Ty, /*V=*/0),
+      llvm::ConstantInt::get(CGF.Int32Ty, 1), //strict modifier enabled
+      Result.TaskDupFn ? CGF.Builder.CreatePointerBitCastOrAddrSpaceCast(
+                             Result.TaskDupFn, CGF.VoidPtrTy)
+                       : llvm::ConstantPointerNull::get(CGF.VoidPtrTy)};
+  CGF.EmitRuntimeCall(OMPBuilder.getOrCreateRuntimeFunction(
+                          CGM.getModule(), OMPRTL___kmpc_taskloop_5),
+                      TaskArgs);
+   } else {
   llvm::Value *TaskArgs[] = {
       UpLoc,
       ThreadID,
@@ -4690,6 +4717,7 @@ void CGOpenMPRuntime::emitTaskLoopCall(CodeGenFunction &CGF, SourceLocation Loc,
   CGF.EmitRuntimeCall(OMPBuilder.getOrCreateRuntimeFunction(
                           CGM.getModule(), OMPRTL___kmpc_taskloop),
                       TaskArgs);
+  }
 }
 
 /// Emit reduction operation for each element of array (required for
diff --git a/clang/lib/CodeGen/CGOpenMPRuntime.h b/clang/lib/CodeGen/CGOpenMPRuntime.h
index 5e7715743afb58..56d502d92806eb 100644
--- a/clang/lib/CodeGen/CGOpenMPRuntime.h
+++ b/clang/lib/CodeGen/CGOpenMPRuntime.h
@@ -122,6 +122,7 @@ struct OMPTaskDataTy final {
   bool IsReductionWithTaskMod = false;
   bool IsWorksharingReduction = false;
   bool HasNowaitClause = false;
+  bool HasModifier = false;
 };
 
 /// Class intended to support codegen of all kind of the reduction clauses.
diff --git a/clang/lib/CodeGen/CGStmtOpenMP.cpp b/clang/lib/CodeGen/CGStmtOpenMP.cpp
index 390516fea38498..88c862d2975174 100644
--- a/clang/lib/CodeGen/CGStmtOpenMP.cpp
+++ b/clang/lib/CodeGen/CGStmtOpenMP.cpp
@@ -7831,10 +7831,12 @@ void CodeGenFunction::EmitOMPTaskLoopBasedDirective(const OMPLoopDirective &S) {
     // grainsize clause
     Data.Schedule.setInt(/*IntVal=*/false);
     Data.Schedule.setPointer(EmitScalarExpr(Clause->getGrainsize()));
+    Data.HasModifier = (Clause->getModifier() == OMPC_GRAINSIZE_strict) ? true : false;
   } else if (const auto *Clause = S.getSingleClause<OMPNumTasksClause>()) {
     // num_tasks clause
     Data.Schedule.setInt(/*IntVal=*/true);
     Data.Schedule.setPointer(EmitScalarExpr(Clause->getNumTasks()));
+    Data.HasModifier = (Clause->getModifier() == OMPC_NUMTASKS_strict) ? true : false;
   }
 
   auto &&BodyGen = [CS, &S](CodeGenFunction &CGF, PrePostActionTy &) {
diff --git a/clang/test/OpenMP/taskloop_strictmodifier_codegen.cpp b/clang/test/OpenMP/taskloop_strictmodifier_codegen.cpp
new file mode 100644
index 00000000000000..d84ff181f66156
--- /dev/null
+++ b/clang/test/OpenMP/taskloop_strictmodifier_codegen.cpp
@@ -0,0 +1,256 @@
+// RUN: %clang_cc1 -verify -triple x86_64-apple-darwin10 -fopenmp -x c++ -emit-llvm %s -o - | FileCheck %s
+// RUN: %clang_cc1 -fopenmp -x c++ -triple x86_64-apple-darwin10 -emit-pch -o %t %s
+// RUN: %clang_cc1 -fopenmp -x c++ -triple x86_64-apple-darwin10 -include-pch %t -verify %s -emit-llvm -o - | FileCheck %s
+
+// RUN: %clang_cc1 -verify -triple x86_64-apple-darwin10 -fopenmp-simd -x c++ -emit-llvm %s -o - | FileCheck --check-prefix SIMD-ONLY0 %s
+// RUN: %clang_cc1 -fopenmp-simd -x c++ -triple x86_64-apple-darwin10 -emit-pch -o %t %s
+// RUN: %clang_cc1 -fopenmp-simd -x c++ -triple x86_64-apple-darwin10 -include-pch %t -verify %s -emit-llvm -o - | FileCheck --check-prefix SIMD-ONLY0 %s
+// SIMD-ONLY0-NOT: {{__kmpc|__tgt}}
+// expected-no-diagnostics
+#ifndef HEADER
+#define HEADER
+
+// CHECK-LABEL: @main
+int main(int argc, char **argv) {
+// CHECK: [[GTID:%.+]] = call i32 @__kmpc_global_thread_num(ptr [[DEFLOC:@.+]])
+// CHECK: call ptr @__kmpc_omp_task_alloc(ptr [[DEFLOC]], i32 [[GTID]],
+// CHECK: call i32 @__kmpc_omp_task(ptr [[DEFLOC]], i32 [[GTID]],
+#pragma omp task
+  ;
+// CHECK:       [[RES:%.+]] = call {{.*}}i32 @__kmpc_master(ptr [[DEFLOC]], i32 [[GTID]])
+// CHECK-NEXT:  [[IS_MASTER:%.+]] = icmp ne i32 [[RES]], 0
+// CHECK-NEXT:  br i1 [[IS_MASTER]], label {{%?}}[[THEN:.+]], label {{%?}}[[EXIT:.+]]
+// CHECK:       [[THEN]]
+// CHECK: call void @__kmpc_taskgroup(ptr [[DEFLOC]], i32 [[GTID]])
+// CHECK: [[TASKV:%.+]] = call ptr @__kmpc_omp_task_alloc(ptr [[DEFLOC]], i32 [[GTID]], i32 33, i64 80, i64 1, ptr [[TASK1:@.+]])
+// CHECK: [[TASK_DATA:%.+]] = getelementptr inbounds nuw %{{.+}}, ptr [[TASKV]], i32 0, i32 0
+// CHECK: [[DOWN:%.+]] = getelementptr inbounds nuw [[TD_TY:%.+]], ptr [[TASK_DATA]], i32 0, i32 5
+// CHECK: store i64 0, ptr [[DOWN]],
+// CHECK: [[UP:%.+]] = getelementptr inbounds nuw [[TD_TY]], ptr [[TASK_DATA]], i32 0, i32 6
+// CHECK: store i64 9, ptr [[UP]],
+// CHECK: [[ST:%.+]] = getelementptr inbounds nuw [[TD_TY]], ptr [[TASK_DATA]], i32 0, i32 7
+// CHECK: store i64 1, ptr [[ST]],
+// CHECK: [[ST_VAL:%.+]] = load i64, ptr [[ST]],
+// CHECK: call void @__kmpc_taskloop(ptr [[DEFLOC]], i32 [[GTID]], ptr [[TASKV]], i32 1, ptr [[DOWN]], ptr [[UP]], i64 [[ST_VAL]], i32 1, i32 0, i64 0, ptr null)
+// CHECK: call void @__kmpc_end_taskgroup(ptr [[DEFLOC]], i32 [[GTID]])
+// CHECK-NEXT:  call {{.*}}void @__kmpc_end_master(ptr [[DEFLOC]], i32 [[GTID]])
+// CHECK-NEXT:  br label {{%?}}[[EXIT]]
+// CHECK:       [[EXIT]]
+#pragma omp master taskloop priority(argc)
+  for (int i = 0; i < 10; ++i)
+    ;
+// CHECK:       [[RES:%.+]] = call {{.*}}i32 @__kmpc_master(ptr [[DEFLOC]], i32 [[GTID]])
+// CHECK-NEXT:  [[IS_MASTER:%.+]] = icmp ne i32 [[RES]], 0
+// CHECK-NEXT:  br i1 [[IS_MASTER]], label {{%?}}[[THEN:.+]], label {{%?}}[[EXIT:.+]]
+// CHECK:       [[THEN]]
+// CHECK: [[TASKV:%.+]] = call ptr @__kmpc_omp_task_alloc(ptr [[DEFLOC]], i32 [[GTID]], i32 1, i64 80, i64 1, ptr [[TASK2:@.+]])
+// CHECK: [[TASK_DATA:%.+]] = getelementptr inbounds nuw %{{.+}}, ptr [[TASKV]], i32 0, i32 0
+// CHECK: [[DOWN:%.+]] = getelementptr inbounds nuw [[TD_TY:%.+]], ptr [[TASK_DATA]], i32 0, i32 5
+// CHECK: store i64 0, ptr [[DOWN]],
+// CHECK: [[UP:%.+]] = getelementptr inbounds nuw [[TD_TY]], ptr [[TASK_DATA]], i32 0, i32 6
+// CHECK: store i64 9, ptr [[UP]],
+// CHECK: [[ST:%.+]] = getelementptr inbounds nuw [[TD_TY]], ptr [[TASK_DATA]], i32 0, i32 7
+// CHECK: store i64 1, ptr [[ST]],
+// CHECK: [[ST_VAL:%.+]] = load i64, ptr [[ST]],
+// CHECK: [[GRAINSIZE:%.+]] = zext i32 %{{.+}} to i64
+// CHECK: call void @__kmpc_taskloop_5(ptr [[DEFLOC]], i32 [[GTID]], ptr [[TASKV]], i32 1, ptr [[DOWN]], ptr [[UP]], i64 [[ST_VAL]], i32 1, i32 1, i64 [[GRAINSIZE]], i32 1, ptr null)
+// CHECK-NEXT:  call {{.*}}void @__kmpc_end_master(ptr [[DEFLOC]], i32 [[GTID]])
+// CHECK-NEXT:  br label {{%?}}[[EXIT]]
+// CHECK:       [[EXIT]]
+#pragma omp master taskloop nogroup grainsize(strict:argc)
+  for (int i = 0; i < 10; ++i)
+    ;
+// CHECK:       [[RES:%.+]] = call {{.*}}i32 @__kmpc_master(ptr [[DEFLOC]], i32 [[GTID]])
+// CHECK-NEXT:  [[IS_MASTER:%.+]] = icmp ne i32 [[RES]], 0
+// CHECK-NEXT:  br i1 [[IS_MASTER]], label {{%?}}[[THEN:.+]], label {{%?}}[[EXIT:.+]]
+// CHECK:       [[THEN]]
+// CHECK: call void @__kmpc_taskgroup(ptr [[DEFLOC]], i32 [[GTID]])
+// CHECK: [[TASKV:%.+]] = call ptr @__kmpc_omp_task_alloc(ptr [[DEFLOC]], i32 [[GTID]], i32 1, i64 80, i64 16, ptr [[TASK3:@.+]])
+// CHECK: [[TASK_DATA:%.+]] = getelementptr inbounds nuw %{{.+}}, ptr [[TASKV]], i32 0, i32 0
+// CHECK: [[IF:%.+]] = icmp ne i32 %{{.+}}, 0
+// CHECK: [[IF_INT:%.+]] = sext i1 [[IF]] to i32
+// CHECK: [[DOWN:%.+]] = getelementptr inbounds nuw [[TD_TY:%.+]], ptr [[TASK_DATA]], i32 0, i32 5
+// CHECK: store i64 0, ptr [[DOWN]],
+// CHECK: [[UP:%.+]] = getelementptr inbounds nuw [[TD_TY]], ptr [[TASK_DATA]], i32 0, i32 6
+// CHECK: store i64 %{{.+}}, ptr [[UP]],
+// CHECK: [[ST:%.+]] = getelementptr inbounds nuw [[TD_TY]], ptr [[TASK_DATA]], i32 0, i32 7
+// CHECK: store i64 1, ptr [[ST]],
+// CHECK: [[ST_VAL:%.+]] = load i64, ptr [[ST]],
+// CHECK: call void @__kmpc_taskloop_5(ptr [[DEFLOC]], i32 [[GTID]], ptr [[TASKV]], i32 [[IF_INT]], ptr [[DOWN]], ptr [[UP]], i64 [[ST_VAL]], i32 1, i32 2, i64 4, i32 1, ptr null)
+// CHECK: call void @__kmpc_end_taskgroup(ptr [[DEFLOC]], i32 [[GTID]])
+// CHECK-NEXT:  call {{.*}}void @__kmpc_end_master(ptr [[DEFLOC]], i32 [[GTID]])
+// CHECK-NEXT:  br label {{%?}}[[EXIT]]
+// CHECK:       [[EXIT]]
+  int i;
+#pragma omp master taskloop if(argc) shared(argc, argv) collapse(2) num_tasks(strict: 4)
+  for (i = 0; i < argc; ++i)
+  for (int j = argc; j < argv[argc][argc]; ++j)
+    ;
+// CHECK:       [[RES:%.+]] = call {{.*}}i32 @__kmpc_master(ptr [[DEFLOC]], i32 [[GTID]])
+// CHECK-NEXT:  [[IS_MASTER:%.+]] = icmp ne i32 [[RES]], 0
+// CHECK-NEXT:  br i1 [[IS_MASTER]], label {{%?}}[[THEN:.+]], label {{%?}}[[EXIT:.+]]
+// CHECK:       [[THEN]]
+// CHECK: call void @__kmpc_taskgroup(
+// CHECK: call ptr @__kmpc_omp_task_alloc(ptr @{{.+}}, i32 %{{.+}}, i32 1, i64 80, i64 1, ptr [[TASK_CANCEL:@.+]])
+// CHECK: call void @__kmpc_taskloop(
+// CHECK: call void @__kmpc_end_taskgroup(
+// CHECK-NEXT:  call {{.*}}void @__kmpc_end_master(ptr [[DEFLOC]], i32 [[GTID]])
+// CHECK-NEXT:  br label {{%?}}[[EXIT]]
+// CHECK:       [[EXIT]]
+#pragma omp master taskloop
+  for (int i = 0; i < 10; ++i) {
+#pragma omp cancel taskgroup
+#pragma omp cancellation point taskgroup
+  }
+}
+
+// CHECK: define internal noundef i32 [[TASK1]](
+// CHECK: [[DOWN:%.+]] = getelementptr inbounds nuw [[TD_TY:%.+]], ptr %{{.+}}, i32 0, i32 5
+// CHECK: [[DOWN_VAL:%.+]] = load i64, ptr [[DOWN]],
+// CHECK: [[UP:%.+]] = getelementptr inbounds nuw [[TD_TY]], ptr %{{.+}}, i32 0, i32 6
+// CHECK: [[UP_VAL:%.+]] = load i64, ptr [[UP]],
+// CHECK: [[ST:%.+]] = getelementptr inbounds nuw [[TD_TY]], ptr %{{.+}}, i32 0, i32 7
+// CHECK: [[ST_VAL:%.+]] = load i64, ptr [[ST]],
+// CHECK: [[LITER:%.+]] = getelementptr inbounds nuw [[TD_TY]], ptr %{{.+}}, i32 0, i32 8
+// CHECK: [[LITER_VAL:%.+]] = load i32, ptr [[LITER]],
+// CHECK: store i64 [[DOWN_VAL]], ptr [[LB:%[^,]+]],
+// CHECK: store i64 [[UP_VAL]], ptr [[UB:%[^,]+]],
+// CHECK: store i64 [[ST_VAL]], ptr [[ST:%[^,]+]],
+// CHECK: store i32 [[LITER_VAL]], ptr [[LITER:%[^,]+]],
+// CHECK: [[LB_VAL:%.+]] = load i64, ptr [[LB]],
+// CHECK: [[LB_I32:%.+]] = trunc i64 [[LB_VAL]] to i32
+// CHECK: store i32 [[LB_I32]], ptr [[CNT:%.+]],
+// CHECK: br label
+// CHECK: [[VAL:%.+]] = load i32, ptr [[CNT]],
+// CHECK: [[VAL_I64:%.+]] = sext i32 [[VAL]] to i64
+// CHECK: [[UB_VAL:%.+]] = load i64, ptr [[UB]],
+// CHECK: [[CMP:%.+]] = icmp ule i64 [[VAL_I64]], [[UB_VAL]]
+// CHECK: br i1 [[CMP]], label %{{.+}}, label %{{.+}}
+// CHECK: load i32, ptr %
+// CHECK: store i32 %
+// CHECK: load i32, ptr %
+// CHECK: add nsw i32 %{{.+}}, 1
+// CHECK: store i32 %{{.+}}, ptr %
+// CHECK: br label %
+// CHECK: ret i32 0
+
+// CHECK: define internal noundef i32 [[TASK2]](
+// CHECK: [[DOWN:%.+]] = getelementptr inbounds nuw [[TD_TY:%.+]], ptr %{{.+}}, i32 0, i32 5
+// CHECK: [[DOWN_VAL:%.+]] = load i64, ptr [[DOWN]],
+// CHECK: [[UP:%.+]] = getelementptr inbounds nuw [[TD_TY]], ptr %{{.+}}, i32 0, i32 6
+// CHECK: [[UP_VAL:%.+]] = load i64, ptr [[UP]],
+// CHECK: [[ST:%.+]] = getelementptr inbounds nuw [[TD_TY]], ptr %{{.+}}, i32 0, i32 7
+// CHECK: [[ST_VAL:%.+]] = load i64, ptr [[ST]],
+// CHECK: [[LITER:%.+]] = getelementptr inbounds nuw [[TD_TY]], ptr %{{.+}}, i32 0, i32 8
+// CHECK: [[LITER_VAL:%.+]] = load i32, ptr [[LITER]],
+// CHECK: store i64 [[DOWN_VAL]], ptr [[LB:%[^,]+]],
+// CHECK: store i64 [[UP_VAL]], ptr [[UB:%[^,]+]],
+// CHECK: store i64 [[ST_VAL]], ptr [[ST:%[^,]+]],
+// CHECK: store i32 [[LITER_VAL]], ptr [[LITER:%[^,]+]],
+// CHECK: [[LB_VAL:%.+]] = load i64, ptr [[LB]],
+// CHECK: [[LB_I32:%.+]] = trunc i64 [[LB_VAL]] to i32
+// CHECK: store i32 [[LB_I32]], ptr [[CNT:%.+]],
+// CHECK: br label
+// CHECK: [[VAL:%.+]] = load i32, ptr [[CNT]],
+// CHECK: [[VAL_I64:%.+]] = sext i32 [[VAL]] to i64
+// CHECK: [[UB_VAL:%.+]] = load i64, ptr [[UB]],
+// CHECK: [[CMP:%.+]] = icmp ule i64 [[VAL_I64]], [[UB_VAL]]
+// CHECK: br i1 [[CMP]], label %{{.+}}, label %{{.+}}
+// CHECK: load i32, ptr %
+// CHECK: store i32 %
+// CHECK: load i32, ptr %
+// CHECK: add nsw i32 %{{.+}}, 1
+// CHECK: store i32 %{{.+}}, ptr %
+// CHECK: br label %
+// CHECK: ret i32 0
+
+// CHECK: define internal noundef i32 [[TASK3]](
+// CHECK: [[DOWN:%.+]] = getelementptr inbounds nuw [[TD_TY:%.+]], ptr %{{.+}}, i32 0, i32 5
+// CHECK: [[DOWN_VAL:%.+]] = load i64, ptr [[DOWN]],
+// CHECK: [[UP:%.+]] = getelementptr inbounds nuw [[TD_TY]], ptr %{{.+}}, i32 0, i32 6
+// CHECK: [[UP_VAL:%.+]] = load i64, ptr [[UP]],
+// CHECK: [[ST:%.+]] = getelementptr inbounds nuw [[TD_TY]], ptr %{{.+}}, i32 0, i32 7
+// CHECK: [[ST_VAL:%.+]] = load i64, ptr [[ST]],
+// CHECK: [[LITER:%.+]] = getelementptr inbounds nuw [[TD_TY]], ptr %{{.+}}, i32 0, i32 8
+// CHECK: [[LITER_VAL:%.+]] = load i32, ptr [[LITER]],
+// CHECK: store i64 [[DOWN_VAL]], ptr [[LB:%[^,]+]],
+// CHECK: store i64 [[UP_VAL]], ptr [[UB:%[^,]+]],
+// CHECK: store i64 [[ST_VAL]], ptr [[ST:%[^,]+]],
+// CHECK: store i32 [[LITER_VAL]], ptr [[LITER:%[^,]+]],
+// CHECK: [[LB_VAL:%.+]] = load i64, ptr [[LB]],
+// CHECK: store i64 [[LB_VAL]], ptr [[CNT:%.+]],
+// CHECK: br label
+// CHECK: ret i32 0
+
+// CHECK: define internal noundef i32 [[TASK_CANCEL]](
+// CHECK: [[RES:%.+]] = call i32 @__kmpc_cancel(ptr @{{.+}}, i32 %{{.+}}, i32 4)
+// CHECK: [[IS_CANCEL:%.+]] = icmp ne i32 [[RES]], 0
+// CHECK: br i1 [[IS_CANCEL]], label %[[EXIT:.+]], label %[[CONTINUE:[^,]+]]
+// CHECK: [[EXIT]]:
+// CHECK: store i32 1, ptr [[CLEANUP_SLOT:%.+]],
+// CHECK: br label %[[DONE:[^,]+]]
+// CHECK: [[CONTINUE]]:
+// CHECK: [[RES:%.+]] = call i32 @__kmpc_cancellationpoint(ptr @{{.+}}, i32 %{{.+}}, i32 4)
+// CHECK: [[IS_CANCEL:%.+]] = icmp ne i32 [[RES]], 0
+// CHECK: br i1 [[IS_CANCEL]], label %[[EXIT2:.+]], label %[[CONTINUE2:[^,]+]]
+// CHECK: [[EXIT2]]:
+// CHECK: store i32 1, ptr [[CLEANUP_SLOT]],
+// CHECK: br label %[[DONE]]
+// CHECK: store i32 0, ptr [[CLEANUP_SLOT]],
+// CHECK: br label %[[DONE]]
+// CHECK: [[DONE]]:
+// CHECK: ret i32 0
+
+// CHECK-LABEL: @_ZN1SC2Ei
+struct S {
+  int a;
+  S(int c) {
+// CHECK: [[GTID:%.+]] = call i32 @__kmpc_global_thread_num(ptr [[DEFLOC:@.+]])
+// CHECK: [[TASKV:%.+]] = call ptr @__kmpc_omp_task_alloc(ptr [[DEFLOC]], i32 [[GTID]], i32 1, i64 80, i64 16, ptr [[TASK4:@.+]])
+// CHECK: [[TASK_DATA:%.+]] = getelementptr inbounds nuw %{{.+}}, ptr [[TASKV]], i32 0, i32 0
+// CHECK: [[DOWN:%.+]] = getelementptr inbounds nuw [[TD_TY:%.+]], ptr [[TASK_DATA]], i32 0, i32 5
+// CHECK: store i64 0, ptr [[DOWN]],
+// CHECK: [[UP:%.+]] = getelementptr inbounds nuw [[TD_TY]], ptr [[TASK_DATA]], i32 0, i32 6
+// CHECK: store i64 %{{.+}}, ptr [[UP]],
+// CHECK: [[ST:%.+]] = getelementptr inbounds nuw [[TD_TY]], ptr [[TASK_DATA]], i32 0, i32 7
+// CHECK: store i64 1, ptr [[ST]],
+// CHECK: [[ST_VAL:%.+]] = load i64, ptr [[ST]],
+// CHECK: [[NUM_TASKS:%.+]] = zext i32 %{{.+}} to i64
+// CHECK: call void @__kmpc_taskloop_5(ptr [[DEFLOC]], i32 [[GTID]], ptr [[TASKV]], i32 1, ptr [[DOWN]], ptr [[UP]], i64 [[ST_VAL]], i32 1, i32 2, i64 [[NUM_TASKS]], i32 1, ptr null)
+#pragma omp master taskloop shared(c) num_tasks(strict:a)
+    for (a = 0; a < c; ++a)
+      ;
+  }
+} s(1);
+
+// CHECK: define internal noundef i32 [[TASK4]](
+// CHECK: [[DOWN:%.+]] = getelementptr inbounds nuw [[TD_TY:%.+]], ptr %{{.+}}, i32 0, i32 5
+// CHECK: [[DOWN_VAL:%.+]] = load i64, ptr [[DOWN]],
+// CHECK: [[UP:%.+]] = getelementptr inbounds nuw [[TD_TY]], ptr %{{.+}}, i32 0, i32 6
+// CHECK: [[UP_VAL:%.+]] = load i64, ptr [[UP]],
+// CHECK: [[ST:%.+]] = getelementptr inbounds nuw [[TD_TY]], ptr %{{.+}}, i32 0, i32 7
+// CHECK: [[ST_VAL:%.+]] = load i64, ptr [[ST]],
+// CHECK: [[LITER:%.+]] = getelementptr inbounds nuw [[TD_TY]], ptr %{{.+}}, i32 0, i32 8
+// CHECK: [[LITER_VAL:%.+]] = load i32, ptr [[LITER]],
+// CHECK: store i64 [[DOWN_VAL]], ptr [[LB:%[^,]+]],
+// CHECK: store i64 [[UP_VAL]], ptr [[UB:%[^,]+]],
+// CHECK: store i64 [[ST_VAL]], ptr [[ST:%[^,]+]],
+// CHECK: store i32 [[LITER_VAL]], ptr [[LITER:%[^,]+]],
+// CHECK: [[LB_VAL:%.+]] = load i64, ptr [[LB]],
+// CHECK: [[LB_I32:%.+]] = trunc i64 [[LB_VAL]] to i32
+// CHECK: store i32 [[LB_I32]], ptr [[CNT:%.+]],
+// CHECK: br label
+// CHECK: [[VAL:%.+]] = load i32, ptr [[CNT]],
+// CHECK: [[VAL_I64:%.+]] = sext i32 [[VAL]] to i64
+// CHECK: [[UB_VAL:%.+]] = load i64, ptr [[UB]],
+// CHECK: [[CMP:%.+]] = icmp ule i64 [[VAL_I64]], [[UB_VAL]]
+// CHECK: br i1 [[CMP]], label %{{.+}}, label %{{.+}}
+// CHECK: load i32, ptr %
+// CHECK: store i32 %
+// CHECK: load i32, ptr %
+// CHECK: add nsw i32 %{{.+}}, 1
+// CHECK: store i32 %{{.+}}, ptr %
+// CHECK: br label %
+// CHECK: ret i32 0
+
+#endif
diff --git a/llvm/include/llvm/Frontend/OpenMP/OMPKinds.def b/llvm/include/llvm/Frontend/OpenMP/OMPKinds.def
index 6f26f853eca032..928a03148c4165 100644
--- a/llvm/include/llvm/Frontend/OpenMP/OMPKinds.def
+++ b/llvm/include/llvm/Frontend/OpenMP/OMPKinds.def
@@ -365,6 +365,9 @@ __OMP_RTL(__kmpc_omp_task_with_deps, false, Int32, IdentPtr, Int32,
 __OMP_RTL(__kmpc_taskloop, false, Void, IdentPtr, /* Int */ Int32, VoidPtr,
           /* Int */ Int32, Int64Ptr, Int64Ptr, Int64, /* Int */ Int32,
           /* Int */ Int32, Int64, VoidPtr)
+__OMP_RTL(__kmpc_taskloop_5, false, Void, IdentPtr, /* Int */ Int32, VoidPtr,
+          /* Int */ Int32, Int64Ptr, Int64Ptr, Int64, /* Int */ Int32,
+          /* Int */ Int32, Int64, Int32, VoidPtr)
 __OMP_RTL(__kmpc_omp_target_task_alloc, false, /* kmp_task_t */ VoidPtr,
           IdentPtr, Int32, Int32, SizeTy, SizeTy, TaskRoutineEntryPtr, Int64)
 __OMP_RTL(__kmpc_taskred_modifier_init, false, /* kmp_taskgroup */ VoidPtr,

@llvmbot
Copy link
Member

llvmbot commented Nov 21, 2024

@llvm/pr-subscribers-clang-codegen

Author: CHANDRA GHALE (chandraghale)

Changes

Initial parsing/sema for 'strict' modifier with 'num_tasks' and ‘grainsize’ clause is present in these commits grainsize_parsing and num_tasks_parsing . However, this implementation appears incomplete as it lacks code generation support. A runtime patch was introduced in this runtime commit runtime_patch , which adds a new API, _kmpc_taskloop_5, to accommodate the strict modifier. 
In this patch I have added codegen support. When the strict modifier is present alongside the grainsize or num_tasks clauses, the code emits a call to _kmpc_taskloop_5, which includes an additional parameter of type i32 with the value 1 to indicate the strict modifier. If the strict modifier is not present, it falls back to the existing _kmpc_taskloop API call.


Full diff: https://github.com/llvm/llvm-project/pull/117196.diff

5 Files Affected:

  • (modified) clang/lib/CodeGen/CGOpenMPRuntime.cpp (+28)
  • (modified) clang/lib/CodeGen/CGOpenMPRuntime.h (+1)
  • (modified) clang/lib/CodeGen/CGStmtOpenMP.cpp (+2)
  • (added) clang/test/OpenMP/taskloop_strictmodifier_codegen.cpp (+256)
  • (modified) llvm/include/llvm/Frontend/OpenMP/OMPKinds.def (+3)
diff --git a/clang/lib/CodeGen/CGOpenMPRuntime.cpp b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
index cc389974e04081..361550d2f102b4 100644
--- a/clang/lib/CodeGen/CGOpenMPRuntime.cpp
+++ b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
@@ -4666,6 +4666,33 @@ void CGOpenMPRuntime::emitTaskLoopCall(CodeGenFunction &CGF, SourceLocation Loc,
                                CGF.getContext().VoidPtrTy);
   }
   enum { NoSchedule = 0, Grainsize = 1, NumTasks = 2 };
+  if( Data.HasModifier ){
+    llvm::Value *TaskArgs[] = {
+      UpLoc,
+      ThreadID,
+      Result.NewTask,
+      IfVal,
+      LBLVal.getPointer(CGF),
+      UBLVal.getPointer(CGF),
+      CGF.EmitLoadOfScalar(StLVal, Loc),
+      llvm::ConstantInt::getSigned(
+          CGF.IntTy, 1), // Always 1 because taskgroup emitted by the compiler
+      llvm::ConstantInt::getSigned(
+          CGF.IntTy, Data.Schedule.getPointer()
+                         ? Data.Schedule.getInt() ? NumTasks : Grainsize
+                         : NoSchedule),
+      Data.Schedule.getPointer()
+          ? CGF.Builder.CreateIntCast(Data.Schedule.getPointer(), CGF.Int64Ty,
+                                      /*isSigned=*/false)
+          : llvm::ConstantInt::get(CGF.Int64Ty, /*V=*/0),
+      llvm::ConstantInt::get(CGF.Int32Ty, 1), //strict modifier enabled
+      Result.TaskDupFn ? CGF.Builder.CreatePointerBitCastOrAddrSpaceCast(
+                             Result.TaskDupFn, CGF.VoidPtrTy)
+                       : llvm::ConstantPointerNull::get(CGF.VoidPtrTy)};
+  CGF.EmitRuntimeCall(OMPBuilder.getOrCreateRuntimeFunction(
+                          CGM.getModule(), OMPRTL___kmpc_taskloop_5),
+                      TaskArgs);
+   } else {
   llvm::Value *TaskArgs[] = {
       UpLoc,
       ThreadID,
@@ -4690,6 +4717,7 @@ void CGOpenMPRuntime::emitTaskLoopCall(CodeGenFunction &CGF, SourceLocation Loc,
   CGF.EmitRuntimeCall(OMPBuilder.getOrCreateRuntimeFunction(
                           CGM.getModule(), OMPRTL___kmpc_taskloop),
                       TaskArgs);
+  }
 }
 
 /// Emit reduction operation for each element of array (required for
diff --git a/clang/lib/CodeGen/CGOpenMPRuntime.h b/clang/lib/CodeGen/CGOpenMPRuntime.h
index 5e7715743afb58..56d502d92806eb 100644
--- a/clang/lib/CodeGen/CGOpenMPRuntime.h
+++ b/clang/lib/CodeGen/CGOpenMPRuntime.h
@@ -122,6 +122,7 @@ struct OMPTaskDataTy final {
   bool IsReductionWithTaskMod = false;
   bool IsWorksharingReduction = false;
   bool HasNowaitClause = false;
+  bool HasModifier = false;
 };
 
 /// Class intended to support codegen of all kind of the reduction clauses.
diff --git a/clang/lib/CodeGen/CGStmtOpenMP.cpp b/clang/lib/CodeGen/CGStmtOpenMP.cpp
index 390516fea38498..88c862d2975174 100644
--- a/clang/lib/CodeGen/CGStmtOpenMP.cpp
+++ b/clang/lib/CodeGen/CGStmtOpenMP.cpp
@@ -7831,10 +7831,12 @@ void CodeGenFunction::EmitOMPTaskLoopBasedDirective(const OMPLoopDirective &S) {
     // grainsize clause
     Data.Schedule.setInt(/*IntVal=*/false);
     Data.Schedule.setPointer(EmitScalarExpr(Clause->getGrainsize()));
+    Data.HasModifier = (Clause->getModifier() == OMPC_GRAINSIZE_strict) ? true : false;
   } else if (const auto *Clause = S.getSingleClause<OMPNumTasksClause>()) {
     // num_tasks clause
     Data.Schedule.setInt(/*IntVal=*/true);
     Data.Schedule.setPointer(EmitScalarExpr(Clause->getNumTasks()));
+    Data.HasModifier = (Clause->getModifier() == OMPC_NUMTASKS_strict) ? true : false;
   }
 
   auto &&BodyGen = [CS, &S](CodeGenFunction &CGF, PrePostActionTy &) {
diff --git a/clang/test/OpenMP/taskloop_strictmodifier_codegen.cpp b/clang/test/OpenMP/taskloop_strictmodifier_codegen.cpp
new file mode 100644
index 00000000000000..d84ff181f66156
--- /dev/null
+++ b/clang/test/OpenMP/taskloop_strictmodifier_codegen.cpp
@@ -0,0 +1,256 @@
+// RUN: %clang_cc1 -verify -triple x86_64-apple-darwin10 -fopenmp -x c++ -emit-llvm %s -o - | FileCheck %s
+// RUN: %clang_cc1 -fopenmp -x c++ -triple x86_64-apple-darwin10 -emit-pch -o %t %s
+// RUN: %clang_cc1 -fopenmp -x c++ -triple x86_64-apple-darwin10 -include-pch %t -verify %s -emit-llvm -o - | FileCheck %s
+
+// RUN: %clang_cc1 -verify -triple x86_64-apple-darwin10 -fopenmp-simd -x c++ -emit-llvm %s -o - | FileCheck --check-prefix SIMD-ONLY0 %s
+// RUN: %clang_cc1 -fopenmp-simd -x c++ -triple x86_64-apple-darwin10 -emit-pch -o %t %s
+// RUN: %clang_cc1 -fopenmp-simd -x c++ -triple x86_64-apple-darwin10 -include-pch %t -verify %s -emit-llvm -o - | FileCheck --check-prefix SIMD-ONLY0 %s
+// SIMD-ONLY0-NOT: {{__kmpc|__tgt}}
+// expected-no-diagnostics
+#ifndef HEADER
+#define HEADER
+
+// CHECK-LABEL: @main
+int main(int argc, char **argv) {
+// CHECK: [[GTID:%.+]] = call i32 @__kmpc_global_thread_num(ptr [[DEFLOC:@.+]])
+// CHECK: call ptr @__kmpc_omp_task_alloc(ptr [[DEFLOC]], i32 [[GTID]],
+// CHECK: call i32 @__kmpc_omp_task(ptr [[DEFLOC]], i32 [[GTID]],
+#pragma omp task
+  ;
+// CHECK:       [[RES:%.+]] = call {{.*}}i32 @__kmpc_master(ptr [[DEFLOC]], i32 [[GTID]])
+// CHECK-NEXT:  [[IS_MASTER:%.+]] = icmp ne i32 [[RES]], 0
+// CHECK-NEXT:  br i1 [[IS_MASTER]], label {{%?}}[[THEN:.+]], label {{%?}}[[EXIT:.+]]
+// CHECK:       [[THEN]]
+// CHECK: call void @__kmpc_taskgroup(ptr [[DEFLOC]], i32 [[GTID]])
+// CHECK: [[TASKV:%.+]] = call ptr @__kmpc_omp_task_alloc(ptr [[DEFLOC]], i32 [[GTID]], i32 33, i64 80, i64 1, ptr [[TASK1:@.+]])
+// CHECK: [[TASK_DATA:%.+]] = getelementptr inbounds nuw %{{.+}}, ptr [[TASKV]], i32 0, i32 0
+// CHECK: [[DOWN:%.+]] = getelementptr inbounds nuw [[TD_TY:%.+]], ptr [[TASK_DATA]], i32 0, i32 5
+// CHECK: store i64 0, ptr [[DOWN]],
+// CHECK: [[UP:%.+]] = getelementptr inbounds nuw [[TD_TY]], ptr [[TASK_DATA]], i32 0, i32 6
+// CHECK: store i64 9, ptr [[UP]],
+// CHECK: [[ST:%.+]] = getelementptr inbounds nuw [[TD_TY]], ptr [[TASK_DATA]], i32 0, i32 7
+// CHECK: store i64 1, ptr [[ST]],
+// CHECK: [[ST_VAL:%.+]] = load i64, ptr [[ST]],
+// CHECK: call void @__kmpc_taskloop(ptr [[DEFLOC]], i32 [[GTID]], ptr [[TASKV]], i32 1, ptr [[DOWN]], ptr [[UP]], i64 [[ST_VAL]], i32 1, i32 0, i64 0, ptr null)
+// CHECK: call void @__kmpc_end_taskgroup(ptr [[DEFLOC]], i32 [[GTID]])
+// CHECK-NEXT:  call {{.*}}void @__kmpc_end_master(ptr [[DEFLOC]], i32 [[GTID]])
+// CHECK-NEXT:  br label {{%?}}[[EXIT]]
+// CHECK:       [[EXIT]]
+#pragma omp master taskloop priority(argc)
+  for (int i = 0; i < 10; ++i)
+    ;
+// CHECK:       [[RES:%.+]] = call {{.*}}i32 @__kmpc_master(ptr [[DEFLOC]], i32 [[GTID]])
+// CHECK-NEXT:  [[IS_MASTER:%.+]] = icmp ne i32 [[RES]], 0
+// CHECK-NEXT:  br i1 [[IS_MASTER]], label {{%?}}[[THEN:.+]], label {{%?}}[[EXIT:.+]]
+// CHECK:       [[THEN]]
+// CHECK: [[TASKV:%.+]] = call ptr @__kmpc_omp_task_alloc(ptr [[DEFLOC]], i32 [[GTID]], i32 1, i64 80, i64 1, ptr [[TASK2:@.+]])
+// CHECK: [[TASK_DATA:%.+]] = getelementptr inbounds nuw %{{.+}}, ptr [[TASKV]], i32 0, i32 0
+// CHECK: [[DOWN:%.+]] = getelementptr inbounds nuw [[TD_TY:%.+]], ptr [[TASK_DATA]], i32 0, i32 5
+// CHECK: store i64 0, ptr [[DOWN]],
+// CHECK: [[UP:%.+]] = getelementptr inbounds nuw [[TD_TY]], ptr [[TASK_DATA]], i32 0, i32 6
+// CHECK: store i64 9, ptr [[UP]],
+// CHECK: [[ST:%.+]] = getelementptr inbounds nuw [[TD_TY]], ptr [[TASK_DATA]], i32 0, i32 7
+// CHECK: store i64 1, ptr [[ST]],
+// CHECK: [[ST_VAL:%.+]] = load i64, ptr [[ST]],
+// CHECK: [[GRAINSIZE:%.+]] = zext i32 %{{.+}} to i64
+// CHECK: call void @__kmpc_taskloop_5(ptr [[DEFLOC]], i32 [[GTID]], ptr [[TASKV]], i32 1, ptr [[DOWN]], ptr [[UP]], i64 [[ST_VAL]], i32 1, i32 1, i64 [[GRAINSIZE]], i32 1, ptr null)
+// CHECK-NEXT:  call {{.*}}void @__kmpc_end_master(ptr [[DEFLOC]], i32 [[GTID]])
+// CHECK-NEXT:  br label {{%?}}[[EXIT]]
+// CHECK:       [[EXIT]]
+#pragma omp master taskloop nogroup grainsize(strict:argc)
+  for (int i = 0; i < 10; ++i)
+    ;
+// CHECK:       [[RES:%.+]] = call {{.*}}i32 @__kmpc_master(ptr [[DEFLOC]], i32 [[GTID]])
+// CHECK-NEXT:  [[IS_MASTER:%.+]] = icmp ne i32 [[RES]], 0
+// CHECK-NEXT:  br i1 [[IS_MASTER]], label {{%?}}[[THEN:.+]], label {{%?}}[[EXIT:.+]]
+// CHECK:       [[THEN]]
+// CHECK: call void @__kmpc_taskgroup(ptr [[DEFLOC]], i32 [[GTID]])
+// CHECK: [[TASKV:%.+]] = call ptr @__kmpc_omp_task_alloc(ptr [[DEFLOC]], i32 [[GTID]], i32 1, i64 80, i64 16, ptr [[TASK3:@.+]])
+// CHECK: [[TASK_DATA:%.+]] = getelementptr inbounds nuw %{{.+}}, ptr [[TASKV]], i32 0, i32 0
+// CHECK: [[IF:%.+]] = icmp ne i32 %{{.+}}, 0
+// CHECK: [[IF_INT:%.+]] = sext i1 [[IF]] to i32
+// CHECK: [[DOWN:%.+]] = getelementptr inbounds nuw [[TD_TY:%.+]], ptr [[TASK_DATA]], i32 0, i32 5
+// CHECK: store i64 0, ptr [[DOWN]],
+// CHECK: [[UP:%.+]] = getelementptr inbounds nuw [[TD_TY]], ptr [[TASK_DATA]], i32 0, i32 6
+// CHECK: store i64 %{{.+}}, ptr [[UP]],
+// CHECK: [[ST:%.+]] = getelementptr inbounds nuw [[TD_TY]], ptr [[TASK_DATA]], i32 0, i32 7
+// CHECK: store i64 1, ptr [[ST]],
+// CHECK: [[ST_VAL:%.+]] = load i64, ptr [[ST]],
+// CHECK: call void @__kmpc_taskloop_5(ptr [[DEFLOC]], i32 [[GTID]], ptr [[TASKV]], i32 [[IF_INT]], ptr [[DOWN]], ptr [[UP]], i64 [[ST_VAL]], i32 1, i32 2, i64 4, i32 1, ptr null)
+// CHECK: call void @__kmpc_end_taskgroup(ptr [[DEFLOC]], i32 [[GTID]])
+// CHECK-NEXT:  call {{.*}}void @__kmpc_end_master(ptr [[DEFLOC]], i32 [[GTID]])
+// CHECK-NEXT:  br label {{%?}}[[EXIT]]
+// CHECK:       [[EXIT]]
+  int i;
+#pragma omp master taskloop if(argc) shared(argc, argv) collapse(2) num_tasks(strict: 4)
+  for (i = 0; i < argc; ++i)
+  for (int j = argc; j < argv[argc][argc]; ++j)
+    ;
+// CHECK:       [[RES:%.+]] = call {{.*}}i32 @__kmpc_master(ptr [[DEFLOC]], i32 [[GTID]])
+// CHECK-NEXT:  [[IS_MASTER:%.+]] = icmp ne i32 [[RES]], 0
+// CHECK-NEXT:  br i1 [[IS_MASTER]], label {{%?}}[[THEN:.+]], label {{%?}}[[EXIT:.+]]
+// CHECK:       [[THEN]]
+// CHECK: call void @__kmpc_taskgroup(
+// CHECK: call ptr @__kmpc_omp_task_alloc(ptr @{{.+}}, i32 %{{.+}}, i32 1, i64 80, i64 1, ptr [[TASK_CANCEL:@.+]])
+// CHECK: call void @__kmpc_taskloop(
+// CHECK: call void @__kmpc_end_taskgroup(
+// CHECK-NEXT:  call {{.*}}void @__kmpc_end_master(ptr [[DEFLOC]], i32 [[GTID]])
+// CHECK-NEXT:  br label {{%?}}[[EXIT]]
+// CHECK:       [[EXIT]]
+#pragma omp master taskloop
+  for (int i = 0; i < 10; ++i) {
+#pragma omp cancel taskgroup
+#pragma omp cancellation point taskgroup
+  }
+}
+
+// CHECK: define internal noundef i32 [[TASK1]](
+// CHECK: [[DOWN:%.+]] = getelementptr inbounds nuw [[TD_TY:%.+]], ptr %{{.+}}, i32 0, i32 5
+// CHECK: [[DOWN_VAL:%.+]] = load i64, ptr [[DOWN]],
+// CHECK: [[UP:%.+]] = getelementptr inbounds nuw [[TD_TY]], ptr %{{.+}}, i32 0, i32 6
+// CHECK: [[UP_VAL:%.+]] = load i64, ptr [[UP]],
+// CHECK: [[ST:%.+]] = getelementptr inbounds nuw [[TD_TY]], ptr %{{.+}}, i32 0, i32 7
+// CHECK: [[ST_VAL:%.+]] = load i64, ptr [[ST]],
+// CHECK: [[LITER:%.+]] = getelementptr inbounds nuw [[TD_TY]], ptr %{{.+}}, i32 0, i32 8
+// CHECK: [[LITER_VAL:%.+]] = load i32, ptr [[LITER]],
+// CHECK: store i64 [[DOWN_VAL]], ptr [[LB:%[^,]+]],
+// CHECK: store i64 [[UP_VAL]], ptr [[UB:%[^,]+]],
+// CHECK: store i64 [[ST_VAL]], ptr [[ST:%[^,]+]],
+// CHECK: store i32 [[LITER_VAL]], ptr [[LITER:%[^,]+]],
+// CHECK: [[LB_VAL:%.+]] = load i64, ptr [[LB]],
+// CHECK: [[LB_I32:%.+]] = trunc i64 [[LB_VAL]] to i32
+// CHECK: store i32 [[LB_I32]], ptr [[CNT:%.+]],
+// CHECK: br label
+// CHECK: [[VAL:%.+]] = load i32, ptr [[CNT]],
+// CHECK: [[VAL_I64:%.+]] = sext i32 [[VAL]] to i64
+// CHECK: [[UB_VAL:%.+]] = load i64, ptr [[UB]],
+// CHECK: [[CMP:%.+]] = icmp ule i64 [[VAL_I64]], [[UB_VAL]]
+// CHECK: br i1 [[CMP]], label %{{.+}}, label %{{.+}}
+// CHECK: load i32, ptr %
+// CHECK: store i32 %
+// CHECK: load i32, ptr %
+// CHECK: add nsw i32 %{{.+}}, 1
+// CHECK: store i32 %{{.+}}, ptr %
+// CHECK: br label %
+// CHECK: ret i32 0
+
+// CHECK: define internal noundef i32 [[TASK2]](
+// CHECK: [[DOWN:%.+]] = getelementptr inbounds nuw [[TD_TY:%.+]], ptr %{{.+}}, i32 0, i32 5
+// CHECK: [[DOWN_VAL:%.+]] = load i64, ptr [[DOWN]],
+// CHECK: [[UP:%.+]] = getelementptr inbounds nuw [[TD_TY]], ptr %{{.+}}, i32 0, i32 6
+// CHECK: [[UP_VAL:%.+]] = load i64, ptr [[UP]],
+// CHECK: [[ST:%.+]] = getelementptr inbounds nuw [[TD_TY]], ptr %{{.+}}, i32 0, i32 7
+// CHECK: [[ST_VAL:%.+]] = load i64, ptr [[ST]],
+// CHECK: [[LITER:%.+]] = getelementptr inbounds nuw [[TD_TY]], ptr %{{.+}}, i32 0, i32 8
+// CHECK: [[LITER_VAL:%.+]] = load i32, ptr [[LITER]],
+// CHECK: store i64 [[DOWN_VAL]], ptr [[LB:%[^,]+]],
+// CHECK: store i64 [[UP_VAL]], ptr [[UB:%[^,]+]],
+// CHECK: store i64 [[ST_VAL]], ptr [[ST:%[^,]+]],
+// CHECK: store i32 [[LITER_VAL]], ptr [[LITER:%[^,]+]],
+// CHECK: [[LB_VAL:%.+]] = load i64, ptr [[LB]],
+// CHECK: [[LB_I32:%.+]] = trunc i64 [[LB_VAL]] to i32
+// CHECK: store i32 [[LB_I32]], ptr [[CNT:%.+]],
+// CHECK: br label
+// CHECK: [[VAL:%.+]] = load i32, ptr [[CNT]],
+// CHECK: [[VAL_I64:%.+]] = sext i32 [[VAL]] to i64
+// CHECK: [[UB_VAL:%.+]] = load i64, ptr [[UB]],
+// CHECK: [[CMP:%.+]] = icmp ule i64 [[VAL_I64]], [[UB_VAL]]
+// CHECK: br i1 [[CMP]], label %{{.+}}, label %{{.+}}
+// CHECK: load i32, ptr %
+// CHECK: store i32 %
+// CHECK: load i32, ptr %
+// CHECK: add nsw i32 %{{.+}}, 1
+// CHECK: store i32 %{{.+}}, ptr %
+// CHECK: br label %
+// CHECK: ret i32 0
+
+// CHECK: define internal noundef i32 [[TASK3]](
+// CHECK: [[DOWN:%.+]] = getelementptr inbounds nuw [[TD_TY:%.+]], ptr %{{.+}}, i32 0, i32 5
+// CHECK: [[DOWN_VAL:%.+]] = load i64, ptr [[DOWN]],
+// CHECK: [[UP:%.+]] = getelementptr inbounds nuw [[TD_TY]], ptr %{{.+}}, i32 0, i32 6
+// CHECK: [[UP_VAL:%.+]] = load i64, ptr [[UP]],
+// CHECK: [[ST:%.+]] = getelementptr inbounds nuw [[TD_TY]], ptr %{{.+}}, i32 0, i32 7
+// CHECK: [[ST_VAL:%.+]] = load i64, ptr [[ST]],
+// CHECK: [[LITER:%.+]] = getelementptr inbounds nuw [[TD_TY]], ptr %{{.+}}, i32 0, i32 8
+// CHECK: [[LITER_VAL:%.+]] = load i32, ptr [[LITER]],
+// CHECK: store i64 [[DOWN_VAL]], ptr [[LB:%[^,]+]],
+// CHECK: store i64 [[UP_VAL]], ptr [[UB:%[^,]+]],
+// CHECK: store i64 [[ST_VAL]], ptr [[ST:%[^,]+]],
+// CHECK: store i32 [[LITER_VAL]], ptr [[LITER:%[^,]+]],
+// CHECK: [[LB_VAL:%.+]] = load i64, ptr [[LB]],
+// CHECK: store i64 [[LB_VAL]], ptr [[CNT:%.+]],
+// CHECK: br label
+// CHECK: ret i32 0
+
+// CHECK: define internal noundef i32 [[TASK_CANCEL]](
+// CHECK: [[RES:%.+]] = call i32 @__kmpc_cancel(ptr @{{.+}}, i32 %{{.+}}, i32 4)
+// CHECK: [[IS_CANCEL:%.+]] = icmp ne i32 [[RES]], 0
+// CHECK: br i1 [[IS_CANCEL]], label %[[EXIT:.+]], label %[[CONTINUE:[^,]+]]
+// CHECK: [[EXIT]]:
+// CHECK: store i32 1, ptr [[CLEANUP_SLOT:%.+]],
+// CHECK: br label %[[DONE:[^,]+]]
+// CHECK: [[CONTINUE]]:
+// CHECK: [[RES:%.+]] = call i32 @__kmpc_cancellationpoint(ptr @{{.+}}, i32 %{{.+}}, i32 4)
+// CHECK: [[IS_CANCEL:%.+]] = icmp ne i32 [[RES]], 0
+// CHECK: br i1 [[IS_CANCEL]], label %[[EXIT2:.+]], label %[[CONTINUE2:[^,]+]]
+// CHECK: [[EXIT2]]:
+// CHECK: store i32 1, ptr [[CLEANUP_SLOT]],
+// CHECK: br label %[[DONE]]
+// CHECK: store i32 0, ptr [[CLEANUP_SLOT]],
+// CHECK: br label %[[DONE]]
+// CHECK: [[DONE]]:
+// CHECK: ret i32 0
+
+// CHECK-LABEL: @_ZN1SC2Ei
+struct S {
+  int a;
+  S(int c) {
+// CHECK: [[GTID:%.+]] = call i32 @__kmpc_global_thread_num(ptr [[DEFLOC:@.+]])
+// CHECK: [[TASKV:%.+]] = call ptr @__kmpc_omp_task_alloc(ptr [[DEFLOC]], i32 [[GTID]], i32 1, i64 80, i64 16, ptr [[TASK4:@.+]])
+// CHECK: [[TASK_DATA:%.+]] = getelementptr inbounds nuw %{{.+}}, ptr [[TASKV]], i32 0, i32 0
+// CHECK: [[DOWN:%.+]] = getelementptr inbounds nuw [[TD_TY:%.+]], ptr [[TASK_DATA]], i32 0, i32 5
+// CHECK: store i64 0, ptr [[DOWN]],
+// CHECK: [[UP:%.+]] = getelementptr inbounds nuw [[TD_TY]], ptr [[TASK_DATA]], i32 0, i32 6
+// CHECK: store i64 %{{.+}}, ptr [[UP]],
+// CHECK: [[ST:%.+]] = getelementptr inbounds nuw [[TD_TY]], ptr [[TASK_DATA]], i32 0, i32 7
+// CHECK: store i64 1, ptr [[ST]],
+// CHECK: [[ST_VAL:%.+]] = load i64, ptr [[ST]],
+// CHECK: [[NUM_TASKS:%.+]] = zext i32 %{{.+}} to i64
+// CHECK: call void @__kmpc_taskloop_5(ptr [[DEFLOC]], i32 [[GTID]], ptr [[TASKV]], i32 1, ptr [[DOWN]], ptr [[UP]], i64 [[ST_VAL]], i32 1, i32 2, i64 [[NUM_TASKS]], i32 1, ptr null)
+#pragma omp master taskloop shared(c) num_tasks(strict:a)
+    for (a = 0; a < c; ++a)
+      ;
+  }
+} s(1);
+
+// CHECK: define internal noundef i32 [[TASK4]](
+// CHECK: [[DOWN:%.+]] = getelementptr inbounds nuw [[TD_TY:%.+]], ptr %{{.+}}, i32 0, i32 5
+// CHECK: [[DOWN_VAL:%.+]] = load i64, ptr [[DOWN]],
+// CHECK: [[UP:%.+]] = getelementptr inbounds nuw [[TD_TY]], ptr %{{.+}}, i32 0, i32 6
+// CHECK: [[UP_VAL:%.+]] = load i64, ptr [[UP]],
+// CHECK: [[ST:%.+]] = getelementptr inbounds nuw [[TD_TY]], ptr %{{.+}}, i32 0, i32 7
+// CHECK: [[ST_VAL:%.+]] = load i64, ptr [[ST]],
+// CHECK: [[LITER:%.+]] = getelementptr inbounds nuw [[TD_TY]], ptr %{{.+}}, i32 0, i32 8
+// CHECK: [[LITER_VAL:%.+]] = load i32, ptr [[LITER]],
+// CHECK: store i64 [[DOWN_VAL]], ptr [[LB:%[^,]+]],
+// CHECK: store i64 [[UP_VAL]], ptr [[UB:%[^,]+]],
+// CHECK: store i64 [[ST_VAL]], ptr [[ST:%[^,]+]],
+// CHECK: store i32 [[LITER_VAL]], ptr [[LITER:%[^,]+]],
+// CHECK: [[LB_VAL:%.+]] = load i64, ptr [[LB]],
+// CHECK: [[LB_I32:%.+]] = trunc i64 [[LB_VAL]] to i32
+// CHECK: store i32 [[LB_I32]], ptr [[CNT:%.+]],
+// CHECK: br label
+// CHECK: [[VAL:%.+]] = load i32, ptr [[CNT]],
+// CHECK: [[VAL_I64:%.+]] = sext i32 [[VAL]] to i64
+// CHECK: [[UB_VAL:%.+]] = load i64, ptr [[UB]],
+// CHECK: [[CMP:%.+]] = icmp ule i64 [[VAL_I64]], [[UB_VAL]]
+// CHECK: br i1 [[CMP]], label %{{.+}}, label %{{.+}}
+// CHECK: load i32, ptr %
+// CHECK: store i32 %
+// CHECK: load i32, ptr %
+// CHECK: add nsw i32 %{{.+}}, 1
+// CHECK: store i32 %{{.+}}, ptr %
+// CHECK: br label %
+// CHECK: ret i32 0
+
+#endif
diff --git a/llvm/include/llvm/Frontend/OpenMP/OMPKinds.def b/llvm/include/llvm/Frontend/OpenMP/OMPKinds.def
index 6f26f853eca032..928a03148c4165 100644
--- a/llvm/include/llvm/Frontend/OpenMP/OMPKinds.def
+++ b/llvm/include/llvm/Frontend/OpenMP/OMPKinds.def
@@ -365,6 +365,9 @@ __OMP_RTL(__kmpc_omp_task_with_deps, false, Int32, IdentPtr, Int32,
 __OMP_RTL(__kmpc_taskloop, false, Void, IdentPtr, /* Int */ Int32, VoidPtr,
           /* Int */ Int32, Int64Ptr, Int64Ptr, Int64, /* Int */ Int32,
           /* Int */ Int32, Int64, VoidPtr)
+__OMP_RTL(__kmpc_taskloop_5, false, Void, IdentPtr, /* Int */ Int32, VoidPtr,
+          /* Int */ Int32, Int64Ptr, Int64Ptr, Int64, /* Int */ Int32,
+          /* Int */ Int32, Int64, Int32, VoidPtr)
 __OMP_RTL(__kmpc_omp_target_task_alloc, false, /* kmp_task_t */ VoidPtr,
           IdentPtr, Int32, Int32, SizeTy, SizeTy, TaskRoutineEntryPtr, Int64)
 __OMP_RTL(__kmpc_taskred_modifier_init, false, /* kmp_taskgroup */ VoidPtr,

@llvmbot
Copy link
Member

llvmbot commented Nov 21, 2024

@llvm/pr-subscribers-flang-openmp

Author: CHANDRA GHALE (chandraghale)

Changes

Initial parsing/sema for 'strict' modifier with 'num_tasks' and ‘grainsize’ clause is present in these commits grainsize_parsing and num_tasks_parsing . However, this implementation appears incomplete as it lacks code generation support. A runtime patch was introduced in this runtime commit runtime_patch , which adds a new API, _kmpc_taskloop_5, to accommodate the strict modifier. 
In this patch I have added codegen support. When the strict modifier is present alongside the grainsize or num_tasks clauses, the code emits a call to _kmpc_taskloop_5, which includes an additional parameter of type i32 with the value 1 to indicate the strict modifier. If the strict modifier is not present, it falls back to the existing _kmpc_taskloop API call.


Full diff: https://github.com/llvm/llvm-project/pull/117196.diff

5 Files Affected:

  • (modified) clang/lib/CodeGen/CGOpenMPRuntime.cpp (+28)
  • (modified) clang/lib/CodeGen/CGOpenMPRuntime.h (+1)
  • (modified) clang/lib/CodeGen/CGStmtOpenMP.cpp (+2)
  • (added) clang/test/OpenMP/taskloop_strictmodifier_codegen.cpp (+256)
  • (modified) llvm/include/llvm/Frontend/OpenMP/OMPKinds.def (+3)
diff --git a/clang/lib/CodeGen/CGOpenMPRuntime.cpp b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
index cc389974e04081..361550d2f102b4 100644
--- a/clang/lib/CodeGen/CGOpenMPRuntime.cpp
+++ b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
@@ -4666,6 +4666,33 @@ void CGOpenMPRuntime::emitTaskLoopCall(CodeGenFunction &CGF, SourceLocation Loc,
                                CGF.getContext().VoidPtrTy);
   }
   enum { NoSchedule = 0, Grainsize = 1, NumTasks = 2 };
+  if( Data.HasModifier ){
+    llvm::Value *TaskArgs[] = {
+      UpLoc,
+      ThreadID,
+      Result.NewTask,
+      IfVal,
+      LBLVal.getPointer(CGF),
+      UBLVal.getPointer(CGF),
+      CGF.EmitLoadOfScalar(StLVal, Loc),
+      llvm::ConstantInt::getSigned(
+          CGF.IntTy, 1), // Always 1 because taskgroup emitted by the compiler
+      llvm::ConstantInt::getSigned(
+          CGF.IntTy, Data.Schedule.getPointer()
+                         ? Data.Schedule.getInt() ? NumTasks : Grainsize
+                         : NoSchedule),
+      Data.Schedule.getPointer()
+          ? CGF.Builder.CreateIntCast(Data.Schedule.getPointer(), CGF.Int64Ty,
+                                      /*isSigned=*/false)
+          : llvm::ConstantInt::get(CGF.Int64Ty, /*V=*/0),
+      llvm::ConstantInt::get(CGF.Int32Ty, 1), //strict modifier enabled
+      Result.TaskDupFn ? CGF.Builder.CreatePointerBitCastOrAddrSpaceCast(
+                             Result.TaskDupFn, CGF.VoidPtrTy)
+                       : llvm::ConstantPointerNull::get(CGF.VoidPtrTy)};
+  CGF.EmitRuntimeCall(OMPBuilder.getOrCreateRuntimeFunction(
+                          CGM.getModule(), OMPRTL___kmpc_taskloop_5),
+                      TaskArgs);
+   } else {
   llvm::Value *TaskArgs[] = {
       UpLoc,
       ThreadID,
@@ -4690,6 +4717,7 @@ void CGOpenMPRuntime::emitTaskLoopCall(CodeGenFunction &CGF, SourceLocation Loc,
   CGF.EmitRuntimeCall(OMPBuilder.getOrCreateRuntimeFunction(
                           CGM.getModule(), OMPRTL___kmpc_taskloop),
                       TaskArgs);
+  }
 }
 
 /// Emit reduction operation for each element of array (required for
diff --git a/clang/lib/CodeGen/CGOpenMPRuntime.h b/clang/lib/CodeGen/CGOpenMPRuntime.h
index 5e7715743afb58..56d502d92806eb 100644
--- a/clang/lib/CodeGen/CGOpenMPRuntime.h
+++ b/clang/lib/CodeGen/CGOpenMPRuntime.h
@@ -122,6 +122,7 @@ struct OMPTaskDataTy final {
   bool IsReductionWithTaskMod = false;
   bool IsWorksharingReduction = false;
   bool HasNowaitClause = false;
+  bool HasModifier = false;
 };
 
 /// Class intended to support codegen of all kind of the reduction clauses.
diff --git a/clang/lib/CodeGen/CGStmtOpenMP.cpp b/clang/lib/CodeGen/CGStmtOpenMP.cpp
index 390516fea38498..88c862d2975174 100644
--- a/clang/lib/CodeGen/CGStmtOpenMP.cpp
+++ b/clang/lib/CodeGen/CGStmtOpenMP.cpp
@@ -7831,10 +7831,12 @@ void CodeGenFunction::EmitOMPTaskLoopBasedDirective(const OMPLoopDirective &S) {
     // grainsize clause
     Data.Schedule.setInt(/*IntVal=*/false);
     Data.Schedule.setPointer(EmitScalarExpr(Clause->getGrainsize()));
+    Data.HasModifier = (Clause->getModifier() == OMPC_GRAINSIZE_strict) ? true : false;
   } else if (const auto *Clause = S.getSingleClause<OMPNumTasksClause>()) {
     // num_tasks clause
     Data.Schedule.setInt(/*IntVal=*/true);
     Data.Schedule.setPointer(EmitScalarExpr(Clause->getNumTasks()));
+    Data.HasModifier = (Clause->getModifier() == OMPC_NUMTASKS_strict) ? true : false;
   }
 
   auto &&BodyGen = [CS, &S](CodeGenFunction &CGF, PrePostActionTy &) {
diff --git a/clang/test/OpenMP/taskloop_strictmodifier_codegen.cpp b/clang/test/OpenMP/taskloop_strictmodifier_codegen.cpp
new file mode 100644
index 00000000000000..d84ff181f66156
--- /dev/null
+++ b/clang/test/OpenMP/taskloop_strictmodifier_codegen.cpp
@@ -0,0 +1,256 @@
+// RUN: %clang_cc1 -verify -triple x86_64-apple-darwin10 -fopenmp -x c++ -emit-llvm %s -o - | FileCheck %s
+// RUN: %clang_cc1 -fopenmp -x c++ -triple x86_64-apple-darwin10 -emit-pch -o %t %s
+// RUN: %clang_cc1 -fopenmp -x c++ -triple x86_64-apple-darwin10 -include-pch %t -verify %s -emit-llvm -o - | FileCheck %s
+
+// RUN: %clang_cc1 -verify -triple x86_64-apple-darwin10 -fopenmp-simd -x c++ -emit-llvm %s -o - | FileCheck --check-prefix SIMD-ONLY0 %s
+// RUN: %clang_cc1 -fopenmp-simd -x c++ -triple x86_64-apple-darwin10 -emit-pch -o %t %s
+// RUN: %clang_cc1 -fopenmp-simd -x c++ -triple x86_64-apple-darwin10 -include-pch %t -verify %s -emit-llvm -o - | FileCheck --check-prefix SIMD-ONLY0 %s
+// SIMD-ONLY0-NOT: {{__kmpc|__tgt}}
+// expected-no-diagnostics
+#ifndef HEADER
+#define HEADER
+
+// CHECK-LABEL: @main
+int main(int argc, char **argv) {
+// CHECK: [[GTID:%.+]] = call i32 @__kmpc_global_thread_num(ptr [[DEFLOC:@.+]])
+// CHECK: call ptr @__kmpc_omp_task_alloc(ptr [[DEFLOC]], i32 [[GTID]],
+// CHECK: call i32 @__kmpc_omp_task(ptr [[DEFLOC]], i32 [[GTID]],
+#pragma omp task
+  ;
+// CHECK:       [[RES:%.+]] = call {{.*}}i32 @__kmpc_master(ptr [[DEFLOC]], i32 [[GTID]])
+// CHECK-NEXT:  [[IS_MASTER:%.+]] = icmp ne i32 [[RES]], 0
+// CHECK-NEXT:  br i1 [[IS_MASTER]], label {{%?}}[[THEN:.+]], label {{%?}}[[EXIT:.+]]
+// CHECK:       [[THEN]]
+// CHECK: call void @__kmpc_taskgroup(ptr [[DEFLOC]], i32 [[GTID]])
+// CHECK: [[TASKV:%.+]] = call ptr @__kmpc_omp_task_alloc(ptr [[DEFLOC]], i32 [[GTID]], i32 33, i64 80, i64 1, ptr [[TASK1:@.+]])
+// CHECK: [[TASK_DATA:%.+]] = getelementptr inbounds nuw %{{.+}}, ptr [[TASKV]], i32 0, i32 0
+// CHECK: [[DOWN:%.+]] = getelementptr inbounds nuw [[TD_TY:%.+]], ptr [[TASK_DATA]], i32 0, i32 5
+// CHECK: store i64 0, ptr [[DOWN]],
+// CHECK: [[UP:%.+]] = getelementptr inbounds nuw [[TD_TY]], ptr [[TASK_DATA]], i32 0, i32 6
+// CHECK: store i64 9, ptr [[UP]],
+// CHECK: [[ST:%.+]] = getelementptr inbounds nuw [[TD_TY]], ptr [[TASK_DATA]], i32 0, i32 7
+// CHECK: store i64 1, ptr [[ST]],
+// CHECK: [[ST_VAL:%.+]] = load i64, ptr [[ST]],
+// CHECK: call void @__kmpc_taskloop(ptr [[DEFLOC]], i32 [[GTID]], ptr [[TASKV]], i32 1, ptr [[DOWN]], ptr [[UP]], i64 [[ST_VAL]], i32 1, i32 0, i64 0, ptr null)
+// CHECK: call void @__kmpc_end_taskgroup(ptr [[DEFLOC]], i32 [[GTID]])
+// CHECK-NEXT:  call {{.*}}void @__kmpc_end_master(ptr [[DEFLOC]], i32 [[GTID]])
+// CHECK-NEXT:  br label {{%?}}[[EXIT]]
+// CHECK:       [[EXIT]]
+#pragma omp master taskloop priority(argc)
+  for (int i = 0; i < 10; ++i)
+    ;
+// CHECK:       [[RES:%.+]] = call {{.*}}i32 @__kmpc_master(ptr [[DEFLOC]], i32 [[GTID]])
+// CHECK-NEXT:  [[IS_MASTER:%.+]] = icmp ne i32 [[RES]], 0
+// CHECK-NEXT:  br i1 [[IS_MASTER]], label {{%?}}[[THEN:.+]], label {{%?}}[[EXIT:.+]]
+// CHECK:       [[THEN]]
+// CHECK: [[TASKV:%.+]] = call ptr @__kmpc_omp_task_alloc(ptr [[DEFLOC]], i32 [[GTID]], i32 1, i64 80, i64 1, ptr [[TASK2:@.+]])
+// CHECK: [[TASK_DATA:%.+]] = getelementptr inbounds nuw %{{.+}}, ptr [[TASKV]], i32 0, i32 0
+// CHECK: [[DOWN:%.+]] = getelementptr inbounds nuw [[TD_TY:%.+]], ptr [[TASK_DATA]], i32 0, i32 5
+// CHECK: store i64 0, ptr [[DOWN]],
+// CHECK: [[UP:%.+]] = getelementptr inbounds nuw [[TD_TY]], ptr [[TASK_DATA]], i32 0, i32 6
+// CHECK: store i64 9, ptr [[UP]],
+// CHECK: [[ST:%.+]] = getelementptr inbounds nuw [[TD_TY]], ptr [[TASK_DATA]], i32 0, i32 7
+// CHECK: store i64 1, ptr [[ST]],
+// CHECK: [[ST_VAL:%.+]] = load i64, ptr [[ST]],
+// CHECK: [[GRAINSIZE:%.+]] = zext i32 %{{.+}} to i64
+// CHECK: call void @__kmpc_taskloop_5(ptr [[DEFLOC]], i32 [[GTID]], ptr [[TASKV]], i32 1, ptr [[DOWN]], ptr [[UP]], i64 [[ST_VAL]], i32 1, i32 1, i64 [[GRAINSIZE]], i32 1, ptr null)
+// CHECK-NEXT:  call {{.*}}void @__kmpc_end_master(ptr [[DEFLOC]], i32 [[GTID]])
+// CHECK-NEXT:  br label {{%?}}[[EXIT]]
+// CHECK:       [[EXIT]]
+#pragma omp master taskloop nogroup grainsize(strict:argc)
+  for (int i = 0; i < 10; ++i)
+    ;
+// CHECK:       [[RES:%.+]] = call {{.*}}i32 @__kmpc_master(ptr [[DEFLOC]], i32 [[GTID]])
+// CHECK-NEXT:  [[IS_MASTER:%.+]] = icmp ne i32 [[RES]], 0
+// CHECK-NEXT:  br i1 [[IS_MASTER]], label {{%?}}[[THEN:.+]], label {{%?}}[[EXIT:.+]]
+// CHECK:       [[THEN]]
+// CHECK: call void @__kmpc_taskgroup(ptr [[DEFLOC]], i32 [[GTID]])
+// CHECK: [[TASKV:%.+]] = call ptr @__kmpc_omp_task_alloc(ptr [[DEFLOC]], i32 [[GTID]], i32 1, i64 80, i64 16, ptr [[TASK3:@.+]])
+// CHECK: [[TASK_DATA:%.+]] = getelementptr inbounds nuw %{{.+}}, ptr [[TASKV]], i32 0, i32 0
+// CHECK: [[IF:%.+]] = icmp ne i32 %{{.+}}, 0
+// CHECK: [[IF_INT:%.+]] = sext i1 [[IF]] to i32
+// CHECK: [[DOWN:%.+]] = getelementptr inbounds nuw [[TD_TY:%.+]], ptr [[TASK_DATA]], i32 0, i32 5
+// CHECK: store i64 0, ptr [[DOWN]],
+// CHECK: [[UP:%.+]] = getelementptr inbounds nuw [[TD_TY]], ptr [[TASK_DATA]], i32 0, i32 6
+// CHECK: store i64 %{{.+}}, ptr [[UP]],
+// CHECK: [[ST:%.+]] = getelementptr inbounds nuw [[TD_TY]], ptr [[TASK_DATA]], i32 0, i32 7
+// CHECK: store i64 1, ptr [[ST]],
+// CHECK: [[ST_VAL:%.+]] = load i64, ptr [[ST]],
+// CHECK: call void @__kmpc_taskloop_5(ptr [[DEFLOC]], i32 [[GTID]], ptr [[TASKV]], i32 [[IF_INT]], ptr [[DOWN]], ptr [[UP]], i64 [[ST_VAL]], i32 1, i32 2, i64 4, i32 1, ptr null)
+// CHECK: call void @__kmpc_end_taskgroup(ptr [[DEFLOC]], i32 [[GTID]])
+// CHECK-NEXT:  call {{.*}}void @__kmpc_end_master(ptr [[DEFLOC]], i32 [[GTID]])
+// CHECK-NEXT:  br label {{%?}}[[EXIT]]
+// CHECK:       [[EXIT]]
+  int i;
+#pragma omp master taskloop if(argc) shared(argc, argv) collapse(2) num_tasks(strict: 4)
+  for (i = 0; i < argc; ++i)
+  for (int j = argc; j < argv[argc][argc]; ++j)
+    ;
+// CHECK:       [[RES:%.+]] = call {{.*}}i32 @__kmpc_master(ptr [[DEFLOC]], i32 [[GTID]])
+// CHECK-NEXT:  [[IS_MASTER:%.+]] = icmp ne i32 [[RES]], 0
+// CHECK-NEXT:  br i1 [[IS_MASTER]], label {{%?}}[[THEN:.+]], label {{%?}}[[EXIT:.+]]
+// CHECK:       [[THEN]]
+// CHECK: call void @__kmpc_taskgroup(
+// CHECK: call ptr @__kmpc_omp_task_alloc(ptr @{{.+}}, i32 %{{.+}}, i32 1, i64 80, i64 1, ptr [[TASK_CANCEL:@.+]])
+// CHECK: call void @__kmpc_taskloop(
+// CHECK: call void @__kmpc_end_taskgroup(
+// CHECK-NEXT:  call {{.*}}void @__kmpc_end_master(ptr [[DEFLOC]], i32 [[GTID]])
+// CHECK-NEXT:  br label {{%?}}[[EXIT]]
+// CHECK:       [[EXIT]]
+#pragma omp master taskloop
+  for (int i = 0; i < 10; ++i) {
+#pragma omp cancel taskgroup
+#pragma omp cancellation point taskgroup
+  }
+}
+
+// CHECK: define internal noundef i32 [[TASK1]](
+// CHECK: [[DOWN:%.+]] = getelementptr inbounds nuw [[TD_TY:%.+]], ptr %{{.+}}, i32 0, i32 5
+// CHECK: [[DOWN_VAL:%.+]] = load i64, ptr [[DOWN]],
+// CHECK: [[UP:%.+]] = getelementptr inbounds nuw [[TD_TY]], ptr %{{.+}}, i32 0, i32 6
+// CHECK: [[UP_VAL:%.+]] = load i64, ptr [[UP]],
+// CHECK: [[ST:%.+]] = getelementptr inbounds nuw [[TD_TY]], ptr %{{.+}}, i32 0, i32 7
+// CHECK: [[ST_VAL:%.+]] = load i64, ptr [[ST]],
+// CHECK: [[LITER:%.+]] = getelementptr inbounds nuw [[TD_TY]], ptr %{{.+}}, i32 0, i32 8
+// CHECK: [[LITER_VAL:%.+]] = load i32, ptr [[LITER]],
+// CHECK: store i64 [[DOWN_VAL]], ptr [[LB:%[^,]+]],
+// CHECK: store i64 [[UP_VAL]], ptr [[UB:%[^,]+]],
+// CHECK: store i64 [[ST_VAL]], ptr [[ST:%[^,]+]],
+// CHECK: store i32 [[LITER_VAL]], ptr [[LITER:%[^,]+]],
+// CHECK: [[LB_VAL:%.+]] = load i64, ptr [[LB]],
+// CHECK: [[LB_I32:%.+]] = trunc i64 [[LB_VAL]] to i32
+// CHECK: store i32 [[LB_I32]], ptr [[CNT:%.+]],
+// CHECK: br label
+// CHECK: [[VAL:%.+]] = load i32, ptr [[CNT]],
+// CHECK: [[VAL_I64:%.+]] = sext i32 [[VAL]] to i64
+// CHECK: [[UB_VAL:%.+]] = load i64, ptr [[UB]],
+// CHECK: [[CMP:%.+]] = icmp ule i64 [[VAL_I64]], [[UB_VAL]]
+// CHECK: br i1 [[CMP]], label %{{.+}}, label %{{.+}}
+// CHECK: load i32, ptr %
+// CHECK: store i32 %
+// CHECK: load i32, ptr %
+// CHECK: add nsw i32 %{{.+}}, 1
+// CHECK: store i32 %{{.+}}, ptr %
+// CHECK: br label %
+// CHECK: ret i32 0
+
+// CHECK: define internal noundef i32 [[TASK2]](
+// CHECK: [[DOWN:%.+]] = getelementptr inbounds nuw [[TD_TY:%.+]], ptr %{{.+}}, i32 0, i32 5
+// CHECK: [[DOWN_VAL:%.+]] = load i64, ptr [[DOWN]],
+// CHECK: [[UP:%.+]] = getelementptr inbounds nuw [[TD_TY]], ptr %{{.+}}, i32 0, i32 6
+// CHECK: [[UP_VAL:%.+]] = load i64, ptr [[UP]],
+// CHECK: [[ST:%.+]] = getelementptr inbounds nuw [[TD_TY]], ptr %{{.+}}, i32 0, i32 7
+// CHECK: [[ST_VAL:%.+]] = load i64, ptr [[ST]],
+// CHECK: [[LITER:%.+]] = getelementptr inbounds nuw [[TD_TY]], ptr %{{.+}}, i32 0, i32 8
+// CHECK: [[LITER_VAL:%.+]] = load i32, ptr [[LITER]],
+// CHECK: store i64 [[DOWN_VAL]], ptr [[LB:%[^,]+]],
+// CHECK: store i64 [[UP_VAL]], ptr [[UB:%[^,]+]],
+// CHECK: store i64 [[ST_VAL]], ptr [[ST:%[^,]+]],
+// CHECK: store i32 [[LITER_VAL]], ptr [[LITER:%[^,]+]],
+// CHECK: [[LB_VAL:%.+]] = load i64, ptr [[LB]],
+// CHECK: [[LB_I32:%.+]] = trunc i64 [[LB_VAL]] to i32
+// CHECK: store i32 [[LB_I32]], ptr [[CNT:%.+]],
+// CHECK: br label
+// CHECK: [[VAL:%.+]] = load i32, ptr [[CNT]],
+// CHECK: [[VAL_I64:%.+]] = sext i32 [[VAL]] to i64
+// CHECK: [[UB_VAL:%.+]] = load i64, ptr [[UB]],
+// CHECK: [[CMP:%.+]] = icmp ule i64 [[VAL_I64]], [[UB_VAL]]
+// CHECK: br i1 [[CMP]], label %{{.+}}, label %{{.+}}
+// CHECK: load i32, ptr %
+// CHECK: store i32 %
+// CHECK: load i32, ptr %
+// CHECK: add nsw i32 %{{.+}}, 1
+// CHECK: store i32 %{{.+}}, ptr %
+// CHECK: br label %
+// CHECK: ret i32 0
+
+// CHECK: define internal noundef i32 [[TASK3]](
+// CHECK: [[DOWN:%.+]] = getelementptr inbounds nuw [[TD_TY:%.+]], ptr %{{.+}}, i32 0, i32 5
+// CHECK: [[DOWN_VAL:%.+]] = load i64, ptr [[DOWN]],
+// CHECK: [[UP:%.+]] = getelementptr inbounds nuw [[TD_TY]], ptr %{{.+}}, i32 0, i32 6
+// CHECK: [[UP_VAL:%.+]] = load i64, ptr [[UP]],
+// CHECK: [[ST:%.+]] = getelementptr inbounds nuw [[TD_TY]], ptr %{{.+}}, i32 0, i32 7
+// CHECK: [[ST_VAL:%.+]] = load i64, ptr [[ST]],
+// CHECK: [[LITER:%.+]] = getelementptr inbounds nuw [[TD_TY]], ptr %{{.+}}, i32 0, i32 8
+// CHECK: [[LITER_VAL:%.+]] = load i32, ptr [[LITER]],
+// CHECK: store i64 [[DOWN_VAL]], ptr [[LB:%[^,]+]],
+// CHECK: store i64 [[UP_VAL]], ptr [[UB:%[^,]+]],
+// CHECK: store i64 [[ST_VAL]], ptr [[ST:%[^,]+]],
+// CHECK: store i32 [[LITER_VAL]], ptr [[LITER:%[^,]+]],
+// CHECK: [[LB_VAL:%.+]] = load i64, ptr [[LB]],
+// CHECK: store i64 [[LB_VAL]], ptr [[CNT:%.+]],
+// CHECK: br label
+// CHECK: ret i32 0
+
+// CHECK: define internal noundef i32 [[TASK_CANCEL]](
+// CHECK: [[RES:%.+]] = call i32 @__kmpc_cancel(ptr @{{.+}}, i32 %{{.+}}, i32 4)
+// CHECK: [[IS_CANCEL:%.+]] = icmp ne i32 [[RES]], 0
+// CHECK: br i1 [[IS_CANCEL]], label %[[EXIT:.+]], label %[[CONTINUE:[^,]+]]
+// CHECK: [[EXIT]]:
+// CHECK: store i32 1, ptr [[CLEANUP_SLOT:%.+]],
+// CHECK: br label %[[DONE:[^,]+]]
+// CHECK: [[CONTINUE]]:
+// CHECK: [[RES:%.+]] = call i32 @__kmpc_cancellationpoint(ptr @{{.+}}, i32 %{{.+}}, i32 4)
+// CHECK: [[IS_CANCEL:%.+]] = icmp ne i32 [[RES]], 0
+// CHECK: br i1 [[IS_CANCEL]], label %[[EXIT2:.+]], label %[[CONTINUE2:[^,]+]]
+// CHECK: [[EXIT2]]:
+// CHECK: store i32 1, ptr [[CLEANUP_SLOT]],
+// CHECK: br label %[[DONE]]
+// CHECK: store i32 0, ptr [[CLEANUP_SLOT]],
+// CHECK: br label %[[DONE]]
+// CHECK: [[DONE]]:
+// CHECK: ret i32 0
+
+// CHECK-LABEL: @_ZN1SC2Ei
+struct S {
+  int a;
+  S(int c) {
+// CHECK: [[GTID:%.+]] = call i32 @__kmpc_global_thread_num(ptr [[DEFLOC:@.+]])
+// CHECK: [[TASKV:%.+]] = call ptr @__kmpc_omp_task_alloc(ptr [[DEFLOC]], i32 [[GTID]], i32 1, i64 80, i64 16, ptr [[TASK4:@.+]])
+// CHECK: [[TASK_DATA:%.+]] = getelementptr inbounds nuw %{{.+}}, ptr [[TASKV]], i32 0, i32 0
+// CHECK: [[DOWN:%.+]] = getelementptr inbounds nuw [[TD_TY:%.+]], ptr [[TASK_DATA]], i32 0, i32 5
+// CHECK: store i64 0, ptr [[DOWN]],
+// CHECK: [[UP:%.+]] = getelementptr inbounds nuw [[TD_TY]], ptr [[TASK_DATA]], i32 0, i32 6
+// CHECK: store i64 %{{.+}}, ptr [[UP]],
+// CHECK: [[ST:%.+]] = getelementptr inbounds nuw [[TD_TY]], ptr [[TASK_DATA]], i32 0, i32 7
+// CHECK: store i64 1, ptr [[ST]],
+// CHECK: [[ST_VAL:%.+]] = load i64, ptr [[ST]],
+// CHECK: [[NUM_TASKS:%.+]] = zext i32 %{{.+}} to i64
+// CHECK: call void @__kmpc_taskloop_5(ptr [[DEFLOC]], i32 [[GTID]], ptr [[TASKV]], i32 1, ptr [[DOWN]], ptr [[UP]], i64 [[ST_VAL]], i32 1, i32 2, i64 [[NUM_TASKS]], i32 1, ptr null)
+#pragma omp master taskloop shared(c) num_tasks(strict:a)
+    for (a = 0; a < c; ++a)
+      ;
+  }
+} s(1);
+
+// CHECK: define internal noundef i32 [[TASK4]](
+// CHECK: [[DOWN:%.+]] = getelementptr inbounds nuw [[TD_TY:%.+]], ptr %{{.+}}, i32 0, i32 5
+// CHECK: [[DOWN_VAL:%.+]] = load i64, ptr [[DOWN]],
+// CHECK: [[UP:%.+]] = getelementptr inbounds nuw [[TD_TY]], ptr %{{.+}}, i32 0, i32 6
+// CHECK: [[UP_VAL:%.+]] = load i64, ptr [[UP]],
+// CHECK: [[ST:%.+]] = getelementptr inbounds nuw [[TD_TY]], ptr %{{.+}}, i32 0, i32 7
+// CHECK: [[ST_VAL:%.+]] = load i64, ptr [[ST]],
+// CHECK: [[LITER:%.+]] = getelementptr inbounds nuw [[TD_TY]], ptr %{{.+}}, i32 0, i32 8
+// CHECK: [[LITER_VAL:%.+]] = load i32, ptr [[LITER]],
+// CHECK: store i64 [[DOWN_VAL]], ptr [[LB:%[^,]+]],
+// CHECK: store i64 [[UP_VAL]], ptr [[UB:%[^,]+]],
+// CHECK: store i64 [[ST_VAL]], ptr [[ST:%[^,]+]],
+// CHECK: store i32 [[LITER_VAL]], ptr [[LITER:%[^,]+]],
+// CHECK: [[LB_VAL:%.+]] = load i64, ptr [[LB]],
+// CHECK: [[LB_I32:%.+]] = trunc i64 [[LB_VAL]] to i32
+// CHECK: store i32 [[LB_I32]], ptr [[CNT:%.+]],
+// CHECK: br label
+// CHECK: [[VAL:%.+]] = load i32, ptr [[CNT]],
+// CHECK: [[VAL_I64:%.+]] = sext i32 [[VAL]] to i64
+// CHECK: [[UB_VAL:%.+]] = load i64, ptr [[UB]],
+// CHECK: [[CMP:%.+]] = icmp ule i64 [[VAL_I64]], [[UB_VAL]]
+// CHECK: br i1 [[CMP]], label %{{.+}}, label %{{.+}}
+// CHECK: load i32, ptr %
+// CHECK: store i32 %
+// CHECK: load i32, ptr %
+// CHECK: add nsw i32 %{{.+}}, 1
+// CHECK: store i32 %{{.+}}, ptr %
+// CHECK: br label %
+// CHECK: ret i32 0
+
+#endif
diff --git a/llvm/include/llvm/Frontend/OpenMP/OMPKinds.def b/llvm/include/llvm/Frontend/OpenMP/OMPKinds.def
index 6f26f853eca032..928a03148c4165 100644
--- a/llvm/include/llvm/Frontend/OpenMP/OMPKinds.def
+++ b/llvm/include/llvm/Frontend/OpenMP/OMPKinds.def
@@ -365,6 +365,9 @@ __OMP_RTL(__kmpc_omp_task_with_deps, false, Int32, IdentPtr, Int32,
 __OMP_RTL(__kmpc_taskloop, false, Void, IdentPtr, /* Int */ Int32, VoidPtr,
           /* Int */ Int32, Int64Ptr, Int64Ptr, Int64, /* Int */ Int32,
           /* Int */ Int32, Int64, VoidPtr)
+__OMP_RTL(__kmpc_taskloop_5, false, Void, IdentPtr, /* Int */ Int32, VoidPtr,
+          /* Int */ Int32, Int64Ptr, Int64Ptr, Int64, /* Int */ Int32,
+          /* Int */ Int32, Int64, Int32, VoidPtr)
 __OMP_RTL(__kmpc_omp_target_task_alloc, false, /* kmp_task_t */ VoidPtr,
           IdentPtr, Int32, Int32, SizeTy, SizeTy, TaskRoutineEntryPtr, Int64)
 __OMP_RTL(__kmpc_taskred_modifier_init, false, /* kmp_taskgroup */ VoidPtr,

Copy link

github-actions bot commented Nov 21, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

Result.TaskDupFn, CGF.VoidPtrTy)
: llvm::ConstantPointerNull::get(CGF.VoidPtrTy));
if (Data.HasModifier)
CGF.EmitRuntimeCall(OMPBuilder.getOrCreateRuntimeFunction(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you just use one EmitRuntimeCall here and check Data.HasModifier inside?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed as suggested !!

@chandraghale chandraghale merged commit 76e6c8d into llvm:main Nov 28, 2024
8 checks passed
@llvm-ci
Copy link
Collaborator

llvm-ci commented Nov 28, 2024

LLVM Buildbot has detected a new failure on builder openmp-offload-libc-amdgpu-runtime running on omp-vega20-1 while building clang,llvm at step 7 "Add check check-offload".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/73/builds/9286

Here is the relevant piece of the build log for the reference
Step 7 (Add check check-offload) failure: test (failure)
******************** TEST 'libomptarget :: amdgcn-amd-amdhsa :: offloading/thread_state_1.c' FAILED ********************
Exit Code: 2

Command Output (stdout):
--
# RUN: at line 1
/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/clang -fopenmp    -I /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/test -I /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -L /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -L /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./lib -L /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src  -nogpulib -Wl,-rpath,/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -Wl,-rpath,/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -Wl,-rpath,/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./lib  -fopenmp-targets=amdgcn-amd-amdhsa /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/test/offloading/thread_state_1.c -o /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/offloading/Output/thread_state_1.c.tmp -Xoffload-linker -lc -Xoffload-linker -lm /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./lib/libomptarget.devicertl.a && /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/offloading/Output/thread_state_1.c.tmp | /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/FileCheck /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/test/offloading/thread_state_1.c
# executed command: /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/clang -fopenmp -I /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/test -I /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -L /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -L /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./lib -L /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -nogpulib -Wl,-rpath,/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -Wl,-rpath,/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -Wl,-rpath,/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./lib -fopenmp-targets=amdgcn-amd-amdhsa /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/test/offloading/thread_state_1.c -o /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/offloading/Output/thread_state_1.c.tmp -Xoffload-linker -lc -Xoffload-linker -lm /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./lib/libomptarget.devicertl.a
# executed command: /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/offloading/Output/thread_state_1.c.tmp
# executed command: /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/FileCheck /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/test/offloading/thread_state_1.c
# RUN: at line 2
/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/clang -fopenmp    -I /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/test -I /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -L /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -L /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./lib -L /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src  -nogpulib -Wl,-rpath,/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -Wl,-rpath,/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -Wl,-rpath,/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./lib  -fopenmp-targets=amdgcn-amd-amdhsa -O3 /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/test/offloading/thread_state_1.c -o /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/offloading/Output/thread_state_1.c.tmp -Xoffload-linker -lc -Xoffload-linker -lm /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./lib/libomptarget.devicertl.a && /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/offloading/Output/thread_state_1.c.tmp | /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/FileCheck /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/test/offloading/thread_state_1.c
# executed command: /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/clang -fopenmp -I /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/test -I /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -L /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -L /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./lib -L /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -nogpulib -Wl,-rpath,/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -Wl,-rpath,/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -Wl,-rpath,/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./lib -fopenmp-targets=amdgcn-amd-amdhsa -O3 /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/test/offloading/thread_state_1.c -o /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/offloading/Output/thread_state_1.c.tmp -Xoffload-linker -lc -Xoffload-linker -lm /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./lib/libomptarget.devicertl.a
# executed command: /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/offloading/Output/thread_state_1.c.tmp
# .---command stderr------------
# | AMDGPU error: Error in hsa_amd_memory_pool_allocate: HSA_STATUS_ERROR_OUT_OF_RESOURCES: The runtime failed to allocate the necessary resources. This error may also occur when the core runtime library needs to spawn threads or create internal OS-specific events.
# | AMDGPU error: Error in hsa_amd_memory_pool_allocate: HSA_STATUS_ERROR_OUT_OF_RESOURCES: The runtime failed to allocate the necessary resources. This error may also occur when the core runtime library needs to spawn threads or create internal OS-specific events.
# | "PluginInterface" error: Failure to allocate device memory for global memory pool: Failed to allocate from memory manager
# | Display only launched kernel:
# | Kernel 'omp target in main @ 11 (__omp_offloading_802_d82835f_main_l11)'
# | OFFLOAD ERROR: Memory access fault by GPU 1 (agent 0x561e87e1f7d0) at virtual address (nil). Reasons: Page not present or supervisor privilege, Write access to a read-only page
# | Use 'OFFLOAD_TRACK_ALLOCATION_TRACES=true' to track device allocations
# `-----------------------------
# error: command failed with exit status: -6
# executed command: /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/FileCheck /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/test/offloading/thread_state_1.c
# .---command stderr------------
# | FileCheck error: '<stdin>' is empty.
# | FileCheck command line:  /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/FileCheck /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/test/offloading/thread_state_1.c
# `-----------------------------
# error: command failed with exit status: 2

--

********************


@llvm-ci
Copy link
Collaborator

llvm-ci commented Nov 28, 2024

LLVM Buildbot has detected a new failure on builder libc-x86_64-debian-fullbuild-dbg-asan running on libc-x86_64-debian-fullbuild while building clang,llvm at step 4 "annotate".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/171/builds/11252

Here is the relevant piece of the build log for the reference
Step 4 (annotate) failure: 'python ../llvm-zorg/zorg/buildbot/builders/annotated/libc-linux.py ...' (failure)
...
[==========] Running 4 tests from 1 test suite.
[ RUN      ] LlvmLibcHashTest.SanityCheck
[       OK ] LlvmLibcHashTest.SanityCheck (16 ms)
[ RUN      ] LlvmLibcHashTest.Avalanche
[       OK ] LlvmLibcHashTest.Avalanche (2144 ms)
[ RUN      ] LlvmLibcHashTest.UniformLSB
[       OK ] LlvmLibcHashTest.UniformLSB (202 ms)
[ RUN      ] LlvmLibcHashTest.UniformMSB
[       OK ] LlvmLibcHashTest.UniformMSB (135 us)
Ran 4 tests.  PASS: 4  FAIL: 0
command timed out: 1200 seconds without output running [b'python', b'../llvm-zorg/zorg/buildbot/builders/annotated/libc-linux.py', b'--debug', b'--asan'], attempting to kill
process killed by signal 9
program finished with exit code -1
elapsedTime=1259.659335
Step 8 (libc-unit-tests) failure: libc-unit-tests (failure)
...
[ RUN      ] LlvmLibcStrtoint32Test.InvalidBase
[       OK ] LlvmLibcStrtoint32Test.InvalidBase (51 us)
[ RUN      ] LlvmLibcStrtoint32Test.CleanBaseTenDecode
[       OK ] LlvmLibcStrtoint32Test.CleanBaseTenDecode (122 us)
[ RUN      ] LlvmLibcStrtoint32Test.MessyBaseTenDecode
[       OK ] LlvmLibcStrtoint32Test.MessyBaseTenDecode (65 us)
[ RUN      ] LlvmLibcStrtoint32Test.DecodeInOtherBases
[       OK ] LlvmLibcStrtoint32Test.DecodeInOtherBases (462 ms)
[ RUN      ] LlvmLibcStrtoint32Test.CleanBaseSixteenDecode
[       OK ] LlvmLibcStrtoint32Test.CleanBaseSixteenDecode (111 us)
[ RUN      ] LlvmLibcStrtoint32Test.MessyBaseSixteenDecode
[       OK ] LlvmLibcStrtoint32Test.MessyBaseSixteenDecode (75 us)
[ RUN      ] LlvmLibcStrtoint32Test.AutomaticBaseSelection
[       OK ] LlvmLibcStrtoint32Test.AutomaticBaseSelection (28 us)
[ RUN      ] LlvmLibcStrtouint32Test.InvalidBase
[       OK ] LlvmLibcStrtouint32Test.InvalidBase (43 us)
[ RUN      ] LlvmLibcStrtouint32Test.CleanBaseTenDecode
[       OK ] LlvmLibcStrtouint32Test.CleanBaseTenDecode (52 us)
[ RUN      ] LlvmLibcStrtouint32Test.MessyBaseTenDecode
[       OK ] LlvmLibcStrtouint32Test.MessyBaseTenDecode (74 us)
[ RUN      ] LlvmLibcStrtouint32Test.DecodeInOtherBases
[       OK ] LlvmLibcStrtouint32Test.DecodeInOtherBases (227 ms)
[ RUN      ] LlvmLibcStrtouint32Test.CleanBaseSixteenDecode
[       OK ] LlvmLibcStrtouint32Test.CleanBaseSixteenDecode (71 us)
[ RUN      ] LlvmLibcStrtouint32Test.MessyBaseSixteenDecode
[       OK ] LlvmLibcStrtouint32Test.MessyBaseSixteenDecode (43 us)
[ RUN      ] LlvmLibcStrtouint32Test.AutomaticBaseSelection
[       OK ] LlvmLibcStrtouint32Test.AutomaticBaseSelection (9 us)
Ran 14 tests.  PASS: 14  FAIL: 0
[1096/1098] Running unit test libc.test.src.time.nanosleep_test.__unit__
[==========] Running 1 test from 1 test suite.
[ RUN      ] LlvmLibcNanosleep.SmokeTest
[       OK ] LlvmLibcNanosleep.SmokeTest (132 us)
Ran 1 tests.  PASS: 1  FAIL: 0
[1097/1098] Running unit test libc.test.src.__support.hash_test.__unit__
[==========] Running 4 tests from 1 test suite.
[ RUN      ] LlvmLibcHashTest.SanityCheck
[       OK ] LlvmLibcHashTest.SanityCheck (16 ms)
[ RUN      ] LlvmLibcHashTest.Avalanche
[       OK ] LlvmLibcHashTest.Avalanche (2144 ms)
[ RUN      ] LlvmLibcHashTest.UniformLSB
[       OK ] LlvmLibcHashTest.UniformLSB (202 ms)
[ RUN      ] LlvmLibcHashTest.UniformMSB
[       OK ] LlvmLibcHashTest.UniformMSB (135 us)
Ran 4 tests.  PASS: 4  FAIL: 0

command timed out: 1200 seconds without output running [b'python', b'../llvm-zorg/zorg/buildbot/builders/annotated/libc-linux.py', b'--debug', b'--asan'], attempting to kill
process killed by signal 9
program finished with exit code -1
elapsedTime=1259.659335

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:codegen IR generation bugs: mangling, exceptions, etc. clang:openmp OpenMP related changes to Clang clang Clang issues not falling into any other category flang:openmp
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants