[OpenMP][CodeGen] Improved codegen for combined loop directives #72417

ddpagan · 2023-11-15T17:39:17Z

IR for 'target teams loop' is now dependent on suitability of associated loop-nest.

If a loop-nest:

does not contain a function call, or
the -fopenmp-assume-no-nested-parallelism has been specified,
or the call is to an OpenMP API AND
does not contain nested loop bind(parallel) directives

then it can be emitted as 'target teams distribute parallel for', which is the current default. Otherwise, it is emitted as 'target teams distribute'.

Added debug output indicating how 'target teams loop' was emitted. Flag is -mllvm -debug-only=target-teams-loop-codegen

Added LIT tests explicitly verifying 'target teams loop' emitted as a parallel loop and a distribute loop.

Updated other 'loop' related tests as needed to reflect change in IR.

These updates account for most of the changed files and additions/deletions.

llvmbot · 2023-11-15T17:39:48Z

@llvm/pr-subscribers-clang-modules
@llvm/pr-subscribers-clang-codegen

@llvm/pr-subscribers-clang

Author: David Pagan (ddpagan)

Changes

IR for 'target teams loop' is now dependent on suitability of associated loop-nest.

If a loop-nest:

does not contain a function call, or
the -fopenmp-assume-no-nested-parallelism has been specified,
or the call is to an OpenMP API AND
does not contain nested loop bind(parallel) directives

then it can be emitted as 'target teams distribute parallel for', which is the current default. Otherwise, it is emitted as 'target teams distribute'.

Added debug output indicating how 'target teams loop' was emitted. Flag is -mllvm -debug-only=target-teams-loop-codegen

Added LIT tests explicitly verifying 'target teams loop' emitted as a parallel loop and a distribute loop.

Updated other 'loop' related tests as needed to reflect change in IR.

These updates account for most of the changed files and additions/deletions.

Patch is 1.06 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/72417.diff

21 Files Affected:

(modified) clang/lib/CodeGen/CGOpenMPRuntime.cpp (+6-3)
(modified) clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp (+7-3)
(modified) clang/lib/CodeGen/CGStmtOpenMP.cpp (+51-14)
(modified) clang/lib/CodeGen/CodeGenModule.cpp (+93)
(modified) clang/lib/CodeGen/CodeGenModule.h (+8)
(modified) clang/lib/Sema/SemaOpenMP.cpp (+8-2)
(modified) clang/test/OpenMP/nvptx_target_teams_generic_loop_codegen.cpp (+72-42)
(modified) clang/test/OpenMP/nvptx_target_teams_generic_loop_generic_mode_codegen.cpp (+82-315)
(modified) clang/test/OpenMP/target_teams_generic_loop_codegen-1.cpp (+25-25)
(modified) clang/test/OpenMP/target_teams_generic_loop_codegen.cpp (+17-21)
(added) clang/test/OpenMP/target_teams_generic_loop_codegen_as_distribute.cpp (+1685)
(added) clang/test/OpenMP/target_teams_generic_loop_codegen_as_parallel_for.cpp (+3998)
(modified) clang/test/OpenMP/target_teams_generic_loop_collapse_codegen.cpp (+16-16)
(modified) clang/test/OpenMP/target_teams_generic_loop_if_codegen.cpp (+118-600)
(modified) clang/test/OpenMP/target_teams_generic_loop_private_codegen.cpp (+242-1200)
(modified) clang/test/OpenMP/target_teams_generic_loop_uses_allocators_codegen.cpp (+3-3)
(modified) clang/test/OpenMP/teams_generic_loop_codegen-1.cpp (+146-1210)
(modified) clang/test/OpenMP/teams_generic_loop_codegen.cpp (+156-510)
(modified) clang/test/OpenMP/teams_generic_loop_collapse_codegen.cpp (+142-678)
(modified) clang/test/OpenMP/teams_generic_loop_private_codegen.cpp (+101-580)
(modified) clang/test/OpenMP/teams_generic_loop_reduction_codegen.cpp (+122-689)

diff --git a/clang/lib/CodeGen/CGOpenMPRuntime.cpp b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
index d2be8141a3a4b31..97f963df7ad67b4 100644
--- a/clang/lib/CodeGen/CGOpenMPRuntime.cpp
+++ b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
@@ -2644,7 +2644,8 @@ void CGOpenMPRuntime::emitForStaticFinish(CodeGenFunction &CGF,
   // Call __kmpc_for_static_fini(ident_t *loc, kmp_int32 tid);
   llvm::Value *Args[] = {
       emitUpdateLocation(CGF, Loc,
-                         isOpenMPDistributeDirective(DKind)
+                         isOpenMPDistributeDirective(DKind) ||
+                         (DKind == OMPD_target_teams_loop)
                              ? OMP_IDENT_WORK_DISTRIBUTE
                              : isOpenMPLoopDirective(DKind)
                                    ? OMP_IDENT_WORK_LOOP
@@ -8779,7 +8780,8 @@ getNestedDistributeDirective(ASTContext &Ctx, const OMPExecutableDirective &D) {
     OpenMPDirectiveKind DKind = NestedDir->getDirectiveKind();
     switch (D.getDirectiveKind()) {
     case OMPD_target:
-      // For now, just treat 'target teams loop' as if it's distributed.
+      // For now, treat 'target' with nested 'teams loop' as if it's
+      // distributed (target teams distribute).
       if (isOpenMPDistributeDirective(DKind) || DKind == OMPD_teams_loop)
         return NestedDir;
       if (DKind == OMPD_teams) {
@@ -9263,7 +9265,8 @@ llvm::Value *CGOpenMPRuntime::emitTargetNumIterationsCall(
         SizeEmitter) {
   OpenMPDirectiveKind Kind = D.getDirectiveKind();
   const OMPExecutableDirective *TD = &D;
-  // Get nested teams distribute kind directive, if any.
+  // Get nested teams distribute kind directive, if any. For now, treat
+  // 'target_teams_loop' as if it's really a target_teams_distribute.
   if ((!isOpenMPDistributeDirective(Kind) || !isOpenMPTeamsDirective(Kind)) &&
       Kind != OMPD_target_teams_loop)
     TD = getNestedDistributeDirective(CGM.getContext(), D);
diff --git a/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp b/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
index abecf5250f4cf96..2e1cf1ed3abf40c 100644
--- a/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
+++ b/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
@@ -639,14 +639,14 @@ static bool hasNestedSPMDDirective(ASTContext &Ctx,
   return false;
 }
 
-static bool supportsSPMDExecutionMode(ASTContext &Ctx,
+static bool supportsSPMDExecutionMode(CodeGenModule &CGM,
                                       const OMPExecutableDirective &D) {
+  ASTContext &Ctx = CGM.getContext();
   OpenMPDirectiveKind DirectiveKind = D.getDirectiveKind();
   switch (DirectiveKind) {
   case OMPD_target:
   case OMPD_target_teams:
     return hasNestedSPMDDirective(Ctx, D);
-  case OMPD_target_teams_loop:
   case OMPD_target_parallel_loop:
   case OMPD_target_parallel:
   case OMPD_target_parallel_for:
@@ -658,6 +658,10 @@ static bool supportsSPMDExecutionMode(ASTContext &Ctx,
     return true;
   case OMPD_target_teams_distribute:
     return false;
+  case OMPD_target_teams_loop:
+    // Whether this is true or not depends on how the directive will
+    // eventually be emitted.
+    return CGM.teamsLoopCanBeParallelFor(D);
   case OMPD_parallel:
   case OMPD_for:
   case OMPD_parallel_for:
@@ -870,7 +874,7 @@ void CGOpenMPRuntimeGPU::emitTargetOutlinedFunction(
 
   assert(!ParentName.empty() && "Invalid target region parent name!");
 
-  bool Mode = supportsSPMDExecutionMode(CGM.getContext(), D);
+  bool Mode = supportsSPMDExecutionMode(CGM, D);
   bool IsBareKernel = D.getSingleClause<OMPXBareClause>();
   if (Mode || IsBareKernel)
     emitSPMDKernel(D, ParentName, OutlinedFn, OutlinedFnID, IsOffloadEntry,
diff --git a/clang/lib/CodeGen/CGStmtOpenMP.cpp b/clang/lib/CodeGen/CGStmtOpenMP.cpp
index 62b056c5d08a18c..16f84cab181b427 100644
--- a/clang/lib/CodeGen/CGStmtOpenMP.cpp
+++ b/clang/lib/CodeGen/CGStmtOpenMP.cpp
@@ -34,11 +34,14 @@
 #include "llvm/IR/IntrinsicInst.h"
 #include "llvm/IR/Metadata.h"
 #include "llvm/Support/AtomicOrdering.h"
+#include "llvm/Support/Debug.h"
 #include <optional>
 using namespace clang;
 using namespace CodeGen;
 using namespace llvm::omp;
 
+#define TTL_CODEGEN_TYPE "target-teams-loop-codegen"
+
 static const VarDecl *getBaseDecl(const Expr *Ref);
 
 namespace {
@@ -1435,6 +1438,7 @@ void CodeGenFunction::EmitOMPReductionClauseFinal(
     }
     bool WithNowait = D.getSingleClause<OMPNowaitClause>() ||
                       isOpenMPParallelDirective(D.getDirectiveKind()) ||
+                      CGM.teamsLoopCanBeParallelFor(D) ||
                       ReductionKind == OMPD_simd;
     bool SimpleReduction = ReductionKind == OMPD_simd;
     // Emit nowait reduction if nowait clause is present or directive is a
@@ -7876,11 +7880,9 @@ void CodeGenFunction::EmitOMPParallelGenericLoopDirective(
 void CodeGenFunction::EmitOMPTeamsGenericLoopDirective(
     const OMPTeamsGenericLoopDirective &S) {
   // To be consistent with current behavior of 'target teams loop', emit
-  // 'teams loop' as if its constituent constructs are 'distribute,
-  // 'parallel, and 'for'.
+  // 'teams loop' as if its constituent constructs are 'teams' and 'distribute'.
   auto &&CodeGenDistribute = [&S](CodeGenFunction &CGF, PrePostActionTy &) {
-    CGF.EmitOMPDistributeLoop(S, emitInnerParallelForWhenCombined,
-                              S.getDistInc());
+    CGF.EmitOMPDistributeLoop(S, emitOMPLoopBodyWithStopPoint, S.getInc());
   };
 
   // Emit teams region as a standalone region.
@@ -7894,15 +7896,14 @@ void CodeGenFunction::EmitOMPTeamsGenericLoopDirective(
                                                     CodeGenDistribute);
     CGF.EmitOMPReductionClauseFinal(S, /*ReductionKind=*/OMPD_teams);
   };
-  emitCommonOMPTeamsDirective(*this, S, OMPD_distribute_parallel_for, CodeGen);
+  emitCommonOMPTeamsDirective(*this, S, OMPD_distribute, CodeGen);
   emitPostUpdateForReductionClause(*this, S,
                                    [](CodeGenFunction &) { return nullptr; });
 }
 
-static void
-emitTargetTeamsGenericLoopRegion(CodeGenFunction &CGF,
-                                 const OMPTargetTeamsGenericLoopDirective &S,
-                                 PrePostActionTy &Action) {
+static void emitTargetTeamsGenericLoopRegionAsParallel(
+    CodeGenFunction &CGF, PrePostActionTy &Action,
+    const OMPTargetTeamsGenericLoopDirective &S) {
   Action.Enter(CGF);
   // Emit 'teams loop' as if its constituent constructs are 'distribute,
   // 'parallel, and 'for'.
@@ -7922,19 +7923,52 @@ emitTargetTeamsGenericLoopRegion(CodeGenFunction &CGF,
         CGF, OMPD_distribute, CodeGenDistribute, /*HasCancel=*/false);
     CGF.EmitOMPReductionClauseFinal(S, /*ReductionKind=*/OMPD_teams);
   };
-
+  DEBUG_WITH_TYPE(TTL_CODEGEN_TYPE,
+      CGF.CGM.emitTargetTeamsLoopCodegenStatus(
+          TTL_CODEGEN_TYPE " as parallel for", S,
+          CGF.CGM.getLangOpts().OpenMPIsTargetDevice));
   emitCommonOMPTeamsDirective(CGF, S, OMPD_distribute_parallel_for,
                               CodeGenTeams);
   emitPostUpdateForReductionClause(CGF, S,
                                    [](CodeGenFunction &) { return nullptr; });
 }
 
-/// Emit combined directive 'target teams loop' as if its constituent
-/// constructs are 'target', 'teams', 'distribute', 'parallel', and 'for'.
+static void emitTargetTeamsGenericLoopRegionAsDistribute(
+    CodeGenFunction &CGF, PrePostActionTy &Action,
+    const OMPTargetTeamsGenericLoopDirective &S) {
+  Action.Enter(CGF);
+  // Emit 'teams loop' as if its constituent construct is 'distribute'.
+  auto &&CodeGenDistribute = [&S](CodeGenFunction &CGF, PrePostActionTy &) {
+    CGF.EmitOMPDistributeLoop(S, emitOMPLoopBodyWithStopPoint, S.getInc());
+  };
+
+  // Emit teams region as a standalone region.
+  auto &&CodeGen = [&S, &CodeGenDistribute](CodeGenFunction &CGF,
+                                            PrePostActionTy &Action) {
+    Action.Enter(CGF);
+    CodeGenFunction::OMPPrivateScope PrivateScope(CGF);
+    CGF.EmitOMPReductionClauseInit(S, PrivateScope);
+    (void)PrivateScope.Privatize();
+    CGF.CGM.getOpenMPRuntime().emitInlinedDirective(
+        CGF, OMPD_distribute, CodeGenDistribute, /*HasCancel=*/false);
+    CGF.EmitOMPReductionClauseFinal(S, /*ReductionKind=*/OMPD_teams);
+  };
+  DEBUG_WITH_TYPE(TTL_CODEGEN_TYPE,
+      CGF.CGM.emitTargetTeamsLoopCodegenStatus(
+          TTL_CODEGEN_TYPE " as distribute", S,
+          CGF.CGM.getLangOpts().OpenMPIsTargetDevice));
+  emitCommonOMPTeamsDirective(CGF, S, OMPD_distribute, CodeGen);
+  emitPostUpdateForReductionClause(CGF, S,
+                                   [](CodeGenFunction &) { return nullptr; });
+}
+
 void CodeGenFunction::EmitOMPTargetTeamsGenericLoopDirective(
     const OMPTargetTeamsGenericLoopDirective &S) {
   auto &&CodeGen = [&S](CodeGenFunction &CGF, PrePostActionTy &Action) {
-    emitTargetTeamsGenericLoopRegion(CGF, S, Action);
+    if (CGF.CGM.teamsLoopCanBeParallelFor(S))
+      emitTargetTeamsGenericLoopRegionAsParallel(CGF, Action, S);
+    else
+      emitTargetTeamsGenericLoopRegionAsDistribute(CGF, Action, S);
   };
   emitCommonOMPTargetDirective(*this, S, CodeGen);
 }
@@ -7944,7 +7978,10 @@ void CodeGenFunction::EmitOMPTargetTeamsGenericLoopDeviceFunction(
     const OMPTargetTeamsGenericLoopDirective &S) {
   // Emit SPMD target parallel loop region as a standalone region.
   auto &&CodeGen = [&S](CodeGenFunction &CGF, PrePostActionTy &Action) {
-    emitTargetTeamsGenericLoopRegion(CGF, S, Action);
+    if (CGF.CGM.teamsLoopCanBeParallelFor(S))
+      emitTargetTeamsGenericLoopRegionAsParallel(CGF, Action, S);
+    else
+      emitTargetTeamsGenericLoopRegionAsDistribute(CGF, Action, S);
   };
   llvm::Function *Fn;
   llvm::Constant *Addr;
diff --git a/clang/lib/CodeGen/CodeGenModule.cpp b/clang/lib/CodeGen/CodeGenModule.cpp
index f1b900be74b2cdf..0dd389b3a8f1338 100644
--- a/clang/lib/CodeGen/CodeGenModule.cpp
+++ b/clang/lib/CodeGen/CodeGenModule.cpp
@@ -7485,6 +7485,99 @@ void CodeGenModule::printPostfixForExternalizedDecl(llvm::raw_ostream &OS,
   }
 }
 
+namespace {
+/// A 'teams loop' with a nested 'loop bind(parallel)' or generic function
+/// call in the associated loop-nest cannot be a 'parllel for'.
+class TeamsLoopChecker final : public ConstStmtVisitor<TeamsLoopChecker> {
+public:
+  TeamsLoopChecker(CodeGenModule &CGM)
+      : CGM(CGM), TeamsLoopCanBeParallelFor{true} {}
+  bool teamsLoopCanBeParallelFor() const {
+    return TeamsLoopCanBeParallelFor;
+  }
+  // Is there a nested OpenMP loop bind(parallel)
+  void VisitOMPExecutableDirective(const OMPExecutableDirective *D) {
+    if (D->getDirectiveKind() == llvm::omp::Directive::OMPD_loop) {
+      if (const auto *C = D->getSingleClause<OMPBindClause>())
+        if (C->getBindKind() == OMPC_BIND_parallel) {
+          TeamsLoopCanBeParallelFor = false;
+          // No need to continue visiting any more
+          return;
+        }
+    }
+    for (const Stmt *Child : D->children())
+      if (Child)
+        Visit(Child);
+  }
+
+  void VisitCallExpr(const CallExpr *C) {
+    // Function calls inhibit parallel loop translation of 'target teams loop'
+    // unless the assume-no-nested-parallelism flag has been specified.
+    // OpenMP API runtime library calls do not inhibit parallel loop
+    // translation, regardless of the assume-no-nested-parallelism.
+    if (C) {
+      bool IsOpenMPAPI = false;
+      auto *FD = dyn_cast_or_null<FunctionDecl>(C->getCalleeDecl());
+      if (FD) {
+        std::string Name = FD->getNameInfo().getAsString();
+        IsOpenMPAPI = Name.find("omp_") == 0;
+      }
+      TeamsLoopCanBeParallelFor =
+          IsOpenMPAPI || CGM.getLangOpts().OpenMPNoNestedParallelism;
+      if (!TeamsLoopCanBeParallelFor)
+        return;
+    }
+    for (const Stmt *Child : C->children())
+      if (Child)
+        Visit(Child);
+  }
+
+  void VisitCapturedStmt(const CapturedStmt *S) {
+    if (!S)
+      return;
+    Visit(S->getCapturedDecl()->getBody());
+  }
+
+  void VisitStmt(const Stmt *S) {
+    if (!S)
+      return;
+    for (const Stmt *Child : S->children())
+      if (Child)
+        Visit(Child);
+  }
+
+private:
+  CodeGenModule &CGM;
+  bool TeamsLoopCanBeParallelFor;
+};
+} // namespace
+
+/// Determine if 'teams loop' can be emitted using 'parallel for'.
+bool CodeGenModule::teamsLoopCanBeParallelFor(const OMPExecutableDirective &D) {
+  if (D.getDirectiveKind() != llvm::omp::Directive::OMPD_target_teams_loop)
+    return false;
+  assert(D.hasAssociatedStmt() &&
+      "Loop directive must have associated statement.");
+  TeamsLoopChecker Checker(*this);
+  Checker.Visit(D.getAssociatedStmt());
+  return Checker.teamsLoopCanBeParallelFor();
+}
+
+void CodeGenModule::emitTargetTeamsLoopCodegenStatus(
+    std::string StatusMsg, const OMPExecutableDirective &D, bool IsDevice) {
+  if (IsDevice)
+    StatusMsg += ": DEVICE";
+  else
+    StatusMsg += ": HOST";
+  SourceLocation L = D.getBeginLoc();
+  SourceManager &SM = getContext().getSourceManager();
+  PresumedLoc PLoc = SM.getPresumedLoc(L);
+  const char *FileName = PLoc.isValid() ? PLoc.getFilename() : nullptr;
+  unsigned LineNo =
+      PLoc.isValid() ? PLoc.getLine() : SM.getExpansionLineNumber(L);
+  llvm::dbgs() << StatusMsg << ": " << FileName << ": " << LineNo << "\n";
+}
+
 void CodeGenModule::moveLazyEmissionStates(CodeGenModule *NewBuilder) {
   assert(DeferredDeclsToEmit.empty() &&
          "Should have emitted all decls deferred to emit.");
diff --git a/clang/lib/CodeGen/CodeGenModule.h b/clang/lib/CodeGen/CodeGenModule.h
index e81edc979c208b1..3f1a64b47a64bde 100644
--- a/clang/lib/CodeGen/CodeGenModule.h
+++ b/clang/lib/CodeGen/CodeGenModule.h
@@ -1528,6 +1528,8 @@ class CodeGenModule : public CodeGenTypeCache {
                                            LValueBaseInfo *BaseInfo = nullptr,
                                            TBAAAccessInfo *TBAAInfo = nullptr);
   bool stopAutoInit();
+  /// Determine if 'teams loop' can be emitted using 'parallel for'.
+  bool teamsLoopCanBeParallelFor(const OMPExecutableDirective &D);
 
   /// Print the postfix for externalized static variable or kernels for single
   /// source offloading languages CUDA and HIP. The unique postfix is created
@@ -1537,6 +1539,12 @@ class CodeGenModule : public CodeGenTypeCache {
   void printPostfixForExternalizedDecl(llvm::raw_ostream &OS,
                                        const Decl *D) const;
 
+  /// Under debug mode, print status of target teams loop transformation,
+  /// which should be either '#distribute' or '#parallel for'
+  void emitTargetTeamsLoopCodegenStatus(std::string StatusMsg,
+                                        const OMPExecutableDirective &D,
+                                        bool IsDevice);
+
   /// Move some lazily-emitted states to the NewBuilder. This is especially
   /// essential for the incremental parsing environment like Clang Interpreter,
   /// because we'll lose all important information after each repl.
diff --git a/clang/lib/Sema/SemaOpenMP.cpp b/clang/lib/Sema/SemaOpenMP.cpp
index dcdd6e7a3f5c762..8cd2adff191e01a 100644
--- a/clang/lib/Sema/SemaOpenMP.cpp
+++ b/clang/lib/Sema/SemaOpenMP.cpp
@@ -4478,6 +4478,8 @@ void Sema::ActOnOpenMPRegionStart(OpenMPDirectiveKind DKind, Scope *CurScope) {
                              Params);
     break;
   }
+  // For 'target teams loop', collect all captured regions so codegen can
+  // later decide the best IR to emit given the associated loop-nest.
   case OMPD_target_teams_loop:
   case OMPD_target_teams_distribute_parallel_for:
   case OMPD_target_teams_distribute_parallel_for_simd: {
@@ -15573,6 +15575,12 @@ static OpenMPDirectiveKind getOpenMPCaptureRegionForClause(
       if (NameModifier == OMPD_unknown || NameModifier == OMPD_parallel)
         CaptureRegion = OMPD_target;
       break;
+    case OMPD_teams_loop:
+    case OMPD_target_teams_loop:
+      // For [target] teams loop, assume capture region is 'teams' so it's
+      // available for codegen later to use if/when necessary.
+      CaptureRegion = OMPD_teams;
+      break;
     case OMPD_target_teams_distribute_parallel_for_simd:
       if (OpenMPVersion >= 50 &&
           (NameModifier == OMPD_unknown || NameModifier == OMPD_simd)) {
@@ -15580,7 +15588,6 @@ static OpenMPDirectiveKind getOpenMPCaptureRegionForClause(
         break;
       }
       [[fallthrough]];
-    case OMPD_target_teams_loop:
     case OMPD_target_teams_distribute_parallel_for:
       // If this clause applies to the nested 'parallel' region, capture within
       // the 'teams' region, otherwise do not capture.
@@ -15703,7 +15710,6 @@ static OpenMPDirectiveKind getOpenMPCaptureRegionForClause(
     case OMPD_declare_target:
     case OMPD_end_declare_target:
     case OMPD_loop:
-    case OMPD_teams_loop:
     case OMPD_teams:
     case OMPD_tile:
     case OMPD_unroll:
diff --git a/clang/test/OpenMP/nvptx_target_teams_generic_loop_codegen.cpp b/clang/test/OpenMP/nvptx_target_teams_generic_loop_codegen.cpp
index fc83500a09f9846..ef07e607d1dbbf2 100644
--- a/clang/test/OpenMP/nvptx_target_teams_generic_loop_codegen.cpp
+++ b/clang/test/OpenMP/nvptx_target_teams_generic_loop_codegen.cpp
@@ -219,7 +219,7 @@ int bar(int n){
 // CHECK1:       omp.loop.exit:
 // CHECK1-NEXT:    [[TMP40:%.*]] = load ptr, ptr [[DOTGLOBAL_TID__ADDR]], align 8
 // CHECK1-NEXT:    [[TMP41:%.*]] = load i32, ptr [[TMP40]], align 4
-// CHECK1-NEXT:    call void @__kmpc_for_static_fini(ptr @[[GLOB3:[0-9]+]], i32 [[TMP41]])
+// CHECK1-NEXT:    call void @__kmpc_for_static_fini(ptr @[[GLOB2]], i32 [[TMP41]])
 // CHECK1-NEXT:    br label [[OMP_PRECOND_END]]
 // CHECK1:       omp.precond.end:
 // CHECK1-NEXT:    ret void
@@ -276,7 +276,7 @@ int bar(int n){
 // CHECK1-NEXT:    store i32 0, ptr [[DOTOMP_IS_LAST]], align 4
 // CHECK1-NEXT:    [[TMP7:%.*]] = load ptr, ptr [[DOTGLOBAL_TID__ADDR]], align 8
 // CHECK1-NEXT:    [[TMP8:%.*]] = load i32, ptr [[TMP7]], align 4
-// CHECK1-NEXT:    call void @__kmpc_for_static_init_4(ptr @[[GLOB3]], i32 [[TMP8]], i32 33, ptr [[DOTOMP_IS_LAST]], ptr [[DOTOMP_LB]], ptr [[DOTOMP_UB]], ptr [[DOTOMP_STRIDE]], i32 1, i32 1)
+// CHECK1-NEXT:    call void @__kmpc_for_static_init_4(ptr @[[GLOB3:[0-9]+]], i32 [[TMP8]], i32 33, ptr [[DOTOMP_IS_LAST]], ptr [[DOTOMP_LB]], ptr [[DOTOMP_UB]], ptr [[DOTOMP_STRIDE]], i32 1, i32 1)
 // CHECK1-NEXT:    [[TMP9:%.*]] = load i32, ptr [[DOTOMP_LB]], align 4
 // CHECK1-NEXT:    store i32 [[TMP9]], ptr [[DOTOMP_IV]], align 4
 // CHECK1-NEXT:    br label [[OMP_INNER_FOR_COND:%.*]]
@@ -313,30 +313,38 @@ int bar(int n){
 // CHECK1:       omp.loop.exit:
 // CHECK1-NEXT:    [[TMP17:%.*]] = load ptr, ptr [[DOTGLOBAL_TID__ADDR]], align 8
 // CHECK1-NEXT:    [[TMP18:%.*]] = load i32, ptr [[TMP17]], align 4
-// CHECK1-NEXT:    call void @__kmpc_for_static_fini(ptr @[[GLOB3]], i32 [[TMP18]])
+// CHECK1-NEXT:    call void @__kmpc_for_static_fini(ptr @[[GLOB2]], i32 [[TMP18]])
 // CHECK1-NEXT:    br label [[OMP_PRECOND_END]]
 // CHECK1:       omp.precond.end:
 // CHECK1-NEXT:    ret void
 //
 //
 // CHECK1-LABEL: define {{[^@]+}}@{{__omp_offloading_[0-9a-z]+_[0-9a-z]+}}__Z9ftemplateIiET_i_l33
-// CHECK1-SAME: (ptr noalias noundef [[DYN_PTR:%.*]], ptr noundef nonnull align 4 dereferenceable(40) [[B:%.*]]) #[[ATTR4:[0-9]+]] {
+// CHECK1-SAME: (ptr noalias noundef [[DYN_PTR:%.*]], ptr noundef nonnull align 4 dereferenceable(40) [[B:%.*]], i64 noundef [[DOTCAPTURE_EXPR_:%.*]]) #[[ATTR4:[0-9]+]] {
 // CHECK1-NEXT:  entry:
 // CHECK1-NEXT:    [[DYN_PTR_ADDR:%.*]] = alloca ptr, align 8
 // CHECK1-NEXT:    [[B_ADDR:%.*]] = alloca ptr, align 8
+// CHECK1-NEXT:    [[DOTCAPTURE_EXPR__ADDR:%.*]] = alloca i64, align 8
+// CHECK1-NEXT:    [[DOTCAPTURE_EXPR__CASTED:%.*]] = alloca i64, align 8
 // CHECK1-NEXT:    [[DOTZERO_ADDR:%.*]] = alloca i32, align 4
 // CHECK1-NEXT:    [[DOTTHREADID_TEMP_:%.*]] = alloca i32, align 4
 // CHECK1-NEXT:    store ptr [[DYN_PTR]], ptr [[DYN_PTR_ADDR]], align 8
 // CHECK1-NEXT:    store ptr [[B]], ptr [[B_ADDR]], align 8
+// CHECK1-NEXT:    store i64 [[DOTCAPTURE_EXPR_]], ptr [[DOTCAPTURE_EXPR__ADDR]], align 8
 // CHECK1-NEXT:    [[TMP0:%.*]] = load ptr, ptr [[B_ADDR]], align 8
 // CHECK1-NEXT:    [[TMP1:%.*]] = call i32 @__kmpc_target_init(ptr @{{__omp_offloading_[0-9a-z]+_[0-9a-z]+}}__Z9ftemplateIiET_i_l33_kernel_environment, ptr [[DYN_PTR]])
 // CHECK1-NEXT:    [[EXEC_USER_CODE:%.*]] = icmp eq i32 [[TMP1]], -1
 // CHECK1-NEXT:    br i1 [[EXEC_USER_C...
[truncated]

github-actions · 2023-11-15T17:41:42Z

✅ With the latest revision this PR passed the C/C++ code formatter.

alexey-bataev · 2023-11-15T17:44:31Z

clang/lib/CodeGen/CodeGenModule.cpp

+namespace {
+/// A 'teams loop' with a nested 'loop bind(parallel)' or generic function
+/// call in the associated loop-nest cannot be a 'parllel for'.
+class TeamsLoopChecker final : public ConstStmtVisitor<TeamsLoopChecker> {


Better to do this analysis in Sema and mark the region appropriately

Alexey - thanks for the review comments. Good suggestion here. I'll look at moving the analysis.

alexey-bataev · 2023-11-15T17:44:58Z

clang/lib/CodeGen/CodeGenModule.cpp

+        std::string Name = FD->getNameInfo().getAsString();
+        IsOpenMPAPI = Name.find("omp_") == 0;


What if this is user function with omp_ prefix?

What if this is user function with omp_ prefix?

I think that was not allowed. Isn't omp_ a reserved prefix?

No, it is not.

No, it is not.

I see the following in the spec:

Programs must not declare names that begin with the omp_ or ompx_ prefix, as these are reserved for the OpenMP implementation.

Ah, ok, then it should be fine

alexey-bataev · 2023-11-15T17:45:42Z

clang/lib/CodeGen/CodeGenModule.cpp

+  if (IsDevice)
+    StatusMsg += ": DEVICE";
+  else
+    StatusMsg += ": HOST";
+  SourceLocation L = D.getBeginLoc();
+  SourceManager &SM = getContext().getSourceManager();
+  PresumedLoc PLoc = SM.getPresumedLoc(L);
+  const char *FileName = PLoc.isValid() ? PLoc.getFilename() : nullptr;
+  unsigned LineNo =
+      PLoc.isValid() ? PLoc.getLine() : SM.getExpansionLineNumber(L);
+  llvm::dbgs() << StatusMsg << ": " << FileName << ": " << LineNo << "\n";
+}


This should be enabled only in debug mode

ddpagan · 2024-01-02T01:45:25Z

Ping.

alexey-bataev · 2024-01-02T12:33:15Z

clang/include/clang/AST/StmtOpenMP.h

+  /// true if loop directive's associated loop can be a parallel for.
+  bool CanBeParallelFor = false;


Do you still need to this new field, taking into account that it does not affect sema anymore, only codegen? You don't need to store this flag, instead you can call teamsLoopCanBeParallelFor() directly in codegen

Hi Alexey - thanks for the comment. A clarification, as I'm not sure exactly what you're referring to.

So currently, when OMPTargetTeamsGenericLoopDirective is created in SemaOpenMP.cpp, teamsLoopCanBeParallelFor(AStmt) is called as an argument to the create, the result of which is stored in CanBeParallelFor. Later, when the directive is seen in CodeGen/CGStmtOpenMP.cpp, the boolean CanBeParallelFor (via the function canBeParallelFor()) is checked to determine how to emit the directive (parallel or distribute).

Are you saying that instead of checking whether the loop can be parallel while we're in Sema, and saving that value when we create the target teams loop directive, that we should determine this through a call to the Sema function teamsLoopCanBeParallelFor() while in CodeGen?

I'm saying that you can move this function to the codegen and call it in the codegen

Ok. Thanks, Alexey.

I don't think it is possible to have the analysis in Sema and not use a flag here.

The two options we have are:

Do the analysis in Sema and have the flag and then read the flag in CG.

Have the analysis in CG and then there's no reason to pass anything around and CG can call the function when needed.

There is a 3rd hybrid way to do this where this function is moved back into CG:

bool Sema::teamsLoopCanBeParallelFor(Stmt *AStmt) { TeamsLoopChecker Checker(*this); Checker.Visit(AStmt); return Checker.teamsLoopCanBeParallelFor(); }

But then I don't know how you can call the TeamsLoopChecker which lives in Sema.

ddpagan · 2024-02-15T20:59:39Z

Ping.

alexey-bataev · 2024-02-15T21:11:27Z

I already told, that you don't need the changes neither in sema, nor in stmt class. All function changes are in codegen, no other changes are required.

alexey-bataev · 2024-03-11T19:17:14Z

clang/include/clang/Sema/Sema.h

+  /// [target] teams loop is equivalent to parallel for if associated loop
+  /// nest meets certain critera.
+  bool teamsLoopCanBeParallelFor(Stmt *Astmt);
+


Do you really need to expose it in Sema or you can make it just a static local function in SemaOpenMP.cpp?

Do you really need to expose it in Sema or you can make it just a static local function in SemaOpenMP.cpp?

Actually, no. Thanks for noticing that. I'll make the change.

alexey-bataev · 2024-03-11T19:19:02Z

clang/lib/CodeGen/CodeGenModule.h

+  /// Under debug mode, print status of target teams loop transformation,
+  /// which should be either '#distribute' or '#parallel for'
+  void emitTargetTeamsLoopCodegenStatus(std::string StatusMsg,
+                                        const OMPExecutableDirective &D,
+                                        bool IsDevice);
+


Don't like the idea of adding new member function specifically for debugging

Don't like the idea of adding new member function specifically for debugging

Agreed. Changed it to a static local function.

ddpagan · 2024-03-11T21:47:27Z

After some additional discussion with Alexey offline, he concluded that the current changes are okay, specifically for this reason:

"Then I realized that actually it does not require AST nodes building. In this case, this helper class should be moved to CodeGenStmt and hidden in the anonymous namespace. But you also need to use it in CodeGenModule. In this case better to use a flag in statement, as you have it right now. I.e. having this analysis in Sema looks good"

ddpagan · 2024-03-19T20:37:24Z

Fixed conflicts with trunk so I can update changes to address Alexey's most recent couple of comments.

ddpagan requested review from carlobertolli, doru1004, alexey-bataev, jdoerfert and dhruvachak November 15, 2023 17:39

llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:codegen IR generation bugs: mangling, exceptions, etc. clang:openmp OpenMP related changes to Clang labels Nov 15, 2023

alexey-bataev reviewed Nov 15, 2023

View reviewed changes

llvmbot added the clang:modules C++20 modules and Clang Header Modules label Dec 14, 2023

ddpagan requested a review from alexey-bataev December 14, 2023 00:29

alexey-bataev reviewed Jan 2, 2024

View reviewed changes

alexey-bataev reviewed Mar 11, 2024

View reviewed changes

ddpagan requested a review from alexey-bataev March 11, 2024 21:47

ddpagan force-pushed the main branch from d46a8ad to 2ec4fd7 Compare March 19, 2024 20:03

ddpagan requested a review from Endilll as a code owner March 19, 2024 20:03

ddpagan closed this Mar 28, 2024

ddpagan force-pushed the main branch from 2ec4fd7 to f566b07 Compare March 28, 2024 15:19

ddpagan mentioned this pull request Apr 2, 2024

[OpenMP][CodeGen] Improved codegen for combined loop directives #87278

Merged

		std::string Name = FD->getNameInfo().getAsString();
		IsOpenMPAPI = Name.find("omp_") == 0;

		/// true if loop directive's associated loop can be a parallel for.
		bool CanBeParallelFor = false;

[OpenMP][CodeGen] Improved codegen for combined loop directives #72417

[OpenMP][CodeGen] Improved codegen for combined loop directives #72417

Uh oh!

Conversation

ddpagan commented Nov 15, 2023

Uh oh!

llvmbot commented Nov 15, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Nov 15, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ddpagan commented Jan 2, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ddpagan commented Feb 15, 2024

Uh oh!

alexey-bataev commented Feb 15, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ddpagan commented Mar 11, 2024

Uh oh!

ddpagan commented Mar 19, 2024

Uh oh!

Uh oh!

llvmbot commented Nov 15, 2023 •

edited

Loading

github-actions bot commented Nov 15, 2023 •

edited

Loading