Skip to content

Commit 88e31f6

Browse files
MattPDjhuber6
andauthored
[OpenMP][FIX] Remove unsound omp_get_thread_limit deduplication (#79524)
The deduplication of the calls to `omp_get_thread_limit` used to be legal when originally added in <e28936f#diff-de101c82aff66b2bda2d1f53fde3dde7b0d370f14f1ff37b7919ce38531230dfR123>, as the result (thread_limit) was immutable. However, now that we have `thread_limit` clause, we no longer have immutability; therefore `omp_get_thread_limit()` is not a deduplicable runtime call. Thus, removing `omp_get_thread_limit` from the `DeduplicableRuntimeCallIDs` array. Here's a simple example: ``` #include <omp.h> #include <stdio.h> int main() { #pragma omp target thread_limit(4) { printf("\n1:target thread_limit: %d\n", omp_get_thread_limit()); } #pragma omp target thread_limit(3) { printf("\n2:target thread_limit: %d\n", omp_get_thread_limit()); } return 0; } ``` GCC-compiled binary execution: https://gcc.godbolt.org/z/Pjv3TWoTq ``` 1:target thread_limit: 4 2:target thread_limit: 3 ``` Clang/LLVM-compiled binary execution: https://clang.godbolt.org/z/zdPbrdMPn ``` 1:target thread_limit: 4 2:target thread_limit: 4 ``` By my reading of the OpenMP spec GCC does the right thing here; cf. <https://www.openmp.org/spec-html/5.2/openmpse12.html#x34-330002.4>: > If a target construct with a thread_limit clause is encountered, the thread-limit-var ICV from the data environment of the generated initial task is instead set to an implementation defined value between one and the value specified in the clause. The common subexpression elimination (CSE) of the second call to `omp_get_thread_limit` by LLVM does not seem to be correct, as it's not an available expression at any program point(s) (in the scope of the clause in question) after the second target construct with a `thread_limit` clause is encountered. Compiling with `-Rpass=openmp-opt -Rpass-analysis=openmp-opt -Rpass-missed=openmp-opt` we have: https://clang.godbolt.org/z/G7dfhP7jh ``` <source>:8:42: remark: OpenMP runtime call omp_get_thread_limit deduplicated. [OMP170] [-Rpass=openmp-opt] 8 | printf("\n1:target thread_limit: %d\n",omp_get_thread_limit()); | ^ ``` OMP170 has the following explanation: https://openmp.llvm.org/remarks/OMP170.html > This optimization remark indicates that a call to an OpenMP runtime call was replaced with the result of an existing one. This occurs when the compiler knows that the result of a runtime call is immutable. Removing duplicate calls is done by replacing all calls to that function with the result of the first call. This cannot be done automatically by the compiler because the implementations of the OpenMP runtime calls live in a separate library the compiler cannot see. This optimization will trigger for known OpenMP runtime calls whose return value will not change. At the same time I do not believe we have an analysis checking whether this precondition holds here: "This occurs when the compiler knows that the result of a runtime call is immutable." AFAICT, such analysis doesn't appear to exist in the original patch introducing deduplication, either: - 9548b74 - https://reviews.llvm.org/D69930 The fix is to remove it from `DeduplicableRuntimeCallIDs`, effectively reverting the addition in this commit (noting that `omp_get_max_threads` is not present in `DeduplicableRuntimeCallIDs`, so it's possible this addition was incorrect in the first place): - [OpenMP][Opt] Annotate known runtime functions and deduplicate more, - e28936f#diff-de101c82aff66b2bda2d1f53fde3dde7b0d370f14f1ff37b7919ce38531230dfR123 As a result, we're no longer unsoundly deduplicating the OpenMP runtime call `omp_get_thread_limit` as illustrated by the test case: Note the (correctly) repeated `call i32 @omp_get_thread_limit()`. --------- Co-authored-by: Joseph Huber <[email protected]>
1 parent cbb24e1 commit 88e31f6

File tree

2 files changed

+59
-1
lines changed

2 files changed

+59
-1
lines changed

llvm/lib/Transforms/IPO/OpenMPOpt.cpp

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1471,7 +1471,6 @@ struct OpenMPOpt {
14711471
OMPRTL_omp_get_num_threads,
14721472
OMPRTL_omp_in_parallel,
14731473
OMPRTL_omp_get_cancellation,
1474-
OMPRTL_omp_get_thread_limit,
14751474
OMPRTL_omp_get_supported_active_levels,
14761475
OMPRTL_omp_get_level,
14771476
OMPRTL_omp_get_ancestor_thread_num,
Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --function main --scrub-attributes --filter "@omp_get_thread_limit|@use" --version 4
2+
; RUN: opt -passes=openmp-opt-cgscc -S < %s | FileCheck %s
3+
4+
declare void @use(i32 noundef)
5+
declare i32 @omp_get_thread_limit()
6+
declare void @__kmpc_set_thread_limit(ptr, i32, i32)
7+
declare i32 @__kmpc_global_thread_num(ptr)
8+
declare noalias ptr @__kmpc_omp_task_alloc(ptr, i32, i32, i64, i64, ptr)
9+
declare void @__kmpc_omp_task_complete_if0(ptr, i32, ptr)
10+
declare void @__kmpc_omp_task_begin_if0(ptr, i32, ptr)
11+
12+
%struct.ident_t = type { i32, i32, i32, i32, ptr }
13+
14+
@0 = private unnamed_addr constant [23 x i8] c";unknown;unknown;0;0;;\00", align 1
15+
@1 = private unnamed_addr constant %struct.ident_t { i32 0, i32 2, i32 0, i32 22, ptr @0 }, align 8
16+
17+
define i32 @main() local_unnamed_addr {
18+
; CHECK-LABEL: define i32 @main() local_unnamed_addr {
19+
; CHECK: [[CALL_I_I_I:%.*]] = call i32 @omp_get_thread_limit()
20+
; CHECK: call void @use(i32 noundef [[CALL_I_I_I]])
21+
; CHECK: [[CALL_I_I_I2:%.*]] = call i32 @omp_get_thread_limit()
22+
; CHECK: call void @use(i32 noundef [[CALL_I_I_I2]])
23+
;
24+
entry:
25+
%0 = call i32 @__kmpc_global_thread_num(ptr nonnull @1)
26+
%1 = call ptr @__kmpc_omp_task_alloc(ptr nonnull @1, i32 %0, i32 1, i64 40, i64 0, ptr nonnull @.omp_task_entry.)
27+
call void @__kmpc_omp_task_begin_if0(ptr nonnull @1, i32 %0, ptr %1)
28+
call void @__kmpc_set_thread_limit(ptr nonnull @1, i32 %0, i32 4)
29+
%call.i.i.i = call i32 @omp_get_thread_limit()
30+
call void @use(i32 noundef %call.i.i.i)
31+
call void @__kmpc_omp_task_complete_if0(ptr nonnull @1, i32 %0, ptr %1)
32+
%2 = call ptr @__kmpc_omp_task_alloc(ptr nonnull @1, i32 %0, i32 1, i64 40, i64 0, ptr nonnull @.omp_task_entry..2)
33+
call void @__kmpc_omp_task_begin_if0(ptr nonnull @1, i32 %0, ptr %2)
34+
call void @__kmpc_set_thread_limit(ptr nonnull @1, i32 %0, i32 3)
35+
%call.i.i.i2 = call i32 @omp_get_thread_limit()
36+
call void @use(i32 noundef %call.i.i.i2)
37+
call void @__kmpc_omp_task_complete_if0(ptr nonnull @1, i32 %0, ptr %2)
38+
ret i32 0
39+
}
40+
41+
define internal noundef i32 @.omp_task_entry.(i32 noundef %0, ptr noalias nocapture noundef readonly %1) {
42+
entry:
43+
tail call void @__kmpc_set_thread_limit(ptr nonnull @1, i32 %0, i32 4)
44+
%call.i.i = tail call i32 @omp_get_thread_limit()
45+
tail call void @use(i32 noundef %call.i.i)
46+
ret i32 0
47+
}
48+
49+
define internal noundef i32 @.omp_task_entry..2(i32 noundef %0, ptr noalias nocapture noundef readonly %1) {
50+
entry:
51+
tail call void @__kmpc_set_thread_limit(ptr nonnull @1, i32 %0, i32 3)
52+
%call.i.i = tail call i32 @omp_get_thread_limit()
53+
tail call void @use(i32 noundef %call.i.i)
54+
ret i32 0
55+
}
56+
57+
!llvm.module.flags = !{!0}
58+
59+
!0 = !{i32 7, !"openmp", i32 51}

0 commit comments

Comments
 (0)