-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[SimplifyCFG] Not folding branch in loop header with constant iterations #74268
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SimplifyCFG] Not folding branch in loop header with constant iterations #74268
Conversation
@llvm/pr-subscribers-clang Author: None (xiangzh1) ChangesLoop header with constant usually can be optimized in unroll, folding branch in such loop header at SimplifyCFG will break unroll optimization. Patch is 48.66 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/74268.diff 3 Files Affected:
diff --git a/clang/test/CodeGenCUDA/simplify-cfg-unroll.cu b/clang/test/CodeGenCUDA/simplify-cfg-unroll.cu
new file mode 100644
index 0000000000000..ecb421f9fc85c
--- /dev/null
+++ b/clang/test/CodeGenCUDA/simplify-cfg-unroll.cu
@@ -0,0 +1,364 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --version 4
+// REQUIRES: amdgpu-registered-target
+// REQUIRES: x86-registered-target
+// RUN: %clang_cc1 -O2 "-aux-triple" "x86_64-unknown-linux-gnu" "-triple" "amdgcn-amd-amdhsa" \
+// RUN: -fcuda-is-device "-aux-target-cpu" "x86-64" -emit-llvm -o - %s | FileCheck %s
+
+#include "Inputs/cuda.h"
+
+// CHECK-LABEL: define dso_local void @_Z4funciPPiiS_(
+// CHECK-SAME: i32 noundef [[IDX:%.*]], ptr nocapture noundef readonly [[ARR:%.*]], i32 noundef [[DIMS:%.*]], ptr nocapture noundef [[OUT:%.*]]) local_unnamed_addr #[[ATTR0:[0-9]+]] {
+// CHECK-NEXT: entry:
+// CHECK-NEXT: [[CMP1:%.*]] = icmp eq i32 [[DIMS]], 0
+// CHECK-NEXT: br i1 [[CMP1]], label [[CLEANUP:%.*]], label [[IF_END:%.*]]
+// CHECK: if.end:
+// CHECK-NEXT: [[TMP0:%.*]] = load ptr, ptr [[ARR]], align 8, !tbaa [[TBAA3:![0-9]+]]
+// CHECK-NEXT: [[TMP1:%.*]] = load i32, ptr [[TMP0]], align 4, !tbaa [[TBAA7:![0-9]+]]
+// CHECK-NEXT: [[TMP2:%.*]] = load i32, ptr [[OUT]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ADD14:%.*]] = add nsw i32 [[TMP2]], [[TMP1]]
+// CHECK-NEXT: store i32 [[ADD14]], ptr [[OUT]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ARRAYIDX11_1:%.*]] = getelementptr inbounds i32, ptr [[TMP0]], i64 1
+// CHECK-NEXT: [[TMP3:%.*]] = load i32, ptr [[ARRAYIDX11_1]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ARRAYIDX13_1:%.*]] = getelementptr inbounds i32, ptr [[OUT]], i64 1
+// CHECK-NEXT: [[TMP4:%.*]] = load i32, ptr [[ARRAYIDX13_1]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ADD14_1:%.*]] = add nsw i32 [[TMP4]], [[TMP3]]
+// CHECK-NEXT: store i32 [[ADD14_1]], ptr [[ARRAYIDX13_1]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ARRAYIDX11_2:%.*]] = getelementptr inbounds i32, ptr [[TMP0]], i64 2
+// CHECK-NEXT: [[TMP5:%.*]] = load i32, ptr [[ARRAYIDX11_2]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ARRAYIDX13_2:%.*]] = getelementptr inbounds i32, ptr [[OUT]], i64 2
+// CHECK-NEXT: [[TMP6:%.*]] = load i32, ptr [[ARRAYIDX13_2]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ADD14_2:%.*]] = add nsw i32 [[TMP6]], [[TMP5]]
+// CHECK-NEXT: store i32 [[ADD14_2]], ptr [[ARRAYIDX13_2]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ARRAYIDX11_3:%.*]] = getelementptr inbounds i32, ptr [[TMP0]], i64 3
+// CHECK-NEXT: [[TMP7:%.*]] = load i32, ptr [[ARRAYIDX11_3]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ARRAYIDX13_3:%.*]] = getelementptr inbounds i32, ptr [[OUT]], i64 3
+// CHECK-NEXT: [[TMP8:%.*]] = load i32, ptr [[ARRAYIDX13_3]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ADD14_3:%.*]] = add nsw i32 [[TMP8]], [[TMP7]]
+// CHECK-NEXT: store i32 [[ADD14_3]], ptr [[ARRAYIDX13_3]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[CMP1_1:%.*]] = icmp eq i32 [[DIMS]], 1
+// CHECK-NEXT: br i1 [[CMP1_1]], label [[CLEANUP]], label [[IF_END_1:%.*]]
+// CHECK: if.end.1:
+// CHECK-NEXT: [[ARRAYIDX_1:%.*]] = getelementptr inbounds ptr, ptr [[ARR]], i64 1
+// CHECK-NEXT: [[TMP9:%.*]] = load ptr, ptr [[ARRAYIDX_1]], align 8, !tbaa [[TBAA3]]
+// CHECK-NEXT: [[TMP10:%.*]] = load i32, ptr [[TMP9]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ADD14_129:%.*]] = add nsw i32 [[ADD14]], [[TMP10]]
+// CHECK-NEXT: store i32 [[ADD14_129]], ptr [[OUT]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ARRAYIDX11_1_1:%.*]] = getelementptr inbounds i32, ptr [[TMP9]], i64 1
+// CHECK-NEXT: [[TMP11:%.*]] = load i32, ptr [[ARRAYIDX11_1_1]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ADD14_1_1:%.*]] = add nsw i32 [[ADD14_1]], [[TMP11]]
+// CHECK-NEXT: store i32 [[ADD14_1_1]], ptr [[ARRAYIDX13_1]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ARRAYIDX11_2_1:%.*]] = getelementptr inbounds i32, ptr [[TMP9]], i64 2
+// CHECK-NEXT: [[TMP12:%.*]] = load i32, ptr [[ARRAYIDX11_2_1]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ADD14_2_1:%.*]] = add nsw i32 [[ADD14_2]], [[TMP12]]
+// CHECK-NEXT: store i32 [[ADD14_2_1]], ptr [[ARRAYIDX13_2]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ARRAYIDX11_3_1:%.*]] = getelementptr inbounds i32, ptr [[TMP9]], i64 3
+// CHECK-NEXT: [[TMP13:%.*]] = load i32, ptr [[ARRAYIDX11_3_1]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ADD14_3_1:%.*]] = add nsw i32 [[ADD14_3]], [[TMP13]]
+// CHECK-NEXT: store i32 [[ADD14_3_1]], ptr [[ARRAYIDX13_3]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[CMP1_2:%.*]] = icmp eq i32 [[DIMS]], 2
+// CHECK-NEXT: br i1 [[CMP1_2]], label [[CLEANUP]], label [[IF_END_2:%.*]]
+// CHECK: if.end.2:
+// CHECK-NEXT: [[ARRAYIDX_2:%.*]] = getelementptr inbounds ptr, ptr [[ARR]], i64 2
+// CHECK-NEXT: [[TMP14:%.*]] = load ptr, ptr [[ARRAYIDX_2]], align 8, !tbaa [[TBAA3]]
+// CHECK-NEXT: [[TMP15:%.*]] = load i32, ptr [[TMP14]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ADD14_230:%.*]] = add nsw i32 [[ADD14_129]], [[TMP15]]
+// CHECK-NEXT: store i32 [[ADD14_230]], ptr [[OUT]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ARRAYIDX11_1_2:%.*]] = getelementptr inbounds i32, ptr [[TMP14]], i64 1
+// CHECK-NEXT: [[TMP16:%.*]] = load i32, ptr [[ARRAYIDX11_1_2]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ADD14_1_2:%.*]] = add nsw i32 [[ADD14_1_1]], [[TMP16]]
+// CHECK-NEXT: store i32 [[ADD14_1_2]], ptr [[ARRAYIDX13_1]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ARRAYIDX11_2_2:%.*]] = getelementptr inbounds i32, ptr [[TMP14]], i64 2
+// CHECK-NEXT: [[TMP17:%.*]] = load i32, ptr [[ARRAYIDX11_2_2]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ADD14_2_2:%.*]] = add nsw i32 [[ADD14_2_1]], [[TMP17]]
+// CHECK-NEXT: store i32 [[ADD14_2_2]], ptr [[ARRAYIDX13_2]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ARRAYIDX11_3_2:%.*]] = getelementptr inbounds i32, ptr [[TMP14]], i64 3
+// CHECK-NEXT: [[TMP18:%.*]] = load i32, ptr [[ARRAYIDX11_3_2]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ADD14_3_2:%.*]] = add nsw i32 [[ADD14_3_1]], [[TMP18]]
+// CHECK-NEXT: store i32 [[ADD14_3_2]], ptr [[ARRAYIDX13_3]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[CMP1_3:%.*]] = icmp eq i32 [[DIMS]], 3
+// CHECK-NEXT: br i1 [[CMP1_3]], label [[CLEANUP]], label [[IF_END_3:%.*]]
+// CHECK: if.end.3:
+// CHECK-NEXT: [[ARRAYIDX_3:%.*]] = getelementptr inbounds ptr, ptr [[ARR]], i64 3
+// CHECK-NEXT: [[TMP19:%.*]] = load ptr, ptr [[ARRAYIDX_3]], align 8, !tbaa [[TBAA3]]
+// CHECK-NEXT: [[TMP20:%.*]] = load i32, ptr [[TMP19]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ADD14_331:%.*]] = add nsw i32 [[ADD14_230]], [[TMP20]]
+// CHECK-NEXT: store i32 [[ADD14_331]], ptr [[OUT]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ARRAYIDX11_1_3:%.*]] = getelementptr inbounds i32, ptr [[TMP19]], i64 1
+// CHECK-NEXT: [[TMP21:%.*]] = load i32, ptr [[ARRAYIDX11_1_3]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ADD14_1_3:%.*]] = add nsw i32 [[ADD14_1_2]], [[TMP21]]
+// CHECK-NEXT: store i32 [[ADD14_1_3]], ptr [[ARRAYIDX13_1]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ARRAYIDX11_2_3:%.*]] = getelementptr inbounds i32, ptr [[TMP19]], i64 2
+// CHECK-NEXT: [[TMP22:%.*]] = load i32, ptr [[ARRAYIDX11_2_3]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ADD14_2_3:%.*]] = add nsw i32 [[ADD14_2_2]], [[TMP22]]
+// CHECK-NEXT: store i32 [[ADD14_2_3]], ptr [[ARRAYIDX13_2]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ARRAYIDX11_3_3:%.*]] = getelementptr inbounds i32, ptr [[TMP19]], i64 3
+// CHECK-NEXT: [[TMP23:%.*]] = load i32, ptr [[ARRAYIDX11_3_3]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ADD14_3_3:%.*]] = add nsw i32 [[ADD14_3_2]], [[TMP23]]
+// CHECK-NEXT: store i32 [[ADD14_3_3]], ptr [[ARRAYIDX13_3]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[CMP1_4:%.*]] = icmp eq i32 [[DIMS]], 4
+// CHECK-NEXT: br i1 [[CMP1_4]], label [[CLEANUP]], label [[IF_END_4:%.*]]
+// CHECK: if.end.4:
+// CHECK-NEXT: [[ARRAYIDX_4:%.*]] = getelementptr inbounds ptr, ptr [[ARR]], i64 4
+// CHECK-NEXT: [[TMP24:%.*]] = load ptr, ptr [[ARRAYIDX_4]], align 8, !tbaa [[TBAA3]]
+// CHECK-NEXT: [[TMP25:%.*]] = load i32, ptr [[TMP24]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ADD14_4:%.*]] = add nsw i32 [[ADD14_331]], [[TMP25]]
+// CHECK-NEXT: store i32 [[ADD14_4]], ptr [[OUT]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ARRAYIDX11_1_4:%.*]] = getelementptr inbounds i32, ptr [[TMP24]], i64 1
+// CHECK-NEXT: [[TMP26:%.*]] = load i32, ptr [[ARRAYIDX11_1_4]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ADD14_1_4:%.*]] = add nsw i32 [[ADD14_1_3]], [[TMP26]]
+// CHECK-NEXT: store i32 [[ADD14_1_4]], ptr [[ARRAYIDX13_1]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ARRAYIDX11_2_4:%.*]] = getelementptr inbounds i32, ptr [[TMP24]], i64 2
+// CHECK-NEXT: [[TMP27:%.*]] = load i32, ptr [[ARRAYIDX11_2_4]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ADD14_2_4:%.*]] = add nsw i32 [[ADD14_2_3]], [[TMP27]]
+// CHECK-NEXT: store i32 [[ADD14_2_4]], ptr [[ARRAYIDX13_2]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ARRAYIDX11_3_4:%.*]] = getelementptr inbounds i32, ptr [[TMP24]], i64 3
+// CHECK-NEXT: [[TMP28:%.*]] = load i32, ptr [[ARRAYIDX11_3_4]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ADD14_3_4:%.*]] = add nsw i32 [[ADD14_3_3]], [[TMP28]]
+// CHECK-NEXT: store i32 [[ADD14_3_4]], ptr [[ARRAYIDX13_3]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[CMP1_5:%.*]] = icmp eq i32 [[DIMS]], 5
+// CHECK-NEXT: br i1 [[CMP1_5]], label [[CLEANUP]], label [[IF_END_5:%.*]]
+// CHECK: if.end.5:
+// CHECK-NEXT: [[ARRAYIDX_5:%.*]] = getelementptr inbounds ptr, ptr [[ARR]], i64 5
+// CHECK-NEXT: [[TMP29:%.*]] = load ptr, ptr [[ARRAYIDX_5]], align 8, !tbaa [[TBAA3]]
+// CHECK-NEXT: [[TMP30:%.*]] = load i32, ptr [[TMP29]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ADD14_5:%.*]] = add nsw i32 [[ADD14_4]], [[TMP30]]
+// CHECK-NEXT: store i32 [[ADD14_5]], ptr [[OUT]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ARRAYIDX11_1_5:%.*]] = getelementptr inbounds i32, ptr [[TMP29]], i64 1
+// CHECK-NEXT: [[TMP31:%.*]] = load i32, ptr [[ARRAYIDX11_1_5]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ADD14_1_5:%.*]] = add nsw i32 [[ADD14_1_4]], [[TMP31]]
+// CHECK-NEXT: store i32 [[ADD14_1_5]], ptr [[ARRAYIDX13_1]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ARRAYIDX11_2_5:%.*]] = getelementptr inbounds i32, ptr [[TMP29]], i64 2
+// CHECK-NEXT: [[TMP32:%.*]] = load i32, ptr [[ARRAYIDX11_2_5]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ADD14_2_5:%.*]] = add nsw i32 [[ADD14_2_4]], [[TMP32]]
+// CHECK-NEXT: store i32 [[ADD14_2_5]], ptr [[ARRAYIDX13_2]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ARRAYIDX11_3_5:%.*]] = getelementptr inbounds i32, ptr [[TMP29]], i64 3
+// CHECK-NEXT: [[TMP33:%.*]] = load i32, ptr [[ARRAYIDX11_3_5]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ADD14_3_5:%.*]] = add nsw i32 [[ADD14_3_4]], [[TMP33]]
+// CHECK-NEXT: store i32 [[ADD14_3_5]], ptr [[ARRAYIDX13_3]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[CMP1_6:%.*]] = icmp eq i32 [[DIMS]], 6
+// CHECK-NEXT: br i1 [[CMP1_6]], label [[CLEANUP]], label [[IF_END_6:%.*]]
+// CHECK: if.end.6:
+// CHECK-NEXT: [[ARRAYIDX_6:%.*]] = getelementptr inbounds ptr, ptr [[ARR]], i64 6
+// CHECK-NEXT: [[TMP34:%.*]] = load ptr, ptr [[ARRAYIDX_6]], align 8, !tbaa [[TBAA3]]
+// CHECK-NEXT: [[TMP35:%.*]] = load i32, ptr [[TMP34]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ADD14_6:%.*]] = add nsw i32 [[ADD14_5]], [[TMP35]]
+// CHECK-NEXT: store i32 [[ADD14_6]], ptr [[OUT]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ARRAYIDX11_1_6:%.*]] = getelementptr inbounds i32, ptr [[TMP34]], i64 1
+// CHECK-NEXT: [[TMP36:%.*]] = load i32, ptr [[ARRAYIDX11_1_6]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ADD14_1_6:%.*]] = add nsw i32 [[ADD14_1_5]], [[TMP36]]
+// CHECK-NEXT: store i32 [[ADD14_1_6]], ptr [[ARRAYIDX13_1]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ARRAYIDX11_2_6:%.*]] = getelementptr inbounds i32, ptr [[TMP34]], i64 2
+// CHECK-NEXT: [[TMP37:%.*]] = load i32, ptr [[ARRAYIDX11_2_6]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ADD14_2_6:%.*]] = add nsw i32 [[ADD14_2_5]], [[TMP37]]
+// CHECK-NEXT: store i32 [[ADD14_2_6]], ptr [[ARRAYIDX13_2]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ARRAYIDX11_3_6:%.*]] = getelementptr inbounds i32, ptr [[TMP34]], i64 3
+// CHECK-NEXT: [[TMP38:%.*]] = load i32, ptr [[ARRAYIDX11_3_6]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ADD14_3_6:%.*]] = add nsw i32 [[ADD14_3_5]], [[TMP38]]
+// CHECK-NEXT: store i32 [[ADD14_3_6]], ptr [[ARRAYIDX13_3]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[CMP1_7:%.*]] = icmp eq i32 [[DIMS]], 7
+// CHECK-NEXT: br i1 [[CMP1_7]], label [[CLEANUP]], label [[IF_END_7:%.*]]
+// CHECK: if.end.7:
+// CHECK-NEXT: [[ARRAYIDX_7:%.*]] = getelementptr inbounds ptr, ptr [[ARR]], i64 7
+// CHECK-NEXT: [[TMP39:%.*]] = load ptr, ptr [[ARRAYIDX_7]], align 8, !tbaa [[TBAA3]]
+// CHECK-NEXT: [[TMP40:%.*]] = load i32, ptr [[TMP39]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ADD14_7:%.*]] = add nsw i32 [[ADD14_6]], [[TMP40]]
+// CHECK-NEXT: store i32 [[ADD14_7]], ptr [[OUT]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ARRAYIDX11_1_7:%.*]] = getelementptr inbounds i32, ptr [[TMP39]], i64 1
+// CHECK-NEXT: [[TMP41:%.*]] = load i32, ptr [[ARRAYIDX11_1_7]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ADD14_1_7:%.*]] = add nsw i32 [[ADD14_1_6]], [[TMP41]]
+// CHECK-NEXT: store i32 [[ADD14_1_7]], ptr [[ARRAYIDX13_1]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ARRAYIDX11_2_7:%.*]] = getelementptr inbounds i32, ptr [[TMP39]], i64 2
+// CHECK-NEXT: [[TMP42:%.*]] = load i32, ptr [[ARRAYIDX11_2_7]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ADD14_2_7:%.*]] = add nsw i32 [[ADD14_2_6]], [[TMP42]]
+// CHECK-NEXT: store i32 [[ADD14_2_7]], ptr [[ARRAYIDX13_2]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ARRAYIDX11_3_7:%.*]] = getelementptr inbounds i32, ptr [[TMP39]], i64 3
+// CHECK-NEXT: [[TMP43:%.*]] = load i32, ptr [[ARRAYIDX11_3_7]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ADD14_3_7:%.*]] = add nsw i32 [[ADD14_3_6]], [[TMP43]]
+// CHECK-NEXT: store i32 [[ADD14_3_7]], ptr [[ARRAYIDX13_3]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[CMP1_8:%.*]] = icmp eq i32 [[DIMS]], 8
+// CHECK-NEXT: br i1 [[CMP1_8]], label [[CLEANUP]], label [[IF_END_8:%.*]]
+// CHECK: if.end.8:
+// CHECK-NEXT: [[ARRAYIDX_8:%.*]] = getelementptr inbounds ptr, ptr [[ARR]], i64 8
+// CHECK-NEXT: [[TMP44:%.*]] = load ptr, ptr [[ARRAYIDX_8]], align 8, !tbaa [[TBAA3]]
+// CHECK-NEXT: [[TMP45:%.*]] = load i32, ptr [[TMP44]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ADD14_8:%.*]] = add nsw i32 [[ADD14_7]], [[TMP45]]
+// CHECK-NEXT: store i32 [[ADD14_8]], ptr [[OUT]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ARRAYIDX11_1_8:%.*]] = getelementptr inbounds i32, ptr [[TMP44]], i64 1
+// CHECK-NEXT: [[TMP46:%.*]] = load i32, ptr [[ARRAYIDX11_1_8]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ADD14_1_8:%.*]] = add nsw i32 [[ADD14_1_7]], [[TMP46]]
+// CHECK-NEXT: store i32 [[ADD14_1_8]], ptr [[ARRAYIDX13_1]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ARRAYIDX11_2_8:%.*]] = getelementptr inbounds i32, ptr [[TMP44]], i64 2
+// CHECK-NEXT: [[TMP47:%.*]] = load i32, ptr [[ARRAYIDX11_2_8]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ADD14_2_8:%.*]] = add nsw i32 [[ADD14_2_7]], [[TMP47]]
+// CHECK-NEXT: store i32 [[ADD14_2_8]], ptr [[ARRAYIDX13_2]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ARRAYIDX11_3_8:%.*]] = getelementptr inbounds i32, ptr [[TMP44]], i64 3
+// CHECK-NEXT: [[TMP48:%.*]] = load i32, ptr [[ARRAYIDX11_3_8]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ADD14_3_8:%.*]] = add nsw i32 [[ADD14_3_7]], [[TMP48]]
+// CHECK-NEXT: store i32 [[ADD14_3_8]], ptr [[ARRAYIDX13_3]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[CMP1_9:%.*]] = icmp eq i32 [[DIMS]], 9
+// CHECK-NEXT: br i1 [[CMP1_9]], label [[CLEANUP]], label [[IF_END_9:%.*]]
+// CHECK: if.end.9:
+// CHECK-NEXT: [[ARRAYIDX_9:%.*]] = getelementptr inbounds ptr, ptr [[ARR]], i64 9
+// CHECK-NEXT: [[TMP49:%.*]] = load ptr, ptr [[ARRAYIDX_9]], align 8, !tbaa [[TBAA3]]
+// CHECK-NEXT: [[TMP50:%.*]] = load i32, ptr [[TMP49]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ADD14_9:%.*]] = add nsw i32 [[ADD14_8]], [[TMP50]]
+// CHECK-NEXT: store i32 [[ADD14_9]], ptr [[OUT]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ARRAYIDX11_1_9:%.*]] = getelementptr inbounds i32, ptr [[TMP49]], i64 1
+// CHECK-NEXT: [[TMP51:%.*]] = load i32, ptr [[ARRAYIDX11_1_9]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ADD14_1_9:%.*]] = add nsw i32 [[ADD14_1_8]], [[TMP51]]
+// CHECK-NEXT: store i32 [[ADD14_1_9]], ptr [[ARRAYIDX13_1]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ARRAYIDX11_2_9:%.*]] = getelementptr inbounds i32, ptr [[TMP49]], i64 2
+// CHECK-NEXT: [[TMP52:%.*]] = load i32, ptr [[ARRAYIDX11_2_9]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ADD14_2_9:%.*]] = add nsw i32 [[ADD14_2_8]], [[TMP52]]
+// CHECK-NEXT: store i32 [[ADD14_2_9]], ptr [[ARRAYIDX13_2]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ARRAYIDX11_3_9:%.*]] = getelementptr inbounds i32, ptr [[TMP49]], i64 3
+// CHECK-NEXT: [[TMP53:%.*]] = load i32, ptr [[ARRAYIDX11_3_9]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ADD14_3_9:%.*]] = add nsw i32 [[ADD14_3_8]], [[TMP53]]
+// CHECK-NEXT: store i32 [[ADD14_3_9]], ptr [[ARRAYIDX13_3]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[CMP1_10:%.*]] = icmp eq i32 [[DIMS]], 10
+// CHECK-NEXT: br i1 [[CMP1_10]], label [[CLEANUP]], label [[IF_END_10:%.*]]
+// CHECK: if.end.10:
+// CHECK-NEXT: [[ARRAYIDX_10:%.*]] = getelementptr inbounds ptr, ptr [[ARR]], i64 10
+// CHECK-NEXT: [[TMP54:%.*]] = load ptr, ptr [[ARRAYIDX_10]], align 8, !tbaa [[TBAA3]]
+// CHECK-NEXT: [[TMP55:%.*]] = load i32, ptr [[TMP54]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ADD14_10:%.*]] = add nsw i32 [[ADD14_9]], [[TMP55]]
+// CHECK-NEXT: store i32 [[ADD14_10]], ptr [[OUT]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ARRAYIDX11_1_10:%.*]] = getelementptr inbounds i32, ptr [[TMP54]], i64 1
+// CHECK-NEXT: [[TMP56:%.*]] = load i32, ptr [[ARRAYIDX11_1_10]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ADD14_1_10:%.*]] = add nsw i32 [[ADD14_1_9]], [[TMP56]]
+// CHECK-NEXT: store i32 [[ADD14_1_10]], ptr [[ARRAYIDX13_1]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ARRAYIDX11_2_10:%.*]] = getelementptr inbounds i32, ptr [[TMP54]], i64 2
+// CHECK-NEXT: [[TMP57:%.*]] = load i32, ptr [[ARRAYIDX11_2_10]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ADD14_2_10:%.*]] = add nsw i32 [[ADD14_2_9]], [[TMP57]]
+// CHECK-NEXT: store i32 [[ADD14_2_10]], ptr [[ARRAYIDX13_2]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ARRAYIDX11_3_10:%.*]] = getelementptr inbounds i32, ptr [[TMP54]], i64 3
+// CHECK-NEXT: [[TMP58:%.*]] = load i32, ptr [[ARRAYIDX11_3_10]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ADD14_3_10:%.*]] = add nsw i32 [[ADD14_3_9]], [[TMP58]]
+// CHECK-NEXT: store i32 [[ADD14_3_10]], ptr [[ARRAYIDX13_3]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[CMP1_11:%.*]] = icmp eq i32 [[DIMS]], 11
+// CHECK-NEXT: br i1 [[CMP1_11]], label [[CLEANUP]], label [[IF_END_11:%.*]]
+// CHECK: if.end.11:
+// CHECK-NEXT: [[ARRAYIDX_11:%.*]] = getelementptr inbounds ptr, ptr [[ARR]], i64 11
+// CHECK-NEXT: [[TMP59:%.*]] = load ptr, ptr [[ARRAYIDX_11]], align 8, !tbaa [[TBAA3]]
+// CHECK-NEXT: [[TMP60:%.*]] = load i32, ptr [[TMP59]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ADD14_11:%.*]] = add nsw i32 [[ADD14_10]], [[TMP60]]
+// CHECK-NEXT: store i32 [[ADD14_11]], ptr [[OUT]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ARRAYIDX11_1_11:%.*]] = getelementptr inbounds i32, ptr [[TMP59]], i64 1
+// CHECK-NEXT: [[TMP61:%.*]] = load i32, ptr [[ARRAYIDX11_1_11]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ADD14_1_11:%.*]] = add nsw i32 [[ADD14_1_10]], [[TMP61]]
+// CHECK-NEXT: store i32 [[ADD14_1_11]], ptr [[ARRAYIDX13_1]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ARRAYIDX11_2_11:%.*]] = getelement...
[truncated]
|
@llvm/pr-subscribers-llvm-transforms Author: None (xiangzh1) ChangesLoop header with constant usually can be optimized in unroll, folding branch in such loop header at SimplifyCFG will break unroll optimization. Patch is 48.66 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/74268.diff 3 Files Affected:
diff --git a/clang/test/CodeGenCUDA/simplify-cfg-unroll.cu b/clang/test/CodeGenCUDA/simplify-cfg-unroll.cu
new file mode 100644
index 0000000000000..ecb421f9fc85c
--- /dev/null
+++ b/clang/test/CodeGenCUDA/simplify-cfg-unroll.cu
@@ -0,0 +1,364 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --version 4
+// REQUIRES: amdgpu-registered-target
+// REQUIRES: x86-registered-target
+// RUN: %clang_cc1 -O2 "-aux-triple" "x86_64-unknown-linux-gnu" "-triple" "amdgcn-amd-amdhsa" \
+// RUN: -fcuda-is-device "-aux-target-cpu" "x86-64" -emit-llvm -o - %s | FileCheck %s
+
+#include "Inputs/cuda.h"
+
+// CHECK-LABEL: define dso_local void @_Z4funciPPiiS_(
+// CHECK-SAME: i32 noundef [[IDX:%.*]], ptr nocapture noundef readonly [[ARR:%.*]], i32 noundef [[DIMS:%.*]], ptr nocapture noundef [[OUT:%.*]]) local_unnamed_addr #[[ATTR0:[0-9]+]] {
+// CHECK-NEXT: entry:
+// CHECK-NEXT: [[CMP1:%.*]] = icmp eq i32 [[DIMS]], 0
+// CHECK-NEXT: br i1 [[CMP1]], label [[CLEANUP:%.*]], label [[IF_END:%.*]]
+// CHECK: if.end:
+// CHECK-NEXT: [[TMP0:%.*]] = load ptr, ptr [[ARR]], align 8, !tbaa [[TBAA3:![0-9]+]]
+// CHECK-NEXT: [[TMP1:%.*]] = load i32, ptr [[TMP0]], align 4, !tbaa [[TBAA7:![0-9]+]]
+// CHECK-NEXT: [[TMP2:%.*]] = load i32, ptr [[OUT]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ADD14:%.*]] = add nsw i32 [[TMP2]], [[TMP1]]
+// CHECK-NEXT: store i32 [[ADD14]], ptr [[OUT]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ARRAYIDX11_1:%.*]] = getelementptr inbounds i32, ptr [[TMP0]], i64 1
+// CHECK-NEXT: [[TMP3:%.*]] = load i32, ptr [[ARRAYIDX11_1]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ARRAYIDX13_1:%.*]] = getelementptr inbounds i32, ptr [[OUT]], i64 1
+// CHECK-NEXT: [[TMP4:%.*]] = load i32, ptr [[ARRAYIDX13_1]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ADD14_1:%.*]] = add nsw i32 [[TMP4]], [[TMP3]]
+// CHECK-NEXT: store i32 [[ADD14_1]], ptr [[ARRAYIDX13_1]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ARRAYIDX11_2:%.*]] = getelementptr inbounds i32, ptr [[TMP0]], i64 2
+// CHECK-NEXT: [[TMP5:%.*]] = load i32, ptr [[ARRAYIDX11_2]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ARRAYIDX13_2:%.*]] = getelementptr inbounds i32, ptr [[OUT]], i64 2
+// CHECK-NEXT: [[TMP6:%.*]] = load i32, ptr [[ARRAYIDX13_2]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ADD14_2:%.*]] = add nsw i32 [[TMP6]], [[TMP5]]
+// CHECK-NEXT: store i32 [[ADD14_2]], ptr [[ARRAYIDX13_2]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ARRAYIDX11_3:%.*]] = getelementptr inbounds i32, ptr [[TMP0]], i64 3
+// CHECK-NEXT: [[TMP7:%.*]] = load i32, ptr [[ARRAYIDX11_3]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ARRAYIDX13_3:%.*]] = getelementptr inbounds i32, ptr [[OUT]], i64 3
+// CHECK-NEXT: [[TMP8:%.*]] = load i32, ptr [[ARRAYIDX13_3]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ADD14_3:%.*]] = add nsw i32 [[TMP8]], [[TMP7]]
+// CHECK-NEXT: store i32 [[ADD14_3]], ptr [[ARRAYIDX13_3]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[CMP1_1:%.*]] = icmp eq i32 [[DIMS]], 1
+// CHECK-NEXT: br i1 [[CMP1_1]], label [[CLEANUP]], label [[IF_END_1:%.*]]
+// CHECK: if.end.1:
+// CHECK-NEXT: [[ARRAYIDX_1:%.*]] = getelementptr inbounds ptr, ptr [[ARR]], i64 1
+// CHECK-NEXT: [[TMP9:%.*]] = load ptr, ptr [[ARRAYIDX_1]], align 8, !tbaa [[TBAA3]]
+// CHECK-NEXT: [[TMP10:%.*]] = load i32, ptr [[TMP9]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ADD14_129:%.*]] = add nsw i32 [[ADD14]], [[TMP10]]
+// CHECK-NEXT: store i32 [[ADD14_129]], ptr [[OUT]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ARRAYIDX11_1_1:%.*]] = getelementptr inbounds i32, ptr [[TMP9]], i64 1
+// CHECK-NEXT: [[TMP11:%.*]] = load i32, ptr [[ARRAYIDX11_1_1]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ADD14_1_1:%.*]] = add nsw i32 [[ADD14_1]], [[TMP11]]
+// CHECK-NEXT: store i32 [[ADD14_1_1]], ptr [[ARRAYIDX13_1]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ARRAYIDX11_2_1:%.*]] = getelementptr inbounds i32, ptr [[TMP9]], i64 2
+// CHECK-NEXT: [[TMP12:%.*]] = load i32, ptr [[ARRAYIDX11_2_1]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ADD14_2_1:%.*]] = add nsw i32 [[ADD14_2]], [[TMP12]]
+// CHECK-NEXT: store i32 [[ADD14_2_1]], ptr [[ARRAYIDX13_2]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ARRAYIDX11_3_1:%.*]] = getelementptr inbounds i32, ptr [[TMP9]], i64 3
+// CHECK-NEXT: [[TMP13:%.*]] = load i32, ptr [[ARRAYIDX11_3_1]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ADD14_3_1:%.*]] = add nsw i32 [[ADD14_3]], [[TMP13]]
+// CHECK-NEXT: store i32 [[ADD14_3_1]], ptr [[ARRAYIDX13_3]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[CMP1_2:%.*]] = icmp eq i32 [[DIMS]], 2
+// CHECK-NEXT: br i1 [[CMP1_2]], label [[CLEANUP]], label [[IF_END_2:%.*]]
+// CHECK: if.end.2:
+// CHECK-NEXT: [[ARRAYIDX_2:%.*]] = getelementptr inbounds ptr, ptr [[ARR]], i64 2
+// CHECK-NEXT: [[TMP14:%.*]] = load ptr, ptr [[ARRAYIDX_2]], align 8, !tbaa [[TBAA3]]
+// CHECK-NEXT: [[TMP15:%.*]] = load i32, ptr [[TMP14]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ADD14_230:%.*]] = add nsw i32 [[ADD14_129]], [[TMP15]]
+// CHECK-NEXT: store i32 [[ADD14_230]], ptr [[OUT]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ARRAYIDX11_1_2:%.*]] = getelementptr inbounds i32, ptr [[TMP14]], i64 1
+// CHECK-NEXT: [[TMP16:%.*]] = load i32, ptr [[ARRAYIDX11_1_2]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ADD14_1_2:%.*]] = add nsw i32 [[ADD14_1_1]], [[TMP16]]
+// CHECK-NEXT: store i32 [[ADD14_1_2]], ptr [[ARRAYIDX13_1]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ARRAYIDX11_2_2:%.*]] = getelementptr inbounds i32, ptr [[TMP14]], i64 2
+// CHECK-NEXT: [[TMP17:%.*]] = load i32, ptr [[ARRAYIDX11_2_2]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ADD14_2_2:%.*]] = add nsw i32 [[ADD14_2_1]], [[TMP17]]
+// CHECK-NEXT: store i32 [[ADD14_2_2]], ptr [[ARRAYIDX13_2]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ARRAYIDX11_3_2:%.*]] = getelementptr inbounds i32, ptr [[TMP14]], i64 3
+// CHECK-NEXT: [[TMP18:%.*]] = load i32, ptr [[ARRAYIDX11_3_2]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ADD14_3_2:%.*]] = add nsw i32 [[ADD14_3_1]], [[TMP18]]
+// CHECK-NEXT: store i32 [[ADD14_3_2]], ptr [[ARRAYIDX13_3]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[CMP1_3:%.*]] = icmp eq i32 [[DIMS]], 3
+// CHECK-NEXT: br i1 [[CMP1_3]], label [[CLEANUP]], label [[IF_END_3:%.*]]
+// CHECK: if.end.3:
+// CHECK-NEXT: [[ARRAYIDX_3:%.*]] = getelementptr inbounds ptr, ptr [[ARR]], i64 3
+// CHECK-NEXT: [[TMP19:%.*]] = load ptr, ptr [[ARRAYIDX_3]], align 8, !tbaa [[TBAA3]]
+// CHECK-NEXT: [[TMP20:%.*]] = load i32, ptr [[TMP19]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ADD14_331:%.*]] = add nsw i32 [[ADD14_230]], [[TMP20]]
+// CHECK-NEXT: store i32 [[ADD14_331]], ptr [[OUT]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ARRAYIDX11_1_3:%.*]] = getelementptr inbounds i32, ptr [[TMP19]], i64 1
+// CHECK-NEXT: [[TMP21:%.*]] = load i32, ptr [[ARRAYIDX11_1_3]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ADD14_1_3:%.*]] = add nsw i32 [[ADD14_1_2]], [[TMP21]]
+// CHECK-NEXT: store i32 [[ADD14_1_3]], ptr [[ARRAYIDX13_1]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ARRAYIDX11_2_3:%.*]] = getelementptr inbounds i32, ptr [[TMP19]], i64 2
+// CHECK-NEXT: [[TMP22:%.*]] = load i32, ptr [[ARRAYIDX11_2_3]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ADD14_2_3:%.*]] = add nsw i32 [[ADD14_2_2]], [[TMP22]]
+// CHECK-NEXT: store i32 [[ADD14_2_3]], ptr [[ARRAYIDX13_2]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ARRAYIDX11_3_3:%.*]] = getelementptr inbounds i32, ptr [[TMP19]], i64 3
+// CHECK-NEXT: [[TMP23:%.*]] = load i32, ptr [[ARRAYIDX11_3_3]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ADD14_3_3:%.*]] = add nsw i32 [[ADD14_3_2]], [[TMP23]]
+// CHECK-NEXT: store i32 [[ADD14_3_3]], ptr [[ARRAYIDX13_3]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[CMP1_4:%.*]] = icmp eq i32 [[DIMS]], 4
+// CHECK-NEXT: br i1 [[CMP1_4]], label [[CLEANUP]], label [[IF_END_4:%.*]]
+// CHECK: if.end.4:
+// CHECK-NEXT: [[ARRAYIDX_4:%.*]] = getelementptr inbounds ptr, ptr [[ARR]], i64 4
+// CHECK-NEXT: [[TMP24:%.*]] = load ptr, ptr [[ARRAYIDX_4]], align 8, !tbaa [[TBAA3]]
+// CHECK-NEXT: [[TMP25:%.*]] = load i32, ptr [[TMP24]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ADD14_4:%.*]] = add nsw i32 [[ADD14_331]], [[TMP25]]
+// CHECK-NEXT: store i32 [[ADD14_4]], ptr [[OUT]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ARRAYIDX11_1_4:%.*]] = getelementptr inbounds i32, ptr [[TMP24]], i64 1
+// CHECK-NEXT: [[TMP26:%.*]] = load i32, ptr [[ARRAYIDX11_1_4]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ADD14_1_4:%.*]] = add nsw i32 [[ADD14_1_3]], [[TMP26]]
+// CHECK-NEXT: store i32 [[ADD14_1_4]], ptr [[ARRAYIDX13_1]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ARRAYIDX11_2_4:%.*]] = getelementptr inbounds i32, ptr [[TMP24]], i64 2
+// CHECK-NEXT: [[TMP27:%.*]] = load i32, ptr [[ARRAYIDX11_2_4]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ADD14_2_4:%.*]] = add nsw i32 [[ADD14_2_3]], [[TMP27]]
+// CHECK-NEXT: store i32 [[ADD14_2_4]], ptr [[ARRAYIDX13_2]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ARRAYIDX11_3_4:%.*]] = getelementptr inbounds i32, ptr [[TMP24]], i64 3
+// CHECK-NEXT: [[TMP28:%.*]] = load i32, ptr [[ARRAYIDX11_3_4]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ADD14_3_4:%.*]] = add nsw i32 [[ADD14_3_3]], [[TMP28]]
+// CHECK-NEXT: store i32 [[ADD14_3_4]], ptr [[ARRAYIDX13_3]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[CMP1_5:%.*]] = icmp eq i32 [[DIMS]], 5
+// CHECK-NEXT: br i1 [[CMP1_5]], label [[CLEANUP]], label [[IF_END_5:%.*]]
+// CHECK: if.end.5:
+// CHECK-NEXT: [[ARRAYIDX_5:%.*]] = getelementptr inbounds ptr, ptr [[ARR]], i64 5
+// CHECK-NEXT: [[TMP29:%.*]] = load ptr, ptr [[ARRAYIDX_5]], align 8, !tbaa [[TBAA3]]
+// CHECK-NEXT: [[TMP30:%.*]] = load i32, ptr [[TMP29]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ADD14_5:%.*]] = add nsw i32 [[ADD14_4]], [[TMP30]]
+// CHECK-NEXT: store i32 [[ADD14_5]], ptr [[OUT]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ARRAYIDX11_1_5:%.*]] = getelementptr inbounds i32, ptr [[TMP29]], i64 1
+// CHECK-NEXT: [[TMP31:%.*]] = load i32, ptr [[ARRAYIDX11_1_5]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ADD14_1_5:%.*]] = add nsw i32 [[ADD14_1_4]], [[TMP31]]
+// CHECK-NEXT: store i32 [[ADD14_1_5]], ptr [[ARRAYIDX13_1]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ARRAYIDX11_2_5:%.*]] = getelementptr inbounds i32, ptr [[TMP29]], i64 2
+// CHECK-NEXT: [[TMP32:%.*]] = load i32, ptr [[ARRAYIDX11_2_5]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ADD14_2_5:%.*]] = add nsw i32 [[ADD14_2_4]], [[TMP32]]
+// CHECK-NEXT: store i32 [[ADD14_2_5]], ptr [[ARRAYIDX13_2]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ARRAYIDX11_3_5:%.*]] = getelementptr inbounds i32, ptr [[TMP29]], i64 3
+// CHECK-NEXT: [[TMP33:%.*]] = load i32, ptr [[ARRAYIDX11_3_5]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ADD14_3_5:%.*]] = add nsw i32 [[ADD14_3_4]], [[TMP33]]
+// CHECK-NEXT: store i32 [[ADD14_3_5]], ptr [[ARRAYIDX13_3]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[CMP1_6:%.*]] = icmp eq i32 [[DIMS]], 6
+// CHECK-NEXT: br i1 [[CMP1_6]], label [[CLEANUP]], label [[IF_END_6:%.*]]
+// CHECK: if.end.6:
+// CHECK-NEXT: [[ARRAYIDX_6:%.*]] = getelementptr inbounds ptr, ptr [[ARR]], i64 6
+// CHECK-NEXT: [[TMP34:%.*]] = load ptr, ptr [[ARRAYIDX_6]], align 8, !tbaa [[TBAA3]]
+// CHECK-NEXT: [[TMP35:%.*]] = load i32, ptr [[TMP34]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ADD14_6:%.*]] = add nsw i32 [[ADD14_5]], [[TMP35]]
+// CHECK-NEXT: store i32 [[ADD14_6]], ptr [[OUT]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ARRAYIDX11_1_6:%.*]] = getelementptr inbounds i32, ptr [[TMP34]], i64 1
+// CHECK-NEXT: [[TMP36:%.*]] = load i32, ptr [[ARRAYIDX11_1_6]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ADD14_1_6:%.*]] = add nsw i32 [[ADD14_1_5]], [[TMP36]]
+// CHECK-NEXT: store i32 [[ADD14_1_6]], ptr [[ARRAYIDX13_1]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ARRAYIDX11_2_6:%.*]] = getelementptr inbounds i32, ptr [[TMP34]], i64 2
+// CHECK-NEXT: [[TMP37:%.*]] = load i32, ptr [[ARRAYIDX11_2_6]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ADD14_2_6:%.*]] = add nsw i32 [[ADD14_2_5]], [[TMP37]]
+// CHECK-NEXT: store i32 [[ADD14_2_6]], ptr [[ARRAYIDX13_2]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ARRAYIDX11_3_6:%.*]] = getelementptr inbounds i32, ptr [[TMP34]], i64 3
+// CHECK-NEXT: [[TMP38:%.*]] = load i32, ptr [[ARRAYIDX11_3_6]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ADD14_3_6:%.*]] = add nsw i32 [[ADD14_3_5]], [[TMP38]]
+// CHECK-NEXT: store i32 [[ADD14_3_6]], ptr [[ARRAYIDX13_3]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[CMP1_7:%.*]] = icmp eq i32 [[DIMS]], 7
+// CHECK-NEXT: br i1 [[CMP1_7]], label [[CLEANUP]], label [[IF_END_7:%.*]]
+// CHECK: if.end.7:
+// CHECK-NEXT: [[ARRAYIDX_7:%.*]] = getelementptr inbounds ptr, ptr [[ARR]], i64 7
+// CHECK-NEXT: [[TMP39:%.*]] = load ptr, ptr [[ARRAYIDX_7]], align 8, !tbaa [[TBAA3]]
+// CHECK-NEXT: [[TMP40:%.*]] = load i32, ptr [[TMP39]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ADD14_7:%.*]] = add nsw i32 [[ADD14_6]], [[TMP40]]
+// CHECK-NEXT: store i32 [[ADD14_7]], ptr [[OUT]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ARRAYIDX11_1_7:%.*]] = getelementptr inbounds i32, ptr [[TMP39]], i64 1
+// CHECK-NEXT: [[TMP41:%.*]] = load i32, ptr [[ARRAYIDX11_1_7]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ADD14_1_7:%.*]] = add nsw i32 [[ADD14_1_6]], [[TMP41]]
+// CHECK-NEXT: store i32 [[ADD14_1_7]], ptr [[ARRAYIDX13_1]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ARRAYIDX11_2_7:%.*]] = getelementptr inbounds i32, ptr [[TMP39]], i64 2
+// CHECK-NEXT: [[TMP42:%.*]] = load i32, ptr [[ARRAYIDX11_2_7]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ADD14_2_7:%.*]] = add nsw i32 [[ADD14_2_6]], [[TMP42]]
+// CHECK-NEXT: store i32 [[ADD14_2_7]], ptr [[ARRAYIDX13_2]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ARRAYIDX11_3_7:%.*]] = getelementptr inbounds i32, ptr [[TMP39]], i64 3
+// CHECK-NEXT: [[TMP43:%.*]] = load i32, ptr [[ARRAYIDX11_3_7]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ADD14_3_7:%.*]] = add nsw i32 [[ADD14_3_6]], [[TMP43]]
+// CHECK-NEXT: store i32 [[ADD14_3_7]], ptr [[ARRAYIDX13_3]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[CMP1_8:%.*]] = icmp eq i32 [[DIMS]], 8
+// CHECK-NEXT: br i1 [[CMP1_8]], label [[CLEANUP]], label [[IF_END_8:%.*]]
+// CHECK: if.end.8:
+// CHECK-NEXT: [[ARRAYIDX_8:%.*]] = getelementptr inbounds ptr, ptr [[ARR]], i64 8
+// CHECK-NEXT: [[TMP44:%.*]] = load ptr, ptr [[ARRAYIDX_8]], align 8, !tbaa [[TBAA3]]
+// CHECK-NEXT: [[TMP45:%.*]] = load i32, ptr [[TMP44]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ADD14_8:%.*]] = add nsw i32 [[ADD14_7]], [[TMP45]]
+// CHECK-NEXT: store i32 [[ADD14_8]], ptr [[OUT]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ARRAYIDX11_1_8:%.*]] = getelementptr inbounds i32, ptr [[TMP44]], i64 1
+// CHECK-NEXT: [[TMP46:%.*]] = load i32, ptr [[ARRAYIDX11_1_8]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ADD14_1_8:%.*]] = add nsw i32 [[ADD14_1_7]], [[TMP46]]
+// CHECK-NEXT: store i32 [[ADD14_1_8]], ptr [[ARRAYIDX13_1]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ARRAYIDX11_2_8:%.*]] = getelementptr inbounds i32, ptr [[TMP44]], i64 2
+// CHECK-NEXT: [[TMP47:%.*]] = load i32, ptr [[ARRAYIDX11_2_8]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ADD14_2_8:%.*]] = add nsw i32 [[ADD14_2_7]], [[TMP47]]
+// CHECK-NEXT: store i32 [[ADD14_2_8]], ptr [[ARRAYIDX13_2]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ARRAYIDX11_3_8:%.*]] = getelementptr inbounds i32, ptr [[TMP44]], i64 3
+// CHECK-NEXT: [[TMP48:%.*]] = load i32, ptr [[ARRAYIDX11_3_8]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ADD14_3_8:%.*]] = add nsw i32 [[ADD14_3_7]], [[TMP48]]
+// CHECK-NEXT: store i32 [[ADD14_3_8]], ptr [[ARRAYIDX13_3]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[CMP1_9:%.*]] = icmp eq i32 [[DIMS]], 9
+// CHECK-NEXT: br i1 [[CMP1_9]], label [[CLEANUP]], label [[IF_END_9:%.*]]
+// CHECK: if.end.9:
+// CHECK-NEXT: [[ARRAYIDX_9:%.*]] = getelementptr inbounds ptr, ptr [[ARR]], i64 9
+// CHECK-NEXT: [[TMP49:%.*]] = load ptr, ptr [[ARRAYIDX_9]], align 8, !tbaa [[TBAA3]]
+// CHECK-NEXT: [[TMP50:%.*]] = load i32, ptr [[TMP49]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ADD14_9:%.*]] = add nsw i32 [[ADD14_8]], [[TMP50]]
+// CHECK-NEXT: store i32 [[ADD14_9]], ptr [[OUT]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ARRAYIDX11_1_9:%.*]] = getelementptr inbounds i32, ptr [[TMP49]], i64 1
+// CHECK-NEXT: [[TMP51:%.*]] = load i32, ptr [[ARRAYIDX11_1_9]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ADD14_1_9:%.*]] = add nsw i32 [[ADD14_1_8]], [[TMP51]]
+// CHECK-NEXT: store i32 [[ADD14_1_9]], ptr [[ARRAYIDX13_1]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ARRAYIDX11_2_9:%.*]] = getelementptr inbounds i32, ptr [[TMP49]], i64 2
+// CHECK-NEXT: [[TMP52:%.*]] = load i32, ptr [[ARRAYIDX11_2_9]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ADD14_2_9:%.*]] = add nsw i32 [[ADD14_2_8]], [[TMP52]]
+// CHECK-NEXT: store i32 [[ADD14_2_9]], ptr [[ARRAYIDX13_2]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ARRAYIDX11_3_9:%.*]] = getelementptr inbounds i32, ptr [[TMP49]], i64 3
+// CHECK-NEXT: [[TMP53:%.*]] = load i32, ptr [[ARRAYIDX11_3_9]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ADD14_3_9:%.*]] = add nsw i32 [[ADD14_3_8]], [[TMP53]]
+// CHECK-NEXT: store i32 [[ADD14_3_9]], ptr [[ARRAYIDX13_3]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[CMP1_10:%.*]] = icmp eq i32 [[DIMS]], 10
+// CHECK-NEXT: br i1 [[CMP1_10]], label [[CLEANUP]], label [[IF_END_10:%.*]]
+// CHECK: if.end.10:
+// CHECK-NEXT: [[ARRAYIDX_10:%.*]] = getelementptr inbounds ptr, ptr [[ARR]], i64 10
+// CHECK-NEXT: [[TMP54:%.*]] = load ptr, ptr [[ARRAYIDX_10]], align 8, !tbaa [[TBAA3]]
+// CHECK-NEXT: [[TMP55:%.*]] = load i32, ptr [[TMP54]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ADD14_10:%.*]] = add nsw i32 [[ADD14_9]], [[TMP55]]
+// CHECK-NEXT: store i32 [[ADD14_10]], ptr [[OUT]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ARRAYIDX11_1_10:%.*]] = getelementptr inbounds i32, ptr [[TMP54]], i64 1
+// CHECK-NEXT: [[TMP56:%.*]] = load i32, ptr [[ARRAYIDX11_1_10]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ADD14_1_10:%.*]] = add nsw i32 [[ADD14_1_9]], [[TMP56]]
+// CHECK-NEXT: store i32 [[ADD14_1_10]], ptr [[ARRAYIDX13_1]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ARRAYIDX11_2_10:%.*]] = getelementptr inbounds i32, ptr [[TMP54]], i64 2
+// CHECK-NEXT: [[TMP57:%.*]] = load i32, ptr [[ARRAYIDX11_2_10]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ADD14_2_10:%.*]] = add nsw i32 [[ADD14_2_9]], [[TMP57]]
+// CHECK-NEXT: store i32 [[ADD14_2_10]], ptr [[ARRAYIDX13_2]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ARRAYIDX11_3_10:%.*]] = getelementptr inbounds i32, ptr [[TMP54]], i64 3
+// CHECK-NEXT: [[TMP58:%.*]] = load i32, ptr [[ARRAYIDX11_3_10]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ADD14_3_10:%.*]] = add nsw i32 [[ADD14_3_9]], [[TMP58]]
+// CHECK-NEXT: store i32 [[ADD14_3_10]], ptr [[ARRAYIDX13_3]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[CMP1_11:%.*]] = icmp eq i32 [[DIMS]], 11
+// CHECK-NEXT: br i1 [[CMP1_11]], label [[CLEANUP]], label [[IF_END_11:%.*]]
+// CHECK: if.end.11:
+// CHECK-NEXT: [[ARRAYIDX_11:%.*]] = getelementptr inbounds ptr, ptr [[ARR]], i64 11
+// CHECK-NEXT: [[TMP59:%.*]] = load ptr, ptr [[ARRAYIDX_11]], align 8, !tbaa [[TBAA3]]
+// CHECK-NEXT: [[TMP60:%.*]] = load i32, ptr [[TMP59]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ADD14_11:%.*]] = add nsw i32 [[ADD14_10]], [[TMP60]]
+// CHECK-NEXT: store i32 [[ADD14_11]], ptr [[OUT]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ARRAYIDX11_1_11:%.*]] = getelementptr inbounds i32, ptr [[TMP59]], i64 1
+// CHECK-NEXT: [[TMP61:%.*]] = load i32, ptr [[ARRAYIDX11_1_11]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ADD14_1_11:%.*]] = add nsw i32 [[ADD14_1_10]], [[TMP61]]
+// CHECK-NEXT: store i32 [[ADD14_1_11]], ptr [[ARRAYIDX13_1]], align 4, !tbaa [[TBAA7]]
+// CHECK-NEXT: [[ARRAYIDX11_2_11:%.*]] = getelement...
[truncated]
|
Seems the check fail "dr2xx.cpp Line 1297: conversion from 'T' to 'unsigned long long' is ambiguous" has no relation with this change. |
We should add TTI check for the condition. I believe don't unroll on X86 is a correct decision. And I think you need to precommit tests first. |
In fact, there is no direct/strong relation with stack cost, it mostly base on unroll or not (or other loop optimizations). Maybe we should check "unroll" info (e.g #pragma unroll, any targets with this hint should try best to unroll too) before do or not do this folding. A little trouble is loop info is not well established now. |
Yeah, it is a problem we unroll or not. But this case generally I believe we don't want to unroll it. |
9f3ff6f
to
e963223
Compare
Update to handle unroll hint |
We need at lease one more IR test. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LoopUnroll supports upper bound unrolling. Why is it not working in this case?
Let me try |
After the branches fodling, the old loop condition "I < LoopCount" changed/disapeared, I think unroll can not make sure the "upper bound" |
Can you please share the IR before the unroll pass? |
with this patch: 4523 ; *** IR Dump After LoopDeletionPass on for.body *** 4581 ; *** IR Dump After LoopFullUnrollPass on for.body (invalidated) *** without this patch: |
I first guess the trip.count maybe too small, but I changed the iteration Num from 16 to 1600, without this patch, it still not unroll. |
AMDGPU can not unorll this case: https://godbolt.org/z/4Pq3bnzTT But the same code in X86 looks can unroll: https://godbolt.org/z/zr8aTG1KW We may need to continue debug on it. |
X86 do very conservative unroll too,its upper bound send to 4 (default is 8), if we not fold the loop branch, it can fully unroll (16) |
I think we should follow this principle: |
The ideal is right. But I think what Nikic say is loop unroll should handle the case( upper bound unrolling). But It doesn't work. We need to find why loop unroll doesn't work (maybe in UnrollRuntimeLoopRemainder), then can check if it can do in loop unroll or stop the SimpilfyCfg's transform.
So where is the different X86 can partial unroll but AMDGPU can not unroll at all? |
https://godbolt.org/z/cMeE61bhf |
1 In fact, I didn't much care about the different unroll between different targets. The loop unroll pass consider the TTI port, it is make sense to me "one do partial unroll or not" or "partial unroll with different unroll count". 2 I am creating the mininal IR test. (I'll replace current .cu test with it, duo to I use -O2 in current test) thanks again! |
e963223
to
6a80c39
Compare
Update: add ir test llvm/test/Transforms/SimplifyCFG/simplify-cfg-unroll.ll |
Constant iteration loop with unroll hint usually expected do unroll by consumers, folding branches in such loop header at SimplifyCFG will break unroll optimization.
6a80c39
to
cbcc7f3
Compare
rebase |
I checked, and for your test case, LoopUnroll recognizes the loop as an UpperBound unrolling candidate, but does not perform unrolling due to cost model. The pragma unroll metadata currently only takes effect if there is an exact trip count, but not if there is an upper bound trip count. Making it work with an upper bound trip count as well should fix your case. See the code in shouldPragmaUnroll(). |
First, many thanks for checking the test. |
I think you are confusing upper bound and runtime unrolling. An upper bound unroll is a type of full unroll. In this case it would unroll it to 16 iterations, which is what you want, no? |
Yes!: ) I thouth the upper bound is "partial unroll", and I am supprised if it can fully unroll to 16 iterations with the branch folding, anyway let me check shouldPragmaUnroll(), thank you a lot! |
Hi friends, I created a new PR at #74703, many thanks for reviewing!! |
c5043e5
to
cbcc7f3
Compare
[SimplifyCFG] Not folding branch in constant loops which expected unroll
for example:
#program unroll
for (int I = 0; I < ConstNum; ++I) { // ConstNum > 1
if (Cond2) {
break;
}
xxx loop body;
}
Folding these conditional branches may break loop unroll.