-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[PowerPC] Fix incorrect store alignment for __builtin_vsx_build_pair() #108606
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-clang @llvm/pr-subscribers-clang-codegen Author: Lei Huang (lei137) ChangesFixes #107229 Full diff: https://github.com/llvm/llvm-project/pull/108606.diff 3 Files Affected:
diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index a52e880a764252..5db5ad5470c366 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -18197,7 +18197,7 @@ Value *CodeGenFunction::EmitPPCBuiltinExpr(unsigned BuiltinID,
CallOps.push_back(Ops[i]);
llvm::Function *F = CGM.getIntrinsic(ID);
Value *Call = Builder.CreateCall(F, CallOps);
- return Builder.CreateAlignedStore(Call, Ops[0], MaybeAlign(64));
+ return Builder.CreateAlignedStore(Call, Ops[0], MaybeAlign());
}
case PPC::BI__builtin_ppc_compare_and_swap:
diff --git a/clang/test/CodeGen/PowerPC/builtins-ppc-build-pair-mma.c b/clang/test/CodeGen/PowerPC/builtins-ppc-build-pair-mma.c
index 8a2bc93dd6cd0a..cdbfdd6b7975ad 100644
--- a/clang/test/CodeGen/PowerPC/builtins-ppc-build-pair-mma.c
+++ b/clang/test/CodeGen/PowerPC/builtins-ppc-build-pair-mma.c
@@ -99,7 +99,7 @@ void test1(unsigned char *vqp, unsigned char *vpp, vector unsigned char vc1, vec
// CHECK-LE-NOOPT-NEXT: [[TMP4:%.*]] = load <16 x i8>, ptr [[VC1_ADDR]], align 16
// CHECK-LE-NOOPT-NEXT: [[TMP5:%.*]] = load <16 x i8>, ptr [[VC2_ADDR]], align 16
// CHECK-LE-NOOPT-NEXT: [[TMP6:%.*]] = call <256 x i1> @llvm.ppc.vsx.assemble.pair(<16 x i8> [[TMP5]], <16 x i8> [[TMP4]])
-// CHECK-LE-NOOPT-NEXT: store <256 x i1> [[TMP6]], ptr [[RES]], align 64
+// CHECK-LE-NOOPT-NEXT: store <256 x i1> [[TMP6]], ptr [[RES]], align 32
// CHECK-LE-NOOPT-NEXT: [[TMP7:%.*]] = load <256 x i1>, ptr [[RES]], align 32
// CHECK-LE-NOOPT-NEXT: [[TMP8:%.*]] = load ptr, ptr [[RESP_ADDR]], align 8
// CHECK-LE-NOOPT-NEXT: store <256 x i1> [[TMP7]], ptr [[TMP8]], align 32
diff --git a/clang/test/CodeGen/PowerPC/builtins-ppc-pair-mma-types.c b/clang/test/CodeGen/PowerPC/builtins-ppc-pair-mma-types.c
index 39c040967dc0c3..b18bb3ad050aca 100644
--- a/clang/test/CodeGen/PowerPC/builtins-ppc-pair-mma-types.c
+++ b/clang/test/CodeGen/PowerPC/builtins-ppc-pair-mma-types.c
@@ -85,11 +85,11 @@ void testVQLocal(int *ptr, vector unsigned char vc) {
// CHECK-NEXT: [[TMP3:%.*]] = load <16 x i8>, ptr [[VC_ADDR]], align 16
// CHECK-NEXT: [[TMP4:%.*]] = load <16 x i8>, ptr [[VC_ADDR]], align 16
// CHECK-NEXT: [[TMP5:%.*]] = call <256 x i1> @llvm.ppc.vsx.assemble.pair(<16 x i8> [[TMP3]], <16 x i8> [[TMP4]])
-// CHECK-NEXT: store <256 x i1> [[TMP5]], ptr [[VP2]], align 64
+// CHECK-NEXT: store <256 x i1> [[TMP5]], ptr [[VP2]], align 32
// CHECK-NEXT: [[TMP6:%.*]] = load <16 x i8>, ptr [[VC_ADDR]], align 16
// CHECK-NEXT: [[TMP7:%.*]] = load <16 x i8>, ptr [[VC_ADDR]], align 16
// CHECK-NEXT: [[TMP8:%.*]] = call <256 x i1> @llvm.ppc.vsx.assemble.pair(<16 x i8> [[TMP7]], <16 x i8> [[TMP6]])
-// CHECK-NEXT: store <256 x i1> [[TMP8]], ptr [[VP2]], align 64
+// CHECK-NEXT: store <256 x i1> [[TMP8]], ptr [[VP2]], align 32
// CHECK-NEXT: [[TMP9:%.*]] = load <256 x i1>, ptr [[VP3]], align 32
// CHECK-NEXT: [[TMP10:%.*]] = load <16 x i8>, ptr [[VC_ADDR]], align 16
// CHECK-NEXT: [[TMP11:%.*]] = call <512 x i1> @llvm.ppc.mma.xvf64ger(<256 x i1> [[TMP9]], <16 x i8> [[TMP10]])
@@ -118,11 +118,11 @@ void testVQLocal(int *ptr, vector unsigned char vc) {
// CHECK-BE-NEXT: [[TMP3:%.*]] = load <16 x i8>, ptr [[VC_ADDR]], align 16
// CHECK-BE-NEXT: [[TMP4:%.*]] = load <16 x i8>, ptr [[VC_ADDR]], align 16
// CHECK-BE-NEXT: [[TMP5:%.*]] = call <256 x i1> @llvm.ppc.vsx.assemble.pair(<16 x i8> [[TMP3]], <16 x i8> [[TMP4]])
-// CHECK-BE-NEXT: store <256 x i1> [[TMP5]], ptr [[VP2]], align 64
+// CHECK-BE-NEXT: store <256 x i1> [[TMP5]], ptr [[VP2]], align 32
// CHECK-BE-NEXT: [[TMP6:%.*]] = load <16 x i8>, ptr [[VC_ADDR]], align 16
// CHECK-BE-NEXT: [[TMP7:%.*]] = load <16 x i8>, ptr [[VC_ADDR]], align 16
// CHECK-BE-NEXT: [[TMP8:%.*]] = call <256 x i1> @llvm.ppc.vsx.assemble.pair(<16 x i8> [[TMP6]], <16 x i8> [[TMP7]])
-// CHECK-BE-NEXT: store <256 x i1> [[TMP8]], ptr [[VP2]], align 64
+// CHECK-BE-NEXT: store <256 x i1> [[TMP8]], ptr [[VP2]], align 32
// CHECK-BE-NEXT: [[TMP9:%.*]] = load <256 x i1>, ptr [[VP3]], align 32
// CHECK-BE-NEXT: [[TMP10:%.*]] = load <16 x i8>, ptr [[VC_ADDR]], align 16
// CHECK-BE-NEXT: [[TMP11:%.*]] = call <512 x i1> @llvm.ppc.mma.xvf64ger(<256 x i1> [[TMP9]], <16 x i8> [[TMP10]])
|
@llvm/pr-subscribers-backend-powerpc Author: Lei Huang (lei137) ChangesFixes #107229 Full diff: https://github.com/llvm/llvm-project/pull/108606.diff 3 Files Affected:
diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index a52e880a764252..5db5ad5470c366 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -18197,7 +18197,7 @@ Value *CodeGenFunction::EmitPPCBuiltinExpr(unsigned BuiltinID,
CallOps.push_back(Ops[i]);
llvm::Function *F = CGM.getIntrinsic(ID);
Value *Call = Builder.CreateCall(F, CallOps);
- return Builder.CreateAlignedStore(Call, Ops[0], MaybeAlign(64));
+ return Builder.CreateAlignedStore(Call, Ops[0], MaybeAlign());
}
case PPC::BI__builtin_ppc_compare_and_swap:
diff --git a/clang/test/CodeGen/PowerPC/builtins-ppc-build-pair-mma.c b/clang/test/CodeGen/PowerPC/builtins-ppc-build-pair-mma.c
index 8a2bc93dd6cd0a..cdbfdd6b7975ad 100644
--- a/clang/test/CodeGen/PowerPC/builtins-ppc-build-pair-mma.c
+++ b/clang/test/CodeGen/PowerPC/builtins-ppc-build-pair-mma.c
@@ -99,7 +99,7 @@ void test1(unsigned char *vqp, unsigned char *vpp, vector unsigned char vc1, vec
// CHECK-LE-NOOPT-NEXT: [[TMP4:%.*]] = load <16 x i8>, ptr [[VC1_ADDR]], align 16
// CHECK-LE-NOOPT-NEXT: [[TMP5:%.*]] = load <16 x i8>, ptr [[VC2_ADDR]], align 16
// CHECK-LE-NOOPT-NEXT: [[TMP6:%.*]] = call <256 x i1> @llvm.ppc.vsx.assemble.pair(<16 x i8> [[TMP5]], <16 x i8> [[TMP4]])
-// CHECK-LE-NOOPT-NEXT: store <256 x i1> [[TMP6]], ptr [[RES]], align 64
+// CHECK-LE-NOOPT-NEXT: store <256 x i1> [[TMP6]], ptr [[RES]], align 32
// CHECK-LE-NOOPT-NEXT: [[TMP7:%.*]] = load <256 x i1>, ptr [[RES]], align 32
// CHECK-LE-NOOPT-NEXT: [[TMP8:%.*]] = load ptr, ptr [[RESP_ADDR]], align 8
// CHECK-LE-NOOPT-NEXT: store <256 x i1> [[TMP7]], ptr [[TMP8]], align 32
diff --git a/clang/test/CodeGen/PowerPC/builtins-ppc-pair-mma-types.c b/clang/test/CodeGen/PowerPC/builtins-ppc-pair-mma-types.c
index 39c040967dc0c3..b18bb3ad050aca 100644
--- a/clang/test/CodeGen/PowerPC/builtins-ppc-pair-mma-types.c
+++ b/clang/test/CodeGen/PowerPC/builtins-ppc-pair-mma-types.c
@@ -85,11 +85,11 @@ void testVQLocal(int *ptr, vector unsigned char vc) {
// CHECK-NEXT: [[TMP3:%.*]] = load <16 x i8>, ptr [[VC_ADDR]], align 16
// CHECK-NEXT: [[TMP4:%.*]] = load <16 x i8>, ptr [[VC_ADDR]], align 16
// CHECK-NEXT: [[TMP5:%.*]] = call <256 x i1> @llvm.ppc.vsx.assemble.pair(<16 x i8> [[TMP3]], <16 x i8> [[TMP4]])
-// CHECK-NEXT: store <256 x i1> [[TMP5]], ptr [[VP2]], align 64
+// CHECK-NEXT: store <256 x i1> [[TMP5]], ptr [[VP2]], align 32
// CHECK-NEXT: [[TMP6:%.*]] = load <16 x i8>, ptr [[VC_ADDR]], align 16
// CHECK-NEXT: [[TMP7:%.*]] = load <16 x i8>, ptr [[VC_ADDR]], align 16
// CHECK-NEXT: [[TMP8:%.*]] = call <256 x i1> @llvm.ppc.vsx.assemble.pair(<16 x i8> [[TMP7]], <16 x i8> [[TMP6]])
-// CHECK-NEXT: store <256 x i1> [[TMP8]], ptr [[VP2]], align 64
+// CHECK-NEXT: store <256 x i1> [[TMP8]], ptr [[VP2]], align 32
// CHECK-NEXT: [[TMP9:%.*]] = load <256 x i1>, ptr [[VP3]], align 32
// CHECK-NEXT: [[TMP10:%.*]] = load <16 x i8>, ptr [[VC_ADDR]], align 16
// CHECK-NEXT: [[TMP11:%.*]] = call <512 x i1> @llvm.ppc.mma.xvf64ger(<256 x i1> [[TMP9]], <16 x i8> [[TMP10]])
@@ -118,11 +118,11 @@ void testVQLocal(int *ptr, vector unsigned char vc) {
// CHECK-BE-NEXT: [[TMP3:%.*]] = load <16 x i8>, ptr [[VC_ADDR]], align 16
// CHECK-BE-NEXT: [[TMP4:%.*]] = load <16 x i8>, ptr [[VC_ADDR]], align 16
// CHECK-BE-NEXT: [[TMP5:%.*]] = call <256 x i1> @llvm.ppc.vsx.assemble.pair(<16 x i8> [[TMP3]], <16 x i8> [[TMP4]])
-// CHECK-BE-NEXT: store <256 x i1> [[TMP5]], ptr [[VP2]], align 64
+// CHECK-BE-NEXT: store <256 x i1> [[TMP5]], ptr [[VP2]], align 32
// CHECK-BE-NEXT: [[TMP6:%.*]] = load <16 x i8>, ptr [[VC_ADDR]], align 16
// CHECK-BE-NEXT: [[TMP7:%.*]] = load <16 x i8>, ptr [[VC_ADDR]], align 16
// CHECK-BE-NEXT: [[TMP8:%.*]] = call <256 x i1> @llvm.ppc.vsx.assemble.pair(<16 x i8> [[TMP6]], <16 x i8> [[TMP7]])
-// CHECK-BE-NEXT: store <256 x i1> [[TMP8]], ptr [[VP2]], align 64
+// CHECK-BE-NEXT: store <256 x i1> [[TMP8]], ptr [[VP2]], align 32
// CHECK-BE-NEXT: [[TMP9:%.*]] = load <256 x i1>, ptr [[VP3]], align 32
// CHECK-BE-NEXT: [[TMP10:%.*]] = load <16 x i8>, ptr [[VC_ADDR]], align 16
// CHECK-BE-NEXT: [[TMP11:%.*]] = call <512 x i1> @llvm.ppc.mma.xvf64ger(<256 x i1> [[TMP9]], <16 x i8> [[TMP10]])
|
@@ -18197,7 +18197,7 @@ Value *CodeGenFunction::EmitPPCBuiltinExpr(unsigned BuiltinID, | |||
CallOps.push_back(Ops[i]); | |||
llvm::Function *F = CGM.getIntrinsic(ID); | |||
Value *Call = Builder.CreateCall(F, CallOps); | |||
return Builder.CreateAlignedStore(Call, Ops[0], MaybeAlign(64)); | |||
return Builder.CreateAlignedStore(Call, Ops[0], MaybeAlign()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return Builder.CreateAlignedStore(Call, Ops[0], MaybeAlign()); | |
return Builder.CreateStore(Call, Ops[0]); |
Should work, I think? No need to use CreateAlignedStore if you want a naturally aligned store.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to just explicitly write the expected alignment here, though; LLVM DataLayout rules for alignment can be a bit strange.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to just explicitly write the expected alignment here,
@efriedma-quic Do you mean like hard code the expected alignment?
Currently CreateAlignedStore(Call, Ops[0], MaybeAlign());
calls:
StoreInst *CreateAlignedStore(Value *Val, Value *Ptr, MaybeAlign Align,
bool isVolatile = false) {
if (!Align) {
const DataLayout &DL = BB->getDataLayout();
Align = DL.getABITypeAlign(Val->getType());
}
return Insert(new StoreInst(Val, Ptr, isVolatile, *Align));
}
and get the proper ABI alignment for the type given. Since this section of code process types that are either 32bit or 64bit aligned, we can't hard code any alignment here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, didn't realize there were multiple different types coming through here. It's not worth it to duplicate the implementation of CreateStore... @nikic suggestion is fine, then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nikic Using CreateStore(Value, Address)
will require extra code to convertOps[0[
to type Address
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, it looks like CGBuilder doesn't directly re-export this IRBuilder API, it's called CreateDefaultAlignedStore() instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see a FIXME for this function, do we still want to use it?
// FIXME: these "default-aligned" APIs should be removed,
// but I don't feel like fixing all the builtin code right now.
llvm::StoreInst *CreateDefaultAlignedStore(llvm::Value *Val,
llvm::Value *Addr,
bool IsVolatile = false)
Since these need to be aligned stores, maybe it's more clear to just use CreateAlignedStore()
?
The patch that added this FIXME:
commit 7f416cc426384ad1f891addb61d93e7ca1ffa0f2
Author: John McCall <[email protected]>
Date: Tue Sep 8 08:05:57 2015 +0000
Explicitly asks:
I partially punted on applying this work to CGBuiltin. Please
do not add more uses of the CreateDefaultAligned{Load,Store}
APIs; they will be going away eventually.
llvm-svn: 246985
It is an old patch so I am not sure how much of that is still true...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, right, I forgot we were trying to get rid of those at one point... that's probably never happening for LLVM in general, but maybe for clang it'll get finished at some point.
Looking at the code again, it's calling EmitPointerWithAlignment, so you can just take the alignment from that. Well, actually, it looks like there's a bug where we emit the first argument twice, but with that fixed, you should be able to do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, don't think exactly how the builder call looks like here is important...
2c26898
to
7d0d507
Compare
Will this affect the code that is generated for ‘__vector_quad’ values? There is no direct way to load and store those so they will use pair loads and stores. |
Not AFAIK. The change will get the alignment from the type given. The LIT test being updated also have |
Fixes #107229