Skip to content

[PowerPC] Fix incorrect store alignment for __builtin_vsx_build_pair() #108606

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 23, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion clang/lib/CodeGen/CGBuiltin.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -18175,7 +18175,7 @@ Value *CodeGenFunction::EmitPPCBuiltinExpr(unsigned BuiltinID,
CallOps.push_back(Ops[i]);
llvm::Function *F = CGM.getIntrinsic(ID);
Value *Call = Builder.CreateCall(F, CallOps);
return Builder.CreateAlignedStore(Call, Ops[0], MaybeAlign(64));
return Builder.CreateAlignedStore(Call, Ops[0], MaybeAlign());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return Builder.CreateAlignedStore(Call, Ops[0], MaybeAlign());
return Builder.CreateStore(Call, Ops[0]);

Should work, I think? No need to use CreateAlignedStore if you want a naturally aligned store.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to just explicitly write the expected alignment here, though; LLVM DataLayout rules for alignment can be a bit strange.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to just explicitly write the expected alignment here,

@efriedma-quic Do you mean like hard code the expected alignment?

Currently CreateAlignedStore(Call, Ops[0], MaybeAlign()); calls:

StoreInst *CreateAlignedStore(Value *Val, Value *Ptr, MaybeAlign Align,
                                bool isVolatile = false) {
    if (!Align) {
      const DataLayout &DL = BB->getDataLayout();
      Align = DL.getABITypeAlign(Val->getType());
    }
    return Insert(new StoreInst(Val, Ptr, isVolatile, *Align));
  }

and get the proper ABI alignment for the type given. Since this section of code process types that are either 32bit or 64bit aligned, we can't hard code any alignment here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, didn't realize there were multiple different types coming through here. It's not worth it to duplicate the implementation of CreateStore... @nikic suggestion is fine, then.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nikic Using CreateStore(Value, Address) will require extra code to convertOps[0[ to type Address.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, it looks like CGBuilder doesn't directly re-export this IRBuilder API, it's called CreateDefaultAlignedStore() instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see a FIXME for this function, do we still want to use it?

  // FIXME: these "default-aligned" APIs should be removed,
  // but I don't feel like fixing all the builtin code right now.
  llvm::StoreInst *CreateDefaultAlignedStore(llvm::Value *Val,
                                             llvm::Value *Addr,
                                             bool IsVolatile = false) 

Since these need to be aligned stores, maybe it's more clear to just use CreateAlignedStore()?
The patch that added this FIXME:

commit 7f416cc426384ad1f891addb61d93e7ca1ffa0f2
Author: John McCall <[email protected]>
Date:   Tue Sep 8 08:05:57 2015 +0000

Explicitly asks:

    I partially punted on applying this work to CGBuiltin.  Please
    do not add more uses of the CreateDefaultAligned{Load,Store}
    APIs; they will be going away eventually.

    llvm-svn: 246985

It is an old patch so I am not sure how much of that is still true...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, right, I forgot we were trying to get rid of those at one point... that's probably never happening for LLVM in general, but maybe for clang it'll get finished at some point.

Looking at the code again, it's calling EmitPointerWithAlignment, so you can just take the alignment from that. Well, actually, it looks like there's a bug where we emit the first argument twice, but with that fixed, you should be able to do that.

}

case PPC::BI__builtin_ppc_compare_and_swap:
Expand Down
2 changes: 1 addition & 1 deletion clang/test/CodeGen/PowerPC/builtins-ppc-build-pair-mma.c
Original file line number Diff line number Diff line change
Expand Up @@ -99,7 +99,7 @@ void test1(unsigned char *vqp, unsigned char *vpp, vector unsigned char vc1, vec
// CHECK-LE-NOOPT-NEXT: [[TMP4:%.*]] = load <16 x i8>, ptr [[VC1_ADDR]], align 16
// CHECK-LE-NOOPT-NEXT: [[TMP5:%.*]] = load <16 x i8>, ptr [[VC2_ADDR]], align 16
// CHECK-LE-NOOPT-NEXT: [[TMP6:%.*]] = call <256 x i1> @llvm.ppc.vsx.assemble.pair(<16 x i8> [[TMP5]], <16 x i8> [[TMP4]])
// CHECK-LE-NOOPT-NEXT: store <256 x i1> [[TMP6]], ptr [[RES]], align 64
// CHECK-LE-NOOPT-NEXT: store <256 x i1> [[TMP6]], ptr [[RES]], align 32
// CHECK-LE-NOOPT-NEXT: [[TMP7:%.*]] = load <256 x i1>, ptr [[RES]], align 32
// CHECK-LE-NOOPT-NEXT: [[TMP8:%.*]] = load ptr, ptr [[RESP_ADDR]], align 8
// CHECK-LE-NOOPT-NEXT: store <256 x i1> [[TMP7]], ptr [[TMP8]], align 32
Expand Down
8 changes: 4 additions & 4 deletions clang/test/CodeGen/PowerPC/builtins-ppc-pair-mma-types.c
Original file line number Diff line number Diff line change
Expand Up @@ -85,11 +85,11 @@ void testVQLocal(int *ptr, vector unsigned char vc) {
// CHECK-NEXT: [[TMP3:%.*]] = load <16 x i8>, ptr [[VC_ADDR]], align 16
// CHECK-NEXT: [[TMP4:%.*]] = load <16 x i8>, ptr [[VC_ADDR]], align 16
// CHECK-NEXT: [[TMP5:%.*]] = call <256 x i1> @llvm.ppc.vsx.assemble.pair(<16 x i8> [[TMP3]], <16 x i8> [[TMP4]])
// CHECK-NEXT: store <256 x i1> [[TMP5]], ptr [[VP2]], align 64
// CHECK-NEXT: store <256 x i1> [[TMP5]], ptr [[VP2]], align 32
// CHECK-NEXT: [[TMP6:%.*]] = load <16 x i8>, ptr [[VC_ADDR]], align 16
// CHECK-NEXT: [[TMP7:%.*]] = load <16 x i8>, ptr [[VC_ADDR]], align 16
// CHECK-NEXT: [[TMP8:%.*]] = call <256 x i1> @llvm.ppc.vsx.assemble.pair(<16 x i8> [[TMP7]], <16 x i8> [[TMP6]])
// CHECK-NEXT: store <256 x i1> [[TMP8]], ptr [[VP2]], align 64
// CHECK-NEXT: store <256 x i1> [[TMP8]], ptr [[VP2]], align 32
// CHECK-NEXT: [[TMP9:%.*]] = load <256 x i1>, ptr [[VP3]], align 32
// CHECK-NEXT: [[TMP10:%.*]] = load <16 x i8>, ptr [[VC_ADDR]], align 16
// CHECK-NEXT: [[TMP11:%.*]] = call <512 x i1> @llvm.ppc.mma.xvf64ger(<256 x i1> [[TMP9]], <16 x i8> [[TMP10]])
Expand Down Expand Up @@ -118,11 +118,11 @@ void testVQLocal(int *ptr, vector unsigned char vc) {
// CHECK-BE-NEXT: [[TMP3:%.*]] = load <16 x i8>, ptr [[VC_ADDR]], align 16
// CHECK-BE-NEXT: [[TMP4:%.*]] = load <16 x i8>, ptr [[VC_ADDR]], align 16
// CHECK-BE-NEXT: [[TMP5:%.*]] = call <256 x i1> @llvm.ppc.vsx.assemble.pair(<16 x i8> [[TMP3]], <16 x i8> [[TMP4]])
// CHECK-BE-NEXT: store <256 x i1> [[TMP5]], ptr [[VP2]], align 64
// CHECK-BE-NEXT: store <256 x i1> [[TMP5]], ptr [[VP2]], align 32
// CHECK-BE-NEXT: [[TMP6:%.*]] = load <16 x i8>, ptr [[VC_ADDR]], align 16
// CHECK-BE-NEXT: [[TMP7:%.*]] = load <16 x i8>, ptr [[VC_ADDR]], align 16
// CHECK-BE-NEXT: [[TMP8:%.*]] = call <256 x i1> @llvm.ppc.vsx.assemble.pair(<16 x i8> [[TMP6]], <16 x i8> [[TMP7]])
// CHECK-BE-NEXT: store <256 x i1> [[TMP8]], ptr [[VP2]], align 64
// CHECK-BE-NEXT: store <256 x i1> [[TMP8]], ptr [[VP2]], align 32
// CHECK-BE-NEXT: [[TMP9:%.*]] = load <256 x i1>, ptr [[VP3]], align 32
// CHECK-BE-NEXT: [[TMP10:%.*]] = load <16 x i8>, ptr [[VC_ADDR]], align 16
// CHECK-BE-NEXT: [[TMP11:%.*]] = call <512 x i1> @llvm.ppc.mma.xvf64ger(<256 x i1> [[TMP9]], <16 x i8> [[TMP10]])
Expand Down
Loading