Skip to content

[SelectionDAG] Remove UnsafeFPMath check in visitFADDForFMACombine #127770

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jun 25, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 14 additions & 7 deletions llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -16619,8 +16619,8 @@ SDValue DAGCombiner::visitFADDForFMACombine(SDNode *N) {
if (!HasFMAD && !HasFMA)
return SDValue();

bool AllowFusionGlobally = (Options.AllowFPOpFusion == FPOpFusion::Fast ||
Options.UnsafeFPMath || HasFMAD);
bool AllowFusionGlobally =
Options.AllowFPOpFusion == FPOpFusion::Fast || HasFMAD;
// If the addition is not contractable, do not combine.
if (!AllowFusionGlobally && !N->getFlags().hasAllowContract())
return SDValue();
Expand Down Expand Up @@ -17826,6 +17826,7 @@ template <class MatchContextClass> SDValue DAGCombiner::visitFMA(SDNode *N) {
SDValue N2 = N->getOperand(2);
ConstantFPSDNode *N0CFP = dyn_cast<ConstantFPSDNode>(N0);
ConstantFPSDNode *N1CFP = dyn_cast<ConstantFPSDNode>(N1);
ConstantFPSDNode *N2CFP = dyn_cast<ConstantFPSDNode>(N2);
EVT VT = N->getValueType(0);
SDLoc DL(N);
const TargetOptions &Options = DAG.getTarget().Options;
Expand Down Expand Up @@ -17855,11 +17856,17 @@ template <class MatchContextClass> SDValue DAGCombiner::visitFMA(SDNode *N) {
}

// FIXME: use fast math flags instead of Options.UnsafeFPMath
if (Options.UnsafeFPMath) {
if (N0CFP && N0CFP->isZero())
return N2;
if (N1CFP && N1CFP->isZero())
return N2;
// TODO: Finally migrate away from global TargetOptions.
if (Options.AllowFPOpFusion == FPOpFusion::Fast ||
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the "Options.AllowFPOpFusion == FPOpFusion::Fast" part of the condition is incorrect: -fp-contract=fast means you can translate mul/add to fma globally. It does not mean you can ignore NaNs. Furthermore, no tests fail if I remove this part of the OR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created PR #146592 to fix.

(Options.NoNaNsFPMath && Options.NoInfsFPMath) ||
(N->getFlags().hasNoNaNs() && N->getFlags().hasNoInfs())) {
if (Options.NoSignedZerosFPMath || N->getFlags().hasNoSignedZeros() ||
(N2CFP && !N2CFP->isExactlyValue(-0.0))) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have tests for this negative zero constant case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a simple test for X86
Finally we should merge it into SelectionDAG::foldConstantFPMath when it support unary and ternary operators.

if (N0CFP && N0CFP->isZero())
return N2;
if (N1CFP && N1CFP->isZero())
return N2;
}
}

// FIXME: Support splat of constant.
Expand Down
10 changes: 4 additions & 6 deletions llvm/test/CodeGen/AArch64/arm64-fp-contract-zero.ll
Original file line number Diff line number Diff line change
@@ -1,18 +1,16 @@
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; RUN: llc -mtriple=arm64 -fp-contract=fast -o - %s | FileCheck %s
; RUN: llc -mtriple=arm64 -o - %s | FileCheck %s


; Make sure we don't try to fold an fneg into +0.0, creating an illegal constant
; -0.0. It's also good, though not essential, that we don't resort to a litpool.
define double @test_fms_fold(double %a, double %b) {
; CHECK-LABEL: test_fms_fold:
; CHECK: // %bb.0:
; CHECK-NEXT: movi d2, #0000000000000000
; CHECK-NEXT: fmul d1, d1, d2
; CHECK-NEXT: fnmsub d0, d0, d2, d1
; CHECK-NEXT: movi {{d[0-9]+}}, #0000000000000000
; CHECK-NEXT: ret
%mul = fmul double %a, 0.000000e+00
%mul1 = fmul double %b, 0.000000e+00
%mul = fmul fast double %a, 0.000000e+00
%mul1 = fmul fast double %b, 0.000000e+00
Comment on lines +12 to +13
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
%mul = fmul fast double %a, 0.000000e+00
%mul1 = fmul fast double %b, 0.000000e+00
%mul = fmul contract double %a, 0.000000e+00
%mul1 = fmul contract double %b, 0.000000e+00
Suggested change
%mul = fmul fast double %a, 0.000000e+00
%mul1 = fmul fast double %b, 0.000000e+00
%mul = fmul fast double %a, 0.000000e+00
%mul1 = fmul fast double %b, 0.000000e+00

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this test case should keep fast here, because the initial version of this test is

; RUN: llc -mtriple=arm64 -fp-contract=fast -o - %s | FileCheck %s


; Make sure we don't try to fold an fneg into +0.0, creating an illegal constant
; -0.0. It's also good, though not essential, that we don't resort to a litpool.
define double @test_fms_fold(double %a, double %b) {
; CHECK-LABEL: test_fms_fold:
; CHECK: fmov {{d[0-9]+}}, xzr
; CHECK: ret
  %mul = fmul double %a, 0.000000e+00
  %mul1 = fmul double %b, 0.000000e+00
  %sub = fsub double %mul, %mul1
  ret double %sub
}

which ensures constant folding do not generate -0.0 on arm.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 tests?

Copy link
Contributor Author

@paperchalice paperchalice Jun 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test was introduced in 820e041 and was regenerated in d5f1131, which seems like another regression. See https://reviews.llvm.org/D99586

%sub = fsub double %mul, %mul1
ret double %sub
}
237 changes: 200 additions & 37 deletions llvm/test/CodeGen/AMDGPU/fdot2.ll
Original file line number Diff line number Diff line change
@@ -1,28 +1,53 @@
; RUN: llc -mtriple=amdgcn -mcpu=gfx900 -denormal-fp-math-f32=preserve-sign -enable-unsafe-fp-math -verify-machineinstrs < %s | FileCheck %s -check-prefixes=GCN,GFX900
; RUN: llc -mtriple=amdgcn -mcpu=gfx906 -denormal-fp-math-f32=preserve-sign -enable-unsafe-fp-math -verify-machineinstrs < %s | FileCheck %s -check-prefixes=GCN,GCN-DL-UNSAFE,GFX906-DL-UNSAFE
; RUN: llc -mtriple=amdgcn -mcpu=gfx1011 -denormal-fp-math-f32=preserve-sign -enable-unsafe-fp-math -verify-machineinstrs < %s | FileCheck %s -check-prefixes=GCN,GCN-DL-UNSAFE,GFX10-DL-UNSAFE,GFX10-CONTRACT
; RUN: llc -mtriple=amdgcn -mcpu=gfx1012 -denormal-fp-math-f32=preserve-sign -enable-unsafe-fp-math -verify-machineinstrs < %s | FileCheck %s -check-prefixes=GCN,GCN-DL-UNSAFE,GFX10-DL-UNSAFE,GFX10-CONTRACT
; RUN: llc -mtriple=amdgcn -mcpu=gfx900 -denormal-fp-math-f32=preserve-sign -verify-machineinstrs < %s | FileCheck %s -check-prefixes=GCN,GFX900
; RUN: llc -mtriple=amdgcn -mcpu=gfx906 -denormal-fp-math-f32=preserve-sign -verify-machineinstrs < %s | FileCheck %s -check-prefixes=GCN,GCN-DL-UNSAFE,GFX906-DL-UNSAFE
; RUN: llc -mtriple=amdgcn -mcpu=gfx1011 -denormal-fp-math-f32=preserve-sign -verify-machineinstrs < %s | FileCheck %s -check-prefixes=GCN,GCN-DL-UNSAFE,GFX10-DL-UNSAFE,GFX10-CONTRACT
; RUN: llc -mtriple=amdgcn -mcpu=gfx1012 -denormal-fp-math-f32=preserve-sign -verify-machineinstrs < %s | FileCheck %s -check-prefixes=GCN,GCN-DL-UNSAFE,GFX10-DL-UNSAFE,GFX10-CONTRACT
; RUN: llc -mtriple=amdgcn -mcpu=gfx906 -denormal-fp-math-f32=preserve-sign -verify-machineinstrs < %s | FileCheck %s -check-prefixes=GCN,GFX906
; RUN: llc -mtriple=amdgcn -mcpu=gfx906 -denormal-fp-math=preserve-sign -fp-contract=fast -verify-machineinstrs < %s | FileCheck %s -check-prefixes=GCN,GFX906-CONTRACT
; RUN: llc -mtriple=amdgcn -mcpu=gfx906 -denormal-fp-math=ieee -fp-contract=fast -verify-machineinstrs < %s | FileCheck %s -check-prefixes=GCN,GFX906-DENORM-CONTRACT
; RUN: llc -mtriple=amdgcn -mcpu=gfx906 -denormal-fp-math-f32=preserve-sign -enable-unsafe-fp-math -mattr="+dot7-insts,-dot10-insts" -verify-machineinstrs < %s | FileCheck %s -check-prefixes=GCN,GFX906-DOT10-DISABLED
; RUN: llc -mtriple=amdgcn -mcpu=gfx906 -denormal-fp-math-f32=preserve-sign -mattr="+dot7-insts,-dot10-insts" -verify-machineinstrs < %s | FileCheck %s -check-prefixes=GCN,GFX906-DOT10-DISABLED
; (fadd (fmul S1.x, S2.x), (fadd (fmul (S1.y, S2.y), z))) -> (fdot2 S1, S2, z)

; Tests to make sure fdot2 is not generated when vector elements of dot-product expressions
; are not converted from f16 to f32.
; GCN-LABEL: {{^}}dotproduct_f16
; GCN-LABEL: {{^}}dotproduct_f16_contract
; GFX900: v_fma_f16
; GFX900: v_fma_f16

; GFX906: v_mul_f16_e32
; GFX906: v_mul_f16_e32

; GFX906-DL-UNSAFE: v_fma_f16
; GFX10-CONTRACT: v_fmac_f16

; GFX906-CONTRACT: v_mac_f16_e32
; GFX906-DENORM-CONTRACT: v_fma_f16
; GFX906-DOT10-DISABLED: v_fma_f16

define amdgpu_kernel void @dotproduct_f16_contract(ptr addrspace(1) %src1,
ptr addrspace(1) %src2,
ptr addrspace(1) nocapture %dst) {
entry:
%src1.vec = load <2 x half>, ptr addrspace(1) %src1
%src2.vec = load <2 x half>, ptr addrspace(1) %src2

%src1.el1 = extractelement <2 x half> %src1.vec, i64 0
%src2.el1 = extractelement <2 x half> %src2.vec, i64 0

%src1.el2 = extractelement <2 x half> %src1.vec, i64 1
%src2.el2 = extractelement <2 x half> %src2.vec, i64 1

%mul2 = fmul contract half %src1.el2, %src2.el2
%mul1 = fmul contract half %src1.el1, %src2.el1
%acc = load half, ptr addrspace(1) %dst, align 2
%acc1 = fadd contract half %mul2, %acc
%acc2 = fadd contract half %mul1, %acc1
store half %acc2, ptr addrspace(1) %dst, align 2
ret void
}

; GCN-LABEL: {{^}}dotproduct_f16

; GFX906: v_mul_f16_e32
; GFX906: v_mul_f16_e32

define amdgpu_kernel void @dotproduct_f16(ptr addrspace(1) %src1,
ptr addrspace(1) %src2,
ptr addrspace(1) nocapture %dst) {
Expand All @@ -45,18 +70,12 @@ entry:
ret void
}


; We only want to generate fdot2 if:
; - vector element of dot product is converted from f16 to f32, and
; - the vectors are of type <2 x half>, and
; - "dot10-insts" is enabled

; GCN-LABEL: {{^}}dotproduct_f16_f32
; GFX900: v_mad_mix_f32
; GFX900: v_mad_mix_f32

; GFX906: v_mad_f32
; GFX906: v_mac_f32_e32
; GCN-LABEL: {{^}}dotproduct_f16_f32_contract

; GFX906-DL-UNSAFE: v_dot2_f32_f16
; GFX10-DL-UNSAFE: v_dot2c_f32_f16
Expand All @@ -65,6 +84,39 @@ entry:

; GFX906-DENORM-CONTRACT: v_dot2_f32_f16
; GFX906-DOT10-DISABLED: v_fma_mix_f32
define amdgpu_kernel void @dotproduct_f16_f32_contract(ptr addrspace(1) %src1,
ptr addrspace(1) %src2,
ptr addrspace(1) nocapture %dst) {
entry:
%src1.vec = load <2 x half>, ptr addrspace(1) %src1
%src2.vec = load <2 x half>, ptr addrspace(1) %src2

%src1.el1 = extractelement <2 x half> %src1.vec, i64 0
%csrc1.el1 = fpext half %src1.el1 to float
%src2.el1 = extractelement <2 x half> %src2.vec, i64 0
%csrc2.el1 = fpext half %src2.el1 to float

%src1.el2 = extractelement <2 x half> %src1.vec, i64 1
%csrc1.el2 = fpext half %src1.el2 to float
%src2.el2 = extractelement <2 x half> %src2.vec, i64 1
%csrc2.el2 = fpext half %src2.el2 to float

%mul2 = fmul contract float %csrc1.el2, %csrc2.el2
%mul1 = fmul contract float %csrc1.el1, %csrc2.el1
%acc = load float, ptr addrspace(1) %dst, align 4
%acc1 = fadd contract float %mul2, %acc
%acc2 = fadd contract float %mul1, %acc1
store float %acc2, ptr addrspace(1) %dst, align 4
ret void
}

; GCN-LABEL: {{^}}dotproduct_f16_f32
; GFX900: v_mad_mix_f32
; GFX900: v_mad_mix_f32

; GFX906: v_mad_f32
; GFX906: v_mac_f32_e32

define amdgpu_kernel void @dotproduct_f16_f32(ptr addrspace(1) %src1,
ptr addrspace(1) %src2,
ptr addrspace(1) nocapture %dst) {
Expand Down Expand Up @@ -96,19 +148,46 @@ entry:
; - the vectors are of type <2 x half>, and
; - "dot10-insts" is enabled

; GCN-LABEL: {{^}}dotproduct_diffvecorder_contract
; GFX906-DL-UNSAFE: v_dot2_f32_f16
; GFX10-DL-UNSAFE: v_dot2c_f32_f16

; GFX906-CONTRACT: v_dot2_f32_f16
; GFX906-DENORM-CONTRACT: v_dot2_f32_f16
; GFX906-DOT10-DISABLED: v_fma_mix_f32
define amdgpu_kernel void @dotproduct_diffvecorder_contract(ptr addrspace(1) %src1,
ptr addrspace(1) %src2,
ptr addrspace(1) nocapture %dst) {
entry:
%src1.vec = load <2 x half>, ptr addrspace(1) %src1
%src2.vec = load <2 x half>, ptr addrspace(1) %src2

%src1.el1 = extractelement <2 x half> %src1.vec, i64 0
%csrc1.el1 = fpext half %src1.el1 to float
%src2.el1 = extractelement <2 x half> %src2.vec, i64 0
%csrc2.el1 = fpext half %src2.el1 to float

%src1.el2 = extractelement <2 x half> %src1.vec, i64 1
%csrc1.el2 = fpext half %src1.el2 to float
%src2.el2 = extractelement <2 x half> %src2.vec, i64 1
%csrc2.el2 = fpext half %src2.el2 to float

%mul2 = fmul contract float %csrc2.el2, %csrc1.el2
%mul1 = fmul contract float %csrc1.el1, %csrc2.el1
%acc = load float, ptr addrspace(1) %dst, align 4
%acc1 = fadd contract float %mul2, %acc
%acc2 = fadd contract float %mul1, %acc1
store float %acc2, ptr addrspace(1) %dst, align 4
ret void
}

; GCN-LABEL: {{^}}dotproduct_diffvecorder
; GFX900: v_mad_mix_f32
; GFX900: v_mad_mix_f32

; GFX906: v_mad_f32
; GFX906: v_mac_f32_e32

; GFX906-DL-UNSAFE: v_dot2_f32_f16
; GFX10-DL-UNSAFE: v_dot2c_f32_f16

; GFX906-CONTRACT: v_dot2_f32_f16
; GFX906-DENORM-CONTRACT: v_dot2_f32_f16
; GFX906-DOT10-DISABLED: v_fma_mix_f32
define amdgpu_kernel void @dotproduct_diffvecorder(ptr addrspace(1) %src1,
ptr addrspace(1) %src2,
ptr addrspace(1) nocapture %dst) {
Expand Down Expand Up @@ -136,17 +215,45 @@ entry:
}

; Tests to make sure dot product is not generated when the vectors are not of <2 x half>.
; GCN-LABEL: {{^}}dotproduct_v4f16
; GFX900: v_mad_mix_f32

; GFX906: v_mad_f32
; GFX906: v_mac_f32_e32
; GCN-LABEL: {{^}}dotproduct_v4f16_contract

; GCN-DL-UNSAFE: v_fma_mix_f32

; GFX906-CONTRACT: v_fma_mix_f32
; GFX906-DENORM-CONTRACT: v_fma_mix_f32
; GFX906-DOT10-DISABLED: v_fma_mix_f32
define amdgpu_kernel void @dotproduct_v4f16_contract(ptr addrspace(1) %src1,
ptr addrspace(1) %src2,
ptr addrspace(1) nocapture %dst) {
entry:
%src1.vec = load <4 x half>, ptr addrspace(1) %src1
%src2.vec = load <4 x half>, ptr addrspace(1) %src2

%src1.el1 = extractelement <4 x half> %src1.vec, i64 0
%csrc1.el1 = fpext half %src1.el1 to float
%src2.el1 = extractelement <4 x half> %src2.vec, i64 0
%csrc2.el1 = fpext half %src2.el1 to float

%src1.el2 = extractelement <4 x half> %src1.vec, i64 1
%csrc1.el2 = fpext half %src1.el2 to float
%src2.el2 = extractelement <4 x half> %src2.vec, i64 1
%csrc2.el2 = fpext half %src2.el2 to float

%mul2 = fmul contract float %csrc1.el2, %csrc2.el2
%mul1 = fmul float %csrc1.el1, %csrc2.el1
%acc = load float, ptr addrspace(1) %dst, align 4
%acc1 = fadd contract float %mul2, %acc
%acc2 = fadd contract float %mul1, %acc1
store float %acc2, ptr addrspace(1) %dst, align 4
ret void
}

; GCN-LABEL: {{^}}dotproduct_v4f16
; GFX900: v_mad_mix_f32

; GFX906: v_mad_f32
; GFX906: v_mac_f32_e32

define amdgpu_kernel void @dotproduct_v4f16(ptr addrspace(1) %src1,
ptr addrspace(1) %src2,
ptr addrspace(1) nocapture %dst) {
Expand All @@ -173,18 +280,46 @@ entry:
ret void
}

; GCN-LABEL: {{^}}NotAdotproductContract

; GCN-DL-UNSAFE: v_fma_mix_f32

; GFX906-CONTRACT: v_fma_mix_f32
; GFX906-DENORM-CONTRACT: v_fma_mix_f32
; GFX906-DOT10-DISABLED: v_fma_mix_f32
define amdgpu_kernel void @NotAdotproductContract(ptr addrspace(1) %src1,
ptr addrspace(1) %src2,
ptr addrspace(1) nocapture %dst) {
entry:
%src1.vec = load <2 x half>, ptr addrspace(1) %src1
%src2.vec = load <2 x half>, ptr addrspace(1) %src2

%src1.el1 = extractelement <2 x half> %src1.vec, i64 0
%csrc1.el1 = fpext half %src1.el1 to float
%src2.el1 = extractelement <2 x half> %src2.vec, i64 0
%csrc2.el1 = fpext half %src2.el1 to float

%src1.el2 = extractelement <2 x half> %src1.vec, i64 1
%csrc1.el2 = fpext half %src1.el2 to float
%src2.el2 = extractelement <2 x half> %src2.vec, i64 1
%csrc2.el2 = fpext half %src2.el2 to float

%mul2 = fmul contract float %csrc1.el2, %csrc1.el1
%mul1 = fmul contract float %csrc2.el1, %csrc2.el2
%acc = load float, ptr addrspace(1) %dst, align 4
%acc1 = fadd contract float %mul2, %acc
%acc2 = fadd contract float %mul1, %acc1
store float %acc2, ptr addrspace(1) %dst, align 4
ret void
}

; GCN-LABEL: {{^}}NotAdotproduct
; GFX900: v_mad_mix_f32
; GFX900: v_mad_mix_f32

; GFX906: v_mad_f32
; GFX906: v_mac_f32_e32

; GCN-DL-UNSAFE: v_fma_mix_f32

; GFX906-CONTRACT: v_fma_mix_f32
; GFX906-DENORM-CONTRACT: v_fma_mix_f32
; GFX906-DOT10-DISABLED: v_fma_mix_f32
define amdgpu_kernel void @NotAdotproduct(ptr addrspace(1) %src1,
ptr addrspace(1) %src2,
ptr addrspace(1) nocapture %dst) {
Expand All @@ -211,18 +346,46 @@ entry:
ret void
}

; GCN-LABEL: {{^}}Diff_Idx_NotAdotproductContract

; GCN-DL-UNSAFE: v_fma_mix_f32

; GFX906-CONTRACT: v_fma_mix_f32
; GFX906-DENORM-CONTRACT: v_fma_mix_f32
; GFX906-DOT10-DISABLED: v_fma_mix_f32
define amdgpu_kernel void @Diff_Idx_NotAdotproductContract(ptr addrspace(1) %src1,
ptr addrspace(1) %src2,
ptr addrspace(1) nocapture %dst) {
entry:
%src1.vec = load <2 x half>, ptr addrspace(1) %src1
%src2.vec = load <2 x half>, ptr addrspace(1) %src2

%src1.el1 = extractelement <2 x half> %src1.vec, i64 0
%csrc1.el1 = fpext half %src1.el1 to float
%src2.el1 = extractelement <2 x half> %src2.vec, i64 0
%csrc2.el1 = fpext half %src2.el1 to float

%src1.el2 = extractelement <2 x half> %src1.vec, i64 1
%csrc1.el2 = fpext half %src1.el2 to float
%src2.el2 = extractelement <2 x half> %src2.vec, i64 1
%csrc2.el2 = fpext half %src2.el2 to float

%mul2 = fmul contract float %csrc1.el2, %csrc2.el1
%mul1 = fmul contract float %csrc1.el1, %csrc2.el2
%acc = load float, ptr addrspace(1) %dst, align 4
%acc1 = fadd contract float %mul2, %acc
%acc2 = fadd contract float %mul1, %acc1
store float %acc2, ptr addrspace(1) %dst, align 4
ret void
}

; GCN-LABEL: {{^}}Diff_Idx_NotAdotproduct
; GFX900: v_mad_mix_f32
; GFX900: v_mad_mix_f32

; GFX906: v_mad_f32
; GFX906: v_mac_f32_e32

; GCN-DL-UNSAFE: v_fma_mix_f32

; GFX906-CONTRACT: v_fma_mix_f32
; GFX906-DENORM-CONTRACT: v_fma_mix_f32
; GFX906-DOT10-DISABLED: v_fma_mix_f32
define amdgpu_kernel void @Diff_Idx_NotAdotproduct(ptr addrspace(1) %src1,
ptr addrspace(1) %src2,
ptr addrspace(1) nocapture %dst) {
Expand Down
Loading
Loading