Skip to content

[LV][LAA] Vectorize math lib calls with mem write-only attribute #78432

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 8 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 25 additions & 0 deletions llvm/lib/Analysis/LoopAccessAnalysis.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2381,6 +2381,22 @@ bool LoopAccessInfo::canAnalyzeLoop() {
return true;
}

/// Returns whether \p I is a known math library call that has attribute
/// 'memory(argmem: write)' set.
static bool isMathLibCallMemWriteOnly(const TargetLibraryInfo *TLI,
const Instruction &I) {
auto *Call = dyn_cast<CallInst>(&I);
if (!Call)
return false;

auto ME = Call->getMemoryEffects();
LibFunc Func;
TLI->getLibFunc(*Call, Func);
return ME.onlyWritesMemory() && ME.onlyAccessesArgPointees() &&
(Func == LibFunc::LibFunc_modf || Func == LibFunc::LibFunc_modff ||
Func == LibFunc::LibFunc_frexp || Func == LibFunc::LibFunc_frexpf);
}

void LoopAccessInfo::analyzeLoop(AAResults *AA, LoopInfo *LI,
const TargetLibraryInfo *TLI,
DominatorTree *DT) {
Expand Down Expand Up @@ -2477,6 +2493,15 @@ void LoopAccessInfo::analyzeLoop(AAResults *AA, LoopInfo *LI,

// Save 'store' instructions. Abort if other instructions write to memory.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment out of date

if (I.mayWriteToMemory()) {
// We can safety handle math functions that have vectorized
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: safely

// counterparts and have the memory write-only attribute set.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure the reasoning here is correct; the writes could still alias with other accesses,

e.g. something like below would be considered as safe, but %out2 could overlap with %in in a way that results in a backwards dependence at runtime.

define void @frexp_f64(ptr %in, ptr %out1, ptr %out2, i32 %N) {
entry:
  %cmp4 = icmp sgt i32 %N, 0
  br i1 %cmp4, label %for.body.preheader, label %for.cond.cleanup

for.body.preheader:
  %wide.trip.count = zext nneg i32 %N to i64
  br label %for.body

for.cond.cleanup:
  ret void

for.body:
  %indvars.iv = phi i64 [ 0, %for.body.preheader ], [ %indvars.iv.next, %for.body ]
  %arrayidx = getelementptr inbounds double, ptr %in, i64 %indvars.iv
  %0 = load double, ptr %arrayidx, align 8
  %add.ptr = getelementptr inbounds i32, ptr %out2, i64 %indvars.iv
  %call = tail call double @frexp(double noundef %0, ptr noundef %add.ptr)
  store double %call, ptr %arrayidx, align 8
  %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
  %exitcond.not = icmp eq i64 %indvars.iv.next, %wide.trip.count
  br i1 %exitcond.not, label %for.cond.cleanup, label %for.body
}

declare double @frexp(double, ptr) #1
attributes #1 = { memory(argmem: write) }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @fhahn,
you are right.

The whole logic was based on checking later in the

void LoopVectorizationCostModel::setVectorizedCallDecision(ElementCount VF) {
if the pointers are linear with the right step, but there is no checks for aliasing.
@paschalis-mpeis that means that there is an existing bug with sincos, as following code gets vectorised, without any checks:

double sin[N];
double cos[N];
sin[5] = M_PI;

for(int i=0; i<N; i++ ) {
 sincos(sin[5], sin+i, cos+i);
}

if (isMathLibCallMemWriteOnly(TLI, I)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the read case we also check for !VFDatabase::getMappings(*Call).empty(), but i think it should be enough that LV checks for it, that allows also to test your code in a target agnostic way, and without mappings for frexp

LLVM_DEBUG(dbgs() << "LAA: Allow to vectorize math function with "
"write-only attribute:"
<< I << "\n");
continue;
}

auto *St = dyn_cast<StoreInst>(&I);
if (!St) {
recordAnalysis("CantVectorizeInstruction", St)
Expand Down
116 changes: 116 additions & 0 deletions llvm/test/Analysis/LoopAccessAnalysis/attr-mem-write-only.ll
Original file line number Diff line number Diff line change
@@ -0,0 +1,116 @@
; RUN: opt < %s -passes='print<access-info>' -debug-only=loop-accesses -disable-output 2>&1 | FileCheck %s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also check the access-info report.


; REQUIRES: asserts


define void @frexp_f64(ptr %in, ptr %out1, ptr %out2, i32 %N) {
; CHECK: LAA: Allow to vectorize math function with write-only attribute: %call = tail call double @frexp
entry:
%cmp4 = icmp sgt i32 %N, 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this shouldn't be needed, please try to simplify the test cases

br i1 %cmp4, label %for.body.preheader, label %for.cond.cleanup

for.body.preheader:
%wide.trip.count = zext nneg i32 %N to i64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this shouldn't be needed, please try to simplify the test cases

br label %for.body

for.cond.cleanup:
ret void

for.body:
%indvars.iv = phi i64 [ 0, %for.body.preheader ], [ %indvars.iv.next, %for.body ]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please strip %indvars. prefix to keep value names more concise

%arrayidx = getelementptr inbounds double, ptr %in, i64 %indvars.iv
%0 = load double, ptr %arrayidx, align 8
%add.ptr = getelementptr inbounds i32, ptr %out2, i64 %indvars.iv
%call = tail call double @frexp(double noundef %0, ptr noundef %add.ptr)
store double %call, ptr %out1, align 8
%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
%exitcond.not = icmp eq i64 %indvars.iv.next, %wide.trip.count
br i1 %exitcond.not, label %for.cond.cleanup, label %for.body
}

declare double @frexp(double, ptr) #1

define void @frexp_f32(ptr readonly %in, ptr %out1, ptr %out2, i32 %N) {
; CHECK: LAA: Allow to vectorize math function with write-only attribute: %call = tail call float @frexpf
entry:
%cmp4 = icmp sgt i32 %N, 0
br i1 %cmp4, label %for.body.preheader, label %for.cond.cleanup

for.body.preheader:
%wide.trip.count = zext nneg i32 %N to i64
br label %for.body

for.cond.cleanup:
ret void

for.body:
%indvars.iv = phi i64 [ 0, %for.body.preheader ], [ %indvars.iv.next, %for.body ]
%arrayidx = getelementptr inbounds float, ptr %in, i64 %indvars.iv
%0 = load float, ptr %arrayidx, align 4
%add.ptr = getelementptr inbounds i32, ptr %out2, i64 %indvars.iv
%call = tail call float @frexpf(float noundef %0, ptr noundef %add.ptr)
store float %call, ptr %out1, align 4
%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
%exitcond.not = icmp eq i64 %indvars.iv.next, %wide.trip.count
br i1 %exitcond.not, label %for.cond.cleanup, label %for.body
}

declare float @frexpf(float , ptr) #1

define void @modf_f64(ptr %in, ptr %out1, ptr %out2, i32 %N) {
; CHECK: LAA: Allow to vectorize math function with write-only attribute: %call = tail call double @modf
entry:
%cmp7 = icmp sgt i32 %N, 0
br i1 %cmp7, label %for.body.preheader, label %for.cond.cleanup

for.body.preheader:
%wide.trip.count = zext nneg i32 %N to i64
br label %for.body

for.cond.cleanup:
ret void

for.body:
%indvars.iv = phi i64 [ 0, %for.body.preheader ], [ %indvars.iv.next, %for.body ]
%arrayidx = getelementptr inbounds double, ptr %in, i64 %indvars.iv
%0 = load double, ptr %arrayidx, align 8
%add.ptr = getelementptr inbounds double, ptr %out2, i64 %indvars.iv
%call = tail call double @modf(double noundef %0, ptr noundef %add.ptr)
%arrayidx2 = getelementptr inbounds double, ptr %out1, i64 %indvars.iv
store double %call, ptr %arrayidx2, align 8
%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
%exitcond.not = icmp eq i64 %indvars.iv.next, %wide.trip.count
br i1 %exitcond.not, label %for.cond.cleanup, label %for.body
}

declare double @modf(double , ptr ) #1

define void @modf_f32(ptr %in, ptr %out1, ptr %out2, i32 %N) {
; CHECK: LAA: Allow to vectorize math function with write-only attribute: %call = tail call float @modff
entry:
%cmp7 = icmp sgt i32 %N, 0
br i1 %cmp7, label %for.body.preheader, label %for.cond.cleanup

for.body.preheader:
%wide.trip.count = zext nneg i32 %N to i64
br label %for.body

for.cond.cleanup:
ret void

for.body:
%indvars.iv = phi i64 [ 0, %for.body.preheader ], [ %indvars.iv.next, %for.body ]
%arrayidx = getelementptr inbounds float, ptr %in, i64 %indvars.iv
%0 = load float, ptr %arrayidx, align 4
%add.ptr = getelementptr inbounds float, ptr %out2, i64 %indvars.iv
%call = tail call float @modff(float noundef %0, ptr noundef %add.ptr)
%arrayidx2 = getelementptr inbounds float, ptr %out1, i64 %indvars.iv
store float %call, ptr %arrayidx2, align 4
%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
%exitcond.not = icmp eq i64 %indvars.iv.next, %wide.trip.count
br i1 %exitcond.not, label %for.cond.cleanup, label %for.body
}

declare float @modff(float noundef, ptr nocapture noundef) #1

attributes #1 = { memory(argmem: write) }
Loading
Loading