Skip to content

[BPF] Use mul for certain div/mod operations #110712

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Oct 1, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion llvm/lib/Target/BPF/BPFISelLowering.h
Original file line number Diff line number Diff line change
Expand Up @@ -118,7 +118,9 @@ class BPFTargetLowering : public TargetLowering {
return Op.size() >= 8 ? MVT::i64 : MVT::i32;
}

bool isIntDivCheap(EVT VT, AttributeList Attr) const override { return true; }
bool isIntDivCheap(EVT VT, AttributeList Attr) const override {
return false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to keep this "cheap" when cpuv4 is enabled.
E.g. as far as I understand for arm64 BPF backend division/modulo operations are translated as single instructions.
Thus, with the following patch on top allow-diff-for-cpuv4.txt:

$ cat t1.c
struct S {
  int var[3];
};
int foo1 (struct S *a, struct S *b)
{
    return a - b;
}

$ clang --target=bpf -mcpu=v3 -O2 -S t1.c -o -
        ...
foo1:                                   # @foo1
# %bb.0:                                # %entry
	r0 = r1
	r0 -= r2
	r0 >>= 2
	w0 *= -1431655765
	exit
        ...

$ clang --target=bpf -mcpu=v4 -O2 -S t1.c -o -
        ...
foo1:                                   # @foo1
# %bb.0:                                # %entry
	r0 = r1
	r0 -= r2
	r0 s/= 12
	exit
        ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eddyz87 From @4ast the point is that even a single divide is expensive than multiplication/shift/... etc. The number of insns is not a criteria here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to keep this "cheap" when cpuv4 is enabled.

No. Integer div is the slowest operation in pretty much every cpu architecture.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I see both x86 and arm LLVM backends define this as expensive.
I somehow understood this as a knob whether to generate sdiv at all, while this is only an optimization knob.

However, the commit description says:

The reason is that sdiv/smod is only supported at -mcpu=v4. At cpu v1/v2/v3, only udiv/umod is supported.
...
Basically sdiv can be replaced with sub, shr and imul.

Do we also need to add some generic lowering for cpuv{1,2,3}?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we also need to add some generic lowering for cpuv{1,2,3}?

you mean to lower sdiv into udiv plus conditionals ?
It's doable, but let's do it in the separate patch and it's not a high priority imo.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we also need to add some generic lowering for cpuv{1,2,3}?

you mean to lower sdiv into udiv plus conditionals ?

Yes

It's doable, but let's do it in the separate patch and it's not a high priority imo.

If we are sure that optimization pattern that fires in this case would cover all other uses, then yes, lower priority.

}

bool shouldConvertConstantLoadToIntImm(const APInt &Imm,
Type *Ty) const override {
Expand Down
4 changes: 2 additions & 2 deletions llvm/test/CodeGen/BPF/32-bit-subreg-alu.ll
Original file line number Diff line number Diff line change
Expand Up @@ -200,7 +200,7 @@ entry:
define dso_local i32 @div_i(i32 %a) local_unnamed_addr #0 {
entry:
%div = udiv i32 %a, 15
; CHECK: w{{[0-9]+}} /= 15
; CHECK-NOT: w{{[0-9]+}} /= 15
ret i32 %div
}

Expand All @@ -216,7 +216,7 @@ entry:
define dso_local i32 @rem_i(i32 %a) local_unnamed_addr #0 {
entry:
%rem = urem i32 %a, 15
; CHECK: w{{[0-9]+}} %= 15
; CHECK-NOT: w{{[0-9]+}} %= 15
ret i32 %rem
}

Expand Down
6 changes: 3 additions & 3 deletions llvm/test/CodeGen/BPF/sdiv_error.ll
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
; RUN: FileCheck %s < %t1
; CHECK: unsupported signed division

define i32 @test(i32 %len) {
%1 = sdiv i32 %len, 15
ret i32 %1
define i64 @test(i64 %len) {
%1 = sdiv i64 %len, 15
ret i64 %1
}
57 changes: 57 additions & 0 deletions llvm/test/CodeGen/BPF/sdiv_to_mul.ll
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
; RUN: llc -march=bpfel -mcpu=v1 < %s | FileCheck --check-prefix=CHECK-V1 %s
; RUN: llc -march=bpfel -mcpu=v4 < %s | FileCheck --check-prefix=CHECK-V4 %s

target triple = "bpf"

; struct S {
; int var[3];
; };
; int foo1 (struct S *a, struct S *b)
; {
; return a - b;
; }
define dso_local i32 @foo1(ptr noundef %a, ptr noundef %b) local_unnamed_addr {
entry:
%sub.ptr.lhs.cast = ptrtoint ptr %a to i64
%sub.ptr.rhs.cast = ptrtoint ptr %b to i64
%sub.ptr.sub = sub i64 %sub.ptr.lhs.cast, %sub.ptr.rhs.cast
%sub.ptr.div = sdiv exact i64 %sub.ptr.sub, 12
%conv = trunc i64 %sub.ptr.div to i32
ret i32 %conv
}
; CHECK-V1: r0 = r1
; CHECK-V1: r0 -= r2
; CHECK-V1: r0 s>>= 2
; CHECK-V1: r1 = -6148914691236517205 ll
; CHECK-V1: r0 *= r1
; CHECK-V1: exit

; CHECK-V4: r0 = r1
; CHECK-V4: r0 -= r2
; CHECK-V4: r0 >>= 2
; CHECK-V4: w0 *= -1431655765
; CHECK-V4: exit

define dso_local noundef range(i32 -143165576, 143165577) i32 @foo2(i32 noundef %a) local_unnamed_addr {
entry:
%div = sdiv i32 %a, 15
ret i32 %div
}
; CHECK-V1-NOT: r[[#]] s/= 15
; CHECK-V4-NOT: w[[#]] s/= 15

define dso_local noundef range(i32 -14, 15) i32 @foo3(i32 noundef %a) local_unnamed_addr {
entry:
%rem = srem i32 %a, 15
ret i32 %rem
}
; CHECK-V1-NOT: r[[#]] s%= 15
; CHECK-V4-NOT: w[[#]] s%= 15

define dso_local i64 @foo4(i64 noundef %a) local_unnamed_addr {
entry:
%div = udiv exact i64 %a, 15
ret i64 %div
}
; CHECK-V1-NOT: r[[#]] /= 15
; CHECK-V4-NOT: w[[#]] /= 15
6 changes: 3 additions & 3 deletions llvm/test/CodeGen/BPF/srem_error.ll
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
; RUN: FileCheck %s < %t1
; CHECK: unsupported signed division

define i32 @test(i32 %len) {
%1 = srem i32 %len, 15
ret i32 %1
define i64 @test(i64 %len) {
%1 = srem i64 %len, 15
ret i64 %1
}
Loading