Optimize performance of SIMD binary operations via polymorphic builtins #26699

nvzqz · 2019-08-16T21:13:51Z

These changes allow for expressing the vector semantics of our SIMD{n}<T> types directly to LLVM. This is done via polymorphic builtins defined by @gottesmm. These builtins are only called for stdlib types on the condition of Swift._isConcrete (calls Builtin.isConcrete), which is introduced in #26466.

This allows LLVM to generate efficient SIMD code in Debug builds, which may result in up to a 120x performance improvement for some operations such as addition. This also results in some performance improvements for Release builds since we no longer rely on the optimizer to auto-vectorize the loop code.

The public-facing API is only changed through underscored operations on protocols like SIMDStorage and introduced underscored types such as _SIMDNever and _SIMDGenericNever<T>.

gottesmm · 2019-08-16T21:17:53Z

Make sure you munge the stuff (i.e. get rid of IRGenPrepare/etc) before the review starts.

nvzqz · 2019-08-16T21:26:58Z

@gottesmm Can you elaborate why the IRGenPrepare stuff should go into IRGen directly?

gottesmm · 2019-08-16T21:39:03Z

Is there any reason that it /shouldn't/ be in IRGen?

stephentyrone · 2019-08-16T23:06:57Z

include/swift/AST/Builtins.def

 #ifndef BUILTIN_BINARY_OPERATION
-#define BUILTIN_BINARY_OPERATION(Id, Name, Attrs, Overload) \
-          BUILTIN(Id, Name, Attrs)
+#define BUILTIN_BINARY_OPERATION(Id, Name, Attrs) BUILTIN(Id, Name, Attrs)


Can't we just get rid of BUILTIN_BINARY_OPERATION and use BUILTIN instead, then?

No. Sometimes you want to #define something just for Builtin Binary Operations and not all builtins.

…TIONs. TLDR: This patch introduces a new kind of builtin, "a polymorphic builtin". One calls it like any other builtin, e.x.: ``` Builtin.generic_add(x, y) ``` but it has a contract: eventually At constant propagation time, the optimizer attempts to specialize the generic_add to the conc DISCUSSION ---------- Today there are polymorphic like instructions in LLVM-IR. Yet, at the swift and SIL level we represent these operations instead as Builtins whose names are resolved by splatting the builtin into the name. For example, adding two things in LLVM: ``` %2 = add i64 %0, %1 %2 = add <2 x i64> %0, %1 %2 = add <4 x i64> %0, %1 %2 = add <8 x i64> %0, %1 ``` Each of the add operations are done by the same polymorphic instruction. In constrast, we splat out these Builtins in swift today, i.e.: ``` let x, y: Builtin.Int32 Builtin.add_Int32(x, y) let x, y: Builtin.Vec2xInt32 Builtin.add_Vec2xInt32(x, y) ... ``` In SIL, we translate these verbatim and then IRGen just lowers them to the appropriate polymorphic instruction. Beyond being verbose, these prevent these Builtins (which need static types) from being used in polymorphic contexts. These operations in Swift look like: Builtin.add_Vec2

…ns into traps in IRGenPrepare.

…pecialize polymorphic builtins as it inlines. The reason why I am doing this is that today, the builtin concrete type specialization happens in DiagnosticConstantPropagation. This is not because of any super reason, it is just a peephole optimizer where we do stuff sort of like this (and emit diagnostics). So since we are emitting diagnostics, it makes sense to just plugin there. Sadly, this is actually /after/ predictable memory access optimizations. This means that if (without any loss of generality) we transform a generic_add to an add_Vec4xInt32 and load/store before/after the args/results of the builtin, we get unnecessary temporaries. In contrast, by teaching the SILCloner how to specialize polymorphic builtins, the specialization occurs in Mandatory Inlining, before both Predictable Memory Access Operations and DiagnosticConstantPropagation. This means that we will have a chance to eliminate any temporary stack slots improving -Onone codegen.

If the SIMD type is known to have an inner vector representation that is known by LLVM, a fast path is used to call into a polymorphic builtin operation.

shahmishal · 2020-10-01T06:59:51Z

Please update the base branch to main by Oct 5th otherwise the pull request will be closed automatically.

How to change the base branch: (Link)
More detail about the branch update: (Link)

nvzqz requested review from gottesmm, stephentyrone and airspeedswift August 16, 2019 21:13

stephentyrone reviewed Aug 16, 2019

View reviewed changes

nvzqz force-pushed the simd_binary_ops branch from 1a6c879 to eeb4354 Compare August 20, 2019 21:09

gottesmm and others added 4 commits August 29, 2019 15:05

[irgenprepare] Teach IRGenPrepare how to transform polymorphic builti…

d003bf4

…ns into traps in IRGenPrepare.

Call intrinsics in SIMD binary operations

eb6f889

If the SIMD type is known to have an inner vector representation that is known by LLVM, a fast path is used to call into a polymorphic builtin operation.

nvzqz force-pushed the simd_binary_ops branch 3 times, most recently from 1ff040a to eb6f889 Compare August 30, 2019 20:53

shahmishal closed this Oct 5, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimize performance of SIMD binary operations via polymorphic builtins #26699

Optimize performance of SIMD binary operations via polymorphic builtins #26699

Uh oh!

nvzqz commented Aug 16, 2019

Uh oh!

gottesmm commented Aug 16, 2019

Uh oh!

nvzqz commented Aug 16, 2019

Uh oh!

gottesmm commented Aug 16, 2019

Uh oh!

stephentyrone Aug 16, 2019

Uh oh!

gottesmm Aug 17, 2019

Uh oh!

shahmishal commented Oct 1, 2020

Uh oh!

Uh oh!

Optimize performance of SIMD binary operations via polymorphic builtins #26699

Optimize performance of SIMD binary operations via polymorphic builtins #26699

Uh oh!

Conversation

nvzqz commented Aug 16, 2019

Uh oh!

gottesmm commented Aug 16, 2019

Uh oh!

nvzqz commented Aug 16, 2019

Uh oh!

gottesmm commented Aug 16, 2019

Uh oh!

stephentyrone Aug 16, 2019

Choose a reason for hiding this comment

Uh oh!

gottesmm Aug 17, 2019

Choose a reason for hiding this comment

Uh oh!

shahmishal commented Oct 1, 2020

Uh oh!

Uh oh!