[Transform] Introduce `microkernel` dialect optimization passes #296

huanghaixin008 · 2024-08-28T05:14:51Z

Tracking #297

This PR introduces following passed to optimize microkernel dialect runtime efficiency:

EarlyDispatchMicrokernel: Dispatch microkernel during initialization time to reduce runtime cost, and merge identical microkernel dispatch meanwhile
InvariantMicrokernelCodeMotion: Hoist invariant microkernel-related codes to improve performance
MergeBranchMicrokernelContext: Merge and hoist identical microkernel context codes out of branch if possible, enabling further hoist

…aixin/microkernel_dialect_opt

kurapov-peter · 2024-08-28T15:42:46Z

test/mlir/test/gc/cpu-runner/brgemm-simple-for.mlir

+    return
+  }
+
+  // CHECK: BRGEMM DONE


Could you check for the result instead?

Result correctness check added.

kurapov-peter · 2024-08-28T15:50:46Z

lib/gc/Transforms/Microkernel/MergeBranchMicrokernelContext.cpp

+    auto tryAddrOfOp = dyn_cast_or_null<LLVM::AddressOfOp>(
+        tryLoadOp.getOperand().getDefiningOp());
+    if (!tryAddrOfOp)
+      return nullptr;
+    return traceDispatchInGlobalCtor(module, tryAddrOfOp.getGlobalName());


Suggested change

auto tryAddrOfOp = dyn_cast_or_null<LLVM::AddressOfOp>(

tryLoadOp.getOperand().getDefiningOp());

if (!tryAddrOfOp)

return nullptr;

return traceDispatchInGlobalCtor(module, tryAddrOfOp.getGlobalName());

if (auto tryAddrOfOp = dyn_cast_or_null<LLVM::AddressOfOp>(

tryLoadOp.getOperand().getDefiningOp()))

return traceDispatchInGlobalCtor(module, tryAddrOfOp.getGlobalName());

kurapov-peter · 2024-08-28T15:51:38Z

lib/gc/Transforms/Microkernel/MergeBranchMicrokernelContext.cpp

+    if (callee != StringAttr::get(op->getContext(), DNNL_BRGEMM_DISPATCH_NAME))
+      return nullptr;
+    return tryCallOp;
+  } else if (auto tryLoadOp = dyn_cast_or_null<LLVM::LoadOp>(kernelProducer)) {


Suggested change

} else if (auto tryLoadOp = dyn_cast_or_null<LLVM::LoadOp>(kernelProducer)) {

}

if (auto tryLoadOp = dyn_cast_or_null<LLVM::LoadOp>(kernelProducer)) {

test/mlir/test/gc/cpu-runner/brgemm-simple-for.mlir

lib/gc/Transforms/Microkernel/MergeBranchMicrokernelContext.cpp

…aixin/microkernel_dialect_opt

lib/gc/Transforms/Microkernel/EarlyDispatchMicrokernel.cpp

ciyongch · 2024-09-02T02:00:56Z

lib/gc/Transforms/Microkernel/EarlyDispatchMicrokernel.cpp

+
+#define DEBUG_TYPE "early-dispatch-microkernel"
+
+static FailureOr<std::string>


Using string as the key for global kernel cache might not be the good option when considering the post-op fusion or m_mask stuff in the future.

The string name could be lengthy but I think it's pretty self-explanatory and makes pass independent of compiler's internal state. Consider such a scenario with IR going through following pipeline:
EarlyDispatchMicrokernel -> SomeLoweringPassProducingNewBrgemm -> ConvertLinalgToBrgemm -> EarlyDispatchMicrokernel
If we use global var name as cache key, we can easily dedup between first and second EarlyDispatchMicrokernel. I think it's hard to implement this if we keep global kernel cache as some compiler's internal state, especially under test/debug scenarios using mlir-opt where we might run the passes one by one in different spawns of process.

For stuff like post-op fusion and mask, we can add the attr into the name as well, with predefined format, e.g.:
llvm.mlir.global internal @g_mask_1 = xxxx
llvm.mlir.global internal @g_dispatched_microkernel_brgemm_..._mask{g_mask_1}_fusing_relu() ...

lib/gc/Transforms/Microkernel/MicrokernelInvariantCodeMotion.cpp

…aixin/microkernel_dialect_opt

ciyongch · 2024-09-04T08:18:37Z

Please check the failed case:
scripts/correctness.sh: line 24: 5132 Segmentation fault (core dumped) python3 -m benchgc --verbose 0 --driver linalg --case batch_reduce_matmul --md 0:16x512x64xf32 --md 1:16x64x32xf32 --md 2:512x32xf32

huanghaixin008 · 2024-09-04T08:40:32Z

Please check the failed case: scripts/correctness.sh: line 24: 5132 Segmentation fault (core dumped) python3 -m benchgc --verbose 0 --driver linalg --case batch_reduce_matmul --md 0:16x512x64xf32 --md 1:16x64x32xf32 --md 2:512x32xf32

correctness check failed has been fixed.

Huang, Haixin added 30 commits July 24, 2024 20:15

add microkernel dialect

8213a9a

fix licenses

1c43182

fix license check

e522618

fix tidy

88f645a

fix lint

8a7ec98

remove Utils borrowed from TPP

738ba0c

fix CMake

3f57403

fix per comments

e39ba7e

add dialect lowering pass

4acf417

remove irrelavant

6a1260a

refine cmake

1850d60

fix brgemm runtime

5bc44e4

support linalgx::batch_reduce_matmul_vnni

1c69ee6

fix runtime dnnl brgemm correctness

2ce6f4c

fix format

e0e8b94

support pattern with linalg.fill

921b0dc

move brgemm init_tiles to dispatch time

6ec1053

move mlir tests to right place

f014e73

use thread_local for scratch buffer

f586efb

refine memref ptr/offset extraction

c4e4bcf

revert pass change

f51ea4c

fix op preceding check

6ad33cf

fix utils header

a9a683a

accommodate to new utils

619f670

fix licenses

e31a6d3

update clang-tidy workflow

f8100e1

fix tidy

ae3e9f8

fix tidy

c5cbbd3

fix tidy

43b0c28

give teste better names

334be08

Merge branch 'main' of https://github.com/intel/graph-compiler into h…

a2e132e

…aixin/microkernel_dialect_opt

huanghaixin008 self-assigned this Aug 28, 2024

huanghaixin008 requested review from Menooker, kurapov-peter, zhczhong, ZhennanQin and ciyongch August 28, 2024 05:17

huanghaixin008 added the ready to review label Aug 28, 2024

huanghaixin008 linked an issue Aug 28, 2024 that may be closed by this pull request

Introduce microkernel dialect optimization passes #297

Closed

kurapov-peter reviewed Aug 28, 2024

View reviewed changes

zhczhong reviewed Aug 29, 2024

View reviewed changes

Huang, Haixin added 3 commits August 30, 2024 01:38

code & test refinements

e4b96f6

Merge branch 'main' of https://github.com/intel/graph-compiler into h…

9a0e020

…aixin/microkernel_dialect_opt

add microkernel passes to pipeline

eeb8e1f

ciyongch reviewed Sep 2, 2024

View reviewed changes

lib/gc/Transforms/Microkernel/MicrokernelInvariantCodeMotion.cpp Outdated Show resolved Hide resolved

lib/gc/Transforms/Microkernel/MicrokernelInvariantCodeMotion.cpp Show resolved Hide resolved

Huang, Haixin added 4 commits September 3, 2024 01:46

fix per review

4610480

Merge branch 'main' of https://github.com/intel/graph-compiler into h…

9254dbb

…aixin/microkernel_dialect_opt

ignore upstream linalg op with invalid input

a853cd8

add TODO comments

fedd427

fix correctness check

eb94d05

ciyongch approved these changes Sep 5, 2024

View reviewed changes

zhczhong approved these changes Sep 5, 2024

View reviewed changes

huanghaixin008 requested review from ciyongch and zhczhong September 5, 2024 02:59

zhczhong approved these changes Sep 5, 2024

View reviewed changes

ciyongch merged commit bc0014b into main Sep 5, 2024
6 checks passed

lmontigny added this to the 0.1 CPU - General milestone Sep 5, 2024

	} else if (auto tryLoadOp = dyn_cast_or_null<LLVM::LoadOp>(kernelProducer)) {
	}
	if (auto tryLoadOp = dyn_cast_or_null<LLVM::LoadOp>(kernelProducer)) {


		#define DEBUG_TYPE "early-dispatch-microkernel"

		static FailureOr<std::string>

[Transform] Introduce microkernel dialect optimization passes #296

[Transform] Introduce microkernel dialect optimization passes #296

Uh oh!

Conversation

huanghaixin008 commented Aug 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

huanghaixin008 Sep 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ciyongch commented Sep 4, 2024

Uh oh!

huanghaixin008 commented Sep 4, 2024

Uh oh!

Uh oh!

Uh oh!

[Transform] Introduce `microkernel` dialect optimization passes #296

[Transform] Introduce `microkernel` dialect optimization passes #296

huanghaixin008 commented Aug 28, 2024 •

edited

Loading

huanghaixin008 Sep 3, 2024 •

edited

Loading