-
Notifications
You must be signed in to change notification settings - Fork 17
[Transform] Introduce microkernel
dialect optimization passes
#296
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…aixin/microkernel_dialect_opt
return | ||
} | ||
|
||
// CHECK: BRGEMM DONE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you check for the result instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Result correctness check added.
auto tryAddrOfOp = dyn_cast_or_null<LLVM::AddressOfOp>( | ||
tryLoadOp.getOperand().getDefiningOp()); | ||
if (!tryAddrOfOp) | ||
return nullptr; | ||
return traceDispatchInGlobalCtor(module, tryAddrOfOp.getGlobalName()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto tryAddrOfOp = dyn_cast_or_null<LLVM::AddressOfOp>( | |
tryLoadOp.getOperand().getDefiningOp()); | |
if (!tryAddrOfOp) | |
return nullptr; | |
return traceDispatchInGlobalCtor(module, tryAddrOfOp.getGlobalName()); | |
if (auto tryAddrOfOp = dyn_cast_or_null<LLVM::AddressOfOp>( | |
tryLoadOp.getOperand().getDefiningOp())) | |
return traceDispatchInGlobalCtor(module, tryAddrOfOp.getGlobalName()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
if (callee != StringAttr::get(op->getContext(), DNNL_BRGEMM_DISPATCH_NAME)) | ||
return nullptr; | ||
return tryCallOp; | ||
} else if (auto tryLoadOp = dyn_cast_or_null<LLVM::LoadOp>(kernelProducer)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
} else if (auto tryLoadOp = dyn_cast_or_null<LLVM::LoadOp>(kernelProducer)) { | |
} | |
if (auto tryLoadOp = dyn_cast_or_null<LLVM::LoadOp>(kernelProducer)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
lib/gc/Transforms/Microkernel/MergeBranchMicrokernelContext.cpp
Outdated
Show resolved
Hide resolved
lib/gc/Transforms/Microkernel/MergeBranchMicrokernelContext.cpp
Outdated
Show resolved
Hide resolved
lib/gc/Transforms/Microkernel/MergeBranchMicrokernelContext.cpp
Outdated
Show resolved
Hide resolved
…aixin/microkernel_dialect_opt
|
||
#define DEBUG_TYPE "early-dispatch-microkernel" | ||
|
||
static FailureOr<std::string> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using string as the key for global kernel cache might not be the good option when considering the post-op fusion or m_mask stuff in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The string name could be lengthy but I think it's pretty self-explanatory and makes pass independent of compiler's internal state. Consider such a scenario with IR going through following pipeline:
EarlyDispatchMicrokernel -> SomeLoweringPassProducingNewBrgemm -> ConvertLinalgToBrgemm -> EarlyDispatchMicrokernel
If we use global var name as cache key, we can easily dedup between first and second EarlyDispatchMicrokernel
. I think it's hard to implement this if we keep global kernel cache as some compiler's internal state, especially under test/debug scenarios using mlir-opt
where we might run the passes one by one in different spawns of process.
For stuff like post-op fusion and mask, we can add the attr into the name as well, with predefined format, e.g.:
llvm.mlir.global internal @g_mask_1 = xxxx
llvm.mlir.global internal @g_dispatched_microkernel_brgemm_..._mask{g_mask_1}_fusing_relu() ...
lib/gc/Transforms/Microkernel/MicrokernelInvariantCodeMotion.cpp
Outdated
Show resolved
Hide resolved
Please check the failed case: |
correctness check failed has been fixed. |
Tracking #297
This PR introduces following passed to optimize microkernel dialect runtime efficiency:
EarlyDispatchMicrokernel
: Dispatch microkernel during initialization time to reduce runtime cost, and merge identical microkernel dispatch meanwhileInvariantMicrokernelCodeMotion
: Hoist invariant microkernel-related codes to improve performanceMergeBranchMicrokernelContext
: Merge and hoist identical microkernel context codes out of branch if possible, enabling further hoist