-
Notifications
You must be signed in to change notification settings - Fork 607
Cadence fusiong3 operators m2 #7490
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cadence fusiong3 operators m2 #7490
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/7490
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New FailureAs of commit a0bdd97 with merge base a29dc49 ( NEW FAILURE - The following job has failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
bc5b6f0
to
c77741a
Compare
using ::executorch::aten::Scalar; | ||
using ::executorch::aten::ScalarType; | ||
using ::executorch::aten::Tensor; | ||
using ::executorch::runtime::canCast; | ||
using ::executorch::runtime::Error; | ||
using ::executorch::runtime::KernelRuntimeContext; | ||
using exec_aten::Scalar; | ||
using exec_aten::ScalarType; | ||
using exec_aten::Tensor; | ||
using executorch::runtime::canCast; | ||
using torch::executor::Error; | ||
using torch::executor::KernelRuntimeContext; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to retain the full ones, I'll let @hsharma35 confirm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As part of M1 release, we have used "exec_aten::Scalar" similar to what is available in HiFi. Later, I think, Zonglin in his PR has changed this. We tried to retain this. @hsharma35 and @zonglinpeng please let me know the correct name space to use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Please use the latest namespace/headers similar to op_add.cpp in master.
The header/namespaces are using executorch c++ style guide
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hsharma35 - Updated all the operators similar to op_add.cpp in master
using ::executorch::aten::ScalarType; | ||
using ::executorch::aten::Tensor; | ||
using ::executorch::runtime::Error; | ||
using ::executorch::runtime::KernelRuntimeContext; | ||
using exec_aten::Scalar; | ||
using exec_aten::ScalarType; | ||
using exec_aten::Tensor; | ||
using torch::executor::Error; | ||
using torch::executor::KernelRuntimeContext; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
using ::executorch::aten::Scalar; | ||
using ::executorch::aten::ScalarType; | ||
using ::executorch::aten::Tensor; | ||
using ::executorch::runtime::Error; | ||
using ::executorch::runtime::KernelRuntimeContext; | ||
using exec_aten::Scalar; | ||
using exec_aten::ScalarType; | ||
using exec_aten::Tensor; | ||
using torch::executor::Error; | ||
using torch::executor::KernelRuntimeContext; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
won't put it on the other files, but same here and all new files I guess
@@ -30,6 +29,15 @@ namespace impl { | |||
namespace G3 { | |||
namespace native { | |||
|
|||
#define XT_KERNEL_CHECK(ctx, out, kernel, ...) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this defined so many times? We should be able to at least move it to a share location?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR creates a separate header for the XT_KERNEL_CHECK macro. Maybe rebase on top of this and include the common header instead? We can cleanup in a different PR too.
#7516
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change done.
@zonglinpeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Summary: Added new operators sub, div, exp, permute, slice, mean in backends/cadence/fusion_g3 For cycle reduction, disabled error checks in operators using macro "OPT_ARG_CHECK" Pull Request resolved: pytorch#7490 Differential Revision: D67870337 Pulled By: zonglinpeng
using exec_aten::Scalar; | ||
using exec_aten::ScalarType; | ||
using exec_aten::Tensor; | ||
using executorch::runtime::canCast; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we please undo the changes in headers / using declarations?
(I think this is likely due to a bad rebase)
@@ -30,6 +29,15 @@ namespace impl { | |||
namespace G3 { | |||
namespace native { | |||
|
|||
#define XT_KERNEL_CHECK(ctx, out, kernel, ...) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR creates a separate header for the XT_KERNEL_CHECK macro. Maybe rebase on top of this and include the common header instead? We can cleanup in a different PR too.
#7516
using Tensor = exec_aten::Tensor; | ||
using ScalarType = exec_aten::ScalarType; | ||
using IntArrayRef = exec_aten::ArrayRef<int64_t>; | ||
using torch::executor::Error; | ||
using torch::executor::KernelRuntimeContext; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please undo the namespace and header changes.
Summary: Added new operators sub, div, exp, permute, slice, mean in backends/cadence/fusion_g3 For cycle reduction, disabled error checks in operators using macro "OPT_ARG_CHECK" Pull Request resolved: pytorch#7490 Differential Revision: D67870337 Pulled By: zonglinpeng
…, dequantize, cat, layer norm, softmax to backends/cadence folder. Added operators to backends/cadence folder
Signed-off-by: [email protected] <[email protected]>
…ument checks in the operators using macro
…n to support integer inputs and float output
…ckends/cadence/fusion_g3/operators to solve a comilation error
…in the operators at backends\cadence\fusion_f3\operators folder
897987e
to
b626b7f
Compare
@zonglinpeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
…oc in mean operator
@@ -461,7 +534,7 @@ void dequantize_impl( | |||
break; | |||
switch (input.scalar_type()) { | |||
ET_FORALL_INT_TYPES(SYM_CALCULATE_INT_TYPE_CHANNEL); | |||
SYM_CALCULATE_INT_TYPE_CHANNEL(uint16_t, UInt16); | |||
SYM_CALCULATE_INT_TYPE_CHANNEL(uint16_t, Bits16); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason why it has to be Bits16 instead of UInt16? Bits16 is deprecated on ourside
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in Executorch 0.4.0, Bit16 was used and in the later versions both are used though they both are unsigned shorts. So, we have replaced UInt16 with Bit16 assuming that Bits16 is the latest. We will change this and update the PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated Bits16 with UInt16
} | ||
|
||
int scratch_size = 1; | ||
for (int i = 0; i < num_inp_dims; i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the scratch size equal to the number of elements in the input? If so, use in.numel()
.
Can this scratch size be reduced?
Inside nnlib, the size expected of this buffer is ((inp_length / inp_shape_max) * sizeof(WORD32))
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reduced the memory size
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, apart from some minor comments.
Feel free to ignore the comments with "Nit"
using Tensor = ::executorch::aten::Tensor; | ||
using ScalarType = ::executorch::aten::ScalarType; | ||
using IntArrayRef = ::executorch::aten::ArrayRef<int64_t>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please undo this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -32,8 +35,8 @@ template <typename CTYPE> | |||
void layer_norm( | |||
const Tensor& input, | |||
IntArrayRef normalized_shape, | |||
const exec_aten::optional<Tensor>& weight, | |||
const exec_aten::optional<Tensor>& bias, | |||
const ::executorch::aten::optional<Tensor>& weight, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: you can add using ::executorch::aten::optional
at the top of this file and use optional
directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
#include <executorch/runtime/kernel/kernel_includes.h> | ||
|
||
using ::executorch::runtime::KernelRuntimeContext; | ||
using SizesType = ::executorch::aten::SizesType; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use ::executorch::aten::SizesType
instead of using SizesType = ::executorch::aten::SizesType
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
||
using ::executorch::runtime::KernelRuntimeContext; | ||
using SizesType = ::executorch::aten::SizesType; | ||
using Tensor = ::executorch::aten::Tensor; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as SizesType
using ::executorch::aten::Tensor;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
using ::executorch::runtime::KernelRuntimeContext; | ||
using SizesType = ::executorch::aten::SizesType; | ||
using Tensor = ::executorch::aten::Tensor; | ||
using IntArrayRef = ::executorch::aten::ArrayRef<int64_t>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using::executorch::aten::IntArrayRef
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
using ::executorch::aten::ScalarType; | ||
using ::executorch::aten::Tensor; | ||
using ::executorch::runtime::Error; | ||
using torch::executor::RuntimeContext; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: using ::executorch::runtime::KernelRuntimeContext;
instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -532,7 +575,7 @@ void quantize_impl( | |||
case ScalarType::in_dtype: \ | |||
switch (out.scalar_type()) { \ | |||
ET_FORALL_INT_TYPES_WITH(CTYPE_IN, SYM_QUANTIZE_IMPL_CHANNEL); \ | |||
SYM_QUANTIZE_IMPL_CHANNEL(CTYPE_IN, uint16_t, UInt16) \ | |||
SYM_QUANTIZE_IMPL_CHANNEL(CTYPE_IN, uint16_t, Bits16) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zonglinpeng should this be UInt16 instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I have a follow up patch for this as well
if (out.scalar_type() == ScalarType::Int) { | ||
XT_KERNEL_CHECK( | ||
ctx, | ||
out, | ||
xa_nn_slice, | ||
out_data, | ||
out_shape, | ||
inp_data, | ||
inp_shape, | ||
in.dim(), | ||
(int)start, | ||
(int)(end - 1), | ||
(int)step, | ||
(int)dim, | ||
sizeof(int)); | ||
} else if (out.scalar_type() == ScalarType::Short) { | ||
XT_KERNEL_CHECK( | ||
ctx, | ||
out, | ||
xa_nn_slice, | ||
out_data, | ||
out_shape, | ||
inp_data, | ||
inp_shape, | ||
in.dim(), | ||
(int)start, | ||
(int)(end - 1), | ||
(int)step, | ||
(int)dim, | ||
sizeof(short)); | ||
} else if (out.scalar_type() == ScalarType::Char) { | ||
XT_KERNEL_CHECK( | ||
ctx, | ||
out, | ||
xa_nn_slice, | ||
out_data, | ||
out_shape, | ||
inp_data, | ||
inp_shape, | ||
in.dim(), | ||
(int)start, | ||
(int)(end - 1), | ||
(int)step, | ||
(int)dim, | ||
sizeof(char)); | ||
|
||
} else if (out.scalar_type() == (ScalarType)Uint) { | ||
XT_KERNEL_CHECK( | ||
ctx, | ||
out, | ||
xa_nn_slice, | ||
out_data, | ||
out_shape, | ||
inp_data, | ||
inp_shape, | ||
in.dim(), | ||
(int)start, | ||
(int)(end - 1), | ||
(int)step, | ||
(int)dim, | ||
sizeof(int)); | ||
} else if (out.scalar_type() == (ScalarType)Ushort) { | ||
XT_KERNEL_CHECK( | ||
ctx, | ||
out, | ||
xa_nn_slice, | ||
out_data, | ||
out_shape, | ||
inp_data, | ||
inp_shape, | ||
in.dim(), | ||
(int)start, | ||
(int)(end - 1), | ||
(int)step, | ||
(int)dim, | ||
sizeof(short)); | ||
} else if (out.scalar_type() == ScalarType::Byte) { | ||
XT_KERNEL_CHECK( | ||
ctx, | ||
out, | ||
xa_nn_slice, | ||
out_data, | ||
out_shape, | ||
inp_data, | ||
inp_shape, | ||
in.dim(), | ||
(int)start, | ||
(int)(end - 1), | ||
(int)step, | ||
(int)dim, | ||
sizeof(char)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These calls to xa_nn_slice differ only in the last argument. Let's create a separate function to get size based on out.scalar_type()
and pass the result to xa_nn_slice
call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Similar change is applied in cat and permute operators aswell.
Everything else looks good to me beside the unresolved comments/cleanup above. As soon as resolved I will trigger the merge @ckmadhira thx! |
…and dequantize operators. Reduced the size of scratch memory in mean operator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for addressing the comments.
@zonglinpeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Differential Revision: D67870337 Pull Request resolved: #7490
Summary
Added new operators sub, div, exp, permute, slice, mean in backends/cadence/fusion_g3
For cycle reduction, disabled error checks in operators using macro "OPT_ARG_CHECK"