Skip to content

Cadence fusiong3 operators m2 #7490

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

ckmadhira
Copy link
Contributor

@ckmadhira ckmadhira commented Jan 3, 2025

Summary

Added new operators sub, div, exp, permute, slice, mean in backends/cadence/fusion_g3
For cycle reduction, disabled error checks in operators using macro "OPT_ARG_CHECK"

Copy link

pytorch-bot bot commented Jan 3, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/7490

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit a0bdd97 with merge base a29dc49 (image):

NEW FAILURE - The following job has failed:

  • pull / unittest-arm / linux-job (gh)
    RuntimeError: Command docker exec -t 4c981cca6bb6235830bc128f4b9daf20a60211052f039f89622c6d860e12bd39 /exec failed with exit code 127

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 3, 2025
@ckmadhira ckmadhira force-pushed the cadence_fusiong3_operators_M2 branch from bc5b6f0 to c77741a Compare January 3, 2025 11:03
Comment on lines 19 to 26
using ::executorch::aten::Scalar;
using ::executorch::aten::ScalarType;
using ::executorch::aten::Tensor;
using ::executorch::runtime::canCast;
using ::executorch::runtime::Error;
using ::executorch::runtime::KernelRuntimeContext;
using exec_aten::Scalar;
using exec_aten::ScalarType;
using exec_aten::Tensor;
using executorch::runtime::canCast;
using torch::executor::Error;
using torch::executor::KernelRuntimeContext;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to retain the full ones, I'll let @hsharma35 confirm

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As part of M1 release, we have used "exec_aten::Scalar" similar to what is available in HiFi. Later, I think, Zonglin in his PR has changed this. We tried to retain this. @hsharma35 and @zonglinpeng please let me know the correct name space to use.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Please use the latest namespace/headers similar to op_add.cpp in master.
The header/namespaces are using executorch c++ style guide

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hsharma35 - Updated all the operators similar to op_add.cpp in master

Comment on lines 16 to 19
using ::executorch::aten::ScalarType;
using ::executorch::aten::Tensor;
using ::executorch::runtime::Error;
using ::executorch::runtime::KernelRuntimeContext;
using exec_aten::Scalar;
using exec_aten::ScalarType;
using exec_aten::Tensor;
using torch::executor::Error;
using torch::executor::KernelRuntimeContext;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

Comment on lines 18 to 21
using ::executorch::aten::Scalar;
using ::executorch::aten::ScalarType;
using ::executorch::aten::Tensor;
using ::executorch::runtime::Error;
using ::executorch::runtime::KernelRuntimeContext;
using exec_aten::Scalar;
using exec_aten::ScalarType;
using exec_aten::Tensor;
using torch::executor::Error;
using torch::executor::KernelRuntimeContext;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

won't put it on the other files, but same here and all new files I guess

@@ -30,6 +29,15 @@ namespace impl {
namespace G3 {
namespace native {

#define XT_KERNEL_CHECK(ctx, out, kernel, ...) \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this defined so many times? We should be able to at least move it to a share location?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR creates a separate header for the XT_KERNEL_CHECK macro. Maybe rebase on top of this and include the common header instead? We can cleanup in a different PR too.
#7516

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change done.

@facebook-github-bot
Copy link
Contributor

@zonglinpeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

zonglinpeng pushed a commit to zonglinpeng/executorch that referenced this pull request Jan 6, 2025
Summary:
Added new operators sub, div, exp, permute, slice, mean in backends/cadence/fusion_g3
For cycle reduction, disabled error checks in operators using macro "OPT_ARG_CHECK"

Pull Request resolved: pytorch#7490

Differential Revision: D67870337

Pulled By: zonglinpeng
using exec_aten::Scalar;
using exec_aten::ScalarType;
using exec_aten::Tensor;
using executorch::runtime::canCast;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we please undo the changes in headers / using declarations?
(I think this is likely due to a bad rebase)

@@ -30,6 +29,15 @@ namespace impl {
namespace G3 {
namespace native {

#define XT_KERNEL_CHECK(ctx, out, kernel, ...) \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR creates a separate header for the XT_KERNEL_CHECK macro. Maybe rebase on top of this and include the common header instead? We can cleanup in a different PR too.
#7516

Comment on lines 17 to 21
using Tensor = exec_aten::Tensor;
using ScalarType = exec_aten::ScalarType;
using IntArrayRef = exec_aten::ArrayRef<int64_t>;
using torch::executor::Error;
using torch::executor::KernelRuntimeContext;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please undo the namespace and header changes.

zonglinpeng pushed a commit to zonglinpeng/executorch that referenced this pull request Jan 7, 2025
Summary:
Added new operators sub, div, exp, permute, slice, mean in backends/cadence/fusion_g3
For cycle reduction, disabled error checks in operators using macro "OPT_ARG_CHECK"

Pull Request resolved: pytorch#7490

Differential Revision: D67870337

Pulled By: zonglinpeng
@ckmadhira ckmadhira force-pushed the cadence_fusiong3_operators_M2 branch from 897987e to b626b7f Compare January 7, 2025 06:41
@facebook-github-bot
Copy link
Contributor

@zonglinpeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@@ -461,7 +534,7 @@ void dequantize_impl(
break;
switch (input.scalar_type()) {
ET_FORALL_INT_TYPES(SYM_CALCULATE_INT_TYPE_CHANNEL);
SYM_CALCULATE_INT_TYPE_CHANNEL(uint16_t, UInt16);
SYM_CALCULATE_INT_TYPE_CHANNEL(uint16_t, Bits16);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why it has to be Bits16 instead of UInt16? Bits16 is deprecated on ourside

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in Executorch 0.4.0, Bit16 was used and in the later versions both are used though they both are unsigned shorts. So, we have replaced UInt16 with Bit16 assuming that Bits16 is the latest. We will change this and update the PR

Copy link
Contributor

@zonglinpeng zonglinpeng Jan 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated Bits16 with UInt16

}

int scratch_size = 1;
for (int i = 0; i < num_inp_dims; i++) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the scratch size equal to the number of elements in the input? If so, use in.numel().
Can this scratch size be reduced?
Inside nnlib, the size expected of this buffer is ((inp_length / inp_shape_max) * sizeof(WORD32)).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reduced the memory size

Copy link
Contributor

@hsharma35 hsharma35 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, apart from some minor comments.
Feel free to ignore the comments with "Nit"

Comment on lines 21 to 23
using Tensor = ::executorch::aten::Tensor;
using ScalarType = ::executorch::aten::ScalarType;
using IntArrayRef = ::executorch::aten::ArrayRef<int64_t>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please undo this change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -32,8 +35,8 @@ template <typename CTYPE>
void layer_norm(
const Tensor& input,
IntArrayRef normalized_shape,
const exec_aten::optional<Tensor>& weight,
const exec_aten::optional<Tensor>& bias,
const ::executorch::aten::optional<Tensor>& weight,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: you can add using ::executorch::aten::optional at the top of this file and use optional directly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

#include <executorch/runtime/kernel/kernel_includes.h>

using ::executorch::runtime::KernelRuntimeContext;
using SizesType = ::executorch::aten::SizesType;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use ::executorch::aten::SizesType instead of using SizesType = ::executorch::aten::SizesType.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


using ::executorch::runtime::KernelRuntimeContext;
using SizesType = ::executorch::aten::SizesType;
using Tensor = ::executorch::aten::Tensor;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as SizesType
using ::executorch::aten::Tensor;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

using ::executorch::runtime::KernelRuntimeContext;
using SizesType = ::executorch::aten::SizesType;
using Tensor = ::executorch::aten::Tensor;
using IntArrayRef = ::executorch::aten::ArrayRef<int64_t>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using::executorch::aten::IntArrayRef

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

using ::executorch::aten::ScalarType;
using ::executorch::aten::Tensor;
using ::executorch::runtime::Error;
using torch::executor::RuntimeContext;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: using ::executorch::runtime::KernelRuntimeContext; instead

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -532,7 +575,7 @@ void quantize_impl(
case ScalarType::in_dtype: \
switch (out.scalar_type()) { \
ET_FORALL_INT_TYPES_WITH(CTYPE_IN, SYM_QUANTIZE_IMPL_CHANNEL); \
SYM_QUANTIZE_IMPL_CHANNEL(CTYPE_IN, uint16_t, UInt16) \
SYM_QUANTIZE_IMPL_CHANNEL(CTYPE_IN, uint16_t, Bits16) \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zonglinpeng should this be UInt16 instead?

Copy link
Contributor

@zonglinpeng zonglinpeng Jan 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I have a follow up patch for this as well

Comment on lines 108 to 198
if (out.scalar_type() == ScalarType::Int) {
XT_KERNEL_CHECK(
ctx,
out,
xa_nn_slice,
out_data,
out_shape,
inp_data,
inp_shape,
in.dim(),
(int)start,
(int)(end - 1),
(int)step,
(int)dim,
sizeof(int));
} else if (out.scalar_type() == ScalarType::Short) {
XT_KERNEL_CHECK(
ctx,
out,
xa_nn_slice,
out_data,
out_shape,
inp_data,
inp_shape,
in.dim(),
(int)start,
(int)(end - 1),
(int)step,
(int)dim,
sizeof(short));
} else if (out.scalar_type() == ScalarType::Char) {
XT_KERNEL_CHECK(
ctx,
out,
xa_nn_slice,
out_data,
out_shape,
inp_data,
inp_shape,
in.dim(),
(int)start,
(int)(end - 1),
(int)step,
(int)dim,
sizeof(char));

} else if (out.scalar_type() == (ScalarType)Uint) {
XT_KERNEL_CHECK(
ctx,
out,
xa_nn_slice,
out_data,
out_shape,
inp_data,
inp_shape,
in.dim(),
(int)start,
(int)(end - 1),
(int)step,
(int)dim,
sizeof(int));
} else if (out.scalar_type() == (ScalarType)Ushort) {
XT_KERNEL_CHECK(
ctx,
out,
xa_nn_slice,
out_data,
out_shape,
inp_data,
inp_shape,
in.dim(),
(int)start,
(int)(end - 1),
(int)step,
(int)dim,
sizeof(short));
} else if (out.scalar_type() == ScalarType::Byte) {
XT_KERNEL_CHECK(
ctx,
out,
xa_nn_slice,
out_data,
out_shape,
inp_data,
inp_shape,
in.dim(),
(int)start,
(int)(end - 1),
(int)step,
(int)dim,
sizeof(char));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These calls to xa_nn_slice differ only in the last argument. Let's create a separate function to get size based on out.scalar_type() and pass the result to xa_nn_slice call.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Similar change is applied in cat and permute operators aswell.

@zonglinpeng
Copy link
Contributor

Everything else looks good to me beside the unresolved comments/cleanup above. As soon as resolved I will trigger the merge @ckmadhira thx!

…and dequantize operators. Reduced the size of scratch memory in mean operator
Copy link
Contributor

@hsharma35 hsharma35 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for addressing the comments.

@facebook-github-bot
Copy link
Contributor

@zonglinpeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot facebook-github-bot merged commit 94d83ad into pytorch:main Jan 10, 2025
44 of 46 checks passed
YIWENX14 pushed a commit that referenced this pull request Jan 28, 2025
Differential Revision: D67870337

Pull Request resolved: #7490
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. topic: not user facing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants