Cadence fusiong3 operators m2 #7490

ckmadhira · 2025-01-03T09:16:15Z

Summary

Added new operators sub, div, exp, permute, slice, mean in backends/cadence/fusion_g3
For cycle reduction, disabled error checks in operators using macro "OPT_ARG_CHECK"

pytorch-bot · 2025-01-03T09:16:19Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/7490

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit a0bdd97 with merge base a29dc49 ():

NEW FAILURE - The following job has failed:

pull / unittest-arm / linux-job (gh)
RuntimeError: Command docker exec -t 4c981cca6bb6235830bc128f4b9daf20a60211052f039f89622c6d860e12bd39 /exec failed with exit code 127

This comment was automatically generated by Dr. CI and updates every 15 minutes.

mcremon-meta · 2025-01-05T11:26:33Z

backends/cadence/fusion_g3/operators/op_add.cpp

-using ::executorch::aten::Scalar;
-using ::executorch::aten::ScalarType;
-using ::executorch::aten::Tensor;
-using ::executorch::runtime::canCast;
-using ::executorch::runtime::Error;
-using ::executorch::runtime::KernelRuntimeContext;
+using exec_aten::Scalar;
+using exec_aten::ScalarType;
+using exec_aten::Tensor;
+using executorch::runtime::canCast;
+using torch::executor::Error;
+using torch::executor::KernelRuntimeContext;


I think we need to retain the full ones, I'll let @hsharma35 confirm

As part of M1 release, we have used "exec_aten::Scalar" similar to what is available in HiFi. Later, I think, Zonglin in his PR has changed this. We tried to retain this. @hsharma35 and @zonglinpeng please let me know the correct name space to use.

Nit: Please use the latest namespace/headers similar to op_add.cpp in master.
The header/namespaces are using executorch c++ style guide

@hsharma35 - Updated all the operators similar to op_add.cpp in master

mcremon-meta · 2025-01-05T11:26:49Z

backends/cadence/fusion_g3/operators/op_cat.cpp

-using ::executorch::aten::ScalarType;
-using ::executorch::aten::Tensor;
-using ::executorch::runtime::Error;
-using ::executorch::runtime::KernelRuntimeContext;
+using exec_aten::Scalar;
+using exec_aten::ScalarType;
+using exec_aten::Tensor;
+using torch::executor::Error;
+using torch::executor::KernelRuntimeContext;


mcremon-meta · 2025-01-05T11:27:25Z

backends/cadence/fusion_g3/operators/op_dequantize.cpp

-using ::executorch::aten::Scalar;
-using ::executorch::aten::ScalarType;
-using ::executorch::aten::Tensor;
-using ::executorch::runtime::Error;
-using ::executorch::runtime::KernelRuntimeContext;
+using exec_aten::Scalar;
+using exec_aten::ScalarType;
+using exec_aten::Tensor;
+using torch::executor::Error;
+using torch::executor::KernelRuntimeContext;


won't put it on the other files, but same here and all new files I guess

mcremon-meta · 2025-01-05T11:28:22Z

backends/cadence/fusion_g3/operators/op_cat.cpp

@@ -30,6 +29,15 @@ namespace impl {
 namespace G3 {
 namespace native {

+#define XT_KERNEL_CHECK(ctx, out, kernel, ...) \


why is this defined so many times? We should be able to at least move it to a share location?

This PR creates a separate header for the XT_KERNEL_CHECK macro. Maybe rebase on top of this and include the common header instead? We can cleanup in a different PR too.
#7516

Change done.

facebook-github-bot · 2025-01-06T19:10:49Z

@zonglinpeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Summary: Added new operators sub, div, exp, permute, slice, mean in backends/cadence/fusion_g3 For cycle reduction, disabled error checks in operators using macro "OPT_ARG_CHECK" Pull Request resolved: pytorch#7490 Differential Revision: D67870337 Pulled By: zonglinpeng

hsharma35 · 2025-01-05T17:34:51Z

backends/cadence/fusion_g3/operators/op_add.cpp

+using exec_aten::Scalar;
+using exec_aten::ScalarType;
+using exec_aten::Tensor;
+using executorch::runtime::canCast;


Can we please undo the changes in headers / using declarations?
(I think this is likely due to a bad rebase)

hsharma35 · 2025-01-05T18:32:35Z

backends/cadence/fusion_g3/operators/op_cat.cpp

@@ -30,6 +29,15 @@ namespace impl {
 namespace G3 {
 namespace native {

+#define XT_KERNEL_CHECK(ctx, out, kernel, ...) \


This PR creates a separate header for the XT_KERNEL_CHECK macro. Maybe rebase on top of this and include the common header instead? We can cleanup in a different PR too.
#7516

hsharma35 · 2025-01-06T19:53:29Z

backends/cadence/fusion_g3/operators/op_native_layer_norm.cpp

+using Tensor = exec_aten::Tensor;
+using ScalarType = exec_aten::ScalarType;
+using IntArrayRef = exec_aten::ArrayRef<int64_t>;
+using torch::executor::Error;
+using torch::executor::KernelRuntimeContext;


Please undo the namespace and header changes.

Summary: Added new operators sub, div, exp, permute, slice, mean in backends/cadence/fusion_g3 For cycle reduction, disabled error checks in operators using macro "OPT_ARG_CHECK" Pull Request resolved: pytorch#7490 Differential Revision: D67870337 Pulled By: zonglinpeng

…, dequantize, cat, layer norm, softmax to backends/cadence folder. Added operators to backends/cadence folder

Signed-off-by: [email protected] <[email protected]>

…ument checks in the operators using macro

…n to support integer inputs and float output

…ckends/cadence/fusion_g3/operators to solve a comilation error

…in the operators at backends\cadence\fusion_f3\operators folder

facebook-github-bot · 2025-01-07T19:45:04Z

@zonglinpeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

…oc in mean operator

zonglinpeng · 2025-01-08T23:51:38Z

backends/cadence/fusion_g3/operators/op_dequantize.cpp

@@ -461,7 +534,7 @@ void dequantize_impl(
    break;
        switch (input.scalar_type()) {
          ET_FORALL_INT_TYPES(SYM_CALCULATE_INT_TYPE_CHANNEL);
-          SYM_CALCULATE_INT_TYPE_CHANNEL(uint16_t, UInt16);
+          SYM_CALCULATE_INT_TYPE_CHANNEL(uint16_t, Bits16);


Is there a reason why it has to be Bits16 instead of UInt16? Bits16 is deprecated on ourside

in Executorch 0.4.0, Bit16 was used and in the later versions both are used though they both are unsigned shorts. So, we have replaced UInt16 with Bit16 assuming that Bits16 is the latest. We will change this and update the PR

Updated Bits16 with UInt16

hsharma35 · 2025-01-09T05:28:28Z

backends/cadence/fusion_g3/operators/op_mean.cpp

+    }
+
+    int scratch_size = 1;
+    for (int i = 0; i < num_inp_dims; i++) {


Is the scratch size equal to the number of elements in the input? If so, use in.numel().
Can this scratch size be reduced?
Inside nnlib, the size expected of this buffer is ((inp_length / inp_shape_max) * sizeof(WORD32)).

Reduced the memory size

hsharma35

LGTM, apart from some minor comments.
Feel free to ignore the comments with "Nit"

hsharma35 · 2025-01-09T05:29:23Z

backends/cadence/fusion_g3/operators/op_native_layer_norm.cpp

+using Tensor = ::executorch::aten::Tensor;
+using ScalarType = ::executorch::aten::ScalarType;
+using IntArrayRef = ::executorch::aten::ArrayRef<int64_t>;


Please undo this change.

hsharma35 · 2025-01-09T05:30:18Z

backends/cadence/fusion_g3/operators/op_native_layer_norm.cpp

@@ -32,8 +35,8 @@ template <typename CTYPE>
 void layer_norm(
    const Tensor& input,
    IntArrayRef normalized_shape,
-    const exec_aten::optional<Tensor>& weight,
-    const exec_aten::optional<Tensor>& bias,
+    const ::executorch::aten::optional<Tensor>& weight,


Nit: you can add using ::executorch::aten::optional at the top of this file and use optional directly.

hsharma35 · 2025-01-09T05:31:01Z

backends/cadence/fusion_g3/operators/op_permute_copy.cpp

+#include <executorch/runtime/kernel/kernel_includes.h>
+
+using ::executorch::runtime::KernelRuntimeContext;
+using SizesType = ::executorch::aten::SizesType;


use ::executorch::aten::SizesType instead of using SizesType = ::executorch::aten::SizesType.

hsharma35 · 2025-01-09T05:31:38Z

backends/cadence/fusion_g3/operators/op_permute_copy.cpp

+
+using ::executorch::runtime::KernelRuntimeContext;
+using SizesType = ::executorch::aten::SizesType;
+using Tensor = ::executorch::aten::Tensor;


Same as SizesType
using ::executorch::aten::Tensor;

hsharma35 · 2025-01-09T05:31:53Z

backends/cadence/fusion_g3/operators/op_permute_copy.cpp

+using ::executorch::runtime::KernelRuntimeContext;
+using SizesType = ::executorch::aten::SizesType;
+using Tensor = ::executorch::aten::Tensor;
+using IntArrayRef = ::executorch::aten::ArrayRef<int64_t>;


using::executorch::aten::IntArrayRef

hsharma35 · 2025-01-09T05:39:56Z

backends/cadence/fusion_g3/operators/op_exp.cpp

+using ::executorch::aten::ScalarType;
+using ::executorch::aten::Tensor;
+using ::executorch::runtime::Error;
+using torch::executor::RuntimeContext;


Nit: using ::executorch::runtime::KernelRuntimeContext; instead

hsharma35 · 2025-01-09T05:41:49Z

backends/cadence/fusion_g3/operators/op_quantize.cpp

@@ -532,7 +575,7 @@ void quantize_impl(
  case ScalarType::in_dtype:                                         \
    switch (out.scalar_type()) {                                     \
      ET_FORALL_INT_TYPES_WITH(CTYPE_IN, SYM_QUANTIZE_IMPL_CHANNEL); \
-      SYM_QUANTIZE_IMPL_CHANNEL(CTYPE_IN, uint16_t, UInt16)          \
+      SYM_QUANTIZE_IMPL_CHANNEL(CTYPE_IN, uint16_t, Bits16)          \


@zonglinpeng should this be UInt16 instead?

yes, I have a follow up patch for this as well

backends/cadence/fusion_g3/operators/op_quantize.cpp

hsharma35 · 2025-01-09T05:45:37Z

backends/cadence/fusion_g3/operators/op_slice_copy.cpp

+  if (out.scalar_type() == ScalarType::Int) {
+    XT_KERNEL_CHECK(
+        ctx,
+        out,
+        xa_nn_slice,
+        out_data,
+        out_shape,
+        inp_data,
+        inp_shape,
+        in.dim(),
+        (int)start,
+        (int)(end - 1),
+        (int)step,
+        (int)dim,
+        sizeof(int));
+  } else if (out.scalar_type() == ScalarType::Short) {
+    XT_KERNEL_CHECK(
+        ctx,
+        out,
+        xa_nn_slice,
+        out_data,
+        out_shape,
+        inp_data,
+        inp_shape,
+        in.dim(),
+        (int)start,
+        (int)(end - 1),
+        (int)step,
+        (int)dim,
+        sizeof(short));
+  } else if (out.scalar_type() == ScalarType::Char) {
+    XT_KERNEL_CHECK(
+        ctx,
+        out,
+        xa_nn_slice,
+        out_data,
+        out_shape,
+        inp_data,
+        inp_shape,
+        in.dim(),
+        (int)start,
+        (int)(end - 1),
+        (int)step,
+        (int)dim,
+        sizeof(char));
+
+  } else if (out.scalar_type() == (ScalarType)Uint) {
+    XT_KERNEL_CHECK(
+        ctx,
+        out,
+        xa_nn_slice,
+        out_data,
+        out_shape,
+        inp_data,
+        inp_shape,
+        in.dim(),
+        (int)start,
+        (int)(end - 1),
+        (int)step,
+        (int)dim,
+        sizeof(int));
+  } else if (out.scalar_type() == (ScalarType)Ushort) {
+    XT_KERNEL_CHECK(
+        ctx,
+        out,
+        xa_nn_slice,
+        out_data,
+        out_shape,
+        inp_data,
+        inp_shape,
+        in.dim(),
+        (int)start,
+        (int)(end - 1),
+        (int)step,
+        (int)dim,
+        sizeof(short));
+  } else if (out.scalar_type() == ScalarType::Byte) {
+    XT_KERNEL_CHECK(
+        ctx,
+        out,
+        xa_nn_slice,
+        out_data,
+        out_shape,
+        inp_data,
+        inp_shape,
+        in.dim(),
+        (int)start,
+        (int)(end - 1),
+        (int)step,
+        (int)dim,
+        sizeof(char));


These calls to xa_nn_slice differ only in the last argument. Let's create a separate function to get size based on out.scalar_type() and pass the result to xa_nn_slice call.

Done. Similar change is applied in cat and permute operators aswell.

backends/cadence/fusion_g3/operators/op_softmax.cpp

zonglinpeng · 2025-01-10T02:17:22Z

Everything else looks good to me beside the unresolved comments/cleanup above. As soon as resolved I will trigger the merge @ckmadhira thx!

backends/cadence/fusion_g3/operators/op_permute_copy.cpp

…and dequantize operators. Reduced the size of scratch memory in mean operator

hsharma35

LGTM! Thanks for addressing the comments.

facebook-github-bot · 2025-01-10T17:21:00Z

@zonglinpeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Differential Revision: D67870337 Pull Request resolved: #7490

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 3, 2025

ckmadhira force-pushed the cadence_fusiong3_operators_M2 branch from bc5b6f0 to c77741a Compare January 3, 2025 11:03

mcremon-meta added the topic: not user facing label Jan 3, 2025

hsharma35 requested review from hsharma35, zonglinpeng and mcremon-meta January 5, 2025 07:35

mcremon-meta reviewed Jan 5, 2025

View reviewed changes

tarun292 approved these changes Jan 6, 2025

View reviewed changes

hsharma35 reviewed Jan 6, 2025

View reviewed changes

[email protected] and others added 11 commits January 7, 2025 11:45

Added Fusion G3 NN library with kernels related to add, mul, quantize…

2221f8a

…, dequantize, cat, layer norm, softmax to backends/cadence folder. Added operators to backends/cadence folder

Updated name space of the operators by appending cadence

a41efc8

Signed-off-by: [email protected] <[email protected]>

Added nnlib-FusionG3 submodule from FOSS-xtensa git space

ce89bf1

Signed-off-by: [email protected] <[email protected]>

Resolved Linter errors

9011471

Signed-off-by: [email protected] <[email protected]>

added operators for sub, slice, permute, exp, mean, div. Disabled arg…

5bdd7eb

…ument checks in the operators using macro

updated fusion g3 kernel git repo

0da167b

For div operator and for generic case, added get_compute type functio…

6802158

…n to support integer inputs and float output

added unary_ufunc_realhbbf16_to_floathbf16.cpp to the cmakelist in ba…

318f558

…ckends/cadence/fusion_g3/operators to solve a comilation error

resolved linter errors

dd1d629

resolved linter errors

c964d27

copied XT_KERNEL_CHECK to a header file and included the header file …

b626b7f

…in the operators at backends\cadence\fusion_f3\operators folder

ckmadhira force-pushed the cadence_fusiong3_operators_M2 branch from 897987e to b626b7f Compare January 7, 2025 06:41

[email protected] added 2 commits January 7, 2025 18:28

Removed conflicts in op_quantize.cpp

444958c

Name space usage is changed in the operators

e79b466

Ordered the header file inclusion. Used allocate_temp instead of mall…

6ace5a3

…oc in mean operator

zonglinpeng reviewed Jan 8, 2025

View reviewed changes

hsharma35 reviewed Jan 9, 2025

View reviewed changes

hsharma35 approved these changes Jan 9, 2025

View reviewed changes

hsharma35 reviewed Jan 10, 2025

View reviewed changes

backends/cadence/fusion_g3/operators/op_permute_copy.cpp Outdated Show resolved Hide resolved

resovled all name space issues. Changed Bits16 to Uint16 in quantize …

a0bdd97

…and dequantize operators. Reduced the size of scratch memory in mean operator

hsharma35 approved these changes Jan 10, 2025

View reviewed changes

facebook-github-bot merged commit 94d83ad into pytorch:main Jan 10, 2025
44 of 46 checks passed

YIWENX14 pushed a commit that referenced this pull request Jan 28, 2025

Cadence fusiong3 operators m2

af1d707

Differential Revision: D67870337 Pull Request resolved: #7490

Cadence fusiong3 operators m2 #7490

Cadence fusiong3 operators m2 #7490

Uh oh!

Conversation

ckmadhira commented Jan 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

pytorch-bot bot commented Jan 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/7490

❌ 1 New Failure

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Jan 6, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Jan 7, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zonglinpeng Jan 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hsharma35 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

ckmadhira commented Jan 3, 2025 •

edited

Loading

pytorch-bot bot commented Jan 3, 2025 •

edited

Loading

zonglinpeng Jan 10, 2025 •

edited

Loading

zonglinpeng Jan 10, 2025 •

edited

Loading