[executorch] Populate cadence cpu ops #7165

zonglinpeng · 2024-12-03T22:19:53Z

Summary

quantized ops in CPU flow are not fully migrated. Adding them all in this PR

Custom ops

quantized_linear_per_tensor_out
im2row_out
quantized_convolution_per_tensor_out

Native ops

op: transpose_copy.int_out
op: eq.Scalar_out
op: logical_not.out
op: any.out
op: native_group_norm.out
op: sum.IntList_out
op: select_copy.int_out

Test plan

python3 -m examples.cadence.operators.quantized_linear_op
python3 -m examples.cadence.models.babyllama

pytorch-bot · 2024-12-03T22:19:58Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/7165

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit 5e5ca52 with merge base de70b9b ():

NEW FAILURE - The following job has failed:

pull / unittest-arm / linux-job (gh)
RuntimeError: Command docker exec -t 95ab5480bc4b736819d72a7454e872ecaa71dad3ee9a1b0f67f2ee4470f8c85f /exec failed with exit code 1

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot · 2024-12-03T22:20:12Z

@zonglinpeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

mcremon-meta

LGTM with the extra macro removed. Thanks!

mcremon-meta · 2024-12-03T22:23:41Z

backends/cadence/reference/operators/operators.h

+  _(uint8_t, Byte)                          \
+  _(int8_t, Char)


general comment, we should be able to add int16_t and uint16_t already, but let's ignore for now

mcremon-meta · 2024-12-03T22:24:54Z

backends/cadence/reference/operators/quantized_ops.h

+// Generate kernels that perform elementwise arithmetic on two quantized
+// tensors. The tensors are either the same size, or the second tensor is a
+// scalar.
+#define DECLARE_POINTWISE_TENSOR_QUANTIZED_BINARY_OP(BINARY_FUNC_NAME, OP)    \
+  template <typename T>                                                       \
+  void BINARY_FUNC_NAME(                                                      \
+      const ::executorch::aten::Tensor& X,                                    \
+      float X_scale,                                                          \
+      int32_t X_zero_point,                                                   \
+      const ::executorch::aten::Tensor& Y,                                    \
+      float Y_scale,                                                          \
+      int32_t Y_zero_point,                                                   \
+      float out_scale,                                                        \
+      int32_t out_zero_point,                                                 \
+      ::executorch::aten::Tensor& out) {                                      \
+    const T* __restrict__ X_data = X.const_data_ptr<T>();                     \
+    const T* __restrict__ Y_data = Y.const_data_ptr<T>();                     \
+    T* __restrict__ out_data = out.mutable_data_ptr<T>();                     \
+    size_t Y_numel = Y.numel();                                               \
+    size_t X_numel = X.numel();                                               \
+    float inv_out_scale = 1.0f / out_scale;                                   \
+    /* Tensor that has the same element of X */                               \
+    if (Y_numel == X_numel) {                                                 \
+      for (size_t i = 0; i < X_numel; ++i) {                                  \
+        float x = kernels::dequantize<T>(X_data[i], X_scale, X_zero_point);   \
+        float y = kernels::dequantize<T>(Y_data[i], Y_scale, Y_zero_point);   \
+        float z = x OP y;                                                     \
+        out_data[i] = kernels::quantize<T>(z, inv_out_scale, out_zero_point); \
+      }                                                                       \
+    } /* if Y is a scalar Tensor */                                           \
+    else if (Y_numel == 1) {                                                  \
+      float y = kernels::dequantize<T>(Y_data[0], Y_scale, Y_zero_point);     \
+      for (size_t i = 0; i < X_numel; ++i) {                                  \
+        float x = kernels::dequantize<T>(X_data[i], X_scale, X_zero_point);   \
+        float z = x OP y;                                                     \
+        out_data[i] = kernels::quantize<T>(z, inv_out_scale, out_zero_point); \
+      }                                                                       \
+    } /* other broadcasting cases */                                          \
+    else {                                                                    \
+      ET_DCHECK_MSG(false, "Unsupported broadcasting");                       \
+    }                                                                         \
+  }


this macro is not used. It's only for quantized_add, and will need to be removed anyway with PT2 quant

Summary: quantized ops in CPU flow are not fully migrated. Adding them all in this PR - quantized_linear_per_tensor_out Test Plan: python3 -m examples.cadence.operators.quantized_linear_op {F1971946492} {F1971946609} Reviewed By: hsharma35, mcremon-meta Differential Revision: D66726864 Pulled By: zonglinpeng

facebook-github-bot · 2024-12-04T00:15:12Z

This pull request was exported from Phabricator. Differential Revision: D66726864

facebook-github-bot · 2024-12-04T00:42:34Z

@zonglinpeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Summary: quantized ops in CPU flow are not fully migrated. Adding them all in this PR - quantized_linear_per_tensor_out Test Plan: python3 -m examples.cadence.operators.quantized_linear_op Reviewed By: hsharma35, mcremon-meta Differential Revision: D66726864 Pulled By: zonglinpeng

facebook-github-bot · 2024-12-04T02:58:54Z

This pull request was exported from Phabricator. Differential Revision: D66726864

facebook-github-bot · 2024-12-04T18:50:43Z

@zonglinpeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Differential Revision: D66644092 Pull Request resolved: #7134

Summary: quantized ops in CPU flow are not fully migrated. Adding them all in this PR - quantized_linear_per_tensor_out Test Plan: python3 -m examples.cadence.operators.quantized_linear_op Reviewed By: hsharma35, mcremon-meta Differential Revision: D66726864 Pulled By: zonglinpeng

mcremon-meta

not sure why there are so many changes, maybe a linter error?

mcremon-meta · 2024-12-04T19:02:41Z

.lintrunner.toml

@@ -74,6 +74,7 @@ exclude_patterns = [
    # NB: Objective-C is not supported
    'examples/apple/**',
    'examples/demo-apps/apple_ios/**',
+    'examples/demo-apps/react-native/rnllama/ios/**',


lint change I guess?

caused by rebasing to main

mcremon-meta · 2024-12-04T19:05:15Z

backends/cadence/reference/operators/quantized_conv_out.cpp

- * LICENSE file in the root directory of this source tree.
- */
-
-#include <executorch/backends/cadence/reference/kernels/kernels.h>


can we put the convolution.cpp stuff in this one instead, and remove the non-quant version? It should always be quantized anyway, and it will be less confusing

facebook-github-bot · 2024-12-04T19:17:47Z

@zonglinpeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2024-12-04T19:33:59Z

@zonglinpeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Summary: quantized ops in CPU flow are not fully migrated. Adding them all in this PR - quantized_linear_per_tensor_out Test Plan: python3 -m examples.cadence.operators.quantized_linear_op Reviewed By: hsharma35, mcremon-meta Differential Revision: D66726864 Pulled By: zonglinpeng

This reverts commit 6f2e5f6.

This reverts commit 1eb924f.

zonglinpeng requested a review from mcremon-meta December 3, 2024 22:19

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 3, 2024

mcremon-meta approved these changes Dec 3, 2024

View reviewed changes

facebook-github-bot force-pushed the populate-cadence-cpu-ops branch from 5ebcf7f to 250cae2 Compare December 4, 2024 00:14

facebook-github-bot added the fb-exported label Dec 4, 2024

zonglinpeng added the topic: not user facing label Dec 4, 2024

facebook-github-bot force-pushed the populate-cadence-cpu-ops branch from 2be5be9 to 8442540 Compare December 4, 2024 02:58

Zonglin Peng added 4 commits December 4, 2024 09:27

create quantized_linear_per_tensor_out in cpu

e27b913

removed unused headers and lint

58e7f68

add missing aten ops for x86 flow

c4dd40d

remove unused macro

6207708

hietalajulius and others added 2 commits December 4, 2024 10:59

Feat/Add a React Native LLaMA demo app for iOS

a5a2d8c

Differential Revision: D66644092 Pull Request resolved: #7134

mcremon-meta reviewed Dec 4, 2024

View reviewed changes

Zonglin Peng added 2 commits December 4, 2024 11:06

miss 1 rebase

e9c83e2

rename quant con

5e5ca52

zonglinpeng merged commit fd33294 into main Dec 4, 2024
40 of 42 checks passed

zonglinpeng deleted the populate-cadence-cpu-ops branch December 4, 2024 21:33

zonglinpeng added a commit that referenced this pull request Dec 5, 2024

Revert "Populate cadence cpu ops (#7165)"

30115a2

This reverts commit 6f2e5f6.

zonglinpeng added a commit that referenced this pull request Dec 5, 2024

Revert "Populate cadence cpu ops (#7165)"

25efc5f

This reverts commit 1eb924f.

zonglinpeng restored the populate-cadence-cpu-ops branch December 5, 2024 17:01

[executorch] Populate cadence cpu ops #7165

[executorch] Populate cadence cpu ops #7165

Uh oh!

Conversation

zonglinpeng commented Dec 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Custom ops

Native ops

Test plan

Uh oh!

pytorch-bot bot commented Dec 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/7165

❌ 1 New Failure

Uh oh!

facebook-github-bot commented Dec 3, 2024

Uh oh!

mcremon-meta left a comment

Choose a reason for hiding this comment

Uh oh!

mcremon-meta Dec 3, 2024

Choose a reason for hiding this comment

Uh oh!

mcremon-meta Dec 3, 2024

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Dec 4, 2024

Uh oh!

facebook-github-bot commented Dec 4, 2024

Uh oh!

facebook-github-bot commented Dec 4, 2024

Uh oh!

facebook-github-bot commented Dec 4, 2024

Uh oh!

mcremon-meta left a comment

Choose a reason for hiding this comment

Uh oh!

mcremon-meta Dec 4, 2024

Choose a reason for hiding this comment

Uh oh!

zonglinpeng Dec 4, 2024

Choose a reason for hiding this comment

Uh oh!

mcremon-meta Dec 4, 2024

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Dec 4, 2024

Uh oh!

facebook-github-bot commented Dec 4, 2024

Uh oh!

Uh oh!

Uh oh!

zonglinpeng commented Dec 3, 2024 •

edited

Loading

pytorch-bot bot commented Dec 3, 2024 •

edited

Loading