Skip to content

[executorch] Populate cadence cpu ops #7165

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Dec 4, 2024
Merged

Conversation

zonglinpeng
Copy link
Contributor

@zonglinpeng zonglinpeng commented Dec 3, 2024

Summary

quantized ops in CPU flow are not fully migrated. Adding them all in this PR

Custom ops

  • quantized_linear_per_tensor_out
  • im2row_out
  • quantized_convolution_per_tensor_out

Native ops

  • op: transpose_copy.int_out
  • op: eq.Scalar_out
  • op: logical_not.out
  • op: any.out
  • op: native_group_norm.out
  • op: sum.IntList_out
  • op: select_copy.int_out

Test plan

python3 -m examples.cadence.operators.quantized_linear_op
python3 -m examples.cadence.models.babyllama

Copy link

pytorch-bot bot commented Dec 3, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/7165

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit 5e5ca52 with merge base de70b9b (image):

NEW FAILURE - The following job has failed:

  • pull / unittest-arm / linux-job (gh)
    RuntimeError: Command docker exec -t 95ab5480bc4b736819d72a7454e872ecaa71dad3ee9a1b0f67f2ee4470f8c85f /exec failed with exit code 1

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 3, 2024
@facebook-github-bot
Copy link
Contributor

@zonglinpeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Copy link
Contributor

@mcremon-meta mcremon-meta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with the extra macro removed. Thanks!

_(uint8_t, Byte) \
_(int8_t, Char)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

general comment, we should be able to add int16_t and uint16_t already, but let's ignore for now

Comment on lines 10 to 51
// Generate kernels that perform elementwise arithmetic on two quantized
// tensors. The tensors are either the same size, or the second tensor is a
// scalar.
#define DECLARE_POINTWISE_TENSOR_QUANTIZED_BINARY_OP(BINARY_FUNC_NAME, OP) \
template <typename T> \
void BINARY_FUNC_NAME( \
const ::executorch::aten::Tensor& X, \
float X_scale, \
int32_t X_zero_point, \
const ::executorch::aten::Tensor& Y, \
float Y_scale, \
int32_t Y_zero_point, \
float out_scale, \
int32_t out_zero_point, \
::executorch::aten::Tensor& out) { \
const T* __restrict__ X_data = X.const_data_ptr<T>(); \
const T* __restrict__ Y_data = Y.const_data_ptr<T>(); \
T* __restrict__ out_data = out.mutable_data_ptr<T>(); \
size_t Y_numel = Y.numel(); \
size_t X_numel = X.numel(); \
float inv_out_scale = 1.0f / out_scale; \
/* Tensor that has the same element of X */ \
if (Y_numel == X_numel) { \
for (size_t i = 0; i < X_numel; ++i) { \
float x = kernels::dequantize<T>(X_data[i], X_scale, X_zero_point); \
float y = kernels::dequantize<T>(Y_data[i], Y_scale, Y_zero_point); \
float z = x OP y; \
out_data[i] = kernels::quantize<T>(z, inv_out_scale, out_zero_point); \
} \
} /* if Y is a scalar Tensor */ \
else if (Y_numel == 1) { \
float y = kernels::dequantize<T>(Y_data[0], Y_scale, Y_zero_point); \
for (size_t i = 0; i < X_numel; ++i) { \
float x = kernels::dequantize<T>(X_data[i], X_scale, X_zero_point); \
float z = x OP y; \
out_data[i] = kernels::quantize<T>(z, inv_out_scale, out_zero_point); \
} \
} /* other broadcasting cases */ \
else { \
ET_DCHECK_MSG(false, "Unsupported broadcasting"); \
} \
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this macro is not used. It's only for quantized_add, and will need to be removed anyway with PT2 quant

facebook-github-bot pushed a commit that referenced this pull request Dec 4, 2024
Summary:
quantized ops in CPU flow are not fully migrated. Adding them all in this PR
- quantized_linear_per_tensor_out


Test Plan:
python3 -m examples.cadence.operators.quantized_linear_op
 {F1971946492} 
 {F1971946609}

Reviewed By: hsharma35, mcremon-meta

Differential Revision: D66726864

Pulled By: zonglinpeng
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D66726864

@facebook-github-bot
Copy link
Contributor

@zonglinpeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Summary:
quantized ops in CPU flow are not fully migrated. Adding them all in this PR
- quantized_linear_per_tensor_out


Test Plan: python3 -m examples.cadence.operators.quantized_linear_op

Reviewed By: hsharma35, mcremon-meta

Differential Revision: D66726864

Pulled By: zonglinpeng
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D66726864

@facebook-github-bot
Copy link
Contributor

@zonglinpeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

hietalajulius and others added 2 commits December 4, 2024 10:59
Differential Revision: D66644092

Pull Request resolved: #7134
Summary:
quantized ops in CPU flow are not fully migrated. Adding them all in this PR
- quantized_linear_per_tensor_out

Test Plan: python3 -m examples.cadence.operators.quantized_linear_op

Reviewed By: hsharma35, mcremon-meta

Differential Revision: D66726864

Pulled By: zonglinpeng
Copy link
Contributor

@mcremon-meta mcremon-meta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure why there are so many changes, maybe a linter error?

@@ -74,6 +74,7 @@ exclude_patterns = [
# NB: Objective-C is not supported
'examples/apple/**',
'examples/demo-apps/apple_ios/**',
'examples/demo-apps/react-native/rnllama/ios/**',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lint change I guess?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

caused by rebasing to main

* LICENSE file in the root directory of this source tree.
*/

#include <executorch/backends/cadence/reference/kernels/kernels.h>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we put the convolution.cpp stuff in this one instead, and remove the non-quant version? It should always be quantized anyway, and it will be less confusing

@facebook-github-bot
Copy link
Contributor

@zonglinpeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

1 similar comment
@facebook-github-bot
Copy link
Contributor

@zonglinpeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@zonglinpeng zonglinpeng merged commit fd33294 into main Dec 4, 2024
40 of 42 checks passed
zonglinpeng pushed a commit that referenced this pull request Dec 4, 2024
Summary:
quantized ops in CPU flow are not fully migrated. Adding them all in this PR
- quantized_linear_per_tensor_out


Test Plan: python3 -m examples.cadence.operators.quantized_linear_op

Reviewed By: hsharma35, mcremon-meta

Differential Revision: D66726864

Pulled By: zonglinpeng
zonglinpeng pushed a commit that referenced this pull request Dec 4, 2024
Summary:
quantized ops in CPU flow are not fully migrated. Adding them all in this PR
- quantized_linear_per_tensor_out

Test Plan: python3 -m examples.cadence.operators.quantized_linear_op

Reviewed By: hsharma35, mcremon-meta

Differential Revision: D66726864

Pulled By: zonglinpeng
@zonglinpeng zonglinpeng deleted the populate-cadence-cpu-ops branch December 4, 2024 21:33
zonglinpeng added a commit that referenced this pull request Dec 5, 2024
zonglinpeng added a commit that referenced this pull request Dec 5, 2024
@zonglinpeng zonglinpeng restored the populate-cadence-cpu-ops branch December 5, 2024 17:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported topic: not user facing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants