-
Notifications
You must be signed in to change notification settings - Fork 607
[executorch] Populate cadence cpu ops #7165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/7165
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New FailureAs of commit 5e5ca52 with merge base de70b9b ( NEW FAILURE - The following job has failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
@zonglinpeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with the extra macro removed. Thanks!
_(uint8_t, Byte) \ | ||
_(int8_t, Char) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
general comment, we should be able to add int16_t and uint16_t already, but let's ignore for now
// Generate kernels that perform elementwise arithmetic on two quantized | ||
// tensors. The tensors are either the same size, or the second tensor is a | ||
// scalar. | ||
#define DECLARE_POINTWISE_TENSOR_QUANTIZED_BINARY_OP(BINARY_FUNC_NAME, OP) \ | ||
template <typename T> \ | ||
void BINARY_FUNC_NAME( \ | ||
const ::executorch::aten::Tensor& X, \ | ||
float X_scale, \ | ||
int32_t X_zero_point, \ | ||
const ::executorch::aten::Tensor& Y, \ | ||
float Y_scale, \ | ||
int32_t Y_zero_point, \ | ||
float out_scale, \ | ||
int32_t out_zero_point, \ | ||
::executorch::aten::Tensor& out) { \ | ||
const T* __restrict__ X_data = X.const_data_ptr<T>(); \ | ||
const T* __restrict__ Y_data = Y.const_data_ptr<T>(); \ | ||
T* __restrict__ out_data = out.mutable_data_ptr<T>(); \ | ||
size_t Y_numel = Y.numel(); \ | ||
size_t X_numel = X.numel(); \ | ||
float inv_out_scale = 1.0f / out_scale; \ | ||
/* Tensor that has the same element of X */ \ | ||
if (Y_numel == X_numel) { \ | ||
for (size_t i = 0; i < X_numel; ++i) { \ | ||
float x = kernels::dequantize<T>(X_data[i], X_scale, X_zero_point); \ | ||
float y = kernels::dequantize<T>(Y_data[i], Y_scale, Y_zero_point); \ | ||
float z = x OP y; \ | ||
out_data[i] = kernels::quantize<T>(z, inv_out_scale, out_zero_point); \ | ||
} \ | ||
} /* if Y is a scalar Tensor */ \ | ||
else if (Y_numel == 1) { \ | ||
float y = kernels::dequantize<T>(Y_data[0], Y_scale, Y_zero_point); \ | ||
for (size_t i = 0; i < X_numel; ++i) { \ | ||
float x = kernels::dequantize<T>(X_data[i], X_scale, X_zero_point); \ | ||
float z = x OP y; \ | ||
out_data[i] = kernels::quantize<T>(z, inv_out_scale, out_zero_point); \ | ||
} \ | ||
} /* other broadcasting cases */ \ | ||
else { \ | ||
ET_DCHECK_MSG(false, "Unsupported broadcasting"); \ | ||
} \ | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this macro is not used. It's only for quantized_add, and will need to be removed anyway with PT2 quant
Summary: quantized ops in CPU flow are not fully migrated. Adding them all in this PR - quantized_linear_per_tensor_out Test Plan: python3 -m examples.cadence.operators.quantized_linear_op {F1971946492} {F1971946609} Reviewed By: hsharma35, mcremon-meta Differential Revision: D66726864 Pulled By: zonglinpeng
5ebcf7f
to
250cae2
Compare
This pull request was exported from Phabricator. Differential Revision: D66726864 |
@zonglinpeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Summary: quantized ops in CPU flow are not fully migrated. Adding them all in this PR - quantized_linear_per_tensor_out Test Plan: python3 -m examples.cadence.operators.quantized_linear_op Reviewed By: hsharma35, mcremon-meta Differential Revision: D66726864 Pulled By: zonglinpeng
2be5be9
to
8442540
Compare
This pull request was exported from Phabricator. Differential Revision: D66726864 |
@zonglinpeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Differential Revision: D66644092 Pull Request resolved: #7134
Summary: quantized ops in CPU flow are not fully migrated. Adding them all in this PR - quantized_linear_per_tensor_out Test Plan: python3 -m examples.cadence.operators.quantized_linear_op Reviewed By: hsharma35, mcremon-meta Differential Revision: D66726864 Pulled By: zonglinpeng
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure why there are so many changes, maybe a linter error?
@@ -74,6 +74,7 @@ exclude_patterns = [ | |||
# NB: Objective-C is not supported | |||
'examples/apple/**', | |||
'examples/demo-apps/apple_ios/**', | |||
'examples/demo-apps/react-native/rnllama/ios/**', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lint change I guess?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
caused by rebasing to main
* LICENSE file in the root directory of this source tree. | ||
*/ | ||
|
||
#include <executorch/backends/cadence/reference/kernels/kernels.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we put the convolution.cpp
stuff in this one instead, and remove the non-quant version? It should always be quantized anyway, and it will be less confusing
@zonglinpeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
1 similar comment
@zonglinpeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Summary: quantized ops in CPU flow are not fully migrated. Adding them all in this PR - quantized_linear_per_tensor_out Test Plan: python3 -m examples.cadence.operators.quantized_linear_op Reviewed By: hsharma35, mcremon-meta Differential Revision: D66726864 Pulled By: zonglinpeng
Summary: quantized ops in CPU flow are not fully migrated. Adding them all in this PR - quantized_linear_per_tensor_out Test Plan: python3 -m examples.cadence.operators.quantized_linear_op Reviewed By: hsharma35, mcremon-meta Differential Revision: D66726864 Pulled By: zonglinpeng
This reverts commit 6f2e5f6.
This reverts commit 1eb924f.
Summary
quantized ops in CPU flow are not fully migrated. Adding them all in this PR
Custom ops
Native ops
Test plan
python3 -m examples.cadence.operators.quantized_linear_op
python3 -m examples.cadence.models.babyllama