-
Notifications
You must be signed in to change notification settings - Fork 607
[llm] Use new API to register custom ops for llama model #2840
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/2840
Note: Links to docs will display an error until the docs builds have been completed. ❌ 2 New FailuresAs of commit d1612c8 with merge base 081c849 ( NEW FAILURES - The following jobs have failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
@larryliu0820 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
backends/xnnpack/CMakeLists.txt
Outdated
@@ -72,6 +72,7 @@ target_include_directories( | |||
xnnpack_schema INTERFACE ${_xnnpack_schema__include_dir} | |||
${EXECUTORCH_ROOT}/third-party/flatbuffers/include) | |||
|
|||
target_compile_options(pthreadpool PUBLIC ${_common_compile_options}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this needed? and pthreadpool related stuff has moved to root CMakeLists
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my local build failed complaining no -fPIC
if(ANDROID) | ||
list(APPEND link_libraries log) | ||
endif() | ||
|
||
target_compile_options(llama_main PUBLIC ${_common_compile_options} | ||
-DET_USE_THREADPOOL) | ||
-DET_USE_THREADPOOL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reformat cmakelists.txt
@@ -22,6 +21,7 @@ | |||
#include <executorch/backends/xnnpack/threadpool/threadpool.h> | |||
#include <executorch/extension/parallel/thread_parallel.h> | |||
#endif | |||
#include <executorch/extension/kernel_util/make_boxed_from_unboxed_functor.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should rename this header to something else
// @lint-ignore CLANGTIDY facebook-hte-ParameterMightThrowOnCopy | ||
const c10::optional<double> scale) { | ||
auto output = at::empty_like(q_projected); | ||
WRAP_TO_ATEN(sdpa_with_kv_cache_out_no_context, 11) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whats this 11?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The 11th argument is out
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
number of args? I think there is some template magic that allows you to count number of args, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah this is telling the template we need to return the 11th object, since it is out
. See this code: https://github.com/pytorch/executorch/blob/main/extension/aten_util/make_aten_functor_from_et_functor.h#L268-L277
"Tensor(b!) value_cache, SymInt start_pos, SymInt seq_len, Tensor? attn_mask=None, " | ||
"float drpout_p=0.0, bool is_causal=False, float? scale=None) -> Tensor", | ||
&torch::executor::native::sdpa_with_kv_cache_aten); | ||
m.def( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WHy is this one needed? So that we can generate out variant one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need both sdpa_with_kv_cache
and sdpa_with_kv_cache.out
in ATen so that exir is happy.
@@ -32,7 +32,7 @@ exec_aten::Tensor op_sdpa_with_kv_cache( | |||
exec_aten::optional<double> scale, | |||
exec_aten::Tensor& out) { | |||
exec_aten::RuntimeContext context{}; | |||
return torch::executor::llama::sdpa_with_kv_cache_outf( | |||
return torch::executor::native::sdpa_with_kv_cache_out( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why changes the namespace?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
llama
namespace was generated by FunctionHeaderWrapper.h
. Here I added a header op_sdpa.h
and that is using the same namespace as op_sdpa.cpp
.
# assuming we only hit this in OSS, find the default install path | ||
prefix = os.environ.get("CMAKE_INSTALL_PREFIX", "../../../../cmake-out") | ||
lib_path = os.path.join(prefix, "lib/libcustom_ops_aot_lib.so") | ||
torch.ops.load_library(lib_path) | ||
op = torch.ops.llama.sdpa_with_kv_cache.default | ||
assert op is not None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a little bit clunky but I dont know how to improv e it
runtime.cxx_library( | ||
name = "sdpa", | ||
name = "custom_ops", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why rename this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A lot of places are referring to examples/models/llama2/custom_ops:custom_ops
library name. I'm just too lazy to change all of them to sdpa
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments
4039b50
to
1749bf7
Compare
@larryliu0820 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
1 similar comment
@larryliu0820 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@larryliu0820 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
0cbbd53
to
13d2fa7
Compare
@larryliu0820 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Summary: Using the following 2 APIs: * `EXECUTORCH_LIBRARY` to replace the need of a yaml file. With this macro we can directly register a custom kernel into ExecuTorch runtime. * `WRAP_TO_ATEN` allows custom op authors to use the same kernel for ExecuTorch and PyTorch. This can be helpful during debugging. Test Plan: Rely on the new CI job `test_llama` with `xnnpack+kv+custom` option. Reviewed By: kimishpatel Differential Revision: D55713944 Pulled By: larryliu0820
13d2fa7
to
b8e5ff0
Compare
This pull request was exported from Phabricator. Differential Revision: D55713944 |
Summary: Using the following 2 APIs: * `EXECUTORCH_LIBRARY` to replace the need of a yaml file. With this macro we can directly register a custom kernel into ExecuTorch runtime. * `WRAP_TO_ATEN` allows custom op authors to use the same kernel for ExecuTorch and PyTorch. This can be helpful during debugging. Test Plan: Rely on the new CI job `test_llama` with `xnnpack+kv+custom` option. Reviewed By: kimishpatel Differential Revision: D55713944 Pulled By: larryliu0820
b8e5ff0
to
d93584a
Compare
This pull request was exported from Phabricator. Differential Revision: D55713944 |
Summary: Using the following 2 APIs: * `EXECUTORCH_LIBRARY` to replace the need of a yaml file. With this macro we can directly register a custom kernel into ExecuTorch runtime. * `WRAP_TO_ATEN` allows custom op authors to use the same kernel for ExecuTorch and PyTorch. This can be helpful during debugging. Test Plan: Rely on the new CI job `test_llama` with `xnnpack+kv+custom` option. Reviewed By: kimishpatel Differential Revision: D55713944 Pulled By: larryliu0820
d93584a
to
4e60506
Compare
This pull request was exported from Phabricator. Differential Revision: D55713944 |
Summary: Using the following 2 APIs: * `EXECUTORCH_LIBRARY` to replace the need of a yaml file. With this macro we can directly register a custom kernel into ExecuTorch runtime. * `WRAP_TO_ATEN` allows custom op authors to use the same kernel for ExecuTorch and PyTorch. This can be helpful during debugging. Pull Request resolved: #2840 Test Plan: Rely on the new CI job `test_llama` with `xnnpack+kv+custom` option. Reviewed By: kimishpatel Differential Revision: D55713944 Pulled By: larryliu0820
This pull request was exported from Phabricator. Differential Revision: D55713944 |
4e60506
to
f349c9e
Compare
This pull request was exported from Phabricator. Differential Revision: D55713944 |
Summary: Using the following 2 APIs: * `EXECUTORCH_LIBRARY` to replace the need of a yaml file. With this macro we can directly register a custom kernel into ExecuTorch runtime. * `WRAP_TO_ATEN` allows custom op authors to use the same kernel for ExecuTorch and PyTorch. This can be helpful during debugging. Pull Request resolved: #2840 Test Plan: Rely on the new CI job `test_llama` with `xnnpack+kv+custom` option. Reviewed By: kimishpatel Differential Revision: D55713944 Pulled By: larryliu0820
f349c9e
to
bc9b8d0
Compare
Summary: Using the following 2 APIs: * `EXECUTORCH_LIBRARY` to replace the need of a yaml file. With this macro we can directly register a custom kernel into ExecuTorch runtime. * `WRAP_TO_ATEN` allows custom op authors to use the same kernel for ExecuTorch and PyTorch. This can be helpful during debugging. Test Plan: Rely on the new CI job `test_llama` with `xnnpack+kv+custom` option. Reviewed By: kimishpatel Differential Revision: D55713944 Pulled By: larryliu0820
bc9b8d0
to
e4cc1e5
Compare
This pull request was exported from Phabricator. Differential Revision: D55713944 |
Summary: Using the following 2 APIs: * `EXECUTORCH_LIBRARY` to replace the need of a yaml file. With this macro we can directly register a custom kernel into ExecuTorch runtime. * `WRAP_TO_ATEN` allows custom op authors to use the same kernel for ExecuTorch and PyTorch. This can be helpful during debugging. Test Plan: Rely on the new CI job `test_llama` with `xnnpack+kv+custom` option. Reviewed By: kimishpatel Differential Revision: D55713944 Pulled By: larryliu0820
e4cc1e5
to
d1612c8
Compare
This pull request was exported from Phabricator. Differential Revision: D55713944 |
@larryliu0820 merged this pull request in 020d8be. |
This reverts commit 020d8be.
Summary: Using the following 2 APIs:
EXECUTORCH_LIBRARY
to replace the need of a yaml file. With this macro we can directly register a custom kernel into ExecuTorch runtime.WRAP_TO_ATEN
allows custom op authors to use the same kernel for ExecuTorch and PyTorch. This can be helpful during debugging.Test Plan: Rely on the new CI job
test_llama
withxnnpack+kv+custom
option.Reviewers:
Subscribers:
Tasks:
Tags: