-
Notifications
You must be signed in to change notification settings - Fork 607
Simple sdpa #3165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Simple sdpa #3165
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Summary: Pull Request resolved: pytorch#2919 The note directive on sphinx doesn't render well on markdown. Remove it to avoid cause confusion. Reviewed By: mergennachin, cccclai Differential Revision: D55881939 fbshipit-source-id: a4252f0b70593ecd97e5cc352c601e772a9c222a (cherry picked from commit dc7e4d5) Co-authored-by: Hansong Zhang <[email protected]>
Summary: Pull Request resolved: pytorch#2899 Reviewed By: mergennachin Differential Revision: D55829514 Pulled By: kirklandsign fbshipit-source-id: 3e5d222b969c7b13fc8902dbda738edb3cb898dc (cherry picked from commit 3e256ff) Co-authored-by: Hansong Zhang <[email protected]>
…ytorch#2911) Summary: Update the LLM getting started guide for uniform tone and tense. Informally following the Google developer documentation style guide: https://developers.google.com/style. Also, resolve a number of outstanding issues with incorrect or misleading documentation and steps. For reference, here are links to the current and proposed LLM guide: https://docs-preview.pytorch.org/pytorch/executorch/2911/llm/getting-started.html (proposed) https://pytorch.org/executorch/main/llm/getting-started.html (live) Pull Request resolved: pytorch#2911 Reviewed By: Gasoonjia, byjlw Differential Revision: D55867181 Pulled By: GregoryComer fbshipit-source-id: 5e865eaa4a0ae52845963b15c221a3d272431448 (cherry picked from commit 01bac3d)
Summary: Pull Request resolved: pytorch#2921 overriding_review_checks_triggers_an_audit_and_retroactive_review Oncall Short Name: executorch Differential Revision: D55885790 fbshipit-source-id: bb62a42b74ecdfb2e1f6bcebab979e2e8fcf0a3c (cherry picked from commit 9ba8bc9)
Summary: Pull Request resolved: pytorch#2926 Fixing issues we've seen in pytorch#2907 and pytorch#2805 bypass-github-export-checks bypass-github-pytorch-ci-checks bypass-github-executorch-ci-checks Reviewed By: iseeyuan, cccclai Differential Revision: D55893925 fbshipit-source-id: c6e0264d868cb487faf02f95ff1bd223cbcc97ac (cherry picked from commit 6db9d72)
Summary: Pull Request resolved: pytorch#2927 ATT Created from CodeHub with https://fburl.com/edit-in-codehub Reviewed By: mergennachin Differential Revision: D55895703 fbshipit-source-id: 5466b44224b8ebf7b88d846354683da0c1f6a801 (cherry picked from commit ce447dc)
Summary: Pull Request resolved: pytorch#2932 overriding_review_checks_triggers_an_audit_and_retroactive_review Oncall Short Name: executorch Differential Revision: D55904722 fbshipit-source-id: 6057bc75f812e5ae9dd057bbed7291a539d80ff6 (cherry picked from commit 8cabeac)
Summary: Pull Request resolved: pytorch#2876 Fixing the tag constant for mutable buffer. The buffer shouldn't be tagged if it's going to be mutated by the delegated. It's more common in hardware backends Will follow up and test having delegate consume mutation Reviewed By: mcr229, angelayi Differential Revision: D55812844 fbshipit-source-id: e0be4c2dc295141d673cccb1aeecee45894b1e70 (cherry picked from commit 599cfde)
…torch#2959) Summary: Minor updates to the prerequisite section of the LLM getting started guide. Passing -s to pyenv install prevents a prompt if python 3.10 is already installed (it will just silently continue in this case when the flag is passed). Additionally, under pyenv, we should be using python, not python3. I also added a little bit of wording on env management. Pull Request resolved: pytorch#2940 Test Plan: Ran LLM guide prerequisite section on an m1 mac with pyenv-virtualenv. Reviewed By: byjlw Differential Revision: D55913382 Pulled By: GregoryComer fbshipit-source-id: 7f04262b025db83b8621c972c90d3cdc3f029377 (cherry picked from commit 218f643) Co-authored-by: Gregory Comer <[email protected]>
Summary: Version hash reported by https://github.com/facebook/buck2/releases/download/2024-02-15/buck2-x86_64-apple-darwin.zst Pull Request resolved: pytorch#2868 Reviewed By: Olivia-liu Differential Revision: D55914146 Pulled By: GregoryComer fbshipit-source-id: b9882900acfd4cb6f74eda90a7c99bdb119ec122 (cherry picked from commit de7fdaa)
…torch#2952) (pytorch#2971) Summary: Pull Request resolved: pytorch#2952 * Some auto-formatting by my VSCode (remove extra spaces) * Remove imports that have been imported in previous part of the doc * Other minor changes to keep consistency across the doc * Link a screenshot instead of using the raw table because the original table is illegible: {F1482781056} Reviewed By: GregoryComer Differential Revision: D55938344 fbshipit-source-id: 699abb9ebe1196ab73d90a3d08d60be7aa0d8688 (cherry picked from commit e733f2d) Co-authored-by: Olivia Liu <[email protected]>
Summary: Pull Request resolved: pytorch#2992 We should promote the llama2 page more in https://github.com/pytorch/executorch/tree/main/examples/ bypass-github-export-checks bypass-github-pytorch-ci-checks bypass-github-executorch-ci-checks Reviewed By: iseeyuan Differential Revision: D56018978 fbshipit-source-id: cbbc7bd2ea4ce55e564bd6b4a2900f623599dde6 (cherry picked from commit e641ffc) Co-authored-by: Mergen Nachin <[email protected]>
…LLM (pytorch#2977) (pytorch#2997) Summary: Pull Request resolved: pytorch#2977 As titled Reviewed By: Gasoonjia Differential Revision: D55992093 fbshipit-source-id: 7864c330bd86af5d4127cacfd47e96f1e6666bfb (cherry picked from commit cb9caa3) Co-authored-by: Olivia Liu <[email protected]>
These pip dependencies need to be present to build the pip wheel. Also, change the version to a stub that looks less like a real version, until we can hook up the logic to get the version from the git repo state.
Manually install build requirements because `python setup.py bdist_wheel` does not install them.
setup.py is sometimes run as root in docker containers. buck2 doesn't allow running as root unless $HOME is owned by root or does not exist. So temporarily undefine it while configuring cmake, which runs buck2 to get some source lists. Also, the buck2 daemon can sometimes get stuck on the CI workers. Try killing it before starting the build, ignoring any failures.
Some CI jobs can fail with "OS file watch limit reached" when running buck2. This section should reduce the number of files that it tries to watch.
Change the build-wheels workflow to only fetch the first layer of submodules. ExecuTorch only needs the first layer of submodules to build its pip package, but the `build_wheels_*.yaml` workflows will recursively fetch all submodules by default. Fetching all submodules can also cause `buck2` to fail because it will try to watch too many files. This change makes `buck2` work on the CI runners, speeds up the jobs, and reduces disk/network usage.
Always build the pybindings when building the pip wheel. Always link in XNNPACK. On macos, also link in MPS. Core ML can't build on the worker machine, though, because the version of macOS is too old; Core ML requires some features introduced in macOS 10.15.
Passing the `std::` functions directory to unary_ufunc_realhb_to_bool can cause "error: cannot resolve overloaded function ‘isinf’ based on conversion to type ‘torch::executor::FunctionRef<bool(double)>’" in some compilation environments. Might be because these functions can be templatized, or because they became constexpr in C++23.
…ytorch#3052) Summary: This is a no-op Pull Request resolved: pytorch#3005 Test Plan: CI Run with `python -m examples.models.llama2.export_llama -c stories110M.pt -p params.json -kv --use_sdpa_with_kv_cache -X` and with `python -m examples.models.llama2.export_llama -c stories110M.pt -p params.json -kv -X` Make sure both work Reviewed By: cccclai Differential Revision: D56048177 Pulled By: mergennachin fbshipit-source-id: 3ac9ac5c34f6fe215de1cfe8b5ddc7aae3635359 (cherry picked from commit 488afc5) Co-authored-by: Mergen Nachin <[email protected]>
* add more instructions and examples on Delegation (pytorch#2973) Summary: Pull Request resolved: pytorch#2973 as title. Reviewed By: vmpuri, byjlw Differential Revision: D55988177 fbshipit-source-id: 8cdc953118ecd22e8e9a809f0dd716a30a7fc117 (cherry picked from commit 17c64a3) * replace Executorch with ExecuTorch to fix lint error --------- Co-authored-by: Songhao Jia <[email protected]>
…ytorch#3061) Summary: Pull Request resolved: pytorch#3007 Keep llama_transformer.py to look like stock implementation, so that it can be reused everywhere. Do module swap Reviewed By: cccclai Differential Revision: D56048640 fbshipit-source-id: 76de1b09b7f5d79422bb3b32bc830a9a7ecd935c (cherry picked from commit 74eb8b3) Co-authored-by: Mergen Nachin <[email protected]>
* Add executorch_no_prim_ops target (pytorch#2934) Summary: Pull Request resolved: pytorch#2934 Currently `libexecutorch.a` always contain prim ops. This becomes a problem when a binary contains 2 "versions" of `libexecutorch.a`, causing a double registration of the prim ops. For example, `libA.so` depends on `libexecutorch.a` and a binary `B` depends on both `libA.so` and `libexecutorch.a`. Since both `libexecutorch.a` and `libA.so` contains prim ops, they will be registered twice. In this PR I created another library `executorch_no_prim_ops` for `libA.so` to depend on. Reviewed By: cccclai, kirklandsign Differential Revision: D55907752 fbshipit-source-id: 755a9b8d5f6f7cf44d011b83bfdc18be6da1aa05 (cherry picked from commit d309e9d) * Fix failing CI jobs caused by pytorch#2934 (pytorch#2961) Summary: Pull Request resolved: pytorch#2961 Fix these 3 CI job failures caused by pytorch#2934 (D55907752): * Apple / build-frameworks-ios / macos-job * trunk / test-arm-backend-delegation / linux-job * trunk / test-coreml-delegate / macos-job Reviewed By: kirklandsign Differential Revision: D55950023 fbshipit-source-id: 6166d9112e6d971d042df1400442395d8044c3b3 (cherry picked from commit d993797) * [NOT-CLEAN-CP] Fix 3 CI jobs (pytorch#3006) Summary: * [NOT APPLICABLE IN RELEASE] Apple / build-frameworks-ios / macos-job We removed libcustom_ops_lib.a in pytorch#2916 so need to remove it from `build_apple_frameworks.sh`. * [NOT APPLICABLE IN RELEASE] Lint / lintrunner / linux-job Remove extra line in backends/qualcomm/quantizer/utils.py * pull / unittest / macos (buck2) / macos-job Fix it by using `executorch_no_prim_ops` instead of `executorch` in MPS and CoreML. Pull Request resolved: pytorch#3006 Reviewed By: lucylq Differential Revision: D56048430 Pulled By: larryliu0820 fbshipit-source-id: 9dcb476eea446ea3aba566d595167c691fb00eec (cherry picked from commit 5b7c4ba) --------- Co-authored-by: Mengwei Liu <[email protected]> Co-authored-by: Mengwei Liu <[email protected]>
…torch#3026) Summary: We have refactors recently and need to update the tutorial and cmake. See pytorch#2955 for isseues. Pull Request resolved: pytorch#2956 Reviewed By: mcr229, cccclai Differential Revision: D55947725 Pulled By: kirklandsign fbshipit-source-id: f23af28b9a8fe071223d8ffa922a6cd4e49efe61 (cherry picked from commit c7fd394)
…orch#3027) Summary: * Update tutorial due to recent changes. * Clean up setup.sh for app helper lib build. Pull Request resolved: pytorch#2962 Reviewed By: cccclai Differential Revision: D55951189 Pulled By: kirklandsign fbshipit-source-id: 2c95e8580145b039f503e7cd99a4003867f8dbb0 (cherry picked from commit 26365f1)
* Skip annotate boolean input (pytorch#2957) Summary: Pull Request resolved: pytorch#2957 ghstack-source-id: 222200589 exported-using-ghexport It only makes sense to quantize fp tensor, but not boolean. Add a check to make sure only fp tensor are annotated in quantizer Reviewed By: jerryzh168 Differential Revision: D55946526 fbshipit-source-id: d94bfee38ab2d29fc9672ab631b4d5d0c5239d25 * fix lint
Summary: Pull Request resolved: pytorch#3045 Reviewed By: clee2000 Differential Revision: D56201946 Pulled By: svekars fbshipit-source-id: 4212c24b02a1229ff06137b0d437b4e8c5dd454e (cherry picked from commit c73bfc0) Co-authored-by: Svetlana Karslioglu <[email protected]>
Summary: Move noindex logic to the build job Pull Request resolved: pytorch#3071 Reviewed By: clee2000 Differential Revision: D56218857 Pulled By: svekars fbshipit-source-id: 69dff489d98eee046d69185a6c03d62fbae37a16 (cherry picked from commit 5d7949d) Co-authored-by: Svetlana Karslioglu <[email protected]>
…3114) Summary: Pull Request resolved: pytorch#3036 sdpa (https://pytorch.org/docs/stable/generated/torch.nn.functional.scaled_dot_product_attention.html) input is taking attention mask as input, refactor the sdpa module input closer to the sdpa input ghstack-source-id: 222650466 exported-using-ghexport Reviewed By: mergennachin Differential Revision: D56119739 fbshipit-source-id: d9adda66e540abc518b7ffb6a5ebd2aab1626b3b (cherry picked from commit b341223)
Summary: Pull Request resolved: pytorch#3113 imported-using-ghimport Test Plan: Imported from OSS Reviewed By: cccclai Differential Revision: D56279743 Pulled By: SS-JIA fbshipit-source-id: af55cdf2d8518c582b7d8deccb731c6bc442a1c9 (cherry picked from commit 414cd05) Co-authored-by: Sicheng Jia <[email protected]>
* Pin Xcode projects to release/0.2 branch * Update the version for the iOS frameworks upload workflow
…ch#2975) (pytorch#3157) Summary: It was a workaround to skip `aten.index_put` op in Core ML delegation, at the cost of partitioning the Llama model into 13 pieces. For better performance, we prefer to delegate the whole model to Core ML. Since Core ML has added the [necessary support](apple/coremltools#2190), it is time to revert this workaround Pull Request resolved: pytorch#2975 Reviewed By: kirklandsign Differential Revision: D56002979 Pulled By: cccclai fbshipit-source-id: e7a7c8c43706cb57eba3e6f720b3d713bec5065b (cherry picked from commit 7d4bafc) Co-authored-by: yifan_shen3 <[email protected]>
Summary: Pull Request resolved: pytorch#3037 Add a simple sdpa so it's decomposed to simpler ops instead of the decompose F.scaled_dot_product_attention, which includes 29 ops including `torch.where` ``` def forward(self, q, k, v): aten_mul_scalar = executorch_exir_dialects_edge__ops_aten_mul_Scalar(q, 0.5946035575013605); q = None aten_full_default = executorch_exir_dialects_edge__ops_aten_full_default([8, 8], True, dtype = torch.bool, layout = torch.strided, device = device(type='cpu'), pin_memory = False) aten_arange_start_step = executorch_exir_dialects_edge__ops_aten_arange_start_step(0, 8, layout = torch.strided, device = device(type='cpu'), pin_memory = False) aten_unsqueeze_copy_default = executorch_exir_dialects_edge__ops_aten_unsqueeze_copy_default(aten_arange_start_step, -2); aten_arange_start_step = None aten_arange_start_step_1 = executorch_exir_dialects_edge__ops_aten_arange_start_step(0, 8, layout = torch.strided, device = device(type='cpu'), pin_memory = False) aten_unsqueeze_copy_default_1 = executorch_exir_dialects_edge__ops_aten_unsqueeze_copy_default(aten_arange_start_step_1, -1); aten_arange_start_step_1 = None aten_sub_tensor = executorch_exir_dialects_edge__ops_aten_sub_Tensor(aten_unsqueeze_copy_default, aten_unsqueeze_copy_default_1); aten_unsqueeze_copy_default = aten_unsqueeze_copy_default_1 = None aten_le_scalar = executorch_exir_dialects_edge__ops_aten_le_Scalar(aten_sub_tensor, 0); aten_sub_tensor = None aten_logical_and_default = executorch_exir_dialects_edge__ops_aten_logical_and_default(aten_le_scalar, aten_full_default); aten_le_scalar = aten_full_default = None aten_full_like_default = executorch_exir_dialects_edge__ops_aten_full_like_default(aten_logical_and_default, 0, dtype = torch.float32, pin_memory = False, memory_format = torch.preserve_format) aten_logical_not_default = executorch_exir_dialects_edge__ops_aten_logical_not_default(aten_logical_and_default); aten_logical_and_default = None aten_scalar_tensor_default = executorch_exir_dialects_edge__ops_aten_scalar_tensor_default(-inf, dtype = torch.float32, layout = torch.strided, device = device(type='cpu')) aten_where_self = executorch_exir_dialects_edge__ops_aten_where_self(aten_logical_not_default, aten_scalar_tensor_default, aten_full_like_default); aten_logical_not_default = aten_scalar_tensor_default = aten_full_like_default = None aten_permute_copy_default = executorch_exir_dialects_edge__ops_aten_permute_copy_default(k, [0, 1, 3, 2]); k = None aten_mul_scalar_1 = executorch_exir_dialects_edge__ops_aten_mul_Scalar(aten_permute_copy_default, 0.5946035575013605); aten_permute_copy_default = None aten_expand_copy_default = executorch_exir_dialects_edge__ops_aten_expand_copy_default(aten_mul_scalar, [1, 1, 8, 8]); aten_mul_scalar = None aten_view_copy_default = executorch_exir_dialects_edge__ops_aten_view_copy_default(aten_expand_copy_default, [1, 8, 8]); aten_expand_copy_default = None aten_expand_copy_default_1 = executorch_exir_dialects_edge__ops_aten_expand_copy_default(aten_mul_scalar_1, [1, 1, 8, 8]); aten_mul_scalar_1 = None aten_view_copy_default_1 = executorch_exir_dialects_edge__ops_aten_view_copy_default(aten_expand_copy_default_1, [1, 8, 8]); aten_expand_copy_default_1 = None aten_bmm_default = executorch_exir_dialects_edge__ops_aten_bmm_default(aten_view_copy_default, aten_view_copy_default_1); aten_view_copy_default = aten_view_copy_default_1 = None aten_view_copy_default_2 = executorch_exir_dialects_edge__ops_aten_view_copy_default(aten_bmm_default, [1, 1, 8, 8]); aten_bmm_default = None aten_add_tensor = executorch_exir_dialects_edge__ops_aten_add_Tensor(aten_view_copy_default_2, aten_where_self); aten_view_copy_default_2 = aten_where_self = None aten__softmax_default = executorch_exir_dialects_edge__ops_aten__softmax_default(aten_add_tensor, -1, False); aten_add_tensor = None aten_expand_copy_default_2 = executorch_exir_dialects_edge__ops_aten_expand_copy_default(aten__softmax_default, [1, 1, 8, 8]); aten__softmax_default = None aten_view_copy_default_3 = executorch_exir_dialects_edge__ops_aten_view_copy_default(aten_expand_copy_default_2, [1, 8, 8]); aten_expand_copy_default_2 = None aten_expand_copy_default_3 = executorch_exir_dialects_edge__ops_aten_expand_copy_default(v, [1, 1, 8, 8]); v = None aten_view_copy_default_4 = executorch_exir_dialects_edge__ops_aten_view_copy_default(aten_expand_copy_default_3, [1, 8, 8]); aten_expand_copy_default_3 = None aten_bmm_default_1 = executorch_exir_dialects_edge__ops_aten_bmm_default(aten_view_copy_default_3, aten_view_copy_default_4); aten_view_copy_default_3 = aten_view_copy_default_4 = None aten_view_copy_default_5 = executorch_exir_dialects_edge__ops_aten_view_copy_default(aten_bmm_default_1, [1, 1, 8, 8]); aten_bmm_default_1 = None return (aten_view_copy_default_5,) ``` After applying the diff, we remove the following ops ``` %aten_full_like_default : [num_users=1] = call_function[target=executorch.exir.dialects.edge._ops.aten.full_like.default](args = (%aten_index_tensor_2, 0), kwargs = {dtype: torch.float32, pin_memory: False, memory_format: torch.preserve_format}) %aten_logical_not_default : [num_users=1] = call_function[target=executorch.exir.dialects.edge._ops.aten.logical_not.default](args = (%aten_index_tensor_2,), kwargs = {}) %aten_scalar_tensor_default : [num_users=1] = call_function[target=executorch.exir.dialects.edge._ops.aten.scalar_tensor.default](args = (-inf,), kwargs = {dtype: torch.float32, layout: torch.strided, device: cpu}) %aten_where_self : [num_users=1] = call_function[target=executorch.exir.dialects.edge._ops.aten.where.self](args = (%aten_logical_not_default, %aten_scalar_tensor_default, %aten_full_like_default), kwargs = {}) %aten_mul_scalar : [num_users=1] = call_function[target=executorch.exir.dialects.edge._ops.aten.mul.Scalar](args = (%aten_permute_copy_default_3, 0.5946035575013605), kwargs = {}) ... %aten_mul_scalar_1 : [num_users=1] = call_function[target=executorch.exir.dialects.edge._ops.aten.mul.Scalar](args = (%aten_permute_copy_default_6, 0.5946035575013605), kwargs = {}) ``` but introduce an add %aten_add_tensor_3 : [num_users=1] = call_function[target=executorch.exir.dialects.edge._ops.aten.add.Tensor](args = (%aten_mul_tensor_11, %aten_index_tensor_2), kwargs = {}) ``` ghstack-source-id: 223152096 exported-using-ghexport Reviewed By: mergennachin, kimishpatel Differential Revision: D56119737 fbshipit-source-id: ec8e875f0a4c4ec67b7493e4872c9a5b081e6de7 (cherry picked from commit cf78107)
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/3165
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New FailureAs of commit dc0b5bd with merge base d3326a2 ( NEW FAILURE - The following job has failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
CLA Signed
This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.