Skip to content

Simple sdpa #3165

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 38 commits into from
Closed

Simple sdpa #3165

wants to merge 38 commits into from

Conversation

cccclai
Copy link
Contributor

@cccclai cccclai commented Apr 19, 2024

No description provided.

guangy10 and others added 30 commits April 8, 2024 12:08
Summary:
Pull Request resolved: pytorch#2919

The note directive on sphinx doesn't render well on markdown. Remove it to avoid cause confusion.

Reviewed By: mergennachin, cccclai

Differential Revision: D55881939

fbshipit-source-id: a4252f0b70593ecd97e5cc352c601e772a9c222a
(cherry picked from commit dc7e4d5)

Co-authored-by: Hansong Zhang <[email protected]>
Summary: Pull Request resolved: pytorch#2899

Reviewed By: mergennachin

Differential Revision: D55829514

Pulled By: kirklandsign

fbshipit-source-id: 3e5d222b969c7b13fc8902dbda738edb3cb898dc
(cherry picked from commit 3e256ff)

Co-authored-by: Hansong Zhang <[email protected]>
…ytorch#2911)

Summary:
Update the LLM getting started guide for uniform tone and tense. Informally following the Google developer documentation style guide: https://developers.google.com/style. Also, resolve a number of outstanding issues with incorrect or misleading documentation and steps.

For reference, here are links to the current and proposed LLM guide:
https://docs-preview.pytorch.org/pytorch/executorch/2911/llm/getting-started.html (proposed)
https://pytorch.org/executorch/main/llm/getting-started.html (live)

Pull Request resolved: pytorch#2911

Reviewed By: Gasoonjia, byjlw

Differential Revision: D55867181

Pulled By: GregoryComer

fbshipit-source-id: 5e865eaa4a0ae52845963b15c221a3d272431448
(cherry picked from commit 01bac3d)
Summary:
Pull Request resolved: pytorch#2921
overriding_review_checks_triggers_an_audit_and_retroactive_review
Oncall Short Name: executorch

Differential Revision: D55885790

fbshipit-source-id: bb62a42b74ecdfb2e1f6bcebab979e2e8fcf0a3c
(cherry picked from commit 9ba8bc9)
Summary:
Pull Request resolved: pytorch#2926

Fixing issues we've seen in pytorch#2907 and pytorch#2805

bypass-github-export-checks
bypass-github-pytorch-ci-checks
bypass-github-executorch-ci-checks

Reviewed By: iseeyuan, cccclai

Differential Revision: D55893925

fbshipit-source-id: c6e0264d868cb487faf02f95ff1bd223cbcc97ac
(cherry picked from commit 6db9d72)
Summary:
Pull Request resolved: pytorch#2927

ATT

Created from CodeHub with https://fburl.com/edit-in-codehub

Reviewed By: mergennachin

Differential Revision: D55895703

fbshipit-source-id: 5466b44224b8ebf7b88d846354683da0c1f6a801
(cherry picked from commit ce447dc)
Summary:
Pull Request resolved: pytorch#2932
overriding_review_checks_triggers_an_audit_and_retroactive_review
Oncall Short Name: executorch

Differential Revision: D55904722

fbshipit-source-id: 6057bc75f812e5ae9dd057bbed7291a539d80ff6
(cherry picked from commit 8cabeac)
Summary:
Pull Request resolved: pytorch#2876

Fixing the tag constant for mutable buffer. The buffer shouldn't be tagged if it's going to be mutated by the delegated. It's more common in hardware backends

Will follow up and test having delegate consume mutation

Reviewed By: mcr229, angelayi

Differential Revision: D55812844

fbshipit-source-id: e0be4c2dc295141d673cccb1aeecee45894b1e70
(cherry picked from commit 599cfde)
…torch#2959)

Summary:
Minor updates to the prerequisite section of the LLM getting started guide. Passing -s to pyenv install prevents a prompt if python 3.10 is already installed (it will just silently continue in this case when the flag is passed). Additionally, under pyenv, we should be using python, not python3. I also added a little bit of wording on env management.

Pull Request resolved: pytorch#2940

Test Plan: Ran LLM guide prerequisite section on an m1 mac with pyenv-virtualenv.

Reviewed By: byjlw

Differential Revision: D55913382

Pulled By: GregoryComer

fbshipit-source-id: 7f04262b025db83b8621c972c90d3cdc3f029377
(cherry picked from commit 218f643)

Co-authored-by: Gregory Comer <[email protected]>
Summary:
Version hash reported by
https://github.com/facebook/buck2/releases/download/2024-02-15/buck2-x86_64-apple-darwin.zst

Pull Request resolved: pytorch#2868

Reviewed By: Olivia-liu

Differential Revision: D55914146

Pulled By: GregoryComer

fbshipit-source-id: b9882900acfd4cb6f74eda90a7c99bdb119ec122
(cherry picked from commit de7fdaa)
…torch#2952) (pytorch#2971)

Summary:
Pull Request resolved: pytorch#2952

* Some auto-formatting by my VSCode (remove extra spaces)
* Remove imports that have been imported in previous part of the doc
* Other minor changes to keep consistency across the doc
* Link a screenshot instead of using the raw table because the original table is illegible:
 {F1482781056}

Reviewed By: GregoryComer

Differential Revision: D55938344

fbshipit-source-id: 699abb9ebe1196ab73d90a3d08d60be7aa0d8688
(cherry picked from commit e733f2d)

Co-authored-by: Olivia Liu <[email protected]>
Summary:
Pull Request resolved: pytorch#2992

We should promote the llama2 page more in https://github.com/pytorch/executorch/tree/main/examples/

bypass-github-export-checks
bypass-github-pytorch-ci-checks
bypass-github-executorch-ci-checks

Reviewed By: iseeyuan

Differential Revision: D56018978

fbshipit-source-id: cbbc7bd2ea4ce55e564bd6b4a2900f623599dde6
(cherry picked from commit e641ffc)

Co-authored-by: Mergen Nachin <[email protected]>
…LLM (pytorch#2977) (pytorch#2997)

Summary:
Pull Request resolved: pytorch#2977

As titled

Reviewed By: Gasoonjia

Differential Revision: D55992093

fbshipit-source-id: 7864c330bd86af5d4127cacfd47e96f1e6666bfb
(cherry picked from commit cb9caa3)

Co-authored-by: Olivia Liu <[email protected]>
These pip dependencies need to be present to build the pip wheel.

Also, change the version to a stub that looks less like a real version,
until we can hook up the logic to get the version from the git repo
state.
Manually install build requirements because `python setup.py
bdist_wheel` does not install them.
setup.py is sometimes run as root in docker containers. buck2 doesn't
allow running as root unless $HOME is owned by root or does not exist.
So temporarily undefine it while configuring cmake, which runs buck2 to
get some source lists.

Also, the buck2 daemon can sometimes get stuck on the CI workers. Try
killing it before starting the build, ignoring any failures.
Some CI jobs can fail with "OS file watch limit reached" when running
buck2. This section should reduce the number of files that it tries to
watch.
Change the build-wheels workflow to only fetch the first layer of
submodules. ExecuTorch only needs the first layer of submodules to
build its pip package, but the `build_wheels_*.yaml` workflows will
recursively fetch all submodules by default.

Fetching all submodules can also cause `buck2` to fail because it will
try to watch too many files.

This change makes `buck2` work on the CI runners, speeds up the jobs,
and reduces disk/network usage.
Always build the pybindings when building the pip wheel.

Always link in XNNPACK.

On macos, also link in MPS. Core ML can't build on the worker machine,
though, because the version of macOS is too old; Core ML requires some
features introduced in macOS 10.15.
Passing the `std::` functions directory to unary_ufunc_realhb_to_bool
can cause "error: cannot resolve overloaded function ‘isinf’ based
on conversion to type ‘torch::executor::FunctionRef<bool(double)>’"
in some compilation environments.

Might be because these functions can be templatized, or because they
became constexpr in C++23.
…ytorch#3052)

Summary:
This is a no-op

Pull Request resolved: pytorch#3005

Test Plan:
CI

Run with

`python -m examples.models.llama2.export_llama -c stories110M.pt -p params.json -kv --use_sdpa_with_kv_cache -X`

and with

`python -m examples.models.llama2.export_llama -c stories110M.pt -p params.json -kv -X`

Make sure both work

Reviewed By: cccclai

Differential Revision: D56048177

Pulled By: mergennachin

fbshipit-source-id: 3ac9ac5c34f6fe215de1cfe8b5ddc7aae3635359
(cherry picked from commit 488afc5)

Co-authored-by: Mergen Nachin <[email protected]>
* add more instructions and examples on Delegation (pytorch#2973)

Summary:
Pull Request resolved: pytorch#2973

as title.

Reviewed By: vmpuri, byjlw

Differential Revision: D55988177

fbshipit-source-id: 8cdc953118ecd22e8e9a809f0dd716a30a7fc117
(cherry picked from commit 17c64a3)

* replace Executorch with ExecuTorch to fix lint error

---------

Co-authored-by: Songhao Jia <[email protected]>
…ytorch#3061)

Summary:
Pull Request resolved: pytorch#3007

Keep llama_transformer.py to look like stock implementation, so that it can be reused everywhere.

Do module swap

Reviewed By: cccclai

Differential Revision: D56048640

fbshipit-source-id: 76de1b09b7f5d79422bb3b32bc830a9a7ecd935c
(cherry picked from commit 74eb8b3)

Co-authored-by: Mergen Nachin <[email protected]>
* Add executorch_no_prim_ops target (pytorch#2934)

Summary:
Pull Request resolved: pytorch#2934

Currently `libexecutorch.a` always contain prim ops. This becomes a problem when a binary contains 2 "versions" of `libexecutorch.a`, causing a double registration of the prim ops.

For example, `libA.so` depends on `libexecutorch.a` and a binary `B` depends on both `libA.so` and `libexecutorch.a`. Since both `libexecutorch.a` and `libA.so` contains prim ops, they will be registered twice.

In this PR I created another library `executorch_no_prim_ops` for `libA.so` to depend on.

Reviewed By: cccclai, kirklandsign

Differential Revision: D55907752

fbshipit-source-id: 755a9b8d5f6f7cf44d011b83bfdc18be6da1aa05
(cherry picked from commit d309e9d)

* Fix failing CI jobs caused by pytorch#2934 (pytorch#2961)

Summary:
Pull Request resolved: pytorch#2961

Fix these 3 CI job failures caused by pytorch#2934 (D55907752):

* Apple / build-frameworks-ios / macos-job
* trunk / test-arm-backend-delegation / linux-job
* trunk / test-coreml-delegate / macos-job

Reviewed By: kirklandsign

Differential Revision: D55950023

fbshipit-source-id: 6166d9112e6d971d042df1400442395d8044c3b3
(cherry picked from commit d993797)

* [NOT-CLEAN-CP] Fix 3 CI jobs (pytorch#3006)

Summary:
* [NOT APPLICABLE IN RELEASE] Apple / build-frameworks-ios / macos-job

We removed libcustom_ops_lib.a in pytorch#2916 so need to remove it from `build_apple_frameworks.sh`.

* [NOT APPLICABLE IN RELEASE] Lint / lintrunner / linux-job

Remove extra line in backends/qualcomm/quantizer/utils.py

* pull / unittest / macos (buck2) / macos-job

Fix it by using `executorch_no_prim_ops` instead of `executorch` in MPS and CoreML.

Pull Request resolved: pytorch#3006

Reviewed By: lucylq

Differential Revision: D56048430

Pulled By: larryliu0820

fbshipit-source-id: 9dcb476eea446ea3aba566d595167c691fb00eec
(cherry picked from commit 5b7c4ba)

---------

Co-authored-by: Mengwei Liu <[email protected]>
Co-authored-by: Mengwei Liu <[email protected]>
…torch#3026)

Summary:
We have refactors recently and need to update the tutorial and cmake.

See pytorch#2955 for isseues.

Pull Request resolved: pytorch#2956

Reviewed By: mcr229, cccclai

Differential Revision: D55947725

Pulled By: kirklandsign

fbshipit-source-id: f23af28b9a8fe071223d8ffa922a6cd4e49efe61
(cherry picked from commit c7fd394)
…orch#3027)

Summary:
* Update tutorial due to recent changes.
* Clean up setup.sh for app helper lib build.

Pull Request resolved: pytorch#2962

Reviewed By: cccclai

Differential Revision: D55951189

Pulled By: kirklandsign

fbshipit-source-id: 2c95e8580145b039f503e7cd99a4003867f8dbb0
(cherry picked from commit 26365f1)
cccclai and others added 8 commits April 17, 2024 10:36
* Skip annotate boolean input (pytorch#2957)

Summary:
Pull Request resolved: pytorch#2957

ghstack-source-id: 222200589
exported-using-ghexport

It only makes sense to quantize fp tensor, but not boolean. Add a check to make sure only fp tensor are annotated in quantizer

Reviewed By: jerryzh168

Differential Revision: D55946526

fbshipit-source-id: d94bfee38ab2d29fc9672ab631b4d5d0c5239d25

* fix lint
Summary: Pull Request resolved: pytorch#3045

Reviewed By: clee2000

Differential Revision: D56201946

Pulled By: svekars

fbshipit-source-id: 4212c24b02a1229ff06137b0d437b4e8c5dd454e
(cherry picked from commit c73bfc0)

Co-authored-by: Svetlana Karslioglu <[email protected]>
Summary:
Move noindex logic to the build job

Pull Request resolved: pytorch#3071

Reviewed By: clee2000

Differential Revision: D56218857

Pulled By: svekars

fbshipit-source-id: 69dff489d98eee046d69185a6c03d62fbae37a16
(cherry picked from commit 5d7949d)

Co-authored-by: Svetlana Karslioglu <[email protected]>
…3114)

Summary:
Pull Request resolved: pytorch#3036

sdpa (https://pytorch.org/docs/stable/generated/torch.nn.functional.scaled_dot_product_attention.html) input is taking attention mask as input, refactor the sdpa module input closer to the sdpa input
ghstack-source-id: 222650466
exported-using-ghexport

Reviewed By: mergennachin

Differential Revision: D56119739

fbshipit-source-id: d9adda66e540abc518b7ffb6a5ebd2aab1626b3b
(cherry picked from commit b341223)
Summary:
Pull Request resolved: pytorch#3113

imported-using-ghimport

Test Plan: Imported from OSS

Reviewed By: cccclai

Differential Revision: D56279743

Pulled By: SS-JIA

fbshipit-source-id: af55cdf2d8518c582b7d8deccb731c6bc442a1c9
(cherry picked from commit 414cd05)

Co-authored-by: Sicheng Jia <[email protected]>
* Pin Xcode projects to release/0.2 branch

* Update the version for the iOS frameworks upload workflow
…ch#2975) (pytorch#3157)

Summary:
It was a workaround to skip `aten.index_put` op in Core ML delegation, at the cost of partitioning the Llama model into 13 pieces.

For better performance, we prefer to delegate the whole model to Core ML. Since Core ML has added the [necessary support](apple/coremltools#2190), it is time to revert this workaround

Pull Request resolved: pytorch#2975

Reviewed By: kirklandsign

Differential Revision: D56002979

Pulled By: cccclai

fbshipit-source-id: e7a7c8c43706cb57eba3e6f720b3d713bec5065b
(cherry picked from commit 7d4bafc)

Co-authored-by: yifan_shen3 <[email protected]>
Summary:
Pull Request resolved: pytorch#3037

Add a simple sdpa so it's decomposed to simpler ops instead of the decompose F.scaled_dot_product_attention, which includes 29 ops including `torch.where`
```
def forward(self, q, k, v):
    aten_mul_scalar = executorch_exir_dialects_edge__ops_aten_mul_Scalar(q, 0.5946035575013605);  q = None
    aten_full_default = executorch_exir_dialects_edge__ops_aten_full_default([8, 8], True, dtype = torch.bool, layout = torch.strided, device = device(type='cpu'), pin_memory = False)
    aten_arange_start_step = executorch_exir_dialects_edge__ops_aten_arange_start_step(0, 8, layout = torch.strided, device = device(type='cpu'), pin_memory = False)
    aten_unsqueeze_copy_default = executorch_exir_dialects_edge__ops_aten_unsqueeze_copy_default(aten_arange_start_step, -2);  aten_arange_start_step = None
    aten_arange_start_step_1 = executorch_exir_dialects_edge__ops_aten_arange_start_step(0, 8, layout = torch.strided, device = device(type='cpu'), pin_memory = False)
    aten_unsqueeze_copy_default_1 = executorch_exir_dialects_edge__ops_aten_unsqueeze_copy_default(aten_arange_start_step_1, -1);  aten_arange_start_step_1 = None
    aten_sub_tensor = executorch_exir_dialects_edge__ops_aten_sub_Tensor(aten_unsqueeze_copy_default, aten_unsqueeze_copy_default_1);  aten_unsqueeze_copy_default = aten_unsqueeze_copy_default_1 = None
    aten_le_scalar = executorch_exir_dialects_edge__ops_aten_le_Scalar(aten_sub_tensor, 0);  aten_sub_tensor = None
    aten_logical_and_default = executorch_exir_dialects_edge__ops_aten_logical_and_default(aten_le_scalar, aten_full_default);  aten_le_scalar = aten_full_default = None
    aten_full_like_default = executorch_exir_dialects_edge__ops_aten_full_like_default(aten_logical_and_default, 0, dtype = torch.float32, pin_memory = False, memory_format = torch.preserve_format)
    aten_logical_not_default = executorch_exir_dialects_edge__ops_aten_logical_not_default(aten_logical_and_default);  aten_logical_and_default = None
    aten_scalar_tensor_default = executorch_exir_dialects_edge__ops_aten_scalar_tensor_default(-inf, dtype = torch.float32, layout = torch.strided, device = device(type='cpu'))
    aten_where_self = executorch_exir_dialects_edge__ops_aten_where_self(aten_logical_not_default, aten_scalar_tensor_default, aten_full_like_default);  aten_logical_not_default = aten_scalar_tensor_default = aten_full_like_default = None
    aten_permute_copy_default = executorch_exir_dialects_edge__ops_aten_permute_copy_default(k, [0, 1, 3, 2]);  k = None
    aten_mul_scalar_1 = executorch_exir_dialects_edge__ops_aten_mul_Scalar(aten_permute_copy_default, 0.5946035575013605);  aten_permute_copy_default = None
    aten_expand_copy_default = executorch_exir_dialects_edge__ops_aten_expand_copy_default(aten_mul_scalar, [1, 1, 8, 8]);  aten_mul_scalar = None
    aten_view_copy_default = executorch_exir_dialects_edge__ops_aten_view_copy_default(aten_expand_copy_default, [1, 8, 8]);  aten_expand_copy_default = None
    aten_expand_copy_default_1 = executorch_exir_dialects_edge__ops_aten_expand_copy_default(aten_mul_scalar_1, [1, 1, 8, 8]);  aten_mul_scalar_1 = None
    aten_view_copy_default_1 = executorch_exir_dialects_edge__ops_aten_view_copy_default(aten_expand_copy_default_1, [1, 8, 8]);  aten_expand_copy_default_1 = None
    aten_bmm_default = executorch_exir_dialects_edge__ops_aten_bmm_default(aten_view_copy_default, aten_view_copy_default_1);  aten_view_copy_default = aten_view_copy_default_1 = None
    aten_view_copy_default_2 = executorch_exir_dialects_edge__ops_aten_view_copy_default(aten_bmm_default, [1, 1, 8, 8]);  aten_bmm_default = None
    aten_add_tensor = executorch_exir_dialects_edge__ops_aten_add_Tensor(aten_view_copy_default_2, aten_where_self);  aten_view_copy_default_2 = aten_where_self = None
    aten__softmax_default = executorch_exir_dialects_edge__ops_aten__softmax_default(aten_add_tensor, -1, False);  aten_add_tensor = None
    aten_expand_copy_default_2 = executorch_exir_dialects_edge__ops_aten_expand_copy_default(aten__softmax_default, [1, 1, 8, 8]);  aten__softmax_default = None
    aten_view_copy_default_3 = executorch_exir_dialects_edge__ops_aten_view_copy_default(aten_expand_copy_default_2, [1, 8, 8]);  aten_expand_copy_default_2 = None
    aten_expand_copy_default_3 = executorch_exir_dialects_edge__ops_aten_expand_copy_default(v, [1, 1, 8, 8]);  v = None
    aten_view_copy_default_4 = executorch_exir_dialects_edge__ops_aten_view_copy_default(aten_expand_copy_default_3, [1, 8, 8]);  aten_expand_copy_default_3 = None
    aten_bmm_default_1 = executorch_exir_dialects_edge__ops_aten_bmm_default(aten_view_copy_default_3, aten_view_copy_default_4);  aten_view_copy_default_3 = aten_view_copy_default_4 = None
    aten_view_copy_default_5 = executorch_exir_dialects_edge__ops_aten_view_copy_default(aten_bmm_default_1, [1, 1, 8, 8]);  aten_bmm_default_1 = None
    return (aten_view_copy_default_5,)
```
After applying the diff, we remove the following ops
```
    %aten_full_like_default : [num_users=1] = call_function[target=executorch.exir.dialects.edge._ops.aten.full_like.default](args = (%aten_index_tensor_2, 0), kwargs = {dtype: torch.float32, pin_memory: False, memory_format: torch.preserve_format})

    %aten_logical_not_default : [num_users=1] = call_function[target=executorch.exir.dialects.edge._ops.aten.logical_not.default](args = (%aten_index_tensor_2,), kwargs = {})

    %aten_scalar_tensor_default : [num_users=1] = call_function[target=executorch.exir.dialects.edge._ops.aten.scalar_tensor.default](args = (-inf,), kwargs = {dtype: torch.float32, layout: torch.strided, device: cpu})

    %aten_where_self : [num_users=1] = call_function[target=executorch.exir.dialects.edge._ops.aten.where.self](args = (%aten_logical_not_default, %aten_scalar_tensor_default, %aten_full_like_default), kwargs = {})

    %aten_mul_scalar : [num_users=1] = call_function[target=executorch.exir.dialects.edge._ops.aten.mul.Scalar](args = (%aten_permute_copy_default_3, 0.5946035575013605), kwargs = {})
    ...
    %aten_mul_scalar_1 : [num_users=1] = call_function[target=executorch.exir.dialects.edge._ops.aten.mul.Scalar](args = (%aten_permute_copy_default_6, 0.5946035575013605), kwargs = {})
```
but introduce an add
    %aten_add_tensor_3 : [num_users=1] = call_function[target=executorch.exir.dialects.edge._ops.aten.add.Tensor](args = (%aten_mul_tensor_11, %aten_index_tensor_2), kwargs = {})
```
ghstack-source-id: 223152096
exported-using-ghexport

Reviewed By: mergennachin, kimishpatel

Differential Revision: D56119737

fbshipit-source-id: ec8e875f0a4c4ec67b7493e4872c9a5b081e6de7
(cherry picked from commit cf78107)
Copy link

pytorch-bot bot commented Apr 19, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/3165

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit dc0b5bd with merge base d3326a2 (image):

NEW FAILURE - The following job has failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 19, 2024
@cccclai cccclai closed this Apr 19, 2024
@cccclai cccclai deleted the simple_sdpa branch April 19, 2024 19:03
@cccclai cccclai restored the simple_sdpa branch April 19, 2024 19:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.