Skip to content

[ET-VK][Llama] Apply XNNPACK partitoner as well when lowering to Vulkan #6857

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Nov 14, 2024

Conversation

pytorchbot
Copy link
Collaborator

This PR was created by the merge bot to help merge the original PR into the main branch.
ghstack PR number: #6830
^ Please use this as the source of truth for the PR details, comments, and reviews
ghstack PR base: https://github.com/pytorch/executorch/tree/gh/SS-JIA/147/base
ghstack PR head: https://github.com/pytorch/executorch/tree/gh/SS-JIA/147/head
Merge bot PR base: https://github.com/pytorch/executorch/tree/gh/SS-JIA/146/orig
Merge bot PR head: https://github.com/pytorch/executorch/tree/gh/SS-JIA/147/orig
@diff-train-skip-merge

Pull Request resolved: #6829

## Context

In Vulkan, there is a limit on the number of elements a GPU buffer can have. If a GPU buffer exceeds this limit, then the API will either produce an error or undefined behaviour will ensue.

## Changes

Along with `texture_limits`, introduce a configurable `buffer_limit` entry in the partitioner configuration.
ghstack-source-id: 253568943

Differential Revision: [D65899828](https://our.internmc.facebook.com/intern/diff/D65899828/)
Pull Request resolved: #6830

## Context

The final logit linear layer in the Transformer architecture has extremely large tensors, since the output and weight tensors will have a tensor with dim equal to the vocabulary size, which may be extremely large. Because of this, image textures cannot be used to execute the op when running with the Vulkan delegate, so an implementation using buffer based tensors must be used.

Unfortunately, Vulkan does not have a performant implementation of linear with buffer based tensors at the moment. As a result, if this final linear layer is executed in Vulkan, model inference is extremely slow.

## Changes

The below diff will prevent the final logit linear layer from being delegated to Vulkan by enforcing a GPU buffer limit.

This diff modifies the export llama script to apply the XNNPACK partitioner after the Vulkan partitioner if lowering to Vulkan, to ensure that remaining ops will be accelerated with XNNPACK. 4 bit quantization will also apply an additional Quantizer after applying the Vulkan quantizer (which will skip the final logit linear layer) so that the final logit linear can be quantized as well.

## Long Term

This is a temporary measure while an optimized buffer based linear implementation is developed. Once the Vulkan implementation achieves parity with XNNPACK, the final logit linear will be delegated to Vulkan once more.
ghstack-source-id: 253568942

Differential Revision: [D65899827](https://our.internmc.facebook.com/intern/diff/D65899827/)
Copy link

pytorch-bot bot commented Nov 14, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/6857

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

❌ 2 New Failures

As of commit 5d6e508 with merge base ecdc007 (image):

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added ciflow/periodic module: vulkan Issues related to the Vulkan delegate and code under backends/vulkan/ labels Nov 14, 2024
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 14, 2024
Base automatically changed from gh/SS-JIA/146/orig to main November 14, 2024 18:18
@SS-JIA SS-JIA merged commit f32cffd into main Nov 14, 2024
73 of 75 checks passed
@SS-JIA SS-JIA deleted the gh/SS-JIA/147/orig branch November 14, 2024 18:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/periodic CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. module: vulkan Issues related to the Vulkan delegate and code under backends/vulkan/
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants